GB2540922B - Full resolution plenoptic imaging - Google Patents

Full resolution plenoptic imaging Download PDF

Info

Publication number
GB2540922B
GB2540922B GB1502601.6A GB201502601A GB2540922B GB 2540922 B GB2540922 B GB 2540922B GB 201502601 A GB201502601 A GB 201502601A GB 2540922 B GB2540922 B GB 2540922B
Authority
GB
United Kingdom
Prior art keywords
pixels
sensor
camera
pattern
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
GB1502601.6A
Other versions
GB2540922A (en
GB201502601D0 (en
Inventor
Vicente Blasco Claret Jorge
Lena Blasco Whyte Isabel
Victoria Blasco Whyte Carmen
Jorge Blasco Whyte William
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to GB1502601.6A priority Critical patent/GB2540922B/en
Publication of GB201502601D0 publication Critical patent/GB201502601D0/en
Publication of GB2540922A publication Critical patent/GB2540922A/en
Application granted granted Critical
Publication of GB2540922B publication Critical patent/GB2540922B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • G06T5/80
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/282Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • G06T7/596Depth or shape recovery from multiple images from stereo images from three or more stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/207Image signal generators using stereoscopic image cameras using a single 2D image sensor
    • H04N13/232Image signal generators using stereoscopic image cameras using a single 2D image sensor using fly-eye lenses, e.g. arrangements of circular lenses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/21Indexing scheme for image data processing or generation, in general involving computational photography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10052Images from lightfield camera
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/111Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation

Description

Full resolution Pleuoptic Imaging
BACKGROUND
Field of application.
The present inventions relate generally to image capture and processing, in particular relate to image acquisition with solid state sensors including CMOS and CCD, although they can also relate to traditional camera-films.
Summary of the in vention.
The present invention describes a method to compose a three-dimensional virtual image outside of a plenoptic camera, in the three-dimensional object world. In one of the embodiments not only the image itself is recorded, but also the DOA (direction of arrival) of the light rays to the camera (or sensor), and post-processing is performed on the image acquired to obtain a “virtual-image” outside of the camera (in the object world) that matches the real object world. The net result produces images that include not only all the pixels of the sensor, but also their distance to the real world.
Unlike plenoptic (or light field) cameras, that only reach a small fraction of the full resolution of the sensor, the present inventions produce images with the same resolution as the sensor itself. Additionally, it would be desirable that the present inventions allow advanced post processing features such as re-focusing at different depths (after the shot), artificially sharpening or blurring at different parts of the image, all-in-focus images (where different parts of the image are focused at different depths to yield a “sharp-image-everywhere”), calculation of distances to the real world and generation and display of 3D-images (stereo images (with active or passive glasses), autostereo images (without glasses but using multi-stereo-displays) and future integral displays and light-field displays (without glasses)). This field includes optical elements, image sensors and computing means to process the acquired images. Several embodiments can be used to implement the inventions described, using traditional discrete optical components and lenses, but also using micro-optic techniques, as for example the production of WLMC (Wafer Level MiniCameras), yielding extremely high quality, low bulkiness, high volume production, and low cost cameras, that can be used in mobile telephones, tablet computers, laptop-computers, desktop computers, digital cameras, and in general, in consumer goods incorporating cameras; especially, but not exclusively, in combination with CMOS image-sensors.
Description of the drawings.
The invention may be more completely understood in consideration of the detailed description of various embodiments of the invention that follows in connection with the accompanying drawings, in which:
Figure 1. Consequences of unfocused images on the pattern over the film (or over the sensor plane).
Figure 2. Prior ait of a plenoptic sensor or camera.
Figure 3. Projection of rays over a plentoptic sensor in the case where the focused image plane would form beyond the MLA and sensor position.
Figure 4. Projection of rays over a plentoptic sensor in the case where the focused image plane forms before the MLA and the sensor position.
Figures 5.A, 5.B and 5.C. Way in which 3 pixels below every microlens in the microlens array (MLA) discriminate 3 different directions of arrival of the light beam.
Figure 6. Prior-art showing a plenoptic lens that can be attached to an existing camera, the MLA (Microlens Array) is not near the sensor but is part of the lens.
Figure 7. Prior-art showing a Bayer pattern of a sensor with square pixels of several colors (Green-marked with a V, Red-R_marked, and Blue-A_marked). The smaller square at the center of every pixel is the active area and the X-shape is the projection of the pixels microlens as per figure 7.bis.
Figure 7.bis. “Pixel microlens” at the top of a pixel (two pixels at the bottom to show how microlenses of adjacent pixels might overlap), showing the total area of the pixel (covered by the microlens) and the active area (small square at the center of the pixel).
Figure 8. Top view of the bottom of figure 7.bis.
Figure 9. Prior-art, Yotshuba pattern, similar to the Bayer pattern in Figure 7 but including additional B/W (Black and White) pixels marked with a B.
Figure 10. Top view of an array of 4x4 pixels below a lens in the MLA (Microlens array).
Figure 11. Lateral view of figure 10 showing a microlens (5) from the MLA at the top, a lower refraction index material (4), pixel microlenses (3), color filters for three different colors (6, 7 and 8) and the active area of the pixels (2) over a semiconductor susbstrate (1).
Figure 12. Extension of figure 11 showing 3 microlenses from the MLA (Microlens Array).
Figure 13. Different embodiment providing the same function as figure 12. The low refraction index material has been replaced by air (or other gas) and the MLA (5) is held in place by a separating wafer 4'.
Figure 14. Different embodiment in which over a substrate (1) several photosensitive devices (2) have been built, covered under color filters of several colors (6, 7 and 8), pixel microlenses (3), a low refraction index material (4), a MLA (5) and several consecutive layers of low refraction index materials (4', 4", 4"'and 4'"') alternated with high refractive index materials 5', 5", 5'” and 5'"') forming 4 different lenses on top of the MLA.
Figure 15. Different embodiment providing the same function as figure 14, in which the low refraction index material has been replaced by air (or other gas). The MLA and the 4 different lenses are held in place by separating wafers 4......
Figure 16. x-u representation of rays in a camera, where x represents the position of the ray at the surface of the photosensor (or the film in a traditional camera) or the MLA in a plenoptic camera and u represents the position of the ray at the aperture of the main lens.
Figures 17.A and 17.B. x-u representation of rays in a camera, the left figure (A) shows when the film-plane where the image is formed falls after the parameterization plane X, the right figure (B) shows the case where the film plane where the image is formed falls before the parameterization plane x. The x-u representations of points in the object world are not anymore single points (as in figure 16) but lines with a positive slope (bottom of figurel7.A) or negative slope (bottom of figure 17 .B).
Figures 18.A, 18.B and 18.C. Graphs (not in real scale) showing how rays coming from single points at infinite distance of the camera hit different pixels below the central MLA of the camera depending on the direction of arrival, coming from the left (18.Ά), from the optical axis of the camera (18.B) and from the right (18.C).
Figures 19.A and 19.B. Square green patterns in the object world that hit the exact entire area of two adjacent green pixels in a Bayer pattern of a photosensor. Figure 19.A shows the patterns at a distance of the camera double than in 19.B, its square, patterns are larger but drawn at a different scale.
Figure 20. Projection of a “white image over black background” over the pixels of a sensor with a Bayer pattern, used to compensate aberrations.
Figure 21. Explains the cos4 light attenuation towards the sides of the sensor.
Figure 22. Conjugation of a red and a green pixel at a large distance of the camera on the object world.
Figures 23.A and 23.B. Airy patterns for circle lenses (23.A) and for circle lenses and square microlenses (top of 23.B). . Figure 24. Inter-pixel interferences and SNR (Signal to noise ratio) deterioration between adjacent pixels of different colors in an Airy pattern.
Figure 25. Maximum size of the Airy patterns that will not cause image-blurring in a Bayer pattern below a microlens.
Figure 26. Techniques to minimize inter-pixel interference in neighbour-green-pixels in a Bayer sensor.
Figures 27.A and 27.B. Microlens aligned with an array of 5x5 pixels array (20.A) or misaligned (20.B).
Description of related art.
Figure 1 (extracted from the PhD dissertation of Dr .Ren Ng, provided as an addendum to the filing with application number PCT/US2007(003420)) shows how in a converging lens (and in general in any converging lens or group of lenses in series) a point image is formed on the “Best focus” position after the converging lens (top part of the figure 1) and a situation where the point image is formed before the image plane (bottom part of the figure 1) where the sensor (or more traditionally the recording film) is located beyond the “Best focus” position and the beam starts to diverge. The net result is that in the second situation a point in the object world records as a confusion circle (instead of a point) in the sensor area, blurring the resulting image if a large number of points were imaged.
The advent of light field (or plenoptic) cameras, exemplified in figure 2 (extracted from patent US 8,290,358 Bl; inventor Todor G. Georgiev), located a MLA (MicroJLens Array, marked with an arrow near the bottom of Figure 2) a small distance before the sensor (or before the film in traditional cameras), with several sensor pixels (or several particles of minimum size on a film) below every one of the lenses in the MLA, including also an array of pixels (in figure 2 below the MLA), nearly at the bottom of the camera (bottom of figure 2).
Figures 3, 4 (extracted from patent GB 2488905; inventors Norihito Hiasa and Koshi Hatakeyama) and figure 5 (extracted from Reference 1, H. Navarro et al) show how the recording is made in a light-field camera such as that exemplified in Figure 2 (however, if instead of using photo-sensor pixels a film was used, the principles would be exactly the same). A plane 201 in the object world projects all its points through a lens 101 and an image plane 202 is formed inside the camera, however two different situations are exemplified, the image plane 202 forming before the MLAs (figure 4), what in a traditional camera would enlarge the recorded circles and would yield a blurry image, and the case where the image-plane would form beyond the MLA 102 and the recording pixels 103 (Figure 3).
Figures 5 shows how light field cameras record DOAs-Directions of Arrival (in a simplified version of 3 pixels per micro-lens, or 3x3 (9) pixels per micro-lens if the MLA is composed of square micro-lenses of 2 dimensions). Assuming that the pixels are located at the focal distance of the lenses in the MLA (but it is possible to generalize the discussion for different distances), a beam of light coming from a point at infinite distance from the MLA and reaching the MLAs in a perpendicular way, would concentrate the incoming light at the central pixel of the lenses in the MLA (Figure 5.C). If the light reaches the MLA coming upwards, it will finally concentrate the light at the top pixel of every MLA (figure 5.B). If the light is coming downwards, the light would concentrate at the bottom pixel of every MLA (figure 5.A). Hence the MLA-pixel array from figures 5.A.B.C can discriminate 3 different DOAs in the “flatland” simplification shown in the figures (9 different DOAs if we have square micro-lenses with 9 (3x3) pixels below every microlens).
The structures mentioned are used in what are called by some “Light-Field-Cameras” (because these cameras do not only integrate the light hitting every pixel, but also sample the “light-field” when sampling the DOAs). Some others call these “Plenoptic cameras” (because they sample the “plenoptic function” by parameterizing-samplingthe light crossing two different planes: the plane of the MLA-MicroLensArray and the plane of the aperture of the main lens, or what is equivalent, the plane of the pixels). Some others call it “Integral-Photography”, and, some others call it “Computational photography” (due to the computation efforts involved in post-processing the images recorded at the sensors, as we will exemplify below).
Whichever the name used, these structures and their associated procedures allow advanced features not available in traditional photography and video, some of of which are described below: 1. From the information recorded it is possible to compute two different images which are what a pair of human eyes would see if located at the place of the camera and looking in the same direction, opening the possibility of recording 3-Dimensional images in a camera with a single lens and a single Image-sensor (or film), something formerly reserved for stereo cameras (two-cameras looking in the same direction and separated from each other by about 6 centimeters, equivalent to the distance separating a pair of human eyes). The information recorded by the “single-lens” and “single-sensor” plenoptic cameras can be processed to drive 3D-stereoscopic displays (displays sending two different images to the two human eyes by using special glasses that filter every image to its cones ponding eye, either with polarized glasses, colored glasses (called anaglyph) or active glasses that change the properties of the two glasses in real-time interaction with the display) something formerly limited to stereo cameras duplicating sensors and optics with two cameras. Whatever the display/glasses technology used, creating the 3D-image in the human brain in the very same way as the brain does by having two different points of view at the two eyes when looking to the real world formerly was as simple as sending to the right eye the information recorded by the right camera and to the left eye the information recorded by the left camera. With light field cameras a single image is recorded and the two images for the two eyes are computationally synthesized based on the recorded information. Further evolutions of display technology allow “multi-stereo” (also called auto-stereo) images by sending multiple beams of stereo images (2 images for the two eyes) in several possible directions in front of the display, and avoiding the need to use glasses by specially separating the light sent to the right and to the left eye from the display; even prototypes of “integraldisplays” (matched to the “integral cameras” we just described) allow a glass-free display of 3D-images by filtering the light departing from the display to the eyes of the viewers with a MLA (Micro-Lens-Array) at the display performing the opposite function of the MLAs used to record the images (MLAs 102 in figures 3 and 4), offering viewers more realistic 3D-images than the ones obtained using stereoscopic techniques. 2. Several different computations can be performed with the recording of the DOAs. For example; in figure 4 it is possible to trace back the rays recorded by the pixels 103 and micro-lenses 102 to where we want the rays to be (as we know the propagation direction) instead of where the rays were recorded, in this way the image recorded by the MLA/pixel assembly can be re-computed to create what would have been the image at the left side of 102 (plane 202 in Figure 4), obtaining a focused sharp image from un-focused blurred recordings. 3. In Figure 3 we exemplify the opposite, by re-tracing the rays to where they would have formed the image we can compute a clear sharp-image (that would have formed at the right side of the MLAs-102 and recording-pixels 103) starting from the blurry image recorded by the assembly 102-103. 4. According to the procedure described in 2 and 3 above, it is possible to bring into focus any part of the image, at any possible focal plane, at the right or at the left of the MLA-plane. 5. It is possible to compute “all-in-focus” images where all the parts of the image are completely sharp and in focus. There are several procedures to achieve this, more or less quickly, and more or less demanding in terms of computational power requirements. The easiest to understand is to compute a set of images that is called the focal-stack (several different images computed at several different focal depths (or image planes), instead of the 2 planes computed in figures 3 and 4 we compute a larger set (may bel6 images,...may be up to 1024 images or more,..., may be any number we design), we analyze all the pictures of the focal-stack to verify “contrast” in every part of the picture, and we compose a final picture by collecting from each image in the focal-stack, the areas with the highest contrast (the areas focused in that particular image of the focal stack), and we finally re-create a “better picture” by choosing the “best-patches” from every one of the images of the focal-stack. 6. Lots of different computations can be performed using computational photography techniques. We summarized above, the more specific techniques related to integral photography, however a non-exhaustive list of the different image-processing functions that can be performed includes features such as: contrast enhancement, dynamic range enhancement, noise reduction, thresholding, etc.
The MLA is usually located very near the image sensors (as shown in figures 2, 3, 4, and 5), however, it is also possible to place the MLA (or lenslet array) further away from the plane of the image sensors and produce the same effect with intermediate lenses that achieve the same result: assign a small number of pixels discriminating DOAs to every microlens in the Lenslet Array (as for example in Figure 6, extracted from patent US 2011/0169994, from the inventors DiFrancesco, Selkirk, Duff, VandeWettering and Flowers)). Some other implementations place a small number of lenses (and prisms) between the main lens and the object world.
The basic concepts of the plenoptic camera were already described by Lippman in 1908, however technology difficulties would not allow significant advances and implementations until 2004, . when Ng, Hanrahan, Levoy and Horowitz in their patent US 2007/025074 (as well as in the “Stanford Tech Report” CTSR 2005-02) described a camera with a resolution equaling the number of used microlenses. In October 2004, Ng, Hanrahan, Horowitz and Levoy (patent US 2012/0300097) described a plenoptic sensor implementing the same functions as described above. Some of the first industrial implementations were introduced by the Raytrix (www.raytrix.com), and by the Lytro (www.lytro.com), a company founded by some of the Stanford researchers and authors of the aforementioned patents.
During 2008, Lumsdaine, Georgiev and Intwala described a design with higher specifications for effective resolution (prior implementations required only a pixel per microlens) than in the patents mentioned above (“A. Lumsdaine arid T.Georgiev. Full resolution lightfield rendering. Technical report, Adobe Systems, January 2008”, and in patent US 2009/0041448).
At present, the main disadvantage of plenoptic or light-field cameras is the relatively low spatial resolution, as can be seen from figures 2, 3, 4 and 5, where there are several pixels for every microlens (9 pixels per square microlens in figures 5, however actual implementations have around 100 pixels per microlens). The information recorded by the pixels is used to infer the DOA (Direction of arrival) of the light rays, and thereafter to compute how to trace the rays forward (as in figure 3) or backwards (as in figure 4). It can be shown that with square micro-lenses with N*N pixels the number of exactly sampled and focused planes that can be calculated are 2N-1, hence if we have 100 pixels per microlens (10x10 pixels with square microlenses), the number of exactly focused planes that can be inferred is 19 ((2x10)-1). The number 19 is a good number to obtain “all-in-focus” images, or to refocus at 19 different depths, however if for example we adopted 4 pixels per microlens (2x2), the number of focused planes that can be obtained would only be 3 ((2x2)-l), which is not sufficient to obtain satisfactory re-focusing results or “all-in-focus” images. Taking an intermediate trade-off between both approaches, as for example 49 pixels per square micro-lens, (7x7) would yield 13 focused planes ((2x7)-1 planes), which is a more reasonable number than only 3, yet still far from the 19 planes with 100 pixels per microlens.
In summary, plenoptic or light-field cameras have to necessarily make a trade-off between spatial and directional resolution, and it is the directional resolution that achieves all the processing . advantages mentioned previously including refocusing, all-in-focus images, 3D-images, etc. If, for example, the objective is to obtain images with 1 Megapixel resolution, a sensor of 4 Megapixels will be required to sample the DOAs (Directions of Arrival) that will offer the data to calculate as little as 3 focused planes; or, a sensor of 49 Megapixels will be required to be able to compute 13 focused planes; or a sensor of 100 Megapixels will be needed to compute 19 focused planes.
This is the main handicap of plenoptic or light-field cameras. Spatial resolution is sacrificed for directional resolution. As explained above, to obtain 1 Megapixel images with a plenoptic camera one would need 49 Megapixels to refocus in 13 different planes, or 100 Megapixels to refocus in 19 different planes.
The distance between the “equivalent main lens” of the camera and the plane of the MLAs (101 to 102 in figures 3 and 4) is usually equal to the focal-distance of the main lens (101). The distance between the MLA plane and the sensor plane is usually the focal distance of the lenses in the MLA. However, most of the features and the limitations we described above and below are still applicable even if that distance is reduced or increased. However, as both planes approach (102 and 103) we trade-off directional resolution in favor of spatial resolution. In the extreme, when the distance between the MLA-plane and the sensor plane is 0, the net result is the same as not having MLAs at all, as the light is sampled by the pixels immediately after the lens, without allowing space to the light rays to deviate from its original position before the lens.
Sometimes for practical geometrical reasons the lenses are designed with hexagonal shapes because these polygon shapes approach the form of circles (normally used at the main lens) and at the same time have a good fill-factor (for 100% of the surface of the light path).
The image sensors in the prior art use color filters, as in figure 7 (showing a graph of 6 x 6 pixels). We can appreciate the so-called Bayer pattern composed by pixels with red-filters (marked with an R), green filters (marked with a V), and blue filters (marked with an A). It is also possible to appreciate the active photosensitive area at the center of the pixels (little squares with thin lines at the center of every pixel). The X-shape on top of every pixel is the transparent “pixelmicrolens”, which is explained below.
Figure 7.bis shows the 3Dimensional optical structure usually placed at the top of every pixel. Every pixel usually has an associated microlens on top of it, whose function is to concentrate the light only in the active photosensitive area (the small square drawn with thin lines at the center of the pixels in Figures 7 and 7 .bis), so that the quantity of light that hits the non-active area between pixels (normally used for polarization and reading circuitry), is negligible or zero. The top of Figure 7.bis shows a pixel in which the percentage of active area vs. the total area of the pixel is relatively low, the bottom of the figure 7.bis shows two adjacent pixels in which the active areas occupy a large percentage of the total area of the pixel (more normal with the advent of CMOS-imagers reaching submicronic technologies). The micro-lenses at the top of every pixel can be manufactured by deposing a transparent material and melting it, having in this case a semi-sphere shape, however, more advanced processes use photolithographic methods to build more controllable geometries. In figure 7.bis, we illustrate how the micro-lenses have been built using 4 different spherical sectors (instead of a semi-sphere) that intersect each other in four lines (seen as an X at the fop views of figures 7, 8 and 9). Figure 7.bis shows also the edges where the lens of a pixel ends and starts the lens of the adjacent pixel. Figure 8 is the top view of two adjacent pixels (pixel limits drawn with thick lines, the active area is the square drawn with thin lines, and the X-shape with thin lines represents the projections marking the four spherical sections composing the pixel micro-lens.
Figure 9 shows a Yotsuba pattern that besides having filters for the 3 fundamental colors (Red-Green-Blue), also has white pixels (marked in white and with a B).
Figures 10 and 11 explain how we build monolithically a plenoptic-sensor (or light-field sensor) where below every lens 5 of the MLA (Micro Lens Array), built with a material of high refractive index, there is a material 4 (of low refractive index) separating with a distance x2 the “plenoptic microlens” 5 from four “pixel-microlenses” 3. Below the pixel microlenses there are three color filters 7, 6 and 8 (in reality 4x4(=16) square filters), for example (but not exclusively) following the Bayer pattern (Red-Green-Blue), even if any other pattern could have been followed (for example using as fundamental colors yellow, magenta and cyan), below the color filters there is an “active photosensitive area” 2 (where the photons are converted into electrons and causing an electrical current) built into a substrate that can, for example, be a semiconductor substrate 1 that can, for example, be a CMOS-semiconductor substrate.
Figure 12 shows an embodiment in which a plenoptic sensor comprising three plenoptic microlenses 5 with four pixels below every plenoptic microlens (would be sixteen pixels in case of square microlenses), in this case, the distance between the plenoptic microlenses and the pixel microlenses is x (built out of a low refraction index material). Figure 13 shows an embodiment in which the low refraction index material is air (or any other non-corrosive gas with good properties regarding humidity penetration and high inertia to chemical reaction), in this structure (possibly built via wafer-stacking: optical wafers on top of micro-electronic wafers) there are separators 4' (possibly separating wafers) guaranteeing that the physical distance x between the pixel microlenses 3 and the plenoptic microlenses 5 is held constant; the plenoptic microlenses 5 have holders at their edges holding the microlenses and top of the separators 4'.
Figure 14 is an example of a monolithic integration where high refractive index optical layers 5, 5', 5 ", 5"'and 5"" are built on top of low refractive index optical layers 4, 4', 4"'and 4"", building a plenoptic structure on top of the semiconductors substrate 1, photo-sensors 2, that together with the color filters 6,7 and 8, and the pixel micro-lenses 3 constitute a plenoptic image sensor; the higher layers 5, 5', 5 ", 5'" form several lenses on top of the sensors, the MLA 5, a convex-concave lens 5', a biconcave lens 5", a concave-convex lens 5'", and a bi-convex lens 5”". Figure 15 is a similar embodiment built with separating wafers 4”” holding in place the lenses 5.
It is important to know that the plenoptic or light-field structures allow the calculation of the distances from the pixels inside the camera in relation to their origins in the object world. If we parameterize the rays into the camera by the place in which they cross two different planes (see Figure 16, extracted from Dr. Ng PhD dissertation), the x-y plane (y would be the axis perpendicular to the paper and to the x-axis in Figure 16), and the u-v plane (v would be the axis perpendicular to the u-axis and perpendicular to the paper in Figure 16), a ray starting in the “World focal plane” and reaching the photo-sensor (or the film, or the MLA-plane in a plenoptic camera) can be parameterized by a x-u graph (at the right side of figure 16). Even if for simplicity we only represent the x-u axis in “flatland”, its extension to the x-y-u-v planes in a 3Dimensional world is evident, and we can continue the discussion without any loss of generality.
Figure 17.A (extracted from Dr. Ng PhD dissertation), is an equivalent situation to Figure 3, is when the rays departing from a point in the World focal plane (201) hit the “MLA-Plane” (102) and the Sensor-plane (103) before the point in which all the rays would converge in the Camera focal plane (202); unlike in figure 16, the rays in the x-u diagram show a correspondence between x and u which is not a single point but a line leaning towards the right (bottom of Figure 17.A).
Figure 17.B (equivalent to Figure 4) show the opposite situation. The Image-plane (202) is located before the MLAs (102) and the sensor (103). In this case the x-u diagram is a line leaning left (bottom of Figure 17.B. An intermediate situation would be when the Image-plane (202) is exactly over the MLA-plane (102) or over the film-plane in a traditional camera. A straightforward triangulation shows (see figure 3) that: d_101_202 = (d_101_102)/(l - s)
Where': d_101_202 Is the distance from the main lens (101) to the image formation plane (202) d_101_102 Is the known fix distance from the main lens (101) to the MLA-plane (102) s is the slope in the x-u diagram of the line representing a ray departing from the
World focal plane (201) and reaching the MLA-plane (102), as shown at the bottom of the figures 17. A and 17.B.
As a consequence, it is straightforward to calculate the distance where the imaging plane is (202) for every point in the World focal plane (201). The distance from the World focal plane (201) to the main lens of the camera (101) is a straightforward calculation knowing the characteristics of the camera (just a conjugation of the image plane 202 into the object world through the lens 101 to find the distance at which the World focal plane (201) is located).
Hence the calculation of distances to the real , object world can be performed starting from the slopes found in the x-u diagram and the distance between the main lens of the camera (or its equivalent in a multi-lens objective) and the MLA-plane. A calculation of the distances to a real scene (vs. the simplified view of figure 3) is only slightly more complex, as we do not have a single World focal plane (201) with a single point, but real objects of different sizes situated at different distances from the camera, and in a plenoptic camera rays converging towards the Image plane in figure 3 (or having passed the image plane as in figure 4) are “recorded” as an array of color pixels in a sensor (103) after the rays have been deviated by the MLA (102).
There are lots of algorithms to calculate distances from the pixels to their equivalent in the real world in a plenoptic camera, most of them based on the identification of linear patterns in the x-u graphs to calculate the slope of the x-u line corresponding to a point in the real world. The fact that the graphs are not only a single x-u graph but a collection of graphs (one for every value of y-v), representing the scene in a 3 dimensional world adds slightly to the complexity of the explanation without any loss of generality; similarly, the pixels are not points of infinitesimal size but bi-dimensional photodetectors, which makes the lines at the bottom of figures 17.A and 17.B a bit wider than lines of “infinitesimal width” (as wide as a pixel in a photo-sensor).
The images captured by a plenoptic camera with sensors as in Figures 3 and 4 (103), or sensors as in figures 7 to 15 capture the “Light-Field”. In a “Light Field”, every value of the parameters x-y corresponds to its respective microlens (x-y), and every pixel below every microlens has a corresponding u-v parameter within that microlens that maps the x-y plane to the u-v plane (and hence to the aperture of the main lens) according to the direction of arrival of the rays.
In an array of pixels of the sensor as in figures 7 or 9, if we re-order the pixels in x-u graphs for a fixed value of the pixels y-v, we obtain the so called epipolar images (a different epipolar image for every value y-v). It is also possible to obtain epipolar images showing graphs y-v for every value x-u. Hence to calculate distances to the real world we will need to identify and measure slopes in the epipolar images and apply the calculation described above to extrapolate distances from slopes.
However in images recording real scenes from the object world the linear patterns in epipolar images are not always as easy to identify and measure as in figures 17.A and 17.B. For example, if the sensor “records” a large uniform white pattern from the object world, it becomes impossible to identify any slopes. This is no surprise having in mind the fact that even the human brain and the human eyes when “looking at” an infinite white plane in front of the observer would not be able to discriminate distances. The higher the contrast between neighboring points the easier it is for the human brain/eyes to perceive distances, and it is also easier to identify clear lines in epipolar images. Lots of different algorithms have been developed to measure slopes in epipolar images, especially in low-contrast scenes, where very complex image processing techniques have been applied.
DETAILED DESCRIPTION
The following text and the accompanying drawings, describe preferred embodiments and particular ways to implement the inventions of this application, however the purpose is only to illustrate the principles of the inventions and the preferred embodiments, drawings and implementations should not be interpreted in a limiting way. The precise extent of the inventions must be determined as referred to in the claims of this same document.
As we have mentioned, the main handicap of Plenoptic or Light-Field cameras is that the images they produce have a much smaller number of pixels than the number of pixels of the sensor because the number of useful pixels is limited to the number of microlenses, and while this number can be slightly increased through super-resolution and/or a interpolation technique, these techniques have limitations regarding the number of useful high-quality pixels reflecting the reality of the object world. For example, if a 1 Megapixel image is required we must use 9x9 pixels per microlens (a typical value) and a sensor of 81 Megapixels.
In contrast, the present inventions compute “Virtual images” that correspond with the “Real world” using all the pixels of the sensor, by re-tracing the rays hitting every pixel in the sensor back into the real-world, and with every pixel traced-back into the world at the distance which we know it occupies in the real world, resulting in an image containing as many pixels as pixels in the sensor.
By way of introduction, it is important to note that a world point centered on the main lens of the camera at infinite distance from the camera in a perpendicular direction will produce a set of parallel rays hitting the main lens as in figure 18.B (please note the figure is not at the real scale), if we locate the microlens plane 5 at a distance of the main lens equal to the focal distance of the main lens 5"" (but it is straightforward to generalize the discussion for a different distance), all those rays will hit the microlens at the center of the MLA-array, and if the pixels (2-3-6-7-8) are located at the focal distance of the lenses in the MLA, they will all hit the pixel at the center of the central pixel of the microlens (for practical illustrative purposes figures 18 have five pixels per microlens but could have any other number). Please note that if the distances 5""-to-5 and 5-to-3 were not the focal distances of the main lens or of the microlenses, we would only change the sampling pattern of the light-field outside and inside the camera, but the discussion is also applicable. Please note that in a real world implementation the distance 5""-to-5 is extremely large as compared to the thickness arid the lateral extension of a microlens 5, and that the distance 5-to-3 is large as compared to the size of the pixel lens (3), the figures 18 are not drawn at real scale because it would be impossible to appreciate the smaller features and the relative distances between lenses (of microlenses vs. the main lens and of pixels vs. the microlens).
If the point at infinite distance is not centered on the main lens but slightly towards the left of the camera, its parallel rays reaching the main lens of the camera (5”" in Figure 18.A) would have a slight inclination (as exaggerated in Figure 18.A) reaching the central micro-lens in slightly different angles as in figure 18.B, diffracting in a slightly different direction than in 18.B, and hitting the right pixel next to the central pixel of the microlens. If the point at infinite distance is slightly towards the right, the arrival pattern would be as in Figure 18.C, and the rays would hit the pixel at the left of the central pixel, as shown in Figure 18.C.
Please note again that figures 18.A to 18.C are not drawn to scale. The distances 5"" to 5 are very large compared to the dimensions of the microlenses so that the rays 5"" to 5 coming from the same point at the object world are nearly parallel. The main purpose of the figures is to illustrate how points at infinite distances, if coming from slightly different directions, are sampled by different pixels below the central lens in the MLA. If we keep moving the point at infinite distance towards the left, the incoming rays will eventually hit the last pixel below the central MLA. Moving it a bit more towards the left will cause the rays to hit the following microlens in the MLA-array, and this second microlens will diffract all the rays to one of its pixels. As we tilt the incoming rays more and more the pixels in the second microlens will change, and will eventually hit its last pixel, move to the third microlens and so on until the last pixel of the last microlens is finally reached. This process will end when the point at infinite distance moves outside of the FOV (Field of View) of the camera.
In a real situation drawn with realistic scales, having in mind that the distance 5""-to-5 has infinite value as compared to the lateral size of every micro-lens 5 or its thickness, the rays between 5"" and 5 hitting only a single pixel 2-3 are nearly parallel (unlike those depicted in figure 18). In a real situation a pixel at the sensor will integrate all the rays coming from slightly different directions as diffracted by the main lens and its microlens in the MLA-array, and in a real world situation not all the pixels of the object world can be considered to be at infinite distance, some objects will be nearer the camera.
Unlike in figure 18 (considering a single pixel in the object World), the rays coming from the real world will hit the main lens of the camera 5"" coming from lots of different directions, due to two different facts: that they come from different points at infinite distance; or, that they come from single points nearer the camera, and the rays coming from those points are not parallel when they hit the main lens. Regardless of these facts, if the MLA is situated at the focal distance of the main lens (but the discussion is (and the principles are) also valid if the distance is different) an image from the World focal plane at infinite distance will be formed over the MLA-array, and every lens of the MLA will separate all the directions of arrival to a different pixel below (discretized by the number of pixels below every microlens, sampling as many different directions of arrival as pixels below every microlens). With traditional plenoptic algorithms, if the distance from microlenses to pixels is not the focal distance of the microlenses, the number of sampled directions of arrival is lower, but the lateral resolution of the plenoptic camera will be higher. In the present invention, we maximize simultaneously the number of directions of arrival and the lateral resolution (which is the same as the number of pixels in the sensor), whereas the processes used in plenoptic cameras only maximize the number of directions of arrival); for example if in both a plenoptic camera with traditional plenoptic algorithms and in our invention we wish to discriminate 81 different directions of arrival (with 9x9 pixels below every microlens) to be able to obtain 17 “fully focused planes” a plenoptic camera would need an 81 Megapixels sensor (and 1 Megalenses in the MLA) to obtain 1 Megapixel images, while we would only need a 1 Megapixel sensor and 32x32 lenses in the MLA (or a different number of microlenses in the horizontal direction (X) and the vertical direction (Y), so that XxY=1024 for images with a different horizontal-vertical ratio), representing a huge saving of resources. We explain below how this is achieved.
We have limited our discussion to the simple case where the microlenses are based at the focal distance of the mains lens, and the sensors at the focal distance of the MLAs. However, as stated above, the principles and the arguments discussed herein are equally applicable and valid if applied to different distances in both cases. It is evident that a given square pattern in the object world exists that will emit a beam of light that after crossing the camera main lens (5"" in figures 18) and a single micro-lens (5) will project its beam of light over the entire area of a single pixel (3-2 in figures 11, 12,13,14, 15and 18). Ifthat square pattern is green it will only go through the green-filters (6); if red it will only go through the red-filters (7); and, if blue will only go through the blue filters (8) when the incoming light is sampled with a Bayer pattern (as in figure 7). Furthermore, the discussion is still valid when any other color pattern is used, or when letting all the visible light through in some pixels if using a Yotsuba pattern (as in Figure 9), or for different wavelengths, such as and for example infrared, if using different filters. However, the pattern that illuminates the entire area of a pixel is not unique, it can correspond to square patterns of different sizes located at different distances from the camera.
Similarly, applying the principle of optical reciprocity, the light sampled by a square pixel will come from a square pattern from the object world whose rays hit the pixel after being deviated by the main lens and a micro-lens, and the larger the number of pixels the larger the granularity with which we will sample the real scene in the real world. However, a priori, several patterns of several different square sizes at several different distances from the camera would be able to produce the same effect over that pixel. To illustrate the point, consider what patterns would be able to “equally-illuminate” in exactly the same way two collateral green pixels in a Bayer pattern as in figure 7 (but the discussion is equally valid for different colors or for different color sampling patterns or for different distances). Figure 19.B shows a pattern in the object world that illuminates the whole exact area of two collateral green pixels at a given distance from the main camera (after crossing the main lens 5"" and the plenoptic microlenses 5) and is sampled by that particular couple of pixels). Figure 19.A shows two different square patterns in the real world at twice the distance from the camera as compared to the patterns on Figure 19.A that illuminate the same exact area as before (or in other words, would illuminate exactly over the same exact area of those 2 pixels of the sensor). Please note that the squares at the left are larger than the squares at the right (note that the two images (19.A at the left and 19.B at the right, are at a different scale) due to the fact that they are further away from the camera.
While a plenoptic camera would use both pixels (plus all the other pixels below a given microlens 5) to produce a single pixel in a plenoptic image, the present inventions have been able to separate them into two different pixels in the real object world, yet there is an infinite number of possible sizes of those patterns in the real world (located at an infinite number of possible different distances from the camera) that could have illuminated that exact pattern over the two pixels of the sensor.
Obviously, the discussion can be extrapolated to all the pixels of the sensor. All the pixels can be traced back into the real world to obtain an infinite number of virtual-patterns in the object world that would have printed the pattern sampled by those two pixels of the sensor of the plenoptic camera. Pixels of the sensor traced back into the object world will not overlap with each other (except at very short distances from the main lens, where rays coming from different directions overlap (but not mix) over the same space), in the same way as the two pixels in figures 19.A and 19.B do not overlap each other, or in other words, we can create a virtual image in the object world with as many non-overlapping pixels as pixels we have been able to sample at the sensor.
Furthermore, as stated above, a plenoptic camera allows the calculation of distances to the real world through a straightforward triangulation, referring to Figure 3 we have stated that: d_101_202 = (d_101_102)/(l-s)
Where: d_101_202 Is the distance from the main lens (101) to the image formation plane (202) d_101_102 Is the known fix distance from the main lens (101) to the MLA-plane (102) s is the slope on the x-u diagram of the line representing a ray departing from the
World focal plane (201) and reaching the MLA-plane (102), as shown at the bottom of the figures 17. A and 17.B.
Furthermore, analyzing the epipolar images extracted from the Light-Field (recorded by a plenoptic camera and reordered to compose the epipolar view) the slopes of linear patterns are used to compute the distances from the pixels recorded by the sensor to the real world. Hence it is possible to calculate the distance that each pixel recorded by the sensor requires so as to be traced back into the real world and obtain a “virtual image” in the world that closely corresponds to the real world scene (as closely as permitted by the state-of-the-art for the lateral resolution of the pixels of the sensor, but not by the lateral resolution of the MLA-lens (comprising several pixels), and the distance resolution (related to the angular resolution of the “MLA-pixel assembly”).
Patterns near the camera can illuminate several pixels on the sensor or very near the sensor, or if they are very large patterns in the object world they can even “illuminate” several microlenses (as exemplified in Figure 3), may seem at first to be a problem but it only reflects what happens in a real camera or in a real human eye. For example, a red apple (or a red circular pattern the size of an apple) would print more red pixels at the human eye (or at a camera) if located at 10 cm from the eye (or from the camera) than if located at 10 meters from the eye (or from the camera).
In spite of optical aberrations, in cameras such as those exemplified in figures 2, 3, 4, 6, 12, 13, 14,15 and 18 (and, in general, any camera and/or sensor including microlenses and several pixels per microlens), the patterns sensed by the microlenses-pixels duet are forecastable. It is possible to guarantee that a known pattern (as for example a set of white lines over a black background) should drive a “known-pattern” over the microlenses/pixels of the sensor (at least with a “known granularity” below the “granularity” of the squares of the microlenses, approaching the granularity of the pixels, and down to the pixel-level with the appropriate light-test pattern, and even below that, as explained below. In other words, a known pattern in the “real-world” should “illuminate” over a film or over a sensor (such as CMOS or CCD) an image pattern distorted by the aberrations of the lenses between the object world and the sensor, however in the particular embodiment shown in the figures that describe the present invention the “illuminated” pattern is known to be “as square” as the microlens pattern is. Any departure from the “square pattern” of the microlenses (numbered as 5 in figures 11, 12, 13, 14 and 15) is guaranteed to be caused by optical aberrations and it is possible to establish a correspondence between the “sensed pattern” and the “should have been the sensed pattern with perfect lenses”, in other words, it is possible to correct the optical aberrations. This statement holds true even for traditional cameras (even if it is more difficult to implement, as explained below), and for plenoptic or light-field cameras, and for any shape of the microlenses and the pixels (hexagons, triangles and any polygon in general).
Figure 20 explains and expands on the statements made previously by presenting an example that shows a small subset of the “sensor-microlenses” assembly in a plenoptic camera viewed from the top (with 9x9 pixels below every microlens following a Bayer pattern). It shows a 2x2 square microlenses pattern, a small fraction of what would constitute a plenoptic camera with up to (several) hundred(s) (of) microlenses in the horizontal and vertical dimension. It is possible to expose a camera including this sensor to a pattern of white lines over a black background in the object world (a vertical and horizontal grid of white lines over a black background), the white lines in the object world are separated from each other by the appropriate distance to ensure that every white line (vertical and horizontal) is spaced from the next parallel white line the appropriate distance necessary to “illuminate” its image in one out of every three columns, and in one out of every three rows of pixels. Such a pattern in the object world would ensure that the white lines “illuminate” its pattern over the red, blue, and green pixels (if we use a Bayer pattern, but the discussion also applies to any other colors and/or patterns and/or wavelengths). If the measurements are taken in an environment free of noise and stray-lights, the result at the sensor (for example a CMOS sensor, but equally any other sensor or film can be used), is a pattern in which one out of every three columns of the photo sensor-pixels is illuminated, and one of every three rows is also illuminated. Any departure of the “sensed patterns” from such a rectilinear grid of vertical and horizontal lines over black background is due to aberrations and can be corrected (since the location of where the horizontal and vertical lines are and where the lines should have been is known). It is easy to establish a correspondence function between the “pixels sensed” and what “should have been sensed”, and in the case of a plenoptic camera between the “sensed light field” (SLf) and the “real light field” (Lf):
Lf (X, Y, U, F) = Cf (SLf(x, y, u, v))
Where Cf is a correction function between both, the real light field coordinates are denoted with capitals while the sensed light field has followed the traditional non-capital notation. While it can be inferred from the Figure how pincushion or barrel distortions are corrected (by turning curves in the sensed light field into straight lines in the real light field), it is possible to correct any aberration using this technique, since the direction of arrival of rays is known, providing the information to know what a pixel would be further or nearer of the MLA and sensing planes. Furthermore, the method to correct color aberrations can be inferred. If, for example, we use a Bayer pattern the vertical lines originating from a white line in the object world should be a vertical succession of green and blue pixels surrounded by two columns of dark pixels at the right and at the left (“non-illuminated” with a black pattern, or absence of light, hence no signal in the absence of noise, or applying noise thresholding techniques), and, the horizontal lines should be a succession of green and red pixels surrounded by two rows of dark pixels on top of it and two rows of dark pixels below. These principles are also applicable to any pattern and/or light wavelength.
It will be obvious to experts how coma or other aberrations in a “continuous (but discretized)” way (with the granularity of a pixel) over the whole grid of pixels are corrected. The same principles are applicable to film-sensors, tone of the options of the present invention proposes illuminating one out of every three columns (and rows) of pixels (vs. other possible solutions like for example a chess-like light pattern or illuminating one of every two columns (and rows) of pixels) to avoid polluting the measures with possible misalignments in which the white line might illuminate an area covering two adjacent pixels, such a situation can be detected with the proposed pattern but can potentially generate ambiguities with patterns in which the lines are nearer each other. The fact that at the center of a square of 4x4 (=16) pixels there are 2x2 (=4) pixels not sampled by the white grid is of very minor importance, as the Cf (correction function) for these pixels is extrapolated from its nearest neighbors with very good results (either from the two nearest pixels, or from the 16 surrounding pixels or from gradient algorithms of the sampled-squares of pixels in the vicinity). It is also important to bear in mind that 100% of the pixels have been exercised (possibly before) during the final test of the device, and the devices with a number of pixels not performing to specifications (below a threshold of working pixels) have been scrapped or assigned to lower spec-applications.
Cf can be very complex, it can include thorough calculations to compensate any aberration, but can also be reduced to just a LUT (look up table) that translates what has been recorded to what should have been recorded (by, for example, considering only distortion). The image sensor should be exposed to a pattern that covers the entire FOV (Field of view) of the camera, which should correspond with the entire sensor on an ideal design.
For clarity, Figure 20 has shown the vertical and horizontal white lines composing the image of the "pattern grid” over the sensor in grey (instead of white), and shows how the pattern projects its light over the pixels of the Sensor in a system with lenses completely corrected in aberrations, while diffraction effects were not considered in this figure. Also for clarity “thin lines” (thinner than a pixel) are shown. However, more testing strategies were implemented to compensate aberrations. In a particular test it makes sense that the width of the white lines (in the object world) is exactly what should illuminate the exact width of a pixel over the sensor, in that way if the testing starts at the central pixel of the central microlens (where the paraxial approximation holds true) a particular “electrical-level” will be obtained from that particular pixel, and 0 level in the pixels surrounding it not covered by the light pattern (or the NFL: noise floor level). As we move up or down the sensor, left or right the sensor and aberrations start warping the grid this warp is continuous, decreasing the electrical-level gradually as we move up that column of pixels, and increasing the electrical level at the pixels on the column at its right or at its left, so that the “warping” can be obtained in the correction function (Cf) with a subpixel precision, and the different warps can be obtained for the three fundamental colors (RGB-Red-Green-Blue in the embodimient exemplified in the figures but this can equally be applied to other embodiments using a non-Bayer pattern with different colors or frequencies). The warps for the three colors are not necessarily the same, but rather result in three different correction functions Cf (one for every color).
The Cf function is used not only to correct aberrations, but for other practical problems like for example the “Cosine-to-the-fourth” factor. Referring to Figure 21 (extracted from a classic text on optical engineering (Warren J. Smith's, see Reference 2), it is well known that:
Illumination at H = cos4 (illumination at A)
Being the argument of the cosine the angle θ in the figure 21. This phenomenon can become quite harmful in some designs for portable devices, obliging a thicknesses for the total camera (including sensors and lenses) of only a few millimeters (below 5mm or even down to 3 mm in some modern tablets or mobile phones) and requiring very small distances between the exit pupil and the image plane and hence very large angles. Knowing that with 30, 45 and 60 degrees that multiplying factor becomes 0.56,0.25 and 0.06 we have the tool to judge how severe the problem is for a particular design, especially with consumer cameras in portable devices that offer a wide FOV and very small thickness requirements.
From the tests proposed in the present inventions, we know that the illuminated pixels at the extreme rows/columns of pixels at the top, bottom, right side and left side of the sensor (and especially the pixels at its four extreme corners, the most sensible to this phenomenon) “should have received” the same light-power per pixel as the pixels at the very center of the sensor, hence we can compensate, for example as part of the correction function (Cf), which corrects not only aberrations and geometry defects but also light-power-levels.
As mentioned above, lines are used for the vertical and horizontal patterns (in Figure 20) that illuminate the total width of a pixel, however several different patterns are also used to test and fix different aberrations and different problems. Lots of different patterns can be used, from very simple patterns to very complex images, including real 3D-scenarios or patterns, real images, or synthetic images in the object world to test the “depth-extraction capabilities” of the camera or the sensor, as we will explain below. A very simple but useful pattern is when the lines cover only 50% of the width of a pixel when evaluating the output of the sensor, one in every three pixels along the illuminated rows or columns (in the horizontal or vertical dimension) should have 75% of the electrical-level (when the vertical and horizontal lines cross each other), followed by two pixels with only 50% of the electrical level.
Another simple pattern that is very beneficial is to illuminate the pixels with very bright white lines whose width is smaller than the width of the pixels, spacing the lines to be able to follow the patterns created by the optical aberrations while at the same time guaranteeing that the lines do not interfere with each other by simultaneously illuminating a single pixel (as we have seen by separating the lines by more than two pixels width). This helps overcome the difficulty of the optics to produce patterns as small as the pixels of the sensors (especially in small dimension mini-cameras for mobile phones, tablets and laptops) and makes it commercially viable to leverage the huge progress made in miniaturization and the red uction in microelectronics dimensions and pixel sizes, and their production at increasingly lower costs per megapixels..
While the aberration correction techniques described herein can be used for traditional sensors and cameras and for plenoptic sensors and cameras, it is important to analyze the differences between both, as well as the limitations of the technique and the capability to correct aberrations with low sophistication optics in both cases.
The e xample implementation d iscussed above has been explained with optical aberrations producing a pincushion pattern over the sensor (or film), however its generalization for barrel patterns or any pattern caused by any other optical aberration is straightforward. The white grid of horizontal and vertical lines over black background (or any other pattern designed to trace and correct aberrations) has been designed to cover the whole FOV of the camera (in the horizontal and in the vertical dimension), however the reality of non-perfect optics will result in a pattern over the sensor that is not as exactly rectangular as the sensor is, for example, in a standard 36 mm x 24 mm imaging plane, or the smaller sensors normally used for mini-cameras in mobile phones, the pincushion pattern (or any other) Will cause that either the four comers of the pattern to fall outside of the “rectangle of the sensor” (or square) or, if the four comers fall inside the sensor the pixels at the center of the top and bottom row(s) and at the center of the extreme rightleft columh(s) of are not illuminated, and nothing can be done to fix it except to compute the “should have been the pattern” from the “recorded pattern”. As stated before, this computation can be a straightforward LUT (look up table) Or a “hardwired logic correction” implementing the correction function (Cf), it is clear the correction function will be very near ah identity function for the pixels at the center of the sensor (where the paraxial approximation holds true even for lenses with high aberrations). However, at the extreme rows at the top and the bottom of the sensor (or the extreme columns at the right or at the left) the correction function is far from an identity: the “should have been the pattern” must be produced from the “pattern sampled”. For a pincushion pattern 100% inside the sensor that will mean computing values for all the pixels of the senSor, but especially for the dark pixels (with zero readings or the NFL) at center of the topbottom row(s) and for the dark pixels (with zero readings or the NFL) at the center of the extreme right-left column(s). The other possibility is that the extreme corners of the pincushion illuminate beyond the edges of the sensor while the top-bottom row(s) are perfectly illuminated by the less distorted lines of the pattern, in this situation the correction function will compute the values for the extreme pixels recreating from the available samples the “computed samples” of what would have been the light that that should have illuminated those extreme pixels but instead illuminated beyond the edges of the sensor.
The generalization of the statements of the previous paragraph to compensate for barrel distortion, coma or any other aberration are straightforward, and this compensation technique works well until the extreme in which the aberrations are so severe that two adjacent lines are brought together by the aberrations over the width of a single pixel. If using patterns with very narrow White lines every three pixels, the fact that two parallel adjacent lines do not illuminate a single pixel means that up to that point it is possible to correct aberrations that do not mix lines separated that distance. In the presence of higher aberrations the only way to compensate them is to separate the lines from each other a distance equal to a higher number of pixel widths (at the cost of having a correction function Cf Which will not re-create samples that reflect re ality with the same precision), the alternative is to re-design the Optical system with less aberrations (most likely with more optical elements or more complex aspheric surfaces). The mixing of adjacent lines is most likely to happen first at the edges of the sensor (the parts further from the paraxial approximation).
There are differences in the application of this technique to normal sensors and to plenoptic sensors. The classic way to reduce aberrations in traditional cameras without more sophisticated optics (without more lens elements and/or aspheric profiles), and without the benefit of the technique described herein, is to use smaller apertures, obliging all the rays to reach the sensor passing through a very small area near the center of the lens, hence nearer the paraxial approximation and hence causing less aberrations. A straightforward examination of the millimeter size of the aperture in most cameras used as mobile phone mini-cameras (using sensors of a few millimeters) shows that this is a solution used for consumer applications, especially when there are space and thickness limitations, that also limit the number of optical element in the lens. However, the implementation of that measure has a price: the light energy reaching the sensor is lower than with larger apertures hence more time is required to produce the same level of electrons out of a smaller number of incoming photons, limiting the shutter speeds and the number of frames per second for video applications. In the present invention the pattern of pixels below every microlens provides the opportunity to computationally reduce the aperture in a numerically while still capturing all the light, which is very useful for some applications. In the present invention, once an extreme is reached where the white lines in the testing pattern described above start hitting the same pixel and it becomes impossible to discriminate where the light comes from in order to create a correction function, it is still possible to compute pixel values illuminated by light that went through parts of the optics with high aberrations based on samples without aberrations from its nearest neighbors.
In a single pixel in a sensor without microlenses the pixel integrates all the light coming to it without discriminating where the light came from. The use of microlenses makes it possible to discriminate from which part of the aperture of the main lens the light came from, where it crossed a specific microlens and where finally hit the pixel. More specifically, the difference in aberrations in the pixels below the microlenses around the center of the sensor is very small, however, as we approach the left edges of the sensor (the same principle applies to the right, top and bottom edges) it is known that the light sensed by the pixels at the right below these extreme microlenses came from the extreme left zone of the main lens (with heavy aberrations), however the light sensed by the left pixels of these extreme microlenses came from a zone of the main lens nearer the center (with lower aberrations). In other words, (considering now the right edges rather than the left edges of the sensor), in one of these extreme microlenses towards the right side of the sensor the pixels at its left will have readings with higher aberrations than the pixels at its right, hence the white lines of the pattern will mix before at the pixels at the left of that particular microlens. In a traditional plenoptic camera or sensor with traditional plenoptic algorithms, as all the pixels mix-together to compose a single pixel per microlens it is easy to give less weight or even disregard (weight zero) those pixels that reached the sensor through the parts of the optics with higher aberrations (increasing the weight of the remaining pixels below that particular microlens). In the present invention even if the pixels at the left start having current levels that make evident that two adjacent lines of the pattern illuminated the same pixel, as the pattern on those pixels should have been a continuation of what illuminated the pixels at the right of that “aberrated pixel” it is possible to “split the recorded reading” of those pixels to what “should have been” the reading (for that particular pixel and for the surrounding pixels that should have recorded the reading with perfect optics free of aberrations) based on its neighbor pixels at the right side of the same microlens and/or from the right side pixels at the microlens immediately to the left.
This fact improves the percentage of the sensor area that can be considered free of aberrations well beyond that achieved in a traditional camera (even if the traditional camera is also using the same aberration compensation technique) as with the microlenses we extend the area that can be compensated towards the edges of the sensor since some pixels sample light coming from a zone with less aberrations (or without aberrations). We also improve the correction of aberrations vs. a traditional plenoptic camera with traditional plenoptic algorithms. In the best of the cases traditional plenoptic algorithms compensate aberrations to reach diffraction limits equivalent to
Airy patterns the size of a microlens, in the present invention with similar specs for the optics we go down to compensate at the pixel level (144 times better if we use 12x12 pixels per microlens, 81 times better if we use 9x9 pixels per microlens and in the extreme 4 times better when going to 2x2 pixels per microlens, an extreme that should not be reached).
Of course this technique is not panacea either. If, in the example discussed above, the right side pixels of a microlens at the right side of the sensor also mix into a single pixel straight lines that should be separated three pixels and are now mixed by the optical aberrations over the area of a single pixel, it becomes impossible to compensate aberrations at those edges of the sensor.
The reason for this is that in a plenoptic camera every microlens is hit by light coming from the entire aperture of the lens, and if we use the sub-aperture views of the light field (the “u times v” views composed by the images using the same u, v position in every microlens), the sub-apertures views at the extreme sides (low and high u and v numbers) have their pixels formed with light coming through areas of the main lens with different aberrations from different zones of the main lens aperture (resembling in that aspect traditional cameras), in our case we start from the assumption that any light sample is good (even samples that reach the sensor through zones of the optics with heavy aberrations) if we use that sample to compute what should have been the sample in an aberration-free camera.
The final issue to be resolved to ensure that the procedures explained above result in a good correction of aberrations, is to ensure that the plane with the testing patterns is parallel to the plane of the sensors, perpendicular to the optical axis of the camera and its center located at the-optical axis. While the first and second conditions are easy to ensure with mechanical fixtures that hold in place the testing-correcting pattern and the sensor, the third condition is a bit trickier, as small movements of one micron (or the size of the pixels, which will be smaller in the future) will misalign the pixels (the sensor itself) with the testing pattern illumination over the sensor rows and columns of pixels, not only the illuminated row or column will be a row (or a column) higher or lower (or more left or right) than it should be, illuminating the rows of pixels higher or lower (or more to the right or more to the left) than it should have illuminated, but, the fact that the horizontal (or vertical) lines of the testing pattern are not completely parallel to the rows (or columns) of pixels can result in the fact that a complete straight line of white light (as in figure 20 might run along two or more rows (columns) of sensors, as the pixels can be as small as a micron (even smaller in the future) but the sensor has a total horizontal and/or vertical length of several thousand pixels (several millimeters). There are two ways to fix the problem: • Use an active alignment machine in which the readings of the sensor are used as a feedback to the alignment machines to drive submicronic precisions in the alignments. • Use reference patterns in the white-grid of testing patterns, so that even if the rows and columns of pixels are not parallel to the illuminated rows and columns it is possible to build the aberration correction function referring all the pixels of the camera to these reference patterns.
For obvious reasons we propose the second option, and because at present it is much cheaper to implement, but we do not disregard using the first option in the future. We have used as alignment (or reference) a slight change to the central horizontal line and the central vertical line of the white testing grid pattern (see figure 20), both of them drawn from the bottom to the top as lines that should illuminate a pixel and a half, if the sensor detects a column (or a row) completely illuminated surrounded by two columns (and two rows) with 50% of the illumination that would mean perfect alignment in vertical (and horizontal) direction(s). However, if the degree of illumination of the two side columns (rows) changes with distance, it is due to the fact that the light pattern and the sensor columns and rows of pixels are not parallel, we can in this way measure how parallel the alignment of the pattern is with the sensor, and refer all the corrections for all the other pixels of the sensor to this “thick-cross”. Obviously, we need to measure the degree of parallelism at the central part of the sensor (the paraxial zone), as possible disparities towards the edges can be due to optical aberrations. It would have been possible to use any other alignment or reference pattern(s), for example two X-shaped white lines starting at the center of the white pattern plane (with the 4 legs of the X diverging at an angle of 90 degrees between each other and 45 degrees vs. the grid of the pattern), or any smaller recognizable patterns that allow to refer all the other pixels of the sensor to the pixels who sensed those patterns.
To test the efficiency of the distance estimation algorithms these are exposed to real “object world” conditions that offer feedback enabling the computed results to be compared with known distances. For that, small distances computed are compared to “previously measured indoor scenarios” (below and around 10 meters away from the camera), for medium distances (tens to little more than a hundred meters the computed distances are tested vs. known distances within industrial units), for moderately large distances open air scenario comparisons are performed (straight roads with milestones every 100 and 1000 meters), and for very large distances comparisons are performed in mountains with little vegetation (where distances from a peak of a geodesic summit to the surrounding peaks can be known with quite a good precision with the help of maps of the area and with topographical instruments). These comparisons led to prepare tests for cameras and sensors where not only the pixels and the aberrations are tested, but also the capability of the microlens/pixel duet to estimate distances (tested for all the microlenses and for all the pixels on the camera), as well as the impact of optical aberrations of lenses and microlenses on these metrics and on the capability of the algorithms to measure distances, and the limitations to correct aberrations and to calculate the correction function (Cf); this also led to the use of more practical test scenarios to measure camera performances by exposing the camera under measurement to photos with a large number of pixels and on synthetic images that exercise and test the 3D-distance measurement capabilities of all the microlenses and all the pixels of the camera, as we explain below.
As mentioned above, the grid of white vertical and horizontal lines over a black background can illuminate the microlenses/pixels of the sensor as described in figure 20 (and the discussion below is applicable independent of the width of the white lines, either designed to illuminate the complete width of a pixel, or more than that, or very narrow lines that can be considered of nearly infinitesimal size, or half a pixel or any width in general). We could design such grid-patterns to produce the desired effect at let us say 100 meters of the camera, but it is possible to design a pattern that can exactly produce the same illumination pattern over the pixels by situating a homothetic pattern at just 10 meters of the camera and with different smaller homothetic patterns at any distances (for example at 1 or 0.5 meters of the camera), the latter distances being more practical for testing in an industrial environment, as for example in final tests in a manufacturing line before shipping commercial products to customers. The limit is on the smallest patterns that can be printed with state of the art printing techniques, which will impact the size of the pattern (its width and height and the separation between lines) and the distance between the testing pattern plane and the camera (we can test large distances in open air conditions but we should not have kilometric patterns in a production line).
Let us now design a fictitious 3D-pattern in which “rectilinear wires” starting very near the camera go away from the camera to very large distances (even as far as distances that can be considered to be at infinite distance from the camera). Let us assume that those wires are held static by invisible columns, let us assume that we can also design the diameter of the wire (as for example a larger diameter as the distance from the camera increases). Theoretically, (even if we would have unsurmountable practical difficulties as we increase the number of wires) it is possible to design a 3D-pattern of wires that exercise most of the pixels of the camera and test the 3D-capabilities of the algorithms. Please note that we can extrapolate the measurements for the nonexercised pixels (in the same way that in figure 20 we have extrapolated for the 4 pixels at the center of the 4x4(=16) pixels square). It would be non-practical to produce such 3D pattern covering larges distances. It would be very difficult to produce smaller 3-D homothetic patterns that could be used in real applications (for example in a manufacturing or testing environment) but, as we have done, it is possible to create synthetic 2D images multiplexing Slightly different patterns in the time domain to simulate different parallaxes as perceived by the sensor (with for example the help of state-of-the-art displays or display arrays with a very large number of pixels) that produce several patterns over the sensor in the time domain that, adequately processed, produce the desired light-field that should have been produced by such a 3D pattern - synthetic 2D-images that can be used for practical applications such as testing in manufacturing lines for example, - with those 2D-patterns (simulating real 3D-scenes) to test not only the pixels of the camera but also the measures of distance and the 3D-capabilities of the devices, and also the impact of optical aberrations on the 3D-capabilities of the camera, as well as the impact of tolerances (for example manufacturing tolerances on a device per device basis). More importantly, these are used to compensate optical aberrations comparing the “sensed light field” (SLf) and the “should have been light field” (Lf), the “real light field”. Hence, it is possible to create a 3-dimensional aberration correction function 3DCf:
Lf (X, Y, U,V) = 3DCf (SLf(x, y, u, v))
These correction functions (2D and 3D) enable the manufacture of optics with less tight tolerance requirements, a smaller number of lenses to produce the required specification and, in general, much easier to manufacture, hence much cheaper to produce and compensate on an individual device basis.
Finished cameras, or sensors such as those depicted in figures 12, 13, 14 and 15 (and in general any image sensor or camera) needs to be tested before being sent to customers, hence is possible to perform the “aberration compensation” of every camera on an individual basis, and hence compensating the manufacturing tolerances in any unit produced. Those “aberration compensations” can be “stored” (or hardwired) in “non-volatile-memory” that can be shipped with the camera (or the sensor) itself, as for example an EEPROM (electrically erasable programmable read-only-memory), EPROM, ROM or any other memory. It is also possible to store the “aberrations compensations” in a routine of “firmware” (or on a software file) that can be either shipped with the camera (or the sensor) or remotely-downloaded (for example on a serial number basis).
The aberrations compensation routines described above can be implemented (or re-written or reinitialized) on every individual device (sensor or camera): • As a step in the manufacturing process (for example, during the final test) in order to make the cameras (or sensors) more tolerant to any deviations or tolerances in the manufacturing process, making the optics easier to manufacture to the requested specs (end hence with lower production costs). • Periodically, by final users in order to correct aberration drifts with time, vibrations during transport and travel, and temperature drifts in very low or very high temperature environments”.
Aberration correction functions are important. They make it easier to reach diffraction limited devices because we compensate abenations computationally, but they are not panacea. If aberrations are so strong the line grid patterns in figure 20 become undistinguishable, as for example reaching an extreme in which the patterns that should have illuminated a column and a row illuminate with nearly the same intensity three columns or three rows, the compensation techniques described would become non-implementable. Besides, the aberration compensation techniques described can only take us to the point where diffraction phenomena dominates, and the quantic and wave nature of light starts producing unexpected patterns over the sensors. However, as we will explain below the inventions are more immune to diffraction limits and aberrations than traditional cameras with the same pixel size.
This is exemplified in Figure 22 which demonstrates how to propagate light originating at two adjacent 1.12x1.12 square micron pixels in a Bayer pattern (a green and a red pixel) several kilometers into the object world, as diffracted through a 11.2x11.2 microns micro-lens and through the main lens of the camera. The result at such a long distance from the camera is still a decently good square green pixel, but the red pixel is diffracting an important part of its energy outside the area it should have been without diffraction. The reality is just the optical conjugation of what we have described in figure 22, a couple of green and red squares several kilometers away from the camera would produce a decently good green pattern matching quite well the area of the green pixel, however the blue energy would diffract into adjacent pixels bringing several undesired effects: first of all, part of the energy that should produce “red-electrons” through the red filter has been wasted outside its desired area, even if diffractions in the vicinity of green and red pixels are filtered, their energy is wasted. However, when the diffraction pattern introduces a noticeable amount of energy beyond 1.12 microns we have an undesired interpixel noise leaking between pixels of the same color, deteriorating the SNR (Signal to Noise Ratio) and hence the contrast of the camera.
Let us analyze this phenomenon more in depth. A point in the object world going through a series of circular lenses will produce beyond those lenses a pattern like in figure 23.A where the light power levels forms the so called Airy circles and Airy rings (or nearly-square patterns as at the top of figure 23.B in the case of square micro-lenses) given by the expression:
where Jl represents the first order first kind Bessel function, λ the wavelength, n' the refraction index, σ'the exit angle between the exit pupil and the photo-sensor, and p the radial coordinate on the image plane. The central Airy circle contains 84% of the power of the beam. The first zero of the function appears at px '= 0,61A / «'sin σ', showing that the Airy circle radius depends on the incident wavelength. This explains why in the extreme example we have used in figure 22
propagating the two pixels over several kilometers, the red energy scatters away more quickly than the green energy (the green pixel is at the center of the lenses, centered at the very optical axis of the camera, having a much lower exit angle when crossing the lenses and also has a lower wavelength, hence creates different diffraction patterns).
To design a camera free of blurry images caused by diffraction (assuming we have achieved a good diffraction limited design free of aberrations, or with the aberrations compensated with techniques as described in this invention) the recipe is to guarantee that the Airy patterns that illuminate a pixel should not illuminate adjacent pixels, especially adjacent pixels of the same color, as we explain below. Figure 24 shows how this phenomenon happens with 4 adjacent pixels in a Bayer pattern (but the discussion is applicable to any pattern with different fundamental colors or different frequencies): the two peaks of the graphs at the right and left of figure 24 show the power levels over the two adjacent green pixels, the light that leaks through diffraction of the red pixel at the top into the area of the green pixels below (or into the blue pixel at the bottom) is severely attenuated; the dotted line shows the red light level after being filtered and attenuated by only 6dBs (lOdBs in the line at the bottom), color filters of moderate thicknesses attenuate more than that.
Hence the orthogonality between colors makes the problem less severe than it looks at first sight. Furthermore, using our invention, the important factor is the size of the Airy-circle produced by the main lens of the camera at the top of every lens of the MLA (not at the top of the pixel). If, for example, we use square pixels of 1.12 microns and every microlens in the MLA contains 10x10 pixels, the Airy-circle diameter required to properly receive 84% of the light power that has gone over the main lens without leakages leading to blur is 11.2 microns (vs. the 3.36 microns (or 1.12 to use all the light power) that would be required in a traditional camera), this size (x3 or xlO) relaxes the optical specs enabling them to be achieved with much lower cost lens designs (including fewer elements and less sensitive to manufacturing tolerances). This is due to the fact that is less critical to design how the Airy circle deteriorates between the microlens and the pixels. In this particular example, to achieve properly designed microlenses that propagate Airy circles below 1.12 microns is not a bottleneck in the design budget (bearing in mind that the distance between the microlenses and the sensors would be very small: around 20 microns with the pixels and microlens sizes we referred above if we aim for an f2 spec). Furthermore, having in mind that we can allow Airy circles 3 times the diameter of the side of a pixel (see figure 25) because power leakage consequences are attenuated by the orthogonality between colors, the design budget assigned to the micro-lens becomes even less critical. In other words: the circle at the center of figure 25 (even if in reality the Airy pattern is square due to the shape of the micro-lens) has a diameter (side of the square Airy pattern) that can be as large as 3 pixels without causing prohibitively high blurry images (84% of the power does not cause blur due to diffraction). We have shown how that happens for a blue pixel at the center of figure 25, for a red pixel (with the dotted circle up-left) and for a green pixel (marked with a semi-dotted circle below).
An important topic for a correct implementation of the invention described is an appropriate estimation of the distances, and even if the human eye is tolerant to small artifacts and tolerances in distances (images keep appearing to the human eye as “high quality images” even if the distance estimations are not 100% correct), disturbing artifacts can appear if we do not perform distance estimation properly. New algorithms to produce high quality distance estimations with low computational load will for sure keep appearing as the state-of-the-art keeps evolving. Old algorithms which only make the computational equivalence of a visual inspection to identify slopes of lines as in figures 17. A and 17.B might not produce the best results. It is possible to use any algorithms from prior-art and it will be possible to Use future algorithms, we have used what we call “trigonometrical analysis of the four derivatives of the light field”, where the angle alfa of the slope of any point (x, y, u, v) of a light field (the angle of the slopes at figures 17 for every point of the light field) is computed as follows: alfa = 0.5 arctg(2((dLf/dx)(dLf/du) + (dLf/dy)(dLf/dv)f/(((dLf/dx)(dLf/dx) + (dLf/dy)(dLf/dy)) - ((dLf /du)(dLf/du) + (dLf/dv)(dLf/dv)) ))
Where (dLf /di) is the partial derivative of the Light Field respect to i, being i=(x, y, u, v).
The expression before has been derived from trigonometrical considerations observing the distances in figure 3 (that is, the way in which the “arc Whose tangent is” has been introduced in the formula of the differential derivatives), and from differential considerations and function minimization algorithms to process target functions iii the images that form the light field sampled by the sensor. This algorithm offers a good balance between quality of the estimations of the slope and computing power requirements (vs. other approaches where the computing requirements are prohibitive).
There are multiple embodiments in which the inventions of this application can be implemented. It would be out of context to describe all the possible permutations of all the possible embodiments of all the inventions and all the possible permutations with different embodiments in the state of the aft and with other future inventions in optics and in image processing, hence the examples described in this application should not be interpreted in a way as to limit all the possible permutations, the extent of the inventions must be extracted from the claims.
For example in figures 3 and 4 we only show an equivalent main lens of the camera system (101) for simplicity, while in reality it would be several lenses in order to reduce aberrations or design a specified FOV; figure 6 shows the MLA included within the structure of a lens (vs. figures 3 and 4, in which the MLA is very near the sensor), the main inventions of this application are also applicable to those large size lenses (not only to embodiments where the MLA is near the sensor as in figures 2,3 and 4); smaller monolithically integrated structures as exemplified in figures 10 to 15 (very convenient to manufacture high volumes at very low cost, sometimes using WLP (wafer level processing), not only for the silicon but also for the optics) can be used together with the inventions of this application; figure 7 shows a Bayer pattern filtering the light through red, blue and green color filters, while in reality any other patterns like the Yotshuba pattern (figure 9) can also be used to sample the light that will be processed with the inventions of this application, including filters for non-visible light, IR, ultra-violet, etc., etc., as well as using patterns with different fundamental colors such as yellow, magenta and cyan; figures 7 to 15 are mostly based for square pixels and square microlenses, while in reality it is possible to use the inventions of this application with hexagonal microlenses, triangular pixels, and other polygonal shapes for pixels and microlenses, which improve practical implementation features from some points of view; in figures 16 and 17 we describe the rays in terms of the intersection with 2 planes (x, y, u, v coordinates), while it would have been possible to describe them with a single plane (x, y) and two angles of arrival to that plane; figures 19. A and 19.B describe the patterns of the real world that would hit 2 adjacent green pixels at a Bayer or Yotshuba pattern, however the same statements are applicable for any other frequencies (red, blue, IR, UV, etc.); we can use the inventions described in this applications in conjunction with CMOS-sensors, CCD-sensors, films for traditional cameras or any other past or future pixel-sensing technology or photodetectors; when using CMOS image sensors, the inventions described can be used with FSI-technologies (Front Side Illumination) or with more modern BSI-technologies (Back Side Illumination); we did not refer to some traditional techniques generally applicable to the design of cameras as for example filtering away undesired spectrum (as for example IR spectrum in visible-range cameras), these (and all of the) traditional techniques generally applicable to cameras should be borne in mind when designing devices that incorporate our inventions. The extent of the invention is defined in the claims of the application.
Examples of use and differences vs. State of the art.
The “virtual image/real world” patterns obtained by tracing the pixel images back to the real world can be used to produce 3-dimensional images of the real world by computing two stereo-images from a single scene and feed 3D displays (with anaglyph glasses and traditional displays, with state-of-the-art stereoscopic displays with passive or active glasses), to feed glasses-free multi-stereoscopic displays sending multiple stereo-images in multiple directions, or to feed current prototypes (or future products) of integral displays, even to feed future holographic displays.
The main advantage vs. stereo-cameras is that we divide by two the complexity and the cost: only one sensor and only a set of lenses (vs. two sensors and double the number of lenses in stereoscopic cameras). A possible disadvantage of the approach of our invention would be having a lower parallax, the several centimeters distance between the two lenses in a stereo camera make it easy to discriminate distances as far away from the camera as can be discriminated by the human eyes. This disadvantage is compensated in large cameras using our inventions by using large lenses with a large diameter, up to several centimeters (increasing the parallax up to the several centimeters of the entrance pupil), this approach is very rarely used in conventional cameras due to the fact that increasing the size of the lens and using low f-numbers (large apertures) drastically increase lens aberrations as the design-points need to be very far from the paraxial approximation, however in our case this fact is not important due to the inherent capability to “computationally-compensate-aberrations” associated to our invention. In mini-cameras for mobile phones, tablets, laptops and portable devices our approach can also increase the parallax, apertures in traditional mini-cameras tend to be very small (around 1 millimeter), we can afford lenses and apertures as large as the sensor itself or even larger in mini-cameras with very small thicknesses because our technique makes it possible to compensate aberrations computationally, it is even possible to produce rectangular or square lenses and apertures without any fear of aberrations (apertures as large as the sensor itself or even larger). The additional computational load needed to compensate aberrations can be a one-time process used during the final test after manufacturing the devices (or during periodical calibrations to correct aberration drifts with time). The computational load needed to trace the sensor signal back to the object world (straightforward triangulations) is lower than the computational load to process the images from two different cameras in a stereo-pair and can be ported to the low power processors in mobile phones and tablets. The use of low f-numbers without any fear of aberrations (compensated computationally) improves the sensitivity of the camera, as larger apertures allow more light energy to reach the sensor and reduce “exposuretime”, increasing the achievable number of frames per second in a video camera with sensor of the same sensitivity.
The possible use of square lenses in its full-dimension without any fear of aberrations at the edges not only improves our invention vs. stereo-cameras but also adds advantages vs. the traditional way in which plenoptic-cameras have been implemented, as the rectangle pattern from the main lens is designed to keep the sensor fully “illuminated”, in the case of circular lenses either we produce a circle pattern larger than the rectangle of the sensor (increasing bulkiness of the solution, especially if we do not compensate aberrations computationally) or we produce lower- bulkiness lenses that do not provide the required levels of energy at the comers of the sensor, reducing efficiency of some of the sensor pixels in its comers. Besides the cos4 compensation we introduce in the correction function (Cf) give us a quantum leap in this aspect, increasing luminosity of our solution and reducing bulkiness.
We have already stated before that a 1 Megapixel-image using a traditional plenoptic camera and algorithms would need an 81 Megapixel sensor (if using 9x9 pixels per micro-lens), while using our inventions we only need a 1 Megapixel sensor. Even if the cost per pixel has been drastically reduced in recent years with the mass adoption of CMOS sensors and low dimension CCD sensors, this advantage is applicable to both traditional plenoptic cameras and our invention. At the moment of writing this application there is a “conventional wisdom” that the total cost of a mini-camera for mobile phones and laptops is about a US dollar per Megapixel (5 US dollars for a 5 Megapixels mini-camera vs. 35 US dollars for a 35 Megapixels mini-camera, which would severally restrict the market size if an 81 Megapixels sensor were needed). Even if the “conventional wisdom” statement made above will be made obsolete by new technological processes, more dense CMOS and CCD sensors and WLP (Wafer Level Processing) techniques for optics and electronics, the future cost reductions will benefit all (traditional cameras, stereocameras, traditional plenoptic cameras with traditional plenoptic algorithms and our inventions), maintaining our cost advantages in the future.
If a traditional plenoptic camera aims to reduce cost by trading-off “directional resolution” for cost, by using a smaller number of pixels per microlens (instead of 9x9) for example, using 5x5 pixels per microlens (producing “less-refocusable” images and lower quality “all-in-focus” images, and lower quality 3D-images (discriminating a lower number of different 3D-depths) and, more blurry images in general, cost can be reduced. However, even a 25 Megapixels sensor (and its associated optics) will be more expensive than a 1 Megapixel sensor, now and in the future. In the extreme, sacrificing image quality for cost, going down to 2x2 pixels per microlens would still necessitate a 4 Megapixels sensor vs. 1 Megapixel sensor, which is more expensive now and, will continue to be so in the future. In this extreme case, the image quality would be compromised to the extent that the quality of a plenoptic camera would not be better than the quality of a stereocamera, reaching engineering trade-offs that do not make sense from the practical point of view.
Furthermore, one of the reasons why a large number of pixels per microlens (81 or even higher) has been traditionally chosen is due to the fact that the alignment techniques (between microlenses and pixels) are much cheaper if tolerances and misalignments between microlenses and pixels are allowed, if in a square of 81 pixels (9x9) the square of the microlens (but the argument is also applicable to hexagonal microlenses and non-square pixels, or any geometry in microlenses and pixels) is completely misaligned and the edge of the microlens falls between two pixels, still we have a square (or an hexagon or whatever polygon) of useful pixels of 8x8 pixels per microlens. This fact is explained in figures 27.A and 27.B, figure 27.A shows a perfect alignment of a microlens (drawn as a black square) over a Bayer pattern of 5x5 pixels, figure 27.B shows a typical misalignment where 5 pixels at the top and 5 pixels at the left are wasted, hence the number of useful pixels per microlens is 16 (vs. 25 with the perfect alignment of figure 27.A). This phenomenon becomes worse as the number of pixels per microlens becomes smaller, in a design of 2x2 pixels per microlens only one out of 4 pixels would be usable, and in the extreme of cost reductions adopting structures of 2x1 pixels per microlens (where a plenoptic camera would become just an array of multiple stereo cameras) none of the pixels would be useable in case of misalignments. Perfect alignments as in figure 27.A require active alignment techniques, where the sensor needs to be connected and exposed to a known image pattern in the manufacturing line that will drive the feedback to the alignment machines in order to achieve the submicron precision that would be needed for the around one micron pixels (and going below a micron with new technologies) that are currently state-of-the-art. These active alignment techniques reduce throughput in the assembly line and increase costs up to much higher levels, as well as deteriorating the quality of the images as described before.
We did not mention the computational costs, but the digital signal processing (real-time or not) of a million pixels is much more affordable and cheaper than the processing (real-time or not) of 81 million pixels (or than 25 or 5 million pixels), especially bearing in mind that the complexity of the plenoptic algorithms does not grow linearly with the number of pixels, it grows in a much quicker way. This fact, together with the lower number of useful final pixels, are the two main handicaps of traditional plenoptic cameras (deemed by some as unaffordable due to their high computational costs). The present invention overcomes both problems: it reduces drastically the cost per useful pixel and reduces the cost of digital signal processing for a given number of useful pixels.
As a consequence we conclude that this invention yields a much lower cost per useful Megapixel than traditional plenoptic cameras, much lower processing requirements, as well as much higher quality images, lower cost and quicker exposure times than stereo-cameras.
References. 1. Capture of the spatio-angular information of a 3D scene. H.Navarro, M.Martinez-Corral, A.Dorado, G.Saavedra, A.Llavador and B.Javidi. Opt.Pura Apl. 46 (“) pages 147-156 (2013). Secci6n Especial: X Reunion National de Optica. Sociedad Espanola de Optica. 2. Modem Optical Engineering. Warren J. Smith's. Third Edition. McGraw-Hill.

Claims (4)

  1. Claims The implementations described in the drawings, in the examples of use. and in the description of the inventions are presented by way of exemplary embodiments thereof, and not by way of limitation, and must not be interpreted as limitations or unique implementations. Several inventions are described, and the information supplied in this application describes several combinations of the inventions, with each other, and with several implementations of prior art; however we did not describe all the possible permutations of all the inventions described with all the implementations of prior art as the list would become unnecessarily large. Various modifications to these implementations will be readily apparent to those skilled in the art, and the generic principles described herein can be applied to other implementations without departing from the scope of the invention. It is further understood that the scope of the present invention fully encompasses other implementations that may become obvious to those skilled in the art and that the scope of the present invention is accordingly limited by nothing other than the appended claims. What is claimed is: 1. hr a plenoptic camera, a method to compose a three-dimensional virtual image outside of the camera, in a three-dimensional object world, with the same lateral resolution as pixels in a sensor of the plenoptic camera, wherein light patterns captured by every one of the pixels in the sensor are conjugated or traced back into the three-dimensional object world at a distance for every pattern which it is known that pattern occupies in the three-dimensional object world, the distance of every pattern to the three-dimensional object world having been previously computed from the light patterns captured by the pixels in the sensor.
  2. 2. A method to drive three-dimensional displays from the three-dimensional virtual image as recited in claim 1 wherein shapes, sizes and the distance of the virtual image are used to compute the input to the display.
  3. 3. A method as recited in claim 1 wherein the distance of every pattern to the three-dimensional object world is calculated from the measurement of slopes in epipolar images captured by the plenoptic camera.
  4. 4. A method as recited in claim 1 wherein the distance of every pattern to the three-dimensional object world is calculated using trigonometrical analysis of four derivatives of light field captured by the plenoptic camera, wherein the angle alfa of a slope of any point in the light field is computed by the following formula: alfa = 0.5 arctg(2(fdLf /dx)(dLf /du) + (dLf/dy)(dLf/dv))/(((dLf/dx)(dLf/dx) + (dLf /dy)(dLf /dyf) - &dLf/du)(dLf/du) + (dLf/dv)(dLf/dv)) )) where (5Lf / Si) is a partial derivative of the light field respect to i, being i = (x, y, u, v).
GB1502601.6A 2015-01-25 2015-01-25 Full resolution plenoptic imaging Active GB2540922B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1502601.6A GB2540922B (en) 2015-01-25 2015-01-25 Full resolution plenoptic imaging

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1502601.6A GB2540922B (en) 2015-01-25 2015-01-25 Full resolution plenoptic imaging

Publications (3)

Publication Number Publication Date
GB201502601D0 GB201502601D0 (en) 2015-04-01
GB2540922A GB2540922A (en) 2017-02-08
GB2540922B true GB2540922B (en) 2019-10-02

Family

ID=52781703

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1502601.6A Active GB2540922B (en) 2015-01-25 2015-01-25 Full resolution plenoptic imaging

Country Status (1)

Country Link
GB (1) GB2540922B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815823B (en) * 2017-02-22 2020-02-07 广东工业大学 Lens distortion calibration and correction method and device
CN109325981B (en) * 2018-09-13 2020-10-02 北京信息科技大学 Geometric parameter calibration method for micro-lens array type optical field camera based on focusing image points

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
R Kimmel et al, "10th Asian Conference on Computer Vision, Queenstown, New Zealand, November 8-12, 2010, Revised Selected Papers, Part II", published 2011, Springer Berlin Heidelberg, p 186-200, T Bishop et al., "Full-Resolution Depth Map Estimation from an Aliased Plenoptic Light Field" *

Also Published As

Publication number Publication date
GB2540922A (en) 2017-02-08
GB201502601D0 (en) 2015-04-01

Similar Documents

Publication Publication Date Title
CA3040006C (en) Device and method for obtaining distance information from views
US9532033B2 (en) Image sensor and imaging device
JP5929553B2 (en) Image processing apparatus, imaging apparatus, image processing method, and program
KR101605392B1 (en) Digital imaging system, plenoptic optical device and image data processing method
CN202750183U (en) Parallax imaging apparatus and parallax imaging system
US9383548B2 (en) Image sensor for depth estimation
US8717485B2 (en) Picture capturing apparatus and method using an image sensor, an optical element, and interpolation
JP5627622B2 (en) Solid-state imaging device and portable information terminal
CN102917235A (en) Image processing apparatus, image processing method, and program
JP6217918B2 (en) Array-like optical element, imaging member, imaging element, imaging device, and distance measuring device
JP4764624B2 (en) Stereoscopic display device and stereoscopic image generation method
US20140071247A1 (en) Image pick-up device and distance measuring device
Michels et al. Simulation of plenoptic cameras
GB2540922B (en) Full resolution plenoptic imaging
JP4693727B2 (en) 3D beam input device
CN109792511B (en) Full photon aperture view slippage for richer color sampling
KR102314719B1 (en) Plenoptic sub-aperture view shuffling with improved resolution
US20140184861A1 (en) Accurate plenoptic rendering with defocus blur
Drazic Optimal depth resolution in plenoptic imaging
CN113053932A (en) Apparatus and method for obtaining three-dimensional shape information using polarization and phase detection photodiodes
Drazic et al. Optimal design and critical analysis of a high-resolution video plenoptic demonstrator
Brückner et al. Driving micro-optical imaging systems towards miniature camera applications
CN117857935A (en) Image sensor
Xia et al. Mask detection method for automatic stereoscopic display
Brückner Multiaperture Cameras