A METHOD AND APPARATUS FOR VIDEO IMAGE STABILIZATION
FIELD AND BACKGROUND OF THE INVENTION
The present invention relates to an apparatus and a method for video image stabilization, and, more particularly, but not exclusively to stabilization of images from a camera mounted on a non stationary platform. In the context of the background art, we differentiate between two essentially separate aspects of the field of image stabilization. The first is the need for a camera mounted to a non stationary platform to maintain a constant direction in space, and is generally obtained through inertial stabilization methods typically utilizing gyroscopes coupled to a motorized PTZ (pan, tilt, zoom) apparatus of some sort. Such a combined stabilization apparatus, called a gimbal, is often found in surveillance aircraft and sea vessels that bear characteristics of long distance observation from which a high zoom camera is required to capture relevant information. The gimbal apparatus ensures that the camera itself maintains its direction in space with extreme exactitude, despite its being harnessed to say the body of an aircraft that is free to move with six degrees of freedom. In other words, the platform that carries the camera is free to move and change its position in space and nonetheless the camera will always remain oriented towards the same direction. Any change in the camera direction is thus carried out through a type of motorized PTZ apparatus, commonly initiated by a human operator. The second aspect of image stabilization is the mitigation of a variety of types of vibration, shaking, jumping, and swaying of the camera. These movements are all too well known in the world of hand held video cameras. No matter how steady one attempts to hold his hand while shooting a video, the resulting video seen is not free
of those annoying jumps, often very irritating to a viewer. Various stabilization methods exist in the art to deal with these phenomena. In contrast to the first aspect discussed in the previous paragraph of maintaining directional position in space, here the directional position of the camera generally follows the position of the platform to which it is harnessed. An example would be in a tank or armored vehicle carrier, where the camera direction follows the direction of the armored vehicle or turret to which it is mounted, and directional changes may be very frequent or even continuous. Although directional changes are followed by the camera as desired, the problem of camera vibrations remains. Many methods have been proposed to deal with the issue of vibrations and shaking of video images. One common solution is an optical stabilizer, such as typically found in professional video cameras and in many of the newer digital stills cameras on the home consumer market. Optical stabilizers generally are built around a stabilization apparatus that is essentially a mechanical apparatus. A shaking or vibration of the camera is immediately compensated for by a relative movement between the optical elements and the sensor, such as a CCD or CMOS. The relative movement between the lens and the CCD compensates for small displacements and/or rotation of the camera in space, such as hand vibrations. Sometime the lens or parts thereof move and the CCD remains in place, whereas in other implementations the CCD moves and the lens remains fixed. The freedom of movement between the parts is limited so that small movements and shakes are compensated for while still enabling the overall optical elements to follow the desired direction of the camera operator. In particular, the phenomenon known as motion blur is dealt with in a fairly good manner by optical stabilizers. Another common solution is software based image stabilization. Such software based methods are very common in home video cameras and less common in post-processing software for various video applications. In home video cameras for instance, the CCD is exposed to a wider field of view than is actually displayed to the user. The system calculates the image global translation, for example in relation to the previous frame, and then outputs what is referred to as a region of interest. That is to say, only a partial section of the CCD matrix is used for any given frame. When a particular frame jolts in relation to the previous one, by convenient use of the "border region" of pixels on the CCD that is not displayed to the user, the picture remains in
place on the display screen. A continuity of stable frames is thus produced. However, in contrast to optical stabilization methods, software methods lack the ability to counteract the phenomenon of motion blur.
A still further method in the art is the use of a type of external suspension system, such as a pneumatic or spring based suspension system. The camera is harnessed to such a platform, whose spring or pneumatic action offsets any shaking or vibrations of the camera, much like a person riding in a car. A gyroscopic restraining apparatus may, for example, be used to further restrict any directional changes of the suspension system. A fourth method utilizes a motorized gyroscope that is commonly intended for maintaining direction in space for the purpose of stabilization of video images. In such a method, the PTZ mechanism coupled to the gyroscope does not follow the direction of the camera platform precisely, but rather follows a moderated version of the motion. In other words, slight changes and bounces of the camera platform, which may be for instance a car, plane, or other type of moving object, are negated by the gyroscope, but significant angular changes in the platform direction are indeed followed by the camera. This system is complex, delicate, physically heavy, and rather expensive and therefore is not in wide use for purposes of shock mitigation of cameras. The drawback of the fourth method has been stated. The remaining first three solutions, when implemented despite the drawbacks mentioned, may be viable in certain narrow angle stabilization applications. However, they are certainly much less suited to the intrinsic problems encountered when utilizing wide angle lenses. In such cases, the x and y translations of imaged objects to compensate for camera vibrations cause a visual artifact that is seen as dynamic radial deformations of objects in image sequences. These physical deformations of objects observed in a video image are due to a change in the image principal point over the image sequence during the process of image stabilization.
To exemplify, we turn to Fig 1, which is a generalized illustration of prior art. The figure shows two stabilized image frames, Image 1 and Image 2, captured over two successive time intervals, ti and t2, with a wide angle fish eye lens. In the figure, an unstable image device captures at each time interval, ti and t2, a monument structure. Illustrative video image 13 shows the registered video sequence over time
ti-t2 of Image 1 and 2. As stated, the shifting of the house in the image frames from top to bottom is caused by instability of the imaging apparatus. The instability of the camera is such that rotation around the camera's optical center is the dominant component of frame instability and the translational component of camera movement is negligible.
It is seen in both frames that parts of the captured structure closer to the edge of the field of view of each image frame and therefore farther from the principal point are most deformed, as is typical of fish eye lenses. The distance of the object or part thereof from the image's principal point determines the extent of distortion. For instance, in Image 1 taken at time interval tl, the stairs of the monument that are situated at the bottom part of the image frame are seen as more deformed than the roof which is situated closer to the principal point. Likewise, in Image 2 taken at time interval t2, the roof of the monument that is now shifted to the top part of the image frame is seen as more deformed than the stairs which are now closer to the principal point.
In a video sequence, image stabilization is carried out by attempting to compensate for shifts and/or rotations in each frame of the sequence. In the example of Fig 1, Image 1 is shifted and/or rotated downwards and to the right to compensate for camera motion upwards and to the left at t\. Image 2 is shifted and/or rotated upwards and to the left to compensate for camera motion downwards and to the right at t2. The shifting and/or rotation is intended to give a resultant video sequence over ti-t2 in which the objects in each image frame overlap over the sequence in an optimized manner. The overlapping video sequence is referred to as a registered video sequence.
When image stabilization between Image 1 and Image 2 is attempted merely by frame shift and rotation in the X-Y plane, a singular center of distortion or principal point does not exist. Principal point 10 of Image 1 and Principal point 12 of Image 2 are shown. These two principal points, 10 and 12, are shown as being retained in the illustration of the overlaid video image 13. The video image over time interval trt2 is comprised of the two image frames each having a different principal point. Since the stabilized video sequence 13 is comprised of the two image frames having different principal points, the individual distortions of the constituent frames remain, and object registration is non-overlapping as shown. The lack of a singular principal point in the image sequence 13 causes varying distortions in each image frame comprising the
image sequence. These distortions, when viewed in a video sequence, are seen as temporal deformations of imaged objects. By temporal, it is understood that the deformations are perceived over time, or successive image frames in the video sequence. This temporal deformation is shown as a ghost image 13 for illustration purposes, and when viewed over time appears as bending of solid objects over time.
The above problem is seen during optical stabilization where the optical axis is free to physically move in relation to the CCD. As a result, the image principal point changes over the image sequence. Similarly, in software stabilizers, the region of interest changes, meaning that once again the image principal point is not preserved over the sequence. In either case, when utilizing wide angle video, the dynamic change of the principal point leads to visual artifacts. These deformations are sometimes referred to herein as the "optical torsion effect" or "banana effect". This effect is a common detriment of image stabilization methods known in the art.
As stated, in fish-eye lenses dynamic radial deformations appear during typical stabilization techniques. For example, solid and flat surfaces are perceived as dynamically bending, as seen in Image 13 of Fig 1, even while generally remaining in the same position in the image frame. In wide rectilinear lenses, perceived object sizes dynamically change. For example, solid objects are perceived as if they are inflated or deflated, even while generally remaining in the same position in the image frame. As a result, a human observer has difficulty in distinguishing between static and dynamic objects as well as between rigid and deformable objects. Moreover, depth perception is decreased, and vertigo and/or dizziness may occur.
In essence then, any image stabilization method for wide angle applications should not merely compensate for global motion of objects in successive image frames, but also present a way to preserve a fixed principal point for all images.
Therefore, there is an unmet need for, and it would be highly useful to have a system and method that overcomes the above drawbacks.
SUMMARY OF THE INVENTION
According to one aspect of the present invention there is provided an apparatus for image stabilization, the apparatus comprising: a) a light-sensing camera operative to sense light to capture a plurality of scene images including a first scene image at a first point in time and a second scene image at a second point in time after the first point in time; b) a camera motion-sensor operative to determine for the first and second points in time, a physical displacement of the light-sensing camera in physical rotation-translation space; c) an image-warper operative: i) to generate a first warped image by applying a first warping to the first captured image, the first warping being defined such that the first warped image is a curvilinear image; ii) to generate a second warped image by applying a second warping to the second captured image, the second warping being defined such that the:
A) second warped image is a curvilinear image; B) second warping differs from the first warping in accordance with the physical displacement that is determined by the camera motion-sensor; and d) a display unit operative to display the generated warped images. According to another aspect of the present invention there is provided a method for image stabilization, the method comprising: a) using a light-sensing camera operative to sense light to capture a plurality of scene images including a first scene image at a first point in time and a second scene image at a second point in time after the first point in time; b) determining for first and second points in time, a physical displacement of the light-sensing camera in physical rotation-translation space; c) generating:
i) a first warped image by applying a first warping to the first captured image, the first warping being defined such that the first warped image is a curvilinear image; ii) a second warped image by applying a second warping to the second captured image, the second warping being defined such that the:
A) second warped image is a curvilinear image;
B) second warping differs from the first warping in accordance with the physical displacement that is determined by the camera motion-sensor; and d) displaying the generated warped images.
According to another aspect of the present invention there is provided an image stabilization apparatus for use with a light-sensing camera operative to sense light to capture a plurality of scene images including a first scene image at a first point in time and a second scene image at a second point in time after the first point in time, the image stabilizer comprising: a) a camera motion-sensor operative to determine for the first and second points in time, a physical displacement of the light-sensing camera in physical rotation-translation space; b) an image-warper operative: i) to generate a first warped image by applying a first warping to the first captured image, the first warping being defined such that the first warped image is a curvilinear image; ii) to generate a second warped image by applying a second warping to the second captured image, the second warping being defined such that the:
A) second warped image is a curvilinear image;
B) second warping differs from the first warping in accordance with the physical displacement that is determined by the camera motion-sensor; and c) a display unit operative to display the generated warped images.
According to another aspect of the present invention there is provided an image stabilization apparatus for use with a light-sensing camera operative to sense light to capture a plurality of scene images including a first scene image at a first point
in time and a second scene image at a second point in time after the first point in time and a camera motion-sensor operative to determine for the first and second points in time, a physical displacement of the light-sensing camera in physical rotation- translation space, the apparatus comprising: a) an image-warper operative: i) to generate a first warped image by applying a first warping to the first captured image, the first warping being defined such that the first warped image is a curvilinear image; ii) to generate a second warped image by applying a second warping to the second captured image, the second warping being defined such that the:
A) second warped image is a curvilinear image;
B) second warping differs from the first warping in accordance with the physical displacement that is determined by the camera motion-sensor; and b) a display unit operative to display the generated warped images. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The materials, methods, and examples provided herein are illustrative only and not intended to be limiting.
Implementation of the method and system of the present invention involves performing or completing certain selected tasks or steps manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of preferred embodiments of the method and system of the present invention, several selected steps could be implemented by hardware or by software on any operating system of any firmware or a combination thereof. For example, as hardware, selected steps of the invention could be implemented as a chip or a circuit. As software, selected steps of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system. In any case, selected stages of the method and system of the invention could be described as being performed by a data processor, such as a computing platform for executing a plurality of instructions.
BRIEF DESCRIPTION OF THE DRAWINGS
5 The invention is herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only, and are presented in order to provide what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for a fundamental understanding of the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.
In the drawings:
FIG. 1 is a simplified diagram illustrating an embodiment of prior art. FIG. 2 shows an exemplary spherical mapping process of the present invention. FIG. 3 illustrates an exemplary process of transformation of a captured image to polar coordinate representation on the internal surface of a virtual sphere.
FIG. 4 illustrates an exemplary process of assignment of captured image pixel intensities to corresponding points on the virtual sphere over two successive image frames. FIG. 5 illustrates the virtual camera as well as the captured images off of the internal spherical surface as seen by the virtual camera.
FIG 6 is a close up version of the stabilized individual images at ti and t2 and the resultant stabilized video image.
FIG. 7 is a simplified flow chart showing the steps in the process of image stabilization of the present embodiments.
FIG. 8 is a simplified illustration of the process of virtual camera rendering where the camera is found at different radii in relation to the transformed image on the spherical surface.
FIG. 9 shows the virtual camera positioned at the spherical center, thereby creating a virtual rectilinear lens.
FIG. 10 illustrates the advantages of the present embodiments of the invention by comparing the prior art to the results of the present embodiments.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
The present embodiments provide an apparatus and a method for stabilization of video images. In particular, the embodiments of the present invention describe a method for minimization of object motion as perceived in an image sequence while doing away with the notorious problem of dynamic deformations in improperly stabilized wide angle video imaging. Furthermore, the embodiments of the present invention deal with stabilization of images from a camera mounted on a non stationary platform. The embodiments of the present invention are carried out through appropriate image warping using projective geometry.
The principles and operation of an apparatus and method according to the present invention may be better understood with reference to the drawings and accompanying description. Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangement of the components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments or of being practiced or carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein is for the purpose of description and should not be regarded as limiting.
Reference is made to Fig 2, which shows an exemplary spherical mapping construction process of the present invention including camera calibration. Light rays 200 from the imaged field of view arrive at the lens of imaging device or camera 20 at angles defined by polar coordinates (Θ, 0). These azimuths and elevations are those at which the light rays arrive at the camera lens' optical center. Imaging device 20 contains image sensor 16 upon which light rays 200 impinge after entering through the lens. The image sensor may be a CCD and the locations of pixels comprising the captured image are defined by Cartesian coordinates x and y of the image sensor. Example coordinates (xm,ym and xn,yn) are shown. As such, each light ray defined by a polar coordinate pair (Θ,0) is ultimately associated with a pixel location on the sensor. In other words, a one to one correspondence exists between respective polar coordinate pairs defining an incoming light ray and Cartesian coordinate pairs defining
a pixel location on the sensor. The determination of this one to one relationship is termed the calibration process of the imaging device. The calibration process determines the intrinsic optical characteristics unique to a given imaging device.
Now, as a mathematical abstraction, it is possible to define an imaginary half- sphere 14 with radius r of three dimensional, preferably polar coordinates in a virtual 3D space. The sphere may sometimes be referred to herein as the virtual sphere. The above described calibration process provides a known relationship between each pixel coordinate x, y on the image sensor and the azimuths and elevations of the rays of light which arrive at the camera's optical center for a given image. The virtual sphere is built based upon this determined relationship between pixel coordinates and the angles of light associated with them. Each pixel point is mapped onto a corresponding spherical point in accordance with the angles at which the light impinging that pixel hits the camera optical center when entering the camera. As such, a one-to-one spherical mapping or transformation of x-y pixel locations on the image sensor 16 to polar coordinates on the virtual half-spherical 14 surface may be formed. Each point on the spherical surface can be defined by the polar coordinates (Θ,0,r). This spherical mapping is unique to imaging device 20 and embodies the intrinsic camera parameters such as radial distortion properties inherent in the device as determined in the calibration process described above. The third coordinate r may be arbitrarily chosen, representing the radius of the sphere.
As stated, each pixel on the image sensor 16 of the imaging device 20 has a 2D Cartesian coordinate representation. A one to one relationship is determined between each of these Cartesian coordinates and a point on the 3D spherical surface. For example, points (©„, 0n, r) and (Θm, 0m, r) are two 3D polar coordinates on the sphere that represent actual pixel locations (xm,ym and xn,yn) on the image sensor. Shaded area 22 on the sphere shows the full mapping of all points (Θ,0,r) on the spherical surface that correspond to Cartesian (x,y) coordinate points on the image sensor 16. In essence, shaded area 22 is thus an image transformation or mapping of the sensor content onto the inside surface of sphere 14. Again, the transformation process leading to the mapped image on the internal spherical surface utilizes those same azimuths and elevations determined in the calibration process. This image on the internal spherical surface then embodies the intrinsic optical camera parameters
including camera radial distortions. The list of x, y coordinates from the original image on the sensor is saved in computer memory, as is the value of the radius r.
The spherical mapping is thus a function of the lens and other intrinsic camera characteristics. In the present embodiments, for illustration purposes, only part of a complete virtual spherical surface is actually mapped. However it is understood that the spherical construction process may be applied to lenses or other optical elements of various shapes and sizes found in camera systems that may generate surfaces larger than a half-sphere.
In summation then, we see that the calibration process, which may be carried out through methods known in the art, determines the intrinsic optical camera parameters including radial distortion properties unique to a particular camera or imaging device. The calibration process then provides a relationship between pixel coordinates on the sensor and angles of impinging light associated with each pixel coordinate. This relationship then allows a one to one mapping of each pixel coordinate on the camera sensor to a three dimensional and preferably polar coordinate on an imaginary or virtual spherical surface. Such a spherical mapping process, unique to a given camera, is conveniently carried out by off the shelf real time 3D simulation software interfaces, such as OpenGL or Direct3D. Notably, the spherical mapping for a given image sensor is determined a mere single time and is fixed. As explained next, for each image captured thereafter using that image sensor, a pixel value at a given pixel location on the sensor is assigned to the same polar coordinate on the virtual sphere. In an alternative embodiment, the model could be easily extended also to the case of non-fixed lenses, such as varifocal zoom lenses. Instead of calculating a mere single spherical surface, multiple spherical surfaces are produced, one for each possible position of the lens. The computer stores all surfaces in memory, but renders through the virtual camera only the single spherical surface which corresponds to the current state of the lens.
We now turn to Fig 3, which illustrates an exemplary process of transformation of a captured image to polar coordinate representation on the above discussed sphere. Image apparatus 20 is situated in front of monument 18 and an image 24 of the monument is captured at to and appears on sensor 16. The image apparatus at to is steady and positioned such that the monument appears on the sensor pixels at the center of the image. The 2D image 24 on the sensor is then transformed into a
spherical coordinate representation on sphere 14 using the known calibrated mapping intrinsic to camera 20. As explained above, area 22 on virtual sphere 14 contains the totality of spherical surface points that correspond one to one to (x,y) coordinates on the sensor. As such, the image on sensor 16 is always mapped to area 22 on the virtual sphere. Although the mapping is constant as explained above, the content of each spherical surface point on area 22 changes for each frame in accordance with the content of the point's corresponding pixel on the sensor. In other words, the precise texture that makes up spherical points comprising area 22 changes per image frame. This change per frame is a result of the assignment of pixel values for each frame on the sensor 16 to points on the sphere area 22.
To illustrate the assignment of intensity values on the sensor to 3D coordinates on the sphere, reference is made to Fig 4. Image apparatus 20 is now seen at time ti shifted up and to the left. T=ti represents the time interval of one frame capture. Such a position of the image apparatus causes the imaged monument 18 to appear on a set of pixels at the lower right half of image 30 taken at ti. As a result, when transformed to polar coordinate representation onto constant area 22 of sphere 36, the monument appears on spherical surface points at the bottom of area 22 at ti.
At time t∑, a second image frame 32 is captured. During this time interval, camera 20 is positioned down and to the right in relation to the camera position at tj. Such a change in camera position may be caused by many factors, such as a shaking or jolting of the camera by the camera operator. The monument 18 in image 32 now appears on a different set of pixels than the monument's location in image 30 taken at ti. The monument appears now on a set of pixels at the upper left side of second image 32. As a result, when the image pixel values are assigned onto area 22 of sphere 36, the monument appears on spherical surface points at the top of area 22. Again, area 22 is identical on sphere 36 at both ti and t2 as it represents the mapping or coordinate transformation of sensor 16 to a virtual spherical surface and thus is constant for a given imaging device.
We therefore see that camera motion or position change between time intervals ti and t2 leads to the monument being captured by a different set of pixels on the camera sensor for each frame. This difference between pixel set content between frames ti and t2 is transformed to different spherical point content on area 22 between ti and t2. For each frame then, the monument is assigned to a set of spherical points on
area 22 depending on the monuments position on the image sensor pixels. Thus, the camera position change over an image sequence, along with other factors such as object motion and illumination changes, lead to intensity variations of the sensor pixels for each image frame. These intensity variations are represented on the spherical surface as a textural change for each spherically transformed image.
Noticeably at ti, the steps of the monument appear curved in the captured image 30 on the sensor. Likewise, at t2, the roof of the monument appears curved in the image 32. This is due to the fact that a fish eye wide angle lens is utilized to image the monument. The bending or distortion becomes greater the farther away from the principal point. As such, as the edge of the image is approached, the distortion is largest.
Camera motion between frames may be caused by shock, vibration, user hand movements, or other outside force that causes camera motion between the frames. The amount of motion of the real camera 20 may be measured by one or more motion sensors operatively attached to the camera, and is generally expressed in six degrees of freedom. It should be noted however, that rotational components of the camera motion in the three degrees of freedom, P (pan), T (tilt), and R (roll), are frequently the major cause of image instability. As such, in the present embodiments, only rotational displacement of the camera is taken into consideration. However, in further embodiments of the present invention the linear translational components of the camera motion may be taken into consideration to further improve image stabilization accuracy. In further embodiments, it is possible to estimate the motion values in a video stream, otherwise termed ego-motion, through software based analysis such as optical flow analysis, without the use of physical sensors. Reference is now made to Figs. 5 and 6. Starting with Fig. 5, sphere 36 is shown again at two consecutive time intervals ti and tj. However now, in addition to the sphere at each time interval, a virtual camera 50 is positioned at distance d from the center point of sphere 36 for each frame. The virtual camera is a mathematical abstraction as well, and may be implemented along with other aspects of the present embodiments through graphical processing techniques known in the art. The purpose of the virtual camera is to preferably rotate in accordance with the inverse of actual measured rotations of the real camera and thus render stabilized images frame by frame off of the textured internal spherical surface. As such, the virtual camera is
shown at time ti to be tilted downwards and, to the right, while at t2 the virtual camera is shown to be tilted upwards and to the left. Notably, the virtual camera's optical axis for each frame passes through the precise spherical center. The precise angle, speed, and overall relative motion of the virtual camera to the virtual sphere is a function of real camera motion. These parameters are preferably determined by either motion sensors operatively attached to the real camera or through other software based methods of video image analysis.
Now, sphere 36 is also shown with constant area 22 containing the texture mapping of the image sensor pixels to spherical coordinate representation at both time frames ti and t2. At ti, virtual camera 50 is seen, as stated, to be angled down and to the right, while at t2 the virtual camera is seen as angled up and to the left.
The stabilized images rendered by the virtual camera are shown as image 44 at ti and image 46 at t2. In contrast to the consecutive stabilized images in Fig 1, stabilized images 44 and 46 rendered by the virtual camera share the same principal point 110. This is due to the projective geometric process that transforms sensor images in Cartesian x,y coordinates to the spherical environment. In such an environment, the optical axis of the virtual camera passes through the spherical center- a condition that imparts a constant principal point to each captured image. That is to say, for each rendered frame off of the sphere, a constant principal point is maintained as a result of the virtual camera's optical axis passing through the precise spherical center. Fig 6 shows that when successive stabilized and warped images 44 and 46 with a constant principal point are overlapped or registered, video image 54 is obtained free of dynamic visual artifacts. Due to the constant principal point, the overlap is perfect or near perfect, and therefore the registered stabilized video image is free from the visual artifacts discussed earlier such as the dynamic radial deformations as seen in Fig. 1.
It is understood that these stabilized images are as seen from the perspective of the virtual camera. Images 44 and 46 are referred to herein as warped images. The entire process from mapping of the respective captured images 30 and 32 to the sphere to the rendering of these mapped images by the virtual camera is referred to as the warping process. The warped images 44 and 46, containing a new principal point identical in both images, provide identical simulated camera views as the virtual camera compensates for the real camera instability. In contrast, the two original
captured images, 30 and 32, were seen from different camera views as a result of camera instability. Each warped image thus simulates a different view from that seen in the respective corresponding original captured image. So image 44 simulates a different view of the monument from that seen in captured image 30. Likewise, image 46 simulates a different view of the monument from that seen in captured image 32. Since only rotational displacement of the real camera occurs, the two sequential-in- time warped images 44 and 46 are identical. If the real camera 20 moves forward for instance, and therefore has a translational component of motion, then the two sequential-in-time images do not overlap perfectly, as the monument grows in size in proportion to the image for each frame. Even in such a case, both warped images have the same principal point and the visual artifact seen in Fig 1 is avoided.
At ti, since the real camera 20 position (not shown in Fig 5,6) is shifted upward and to the left, the virtual camera 50 is positioned looking downwards to the right at the texture mapped area 22 of sphere 36. At t2, since the real camera 20 position is shifted downwards and to the right, the virtual camera 50 is positioned looking upwards and to the left at the texture mapped area 22 of the sphere. The virtual camera position is thus directly opposite to the real camera position. The virtual camera pivots about an optical axis that always passes through the center of the sphere and renders frame by frame the region of interest on sphere 36. Again, stabilized warped images 44 and 46 are two consecutive images at times ti and t2 viewed by the virtual camera and having the same principal point 110. In the present example, the stabilization process ensures that the monument is centered in each virtual camera rendered image frame. To achieve such centering, this stabilization process ensures that the correct ROI is rendered that may include an area of the sphere beyond area 22. This area is the white area in images 44 and 46. The image 54 shows the registered image without dynamic visual artifacts.
The determined instability of the real camera is thus compensated for by the virtual camera rotation in the opposite direction around the center point of the sphere. Alternatively, the sphere may rotate while the virtual camera remains fixed. By relative rotational motion between the sphere and the virtual camera, the spherical mapped transformations of the imaged monument remain in a fixed position in the virtual camera image plane. That is to say, the monument remains stable and in the
same position over a sequence of image frames rather than appearing at various positions in the image frames.
As stated, by rendering the field of view/ROI on the spherical surface for each frame in accordance with motion data from the real camera, the registered warped image sequence 54, or video image, displayed to the user is seen as stabilized. Moreover, the pixel-coordinate to polar-coordinate mapping enables x-y displacements, of objects over a video sequence on the camera's sensor to be expressed in terms of angular shifts of the objects in a polar space on the virtual sphere. This allows for an important advantage- a single principal point for each successive frame in the stabilized sequence is maintained. A fixed principal point provides for a stabilized sequence free of the dynamic visual artifacts discussed earlier, thus providing clear and sharp stabilized video images.
Notably, the present embodiments may be implemented through the use of texture mapping, typically with the aid of off the shelf graphical processors, such as found on common graphical processor unit (GPU) based hardware such as display cards manufactured by ATI® and nVidia®. The process may typically be carried out through standard software interfaces that typically serve 3D modeling and simulation purposes, such as OpenGL or Direct3D.
Reference is made to Fig 7, which is a simplified flow chart illustrating the steps in the process of image stabilization of the present embodiments. In step 56, the calibration process occurs, by which the intrinsic characteristics of the camera are determined that relate to lens and sensor type. Prior to the calibration process, the optical components are chosen. A sensor and lens are preferably chosen so that their field of view is greater than that recorded and/or shown on a typical display device to a user. The extent to which the field of view is extended typically depends on the extent to which stabilization is required for a given series of images. In the present embodiments, a fixed lens is used, although extensions to the cases where variable zoom lenses and other non fixed varying focus optical lenses are used is understood to be within the scope of the possible embodiments of the present invention. In step 58, the spherical map is built, as discussed above in Fig 2. That is, for each pixel coordinate on the image sensor, a corresponding 3D coordinate is associated with a point on a spherically defined surface. As stated, although polar coordinates are utilized in the present embodiments, other 3D coordinate systems,
including but not limited to 3D Cartesian coordinates, may be implemented. The spherical map is constant, and is used for each frame of the video stream. The 3D coordinates may serve as a basis for the formation of a polygonal mesh. Such a mesh may be formed through polygonal triangulation of neighboring points, by non uniform rational-bi-spline (NURBS) fitting, or by other point-cloud to surface construction methods known in the art. These 3D coordinates may alternatively remain as 3D points in space or used in particle based methods such as SPLATS.
In step 60, the virtual camera is defined. A virtual camera is a known function in common three dimensional modeling software interfaces such as OpenGL. It is understood that the virtual camera is utilized in the present embodiments merely for convenience, but other mathematical abstractions are possible. The virtual camera provides a defined location in virtual translation-rotation space.
The next step in the method, 62, is the imaging process. The real camera is preferably harnessed to a platform with a mechanical apparatus for mitigating the particular vibrations that lead to motion blur. Furthermore, several motion sensors are preferably placed on the platform of the camera as well as on the camera itself. The motion sensors may be internal or external to the camera and measure movement for each one of the six degrees of freedom of the camera. They may comprise among others micro gyroscopes and/or accelerometers to measure both rotational and translational components of camera motion. In alternative embodiments, the motion sensors are replaced by a computer program. Furthermore, the sensing data may serve the system in real time, or may be recorded for later use, for instance for stabilization of recorded video.
Step 64 involves the assignment of the pixel intensities captured in each frame to corresponding coordinates on the virtual sphere. In each video frame, each pixel on the camera sensor captures light energy entering the camera lens from a different azimuth and elevation. The totality of pixel values or intensities in the camera field of view comprises the image content for that frame. Each pixel on the sensor has a corresponding location on the virtual sphere predetermined by the spherical mapping process using the above mentioned azimuths and elevations. The intensity value of each pixel is applied to the pixel's corresponding location on the sphere. So, for example, if pixel number 204 in image frame X has a red value with intensity 145, then the spherical coordinate (Θn, 0n, r) corresponding to pixel 204 for that frame is
also assigned a red value with intensity 145. Now, as a result of camera instability, objects appearing in image frame X are imaged by different pixels in subsequent frame X + 1. Thus, the values of each point on the mapped area on the sphere change over time for each frame, and thus the texture or color changes, but the one to one correspondence between the 3D spherical coordinates and the 2D sensor pixel coordinates remains constant. This one to one correspondence defines a fixed area on the sphere, seen as area 22 in previous figures. The changing texture is mapped onto this area for each frame using a given camera. The spherical mapping process is also termed herein coordinate transformation. The process of both coordinate transformation to build the sphere and the assignment of pixel values to points on the sphere is often termed texture mapping. As mentioned, such a relatively inexpensive texture mapping operation is supported by most of the off-the-shelf graphical processors.
Step 66 involves the rotation of the virtual camera to compensate for the motion of the real camera as measured by the motion sensor 68 or optical flow 70 reading for each frame. This step may be carried out in parallel to step 64. The virtual camera rotates relative to the virtual sphere, and images the correct region of interest or field of view once the sensor image is mapped onto the sphere. For instance, if the real camera pans upwards, the image content on the sphere will shift downwards. The virtual camera will rotate downwards as well, in an opposite direction to the rotation of the real camera to ensure that objects in the viewed video remain stabilized over the image sequence. The series of rendered images off of the texture mapped sphere surface, step 72, leads to a stabilized series of warped images. This frame sequence is free of common visual artifacts that can be output to a display, step 74. Moreover, as mentioned, the rotations around a center of the sphere allow the principal point of each warped image to remain fixed and thus identical for each image frame. This prevents the optical torsion effect discussed above and seen in Fig. 1 that is a serious visual artifact in prior art stabilization systems.
An additional advantage of the above process is the high computation speed, as both texture mapping and virtual camera computations are handled by a graphical processing unit. These calculations can be implemented with a wide range of known and inexpensive graphical processors and off-the-shelf 3D software mentioned above.
Reference is made to Fig. 8, which shows the process of virtual camera rendering where the camera is found at both the spherical center and varying distances from the spherical center. Such an embodiment allows for the simulated change of the optical model that characterizes the display of the video images. For example, by movement of the virtual camera into and away from the center of the sphere on the virtual camera's optical axis, it is possible to view a particular video frame with variations in lens type. In other words, frames may be viewed as if imaged using either a curvilinear lens or a rectilinear lens. A rectilinear lens is sometimes referred to as a pure-perspective or a pinhole lens. This convenient viewing flexibility is a result of the use of the virtual camera and is possible without any further calculations and without any further reduction in image quality.
A spherically transformed image of the original captured image on the real camera sensor is shown on the internal surface of sphere 36. Virtual camera 78 is shown as in previous embodiments. The virtual camera optical axis passes through the precise center of the sphere. Although the camera optical axis passes through the spherical center, the virtual camera is a distance di from the spherical center. This results in a rather distorted image 80 of the monument rendered by the virtual camera. Such an image shows the monument as if the real camera lens were a wide angle curvilinear lens exhibiting high barrel distortion, thus providing an image as would be perceived through a fish-eye lens.
Virtual camera 82 is positioned closer, dz, to the texture mapped area of the spherical surface. The warped image 84 of the internal spherical surface as seen from the virtual camera's perspective now appears less distorted than image 80.
Finally, virtual camera 86 is positioned precisely on the spherical center, and d3 = 0. In image 88 seen from the virtual camera off of the sphere, the monument appears without any radial distortions. Therefore, the virtual camera, when strategically placed at the center of the sphere, provides a virtual rectilinear camera.
So by positioning the virtual camera at varying distances from the spherical center, the resultant stabilized image obtained from the virtual camera is independent of the actual real camera lens. As such, images 80, 84, and 88 give images of the monument as seen through 3 types of "virtual lenses."
It is noted that when the virtual camera is positioned at the precise spherical center, the virtual camera views the transformed image on the sphere as a rectilinear
image. In other words, straight lines in the real imaged object (in the present example a monument) are seen as straight lines in the rendered stabilized image. Rectilinear imaging exhibits what is commonly termed perspective distortion, where an object at the image edge appears significantly larger and closer than an object found at or near the image center even when both objects are of equal size and at equal distance from the camera. For wide angle images, the perspective distortion is even greater. Rectilinear imaging may be preferred in certain applications, such as architectural and commercial photography, where often for aesthetic reasons imaged objects appear straight and objects appear longer and larger than they actually are. On the other hand, in certain wide angle imaging applications the perspective distortions cause erroneous perception of relative size, depth and motion, (particularly above 100 HFOV) and thus are not desired at all. In those applications where perspective distortions are not desired, such as security and automotive applications, curvilinear lenses are utilized. In curvilinear images, straight lines of an imaged object are imaged as straight lines only at the center of the image. Towards the edge of curvilinear images, these straight lines become curved as a result of the radial distortion inherent in the lens. However, curvilinear lenses exhibit little or no "perspective distortions", meaning that equidistant objects of equal size appear almost the same size in any region of the image. The viewer's perception of relative size, depth and motion between objects in all areas of the image is improved at the expense of certain shape distortions as the image edge is approached. Wide angle curvilinear lenses are commonly termed fish- eye, as they exhibit strong barrel distortions (radial distortions of a negative sign). A fully optimized lens that provides minimal or no perspective distortion is commonly termed "equiangular", since the ratio of angular shift and pixel shift along the radii of the image is constant.
A typical example where curvilinear lenses are desired is the case of an armored vehicle equipped with a front video camera. We assume two people of similar size, A and B, are equidistant from the vehicle but at different angles. Person A is standing on the road, directly in front of the camera, while person B is standing on the side of the road and is imaged close to the edge of the image. A rectilinear lens shows person A as appearing much smaller then person B. If the field of view of the camera is, say 120 degrees, as is common in automotive cameras, person B may appear greater than twice the size of person A. From a practical point of view, such lens
characteristics can be dangerous for the driver as his depth perception is skewed and objects in front of him appear smaller than they actually are. Therefore, in such a case and similar ones, a curvilinear lens is preferable.
The stabilization process of the present embodiments preferably utilizes a curvilinear model, or even an equiangular model as is normally present in fish-eye lenses, rather than a rectilinear model. Indeed by positioning the virtual camera at varying distances along a line intersecting the center of the sphere the warped image becomes increasingly equiangular. Stated otherwise, by adjusting the distance of the virtual camera from the center of the sphere, it is possible to obtain in a continuous manner varying levels of perspective and radial distortions. The desired tradeoff between these types of distortions can be suited to meet specific applications and operational conditions.
Reference is made to Fig. 9, which shows the virtual camera positioned at the precise center of the sphere at two consecutive time intervals, thereby creating a virtual rectilinear lens. Sphere 36 is shown once again with constant area 22 containing the assignment of the image sensor pixel values to spherical coordinates at both time frames ti and t2. Virtual camera 90 is seen looking downwards and to the right to compensate for real camera motion upwards and to the left at ti. At t2, the virtual camera is faced upwards and to the left to compensate for real camera motion downwards and to the right.
The stabilized images rendered by the virtual camera are shown as image 92 at ti and image 94 at tχ. It is understood that these stabilized images are as seen from the perspective of the virtual camera. Stabilized and warped images 92 and 94 rendered by the virtual camera share the same principal point 112. This is due to the fact that the projective geometric process that transforms sensor images in Cartesian x, y coordinates to the spherical environment ensures that images captured by the virtual camera have a constant principal point. That is to say, for each rendered frame off of the sphere, a constant principal point is maintained due to the fact that the virtual camera optical axis passes through the precise spherical center. When successive stabilized images 92 and 94 with a constant principal point are overlapped or registered, illustrative video image 96 is obtained. Due to the constant principal point, the overlap is perfect or near perfect, and therefore the registered stabilized video
image is free from the visual artifacts discussed earlier such as dynamic radial deformations as seen in Fig. 1.
Image 92 shows a stabilized image captured by the virtual camera off of the sphere. The captured image 92 contains white region 110 that shows a part of the spherical internal surface below and to the side of region 22 containing the transformed sensor image. Surface area 110 is captured because the camera is looking downwards and to the right in order to compensate for real camera motion upwards and to the left. Similarly, image 94 shows a stabilized image captured by the virtual camera off of the sphere and also shows area 114 of the spherical internal surface. This area is captured because the virtual camera is pointed upward and to the left in order to compensate for real camera motion downwards and to the right. These directional shifts of the virtual camera allow the monument to remain stabilized in both images 92 and image 94. The virtual camera pivots about the center point of the sphere and renders frame by frame the region of interest on sphere 36. Again, these stabilized and warped images 92 and 94 are two consecutive images at times ti and t2 viewed by the virtual camera and having the same principal point 112. Notably, since the virtual camera is at the precise spherical center, the rendered images by the virtual camera are seen as rectilinear images. In addition, registered video image 96 is shown that shows a stabilized registered wide angle lens video image virtually free of any visual dynamic artifacts and perfectly overlapping. In essence, image 96 is created in the same fashion as image 54, only in the present embodiment the virtual camera acts as a rectilinear lens camera rather than a curvilinear lens. As explained in the above paragraph, the video image 96 is an overlapped image free of visual artifacts, such as the optical torsion effect as seen above in the illustration of prior art. Reference is made to Fig 10 which clearly illustrates the advantages of the present embodiments of the invention by comparing the prior art to the results of the present embodiments. Images 102 and 104 represent two stabilized images at time ti and t2 respectively. Image 106 is an overlaid stabilized sequence of those two images using the prior art in which a constant principal point is not maintained. Image warping is not carried out and stabilization is attempted merely through global planar translations and rotations of the original captured images on the sensor. The radial dynamic deformations are seen in image 106 and discussed above in regards to methods used in the prior art.
In contrast, images 96 and 98 are images seen after undergoing the above described warping process and rendered by a virtual camera off of a defined virtual sphere as discussed in present embodiments. Images 96 and 98 are two consecutive wide angle curvilinear images having the same principal point and rendered at times tt and t2 respectively. The resultant registered wide angle video image 100 is stabilized and free of dynamic visual artifacts discussed above.
The embodiments of the present stabilization method notably provide a fixed principal point of successive stabilized images of a video stream seen by a human observer and are particularly useful in the case of wide angle video photography, where radial and/or perspective distortion increases in accordance with the increase in camera field of view and with increased approach to the sides of the displayed image. By maintaining a fixed image principal point in all frames of the video, dynamic deformations of rigid, inanimate objects in the image are avoided. Although polar coordinates are utilized in the present embodiments, other 3D coordinate systems, including but not limited to 3D Cartesian coordinates, may be implemented.
The problem of motion blur, found in singular image frames, is preferably left to a mechanical apparatus. That is to say, the current embodiments are intended to provide a solution for image stabilization among a sequence of images or frames in a video stream. Motion blur, which occurs over the exposure time of a single frame, is preferably solved in a more effective manner by separate mechanical techniques. These mechanical techniques and/or others may then optionally be combined with the embodiments of the present invention to comprise an image stabilization system attending to the additional phenomenon of motion blur.
It is expected that during the life of this patent many relevant devices and systems will be developed and the scope of the terms herein is intended to include all such new technologies a priori.
The term "image-warper" is used herein to refer to any apparatus or device capable of performing the image warping process described herein. The term "image warping" herein refers to shape deforming of the captured image. Mere translations, rotations, and scaling, for example as seen in Fig. 1, are not to be considered image warping under the scope of this document.
The terms "center of projection" and "optical center of projection" are used herein to refer to the cardinal point around the optical system. The cardinal point is the
point around which rotation of the optical system does not introduce any parallax.
Other terms for this point exist in the literature, including but not limited to "center of perspective", "camera center", "optical center", and "lens entrance pupil". The above terminology applies as well within the context of a virtual camera, only that no real optical elements are involved.
The terms "image principal point", "principal point", and "image center of distortion" are used herein to refer to the intersection of the camera's optical axis with the image plane. For some optical systems, the "image center of distortion" and "the image principal point" may not be identical, but for most systems, they are indeed nearly identical and therefore may practically be treated as identical.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination.
Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims. All publications, patents, and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention.