WO2018116322A1

WO2018116322A1 - System and method for generating pan shots from videos

Info

Publication number: WO2018116322A1
Application number: PCT/IN2017/050605
Authority: WO
Inventors: Rajagopalan AMBASAMUDRAM NARAYANAN; Nimisha THEKKE MADAM
Original assignee: Indian Institute Of Technology Madras
Priority date: 2016-12-20
Filing date: 2017-12-19
Publication date: 2018-06-28

Abstract

A system and method for automatic generation of a pan shot from a video of a dynamic object is disclosed. The method includes warping a plurality of frames of the video to compensate for background motion in the video based on homographies (H) of consecutive frames. The background compensated frames are used for segmenting the foreground and background to create a trimap. The foreground of each frame is correlated with a preceding and succeeding frame to obtain an inter-frame object displacement with respect to the preceding and succeeding frames. Based on the inter-frame object displacement, a net displacement and a relative depth of the object are determined. The plurality of clear frames are rewarped using the net displacement to create rewarped clear frames with dynamic background and static foreground. The rewarped clear frames are averaged to generate the pan shot, which has a blurred background and a sharp foreground.

Description

SYSTEM AND METHOD FOR GENERATING PAN SHOTS FROM VIDEOS

RELATED APPLICATION

[0001] This application claims benefit and priority to Indian provisional patent application No. 201641043468, titled "Pan Shots From Videos", filed on December 20, 2016, and the complete specification thereof, filed on October 6, 2017. The disclosures of these India applications are incorporated herein by reference for all purposes.

BACKGROUND

[0002] Pan photography or panning is basically an imaging technique, which involves swiveling an image capturing device, such as a video camera, horizontally at a fixed position. The swiveling motion of the camera is used for capturing images from one part of a scene to another. Generally, the swiveling motion of the camera is used for capturing images of a moving object, such that the camera motion and object motion are in sync. This imaging technique can be used to produce a "pan shot", which gives an artistic visual effect to the objects in motion. [0003] Pan shots have a blurred background and have a high focus on the moving object in the foreground. The moving object appears sharp and frozen against a blurred background. Thus, pan shots give an aesthetic appeal to an image by accentuating the object from other elements in the frame and relegating object motion to the background.

[0004] However, taking pan shots is not straight forward as the techniques involve intricate steps. A camera operator should have an approximate idea of the object velocity a priori so as to pan the camera in sync with the moving object and to avoid undesirable effects. For instance, a high relative velocity between object and camera may result in blurring of the object.

[0005] Additionally, the techniques involve a great amount of manual effort from the camera operator. For example, setting the correct shutter speed, ensuring autofocus mode, adjusting the exposure, and tracking the object, should all happen in perfect harmony. This process is difficult and it is highly likely that the event could get over by the time these settings are adjusted, especially in the case of fast-paced scenarios, such as running, car races, etc. Therefore, creating pan shots requires substantial amount of manual skills and there is a dearth of automated techniques in the existing state-of-the-art technologies.

SUMMARY

[0006] Described herein are systems and methods for automatically generating a pan- shot from a video.

[0007] According to one embodiment, the present subject matter relates to a method for automatically generating a pan shot from a video of a dynamic object. The method includes warping a plurality of frames of the video to compensate for background motion in the video based on homographies (H) of consecutive frames. The warping may include aligning dynamic backgrounds to create a static background, which is consistent, in each of the plurality of frames. The background compensated frames are used for segmenting the foreground and background to create a trimap. The foreground of each frame is correlated with a preceding and succeeding frame to obtain an inter-frame object displacement with respect to the preceding and succeeding frames. Based on the inter-frame object displacement, a net displacement and a relative depth of the object are determined. Further, a deblurring operation is performed in the foreground to obtain a plurality of clear frames if the foreground is blurred. The plurality of clear frames are rewarped using the net displacement to create rewarped clear frames, which have a dynamic background and static foreground. The rewarped clear frames are averaged to generate the pan shot, which has a blurred background and a sharp foreground.

[0008] According to another embodiment, the present subject matter relates to a system for automatically generating a pan shot from a video of a dynamic object. The system includes an image capturing unit, a processing unit, and memory unit coupled to the processing unit. The image capturing unit captures the video of the dynamic object. The memory unit includes a warping module, a segmentation module, a correlation module, a displacement computation module, a deblurring module, a rewarping module, and an averaging module. The warping module is configured to warp a plurality of frames of the video to compensate for background motion in the video based on homographies of consecutive frames. The warping includes aligning dynamic backgrounds to create a static background, which is consistent, in the plurality of frames. The segmentation module is configured to segment a foreground, which comprises the dynamic object, from the background compensated frames to create a trimap. The correlation module is configured to correlate the foreground of each frame with a preceding and a succeeding frame to obtain an inter-frame object displacement with respect to the preceding and the succeeding frames. The displacement computation module is configured to determine a net displacement and a relative depth of the object based on the inter-frame object displacement and camera motion. The deblurring module is configured to deblur the foreground if a blur is present in the foreground to obtain a plurality of clear frames. The rewarping module is configured to rewarp the plurality of clear frames using the net displacement of the object to create rewarped clear frames, which have a dynamic background and static foreground. The averaging module is configured to perform an averaging operation on the rewarped clear frames to generate the pan shot, which has a blurred background and a sharp foreground.

[0009] According to yet another embodiment, the present subject matter relates to a computer program product having non-volatile memory carrying computer executable instructions stored therein for automatically generating a pan shot from a video of a dynamic object. The instructions include warping a plurality of frames of the video to compensate for background motion in the video based on homographies (H) of consecutive frames. The warping may include aligning dynamic backgrounds to create a static background, which is consistent, in each of the plurality of frames. The instructions further comprise segmenting the foreground and background to create a trimap in the background compensated frames. Further, the foreground of each frame is correlated with a preceding and succeeding frame to obtain an inter-frame object displacement with respect to the preceding and succeeding frames. Based on the inter-frame object displacement, a net displacement and a relative depth of the object are determined. Further, a deblurring operation is performed in the foreground to obtain a plurality of clear frames if the foreground is blurred. The plurality of clear frames are rewarped using the net displacement to create rewarped clear frames, which have a dynamic background and static foreground. The rewarped clear frames are averaged to generate the pan shot, which has a blurred background and a sharp foreground.

[0010] In one embodiment, performing the deblurring operation includes determining an object velocity from the net displacement and frame rate. The method further includes calculating an alpha-matte using the trimap to separate background, foreground and an ambiguous region. The blur weights of the foreground are estimated by uniformly sampling the net object displacement. Further, the separated background is filled using the pixels of the static background in the plurality of frames. The separated foreground in the plurality of frames is deblurred based on object velocity, net displacement, and a kernel. The deblurred foreground and the filled background are mixed to obtain the plurality of clear frames, which can be rewarped and averaged to generate the pan shot.

[0011] In some embodiments, the homography (H) is determined between a reference frame and the plurality of frames using random sampling consensus. [0012] In some embodiments, the segmenting is performed using graph-cut algorithm.

[0013] In some embodiments, the graph-cut algorithm is based on a data cost function and a smoothness cost function for distinguishing foreground and background in each frame.

[0014] In some embodiments, the system further comprises a user interface configured to enable a user to interact with the system.

[0015] In some embodiments, the image capturing unit comprises at least a lens, a shutter, and an image sensor for capturing the videos and photographs.

[0016] In some embodiments, the system is a video recording device.

BRIEF DESCRIPTION OF THE DRAWINGS

[0017] The invention has other advantages and features which will be more readily apparent from the following detailed description of the invention and the appended claims, when taken in conjunction with the accompanying drawings, in which: [0018] FIG. 1 illustrates a flow diagram of a method for automatic generation of pan shot from a video, according to one embodiment of the present subject matter.

[0019] FIG. 2 illustrates a system for automatic generation of pan shot from a video, according to one embodiment of the present subject matter.

[0020] FIG. 3A illustrates three consecutive input frames of a video, according to an example of the present subject matter.

[0021] FIG. 3B illustrates background compensated frames, according to an example of the present subject matter.

[0022] FIG. 3C illustrates foreground of the frames on segmentation, according to an example of the present subject matter. [0023] FIG. 3D illustrates clear frames of the consecutive frames, according to an example of the present subject matter.

[0024] FIG. 3E illustrates pan shot generated using the video, according to an example of the present subject matter. [0025] FIG. 4A illustrates an input frame of a video of a gazelle, according to another example of the present subject matter.

[0026] FIG. 4B illustrates a pan shot generated from the video of the gazelle, according to another example of the present subject matter.

[0027] FIG. 5A illustrates an input frame with blurred foreground of a video of a car, according to an example of the present subject matter.

[0028] FIG. 5B illustrates a pan shot generated from the video of a car, according to an example of the present subject matter.

DETAILED DESCRIPTION

[0029] While the invention has been disclosed with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the invention. In addition, many modifications may be made to adapt to a particular situation or material to the teachings of the invention without departing from its scope.

[0030] Throughout the specification and claims, the following terms take the meanings explicitly associated herein unless the context clearly dictates otherwise. The meaning of "a", "an", and "the" include plural references. The meaning of "in" includes "in" and "on." Referring to the drawings, like numbers indicate like parts throughout the views. Additionally, a reference to the singular includes a reference to the plural unless otherwise stated or inconsistent with the disclosure herein.

[0031] The present subject matter is further described with reference to figures 1 - 5B. It should be noted that the description and figures merely illustrate principles of the present subject matter. It is thus understood that various arrangements may be devised that, although not explicitly described or shown herein, encompass the principles of the present subject matter. Moreover, all statements herein reciting principles, aspects, examples, and embodiments of the present subject matter, as well as specific examples thereof, are intended to encompass equivalents thereof. [0032] A flow diagram of a method 100 for automatically generating a pan shot from a video of a dynamic object is illustrated in FIG. 1, according to an embodiment of the present subject matter. The dynamic object may be in foreground of each frame of the video. The method includes receiving a plurality of frames of the video, at block 102, and warping the plurality of frames of the video to compensate for background motion in the video based on homographies (H) of consecutive frames, at block 104. The warping may include aligning dynamic backgrounds to create a static background, which is consistent, in each of the plurality of frames.

[0033] In one embodiment, scale-invariant feature transform (SIFT) based feature correspondences between two frames may be estimated. The correspondences between two frames mostly occur in the background as the foreground may be assumed to be small. Further, on applying random sampling consensus (RANSAC) method, the homographies between the frames are determined. The frames are aligned using the homographies that relate the background in the consecutive frames to compensate the background motion. In one embodiment, the homographies (H) may be determined between a reference frame and the plurality of frames. [0034] The background compensated frames are subjected to segmentation of the foreground and background to create a trimap, at block 106. In one embodiment, the segmentation may be performed by graph cut approach. The segmentation of the foreground may be formulated as a bilabel assignment problem, which can be incorporated in Markov Random Field (MRF) model and effectively solved using graph cuts. The segmentation step will be discussed further in subsequent paragraphs.

[0035] After segmentation, the foreground of each frame is correlated with a preceding and succeeding frame to obtain an inter-frame object displacement with respect to the preceding and succeeding frames, at block 108. A net displacement and a relative depth of the object are determined based on the inter-frame object displacement and camera motion, at block 110. The net displacement refers to the total displacement of the object in the video and the relative depth refers to the relative distance of object with respect to the background, which is given as ^Dbackground, where D refers to the distance from the camera center.

^foreground

[0036] It may be worth understanding that the background motion is dependent on the camera motion and any blur in the foreground is due to the object motion as well as the camera motion. Therefore, when the camera motion and object motion are not in sync, the object (foreground) may be blurred if the camera has a slow shutter speed. Further, if the shutter speed is not slow, the object appears in different positions in the plurality of frames of the videos.

[0037] Thus, the method further includes determining whether the foreground is blurred or not, at block 112. The blurring may be determined based on various techniques as known in the art. For example, a gradient distribution method to analyze the blur in a region of the frame may be used for determining whether foreground is blurred. A log magnitude response of the gradient from the foreground may be correlated with a reference gradient to indicate the presence of blur. If the foreground is not blurred, then the frames are clear and ready for rewarping. However, if a blur is present in the foreground, then a deblurring operation is performed to obtain a plurality of clear frames, at block 114. [0038] The deblurring operation may include constructing an alpha-matte using the trimap to separate background, foreground, and ambiguous region, which may be either a background or foreground. The separated background includes regions with and without pixels. The regions without pixels may be filled using the pixels of the static background of the plurality of frames to obtain a filled background. Further, blur weights of the foreground are estimated by uniformly sampling the net object displacement. A non-blind deblurring is performed using the object velocity, net object displacement, blur weights, and a kernel, to deblur the foreground. The deblurred foreground and the filled background in each frame are mixed to obtain the plurality of clear frames (not blurred).

[0039] The plurality of clear frames are rewarped using the net displacement to create rewarped clear frames, at block 116. The rewarped clear frames have a dynamic background similar to the background of the input frames of the video. The deblurred foreground is static and the object is at the same position in each frame. The rewarped clear frames are averaged to generate the pan shot, which has a sharp foreground and blurred background, at block 118.

[0040] A system 200 for automatically generating a pan shot from a video is illustrated in FIG. 2, according to an embodiment of the present subject matter. The system 200 includes an image capturing unit 202, a processing unit 204, and memory unit 206 coupled to the processing unit 204. The image capturing unit 202 may provide a photo or video capturing capability and may generally include at least a lens, a shutter, an image sensor and the like. The image capturing unit 202 captures the video of the dynamic object and stores it in a storage unit 208, which may be removable or non-removable.

[0041] The processing unit 204 may include one or more computing components including, but not limited to, central processing unit (CPU), graphics processing unit (GPU), digital signal processor (DSP), or other specialized microprocessors. The one or more computing components may be in communication with the memory unit 206 for executing specific functions.

[0042] The memory unit 206 includes a warping module 210, a segmentation module 212, a correlation module 214, a displacement computation module 216, a deblurring module 218, a rewarping module 220, and an averaging module 222. In one embodiment, the modules may be implemented as software code to be executed by the processing unit 204 using any suitable computer language. These software codes may be stored as a series of instructions or commands in the memory unit 206. In various embodiments, the modules may be implemented as one or more software modules, hardware modules, firmware modules, or some combination of these.

[0043] The warping module 210 receives the video as an input to remove the background motion in each frame of the video. The video may be captured using the image capturing unit 202 and stored in the storage unit 208 or may be received from another device or network. The warping module 210 is configured to warp the plurality of frames of the video to compensate for background motion in the video based on homographies of consecutive frames. The warping aligns these dynamic or changing backgrounds to create the static background, which is consistent in each of the plurality of frames.

[0044] Further, the segmentation module 212 is configured to segment the foreground from the background compensated frames to create the trimap. The foreground may include the moving object, which is required to be aligned. The segmentation module 212 may perform graph cut segmentation to create the trimap. [0045] The correlation module 214 is configured to correlate the foreground of each frame with a preceding and a succeeding frame to obtain inter-frame object displacement with respect to the preceding and the succeeding frames. Further, the displacement computation module 216 is configured to determine a net displacement and a relative depth of the object based on the inter-frame object displacement and camera motion t₍i_j).

[0046] Further, the deblurring module 218 is configured to deblur the foreground if a blur is present in the foreground to obtain plurality of clear frames. The deblurring module 218 may construct an alpha-matte using the trimap to separate background, foreground, and ambiguous region. The deblurring module 218 may fill the separated backgrounds using the pixels of the static backgrounds of the plurality of frames. Blur weights of the foreground are estimated by uniformly sampling the net object displacement. Further, the foreground is deblurred using the object velocity, net displacement of object, blur weights, and a kernel. The deblurred foreground and the filled background in each frame are mixed to obtain the plurality of clear frames. [0047] The rewarping module 220 is configured to rewarp the plurality of clear frames using the net displacement of the object to create rewarped clear frames, which has a dynamic background and static foreground. The dynamic backgrounds are similar to the backgrounds of the plurality of frames of the video. The averaging module 222 is configured to perform averaging operation on the rewarped clear frames to generate the pan shot. The pan shot has a blurred background and a sharp foreground that gives an artistic visual effect.

[0048] In one embodiment, the also invention includes a computer-readable non-volatile memory or storage (not shown in figure) embodying instructions for implementing the method as illustrated and described in FIG. 1. The computer-readable memory may be media and/or devices that enable non-transitory storage of data to be executed on a system as illustrated in FIG. 2. The computer-readable memory may include removable and nonremovable media and/or storage devices implemented in a method or technology suitable for storage of information such as computer-readable instructions, data structures, program modules, logic elements/circuits, and other data.

[0049] Further, the system 200 may also include a user interface 224 configured to enable a user to interact with the system 200. For instance, the user may select a mode of operation, such as pan shot mode, and initiate video capturing via the user interface 224. In one embodiment, the user interface 224 may include a display unit and a plurality of control buttons. In other embodiments, the user interface 224 may include a touch display.

[0050] In various embodiments, the user interface 224 may provide a plurality of controls for adjusting settings of the system 200. For example, the plurality of controls may include menu button, info button, shutter button, ISO button, mode dial, display illumination button, power button, flash button, erase button, and various control options as known in the art.

[0051] In some embodiments, the system 200 may include a communication unit (not shown in figure) for sending or receiving files, such as images and videos, to other devices. For example, the system may be configured with a Wi-Fi feature to connect or disconnect to Wi-Fi networks. The system 200 may also include one or more of a NFC antenna, Bluetooth chip, Infrared, USB port, HDMI port, and the like. EXAMPLES

EXAMPLE 1: PAN SHOT FROM A VIDEO WITH CLEAR FOREGROUND

[0052] An example implementation of the method for generating pan shots from video with a clear foreground is illustrated in FIG. 3A - FIG. 3E. The video depicts a cheetah, in the foreground, sprinting against a green background. The video was downloaded from: https://en.wikipedia rg vvikWi1e:Cheetah8 on the Edge (Direci^'or%27s Cu .ogv.

[0053] Three consecutive frames of the video is depicted in FIG. 3A. As shown in FIG. 3A, the three consecutive frames (i), (ii), and (iii) of the video depict the cheetah as an object in foreground and each frame has a distinct background. The object and the foreground was clearly captured, i.e., without blur as the video was captured by a video capturing device with high shutter speed.

[0054] Each frame of the video was warped to compensate the motion in the background based on homographies (H) of consecutive frames. The homographies (H) was determined between a reference frame and the plurality of frames. As there would have been large view changes between the reference frame and say the i"¹ frame, the warp between the consecutive frames was determined to build homographies relating the i"¹ frame and the reference frame as H_g ^k' ^ = n_m=_k H_g ^m' ^m+1-⁾. The background motion due to the camera was removed on aligning the frames and the background in each frame (i), (ii), (iii) of the video were made consistent, as shown in FIG. 3B. [0055] After background compensation, the foreground from each of the background compensated frames was segmented, as shown in FIG. 3C. The problem of segmenting the moving object can be formulated as a bilabel assignment problem, which can be incorporated in a Markov Random Field (MRF) framework and effectively solved using graph cuts. The cost of assigning a label r to the pixel position X® = (x®, y®) in the i^th frame is given as

E(r_x(i)) = E_data(x⁽ⁱ⁾) + E_smooth(x⁽ⁱ⁾) (1) where r ^ {— 1, 1 } and -1 and +1 may be the two labels corresponding to foreground and background, respectively. Here, the data cost at a pixel x® is defined by combining the global motion between the frames and optical flow, while the smoothness cost depends on the color information between the neighboring pixels.

[0056] Further, optical flow technique was used to give the displacement field between two images which can be used to decipher point correspondences. This method uses additional constraints on gradient and flow smoothness to make the flow vectors reliable under small illumination changes.

[0057] For any given two motion compensated frames (L®, L®) and the optical flow vectors (u_x, u_y) between the two frames, each pixel position X® in was warped to X® in

L® using the optical flow as X® = [x®, y^a)] = [x^(l) + u_x, y^(l) + u_y]. When the warped point X® and X® coincide, i.e., when the flow is zero or within a limit δ, then it was assumed that the point is from the background since we have already compensated for the background motion otherwise it was considered as from the foreground. [0058] Further, the data cost in Eq. (1) is a measure of how well the label r fits the X^w pixel position and is defined as

E_data(x⁽ⁱ⁾) = exp[(r_x(i))( | | « _ *( | | 2 _ _δ)]

[0059] If the pixel distance | |X(j) — X(i) | | | ≤ δ (i.e. the pixels correspond to the background), then assigning value r_x i) = 1 will minimize the data cost and vice-versa. Further, the smoothness cost takes the color information in the neighborhood of a pixel into consideration while assigning the label to make the cut smooth. Hence, it is assigned as

E_sm0oth(X⁽ⁱ⁾) =∑_y(i) _eN cp[r_Y(i)≠ r_{x (}i) ] . _e-P"^c"² where C is the color difference between the pixels, N is the pixel neighborhood of X^(l) and φ[.] is a function that returns 1 when the argument inside the braces is true and 0 if false. Minimizing the cost function in Eq. (1) gives the dynamic object segmented out from the background.

[0060] The segmented foregrounds from each of the plurality of frames were correlated to obtain inter-frame object displacement. For example, consider the three consecutive frames { L^, }_i=1 with inter-frame object displacement d_a and d_t,. Let the true object displacement be Od, the relative depth be γ, and the camera motion j₎ between the frames i and j to be t^ j₎, which can be found by negating the homography (H) estimated from the background. If the frame -rate is f_p, then the object velocity v_x is Odf_p. The object undergoes a displacement of Od - yt(i, j₎ in between the input frames. When the background motion is compensated for, equivalently the object net displacement becomes Od + (1 - y)t(i, j₎ in the motion compensated frames. The relative depth and the net displacement of the object may be determined by

[0061] The frames were rewarped using the net displacement to create rewarped clear frames. The rewarped frames have a dynamic background and static foreground, as shown in FIG. 3D. The rewarped clear frames were averaged to generate the pan shot. As shown in FIG. 3E the pan shot has a sharp foreground and blurred background.

EXAMPLE 2: PAN SHOT FROM A VIDEO WITH CLEAR FOREGROUND

[0062] Another example implementation of the method is illustrated in FIG. 4A and FIG. 4B, which depict a gazelle. FIG. 4A illustrates one of the plurality of frames of the video of the gazelle sprinting on which the method was performed. As shown, Fig 4A also depicts the scenario where the foreground and background are devoid of any motion blur artifacts. Even though the background appears to have defocus blur here, it does not reflect the object motion. The same procedure of no blur case was performed to obtain the pan shot, as shown in Fig 4B. EXAMPLE 3: PAN SHOT FROM A VIDEO WITH BLURRED FOREGROUND

[0063] Yet another example implementation of the method for generating a pan shot from a video, which was captured with slow shutter speed. The video as illustrated in FIG. 5 A and FIG. 5B, depicts a moving car. FIG. 5 A illustrates one of the plurality of frames of the video of the moving car on which the method was performed. As shown, the frame has a static background and blurred foreground due to a slow shutter speed of the video recording device.

[0064] A global homography of the plurality of frames was determined and the frames were warped to align the background in each of the plurality of frames. The foreground in each of the frames was then segmented and correlated to determine the net displacement and relative depth of the object. The deblurring operation was performed on each of the plurality of frames as the foreground is blurred.

[0065] The deblurring operation included constructing an alpha-matte using the trimap to separate background, foreground, and ambiguous region, which may be either a background or foreground. The separated background includes regions with and without pixels. The regions without pixels were filled using the pixels of the static backgrounds of the plurality of frames to obtain a filled background.

[0066] The deblurring of the foreground was based on blur weights corresponding to the camera motion t^ _j) in each frame. The blur weights estimated for the i"¹ background frame may be denoted by ω^(ι) = {ω^¹}}^ . Each entry in ω^(ι) represents a fraction of the total exposure time the camera spent in a particular pose k e K. Since the camera trajectory is dominated by the panning motion of camera and is along the horizontal direction, the weights estimated for each frame can be ordered in ascending order of t_x so as to obtain the camera motion t^ _j) trajectory in each frame. From the camera trajectory and weight ω^(ι) for the 1^th background frame, it may be assumed that for fraction of time, the camera was static and only the foreground dynamic object moved with its own velocity v_x. The displacement experienced by the moving object in that fraction of time would be v_x. .τ from its position with respect to previous camera pose k - 1. The position of object with respect to the k^th pose was calculated using

where, is the homography corresponding to the k pose and scaled by γ. Hence, the weights {ω^Ρ}^ of the foreground for that particular camera pose k and background weight was found by sampling the displacement into intervals equally and distributing the k"¹ weight

On repeating this for every k = (1, 2, ..IKI) the foreground weight from the estimated background weights for the i^th frame was derived. The foreground weights (w) will not be uniform in general as they also depend on the camera motion t^_}-

[0067] After the weight distribution, the foreground alone was subjected to non -blind deblurring using the non-uniformly distributed blur weights. In one embodiment, the foreground deblurring may be based on Richardson-Lucy algorithm. The update rule may be modified to incorporate non-uniform blur weights. The modified update rule may be given

where λι = 0.002 for an image scaled in 0-1 range, E^l is the residual error between the real blurred image and predicted blurred image, and Rjy =— V— -. [0068] Further, the deblurred foreground and the filled background in each frame were mixed to obtain the plurality of clear frames (not blurred). The plurality of clear frames were rewarped using the net displacement to create rewarped clear frames. The rewarped frames have a dynamic background similar to the background of the input frames of the video. The deblurred foreground is static and is at the same position in each frame. The rewarped clear frames were then averaged to generate the pan shot, which has a sharp foreground and blurred background.

[0069] The advantages of the present subject matter and its embodiments include automatically and effortlessly generating pan shots with minimal human intervention. The method and system also accounts for any blurring of the object or foreground due to camera motion, object motion, shutter speed, etc., and provides an artistic visual effect to the photograph. Therefore, the subject matter may be used for augmenting hardware capability of consumer cameras or mobile phones.

[0070] Although the detailed description contains many specifics, these should not be construed as limiting the scope of the invention but merely as illustrating different examples and aspects of the invention. It should be appreciated that the scope of the invention includes other embodiments not discussed herein. Various other modifications, changes and variations which will be apparent to those skilled in the art may be made in the arrangement, operation and details of the system and method of the present invention disclosed herein without departing from the spirit and scope of the invention as described here.

[0071] While the invention has been disclosed with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the invention. In addition, many modifications may be made to adapt to a particular situation or material the teachings of the invention without departing from its scope.

Claims

We claim:

1. A method for automatically generating a pan shot from a video of a dynamic object, the method comprising:

warping a plurality of frames of the video to compensate for background motion in the video based on homographies (H) of consecutive frames, wherein the warping comprises aligning dynamic backgrounds to create a static background in the plurality of frames;

segmenting foreground from the background compensated frames to create a trimap, wherein the foreground comprises the dynamic object;

correlating the foreground of each frame with a preceding and a succeeding frame to obtain inter-frame object displacement (dj) with respect to the preceding and succeeding frames;

determining a net displacement (od) and a relative depth (γ) of the object based on the inter-frame object displacement (dj) and a camera motion t(ij);

performing a deblurring operation in the foreground if a blur is present in the foreground to obtain a plurality of clear frames;

rewarping the plurality of clear frames using the net displacement (Od) of the object to create rewarped clear frames, wherein the rewarped clear frames comprise dynamic background and static foreground; and

averaging the rewarped clear frames to generate the pan shot, wherein the pan shot comprises a blurred background and a sharp foreground.

2. The method of claim 1, wherein the performing the deblurring operation comprises: determining an object velocity from the net displacement (Od) and a frame rate;

constructing an alpha-matte using the trimap to separate background, foreground, and an ambiguous region, wherein the ambiguous region comprises a combination of pixels from background and foreground;

estimating blur weights by uniformly sampling object displacement;

filling the background using the pixels of the static backgrounds in the plurality of frames;

deblurring the foreground in the plurality of frames based on object velocity, net displacement (od), and a kernel;

mixing the deblurred foreground and filled background in each frame to obtain the plurality of clear frames.

3. The method of claim 1, wherein the trimap is a pre-segmented image comprising a background, a foreground, and a plurality of ambiguous regions.

4. The method of claim 1 , wherein the homography (H) is determined between a reference frame and the plurality of frames using random sampling consensus (RANSAC).

5. The method of claim 1, wherein the determining the net displacement and the relative depth is based on:

6. The method of claim 1 , wherein the segmenting is performed using graph-cut algorithm.

7. The method of claim 6, wherein the graph-cut algorithm is based on a data cost function and a smoothness cost function for distinguishing foreground and background in each frame.

8. The method of claim 1, wherein the camera motion t^ j₎ is determined using the homographies (H).

9. The method of claim 1, further comprising displaying the generated pan shot, via interface, to a user.

10. A system for automatically generating a pan shot from a video of a dynamic object, the system comprising:

an image capturing unit for capturing the video;

a processing unit;

a memory unit coupled to the processing unit, the memory unit comprising:

a warping module configured to warp a plurality of frames of the video to compensate for background motion in the video based on homographies (H) of consecutive frames, wherein the warping comprises aligning dynamic backgrounds to create a static background in the plurality of frames; a segmentation module configured to segment foreground from the background compensated frames to create a trimap, wherein the foreground comprises the dynamic object;

a correlation module configured to correlate the foreground of each frame with a preceding and a succeeding frame to obtain inter-frame object displacement (dj) with respect to the preceding and succeeding frames;

a displacement computation module configured to determine a net displacement (o_d) and a relative depth (γ) of the object based on the inter-frame object displacement (dj) and the camera motion t^^y,

a deblurring module configured to deblur the foreground if a blur is present in the foreground to obtain a plurality of clear frames;

a rewarping module configured to rewarp the plurality of clear frames using the net displacement (o_d) of the object to create rewarped clear frames, wherein the rewarped clear frames comprise dynamic background and static foreground; and an averaging module configured to perform an averaging operation on the rewarped clear frames to generate the pan shot, wherein the pan shot comprises a blurred background and a sharp foreground.

11. The system of claim 10, wherein the image capturing unit comprises at least:

a lens for allowing light from the object and surroundings of the object to be captured;

a shutter to control amount of light to be captured; an image sensor for converting the light to electrical signal, wherein the electric signal is stored in the memory unit.

12. The system of claim 10, wherein the deblurring module is further configured to:

determine an object velocity from the net displacement (o_d) and frame rate

(f_P);

constructing an alpha-matte using the trimap to separate a clear background, a clear foreground, and an ambiguous region, wherein the ambiguous region comprises a combination of pixels from background and foreground;

estimating blur weights by uniformly sampling object displacement;

deblurring the foreground in the plurality of frames based on object velocity, net displacement, blur weights, and a kernel;

mixing the deblurred foreground and filled background in each frame to obtain a plurality of clear frames.

13. The system of claim 10, further comprising a user interface configured to enableo interact with the system, wherein the user interface comprises at least:

a display unit for displaying at least the generated pan shot; and

a plurality of controls configured to enable at least:

a selection of a mode of operation comprising pan shot mode, and an adjustment of settings of the system.

14. The system of claim 13, wherein the display unit is configured to display the plurality of controls.

15. The system of claim 10 incorporated in a video recording device.

16. The system of claim 10, further comprising a communication unit for sending to or receiving data from other devices.

17. A computer program product having non-volatile memory therein, carrying computer executable instructions stored therein for automatically generating a pan shot from a video of a dynamic object, the instructions for performing the steps of:

determining a net displacement (Od) and a relative depth (γ) of the object based on the inter-frame object displacement (dj) and a camera motion t(ij); performing a deblurring operation in the foreground if a blur is present in the foreground to obtain a plurality of clear frames;

rewarping the plurality of clear frames using the net displacement (o_d) of the object to create rewarped clear frames, wherein the rewarped clear frames comprise dynamic background and static foreground; and

18. The computer program product of claim 17 wherein the instructions to perform the deblurring operation comprise:

determining an object velocity from the net displacement (o_d) and a frame rate;

estimating blur weights by uniformly sampling object displacement;

deblurring the foreground in the plurality of frames based on object velocity, net displacement (o_d), and a kernel;

19. The computer program product of claim 17, further comprising instructions for displaying at least the generated pan shot via a user interface.

20. The computer program product of claim 17, further comprising instructions for providing a plurality of controls via a user interface for:

selection a mode of operation, wherein the mode of operation comprises a pan shot mode, and

an adjustment of settings of the system.

21. The computer program product of claim 17, further comprising instructions for communicating with other devices.