CN112740652B

CN112740652B - System and method for stabilizing video

Info

Publication number: CN112740652B
Application number: CN201980061658.4A
Authority: CN
Inventors: 塞萨尔·杜阿迪; 托马斯·德尔巴纳; 马克西姆·卡尔普辛
Original assignee: GoPro Inc
Current assignee: GoPro Inc
Priority date: 2018-09-19
Filing date: 2019-08-27
Publication date: 2022-07-08
Anticipated expiration: 2039-08-27
Also published as: US11979662B2; US10750092B2; CN112740654B; CN112740652A; EP3854070A1; US10536643B1; WO2020060728A1; EP3854071A1; US11647289B2; EP3854070A4; US20230319408A1; EP3854071A4; US20200092451A1; WO2020060731A1; US20200374442A1; US11678053B2; US20220053114A1; US11228712B2; US20230171496A1; US11172130B2

Abstract

The image capture device captures visual content during a capture duration. The image capture device undergoes a change in position during the capture duration. The trajectory of the image capture device is smoothed based on the forward looking of the trajectory. A cut-out of the visual content is determined based on the smoothed trajectory. The interception part of the visual content is used for generating stable visual content.

Description

System and method for stabilizing video

Technical Field

The invention relates to stabilizing video using the position of an image capture device during a capture duration.

Background

Video may already be captured by an image capture device in motion. Movement of the image capture device during video capture may cause the video to appear jerky/sloppy.

Disclosure of Invention

The invention relates to stabilizing video. Visual content having a field of view may be captured by an image capture device during a capture duration. Visual information defining visual content, positional information characterizing a rotational position of the image capture device at different times within a capture duration, and/or other information may be obtained. The trajectory of the image capture device during the capture duration may be determined based on the location information and/or other information. The trajectory may reflect the rotational position of the image capture device at different times within the capture duration. The trajectory may include a first portion corresponding to a first time instant within the capture duration and a second portion corresponding to a second time instant within the capture duration that is subsequent to the first time instant. A smooth trajectory of the image capture device may be determined based on the subsequent portion of the trajectory and/or other information such that a portion of the smooth trajectory corresponding to the first portion of the trajectory may be determined based on the second portion of the trajectory. The smooth trajectory may have a smoother change in the rotational position of the image capture device than the trajectory. The viewing window for the visual content may be determined based on a smooth trajectory of the shell and/or other information. The viewing window may define one or more ranges of visual content. The stable visual content of the video may be generated based on the viewing window and/or other information. The stabilized visual content may comprise a truncation (punchout) of the range of the visual content within the viewing window.

The system to stabilize video may include one or more electronic memories, one or more processors, and/or other components. The electronic storage may store visual information defining visual content, information related to the visual content, position information characterizing a rotational position of the image capture device, information related to a trajectory of the image capture device, information related to a smooth trajectory of the image capture device, information related to a field of view of the optical element, information related to a viewing window, information related to stable visual content, information related to interception of the visual content, and/or other information. In some implementations, the system may include one or more optical elements, one or more image sensors, one or more position sensors, and/or other components.

One or more components of the system may be carried by a housing, such as a housing of an image capture device. For example, the optical element(s), image sensor(s), and/or position sensor(s) of the system may be carried by a housing of the image capture device. The housing may carry other components, such as processor(s) and/or electronic memory.

The optical element may be configured to direct light within the field of view to the image sensor. The field of view may be larger than the size of the capture/viewing window used to generate the stabilized video content. The image sensor may be configured to generate a visual output signal based on light incident on the image sensor during a capture duration. The visual output signal may convey visual information defining a visual content having a field of view.

The position sensor may be configured to generate a position output signal based on a position of the housing during the capture duration. The position output signal may convey position information indicative of a rotational position of the housing at different times within the capture duration. In some embodiments, the position information may also characterize the translational position of the housing at different times within the capture duration. In some implementations, the position sensor may include one or more of a gyroscope, an accelerometer, and/or an inertial measurement unit. The location information may be determined independently of the visual information.

The processor(s) may be configured by machine-readable instructions. Execution of the machine-readable instructions may cause the processor(s) to facilitate stabilizing the video. Machine-readable instructions may comprise one or more computer program components. The computer program components may include one or more of a trajectory component, a smooth trajectory component, a viewing window component, a generation component, and/or other computer program components.

The trajectory component may be configured to determine a trajectory of the housing during the capture duration based on the position information and/or other information. The trajectory may reflect the rotational position of the housing at different times within the capture duration. In some embodiments, the trajectory may reflect the translational position of the housing at different times within the capture duration. The trajectory may include a first portion corresponding to a first time instant within the capture duration and a second portion corresponding to a second time instant within the capture duration after the first time instant.

The smooth trajectory component may be configured to determine a smooth trajectory of the housing based on subsequent portions of the trajectory and/or other information. A smooth trajectory may be determined such that a portion of the smooth trajectory corresponding to the first portion of the trajectory is determined based on the second portion of the trajectory. The smooth trajectory may have a smoother change in the rotational position of the housing than the trajectory. In some embodiments, the smooth trajectory may have a smoother change in the translational position of the housing than the trajectory.

In some embodiments, the smooth trajectory having a smoother change in the rotational position of the housing than the trajectory may be characterized by the smooth trajectory having less jitter in the rotational position of the housing than the trajectory. In some embodiments, the smooth trajectory having a smoother change in the translational position of the housing than the trajectory may be characterized by the smooth trajectory having less jitter in the translational position of the housing than the trajectory.

In some embodiments, smoothing the trajectory to have a smoother change in the rotational position of the housing than the trajectory may include removing from the smoothed trajectory high frequency changes in the trajectory in the rotational position of the housing. In some embodiments, smoothing the trajectory to have a smoother change in the translational position of the shell as compared to the trajectory may include removing high frequency changes in the trajectory in the translational position of the shell from the smoothed trajectory.

In some implementations, the extent to which the smooth trajectory of the housing deviates from the trajectory of the housing may depend on the amount of rotational and/or translational motion the housing experiences during the capture duration, the exposure time to capture visual content, and/or other information.

In some embodiments, determining a smooth trajectory of the casing based on the subsequent portion of the trajectory may include: (1) obtaining a rotational position of the housing at a first time within the capture duration, the first time being a point in time and corresponding to a video frame of the visual content captured at the point in time within the capture duration; (2) obtaining a rotational position of the housing at a second time within the capture duration, the second time being a duration of time after the point in time within the capture duration; and (3) determining a respective rotational position of the housing within the smooth trajectory at the first time based on the rotational position of the housing at the point in time, the rotational position of the housing during a duration of time after the point in time, and/or other information. The placement of the viewing window for the visual content relative to the field of view of the visual content captured at the first instance in time may be determined based on the respective rotational position of the housing within the smooth trajectory at the first instance in time and/or other information.

In some embodiments, determining a smooth trajectory of the shell based on the subsequent portion of the trajectory may further comprise: (1) determining whether placement of a viewing window for the visual content at a first time causes one or more portions of the viewing window to exceed a field of view of the visual content; and (2) in response to determining that the portion(s) of the viewing window for the visual content at the first instance in time exceed the field of view of the visual content, adjusting the respective rotational position of the housing within the smooth trajectory at the first instance in time such that the viewing window for the visual content at the first instance in time does not exceed the field of view of the visual content.

In some embodiments, the respective rotational position of the housing within the smooth trajectory at the first time may be initially determined based on a combination of the estimated rotational position of the housing at the first time and the respective rotational position of the housing and/or other information. The estimate of the respective rotational position of the housing may be determined based on a minimization of a combination of a rotational speed of the housing and a rotational acceleration of the housing and/or other information.

The viewing window component may be configured to determine a viewing window for the visual content based on the smooth trajectory of the shell and/or other information. The viewing window may define one or more ranges of visual content.

The generation component may be configured to generate stable visual content of the video based on the viewing window and/or other information. The stabilized visual content may include an interception of a range of visual content within the viewing window.

In some implementations, at least one of the processor(s) may be a remote processor located remotely from the housing of the image capture device. The generation of the stable visual content may be performed by the remote processor after the visual content is captured.

These and other objects, features and characteristics of the systems and/or methods disclosed herein, as well as other objects, features and characteristics of the related elements of structure, the methods of operation and functions of the same, and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the invention. As used in the specification and in the claims, the singular form of "a," "an," and "the" include plural referents unless the context clearly dictates otherwise.

Drawings

Fig. 1 illustrates an exemplary system for stabilizing video.

Fig. 2 illustrates an exemplary method for stabilizing video.

Fig. 3 illustrates an exemplary image capture device.

Fig. 4 shows an exemplary trajectory of the rotational position of the image capture device over the capture duration.

FIG. 5A illustrates an exemplary predicted trajectory.

Fig. 5B shows an exemplary filter trace.

FIG. 5C illustrates an exemplary smoothed trajectory.

Fig. 6A shows an exemplary orientation of a viewing window relative to an image.

Fig. 6B illustrates a number of exemplary orientations of the viewing window relative to the image.

Fig. 7 shows an exemplary plot of smoothed values.

Detailed Description

Fig. 1 shows a system 10 for stabilizing video. System 10 may include one or more of a processor 11, an interface 12 (e.g., a bus interface, a wireless interface), an electronic memory 13, and/or other components. In some embodiments, system 10 may include one or more image sensors, one or more position sensors, and/or other components. Visual content having a field of view may be captured by an image capture device during a capture duration. Visual information defining visual content, positional information characterizing a rotational position of the image capture device at different times within a capture duration, and/or other information may be obtained. Visual information, location information, and/or other information may be obtained by the processor 11. The trajectory of the image capture device during the capture duration may be determined based on the location information and/or other information. The trajectory may reflect the rotational position of the image capture device at different times within the capture duration. The trajectory may include a first portion corresponding to a first time instant within the capture duration and a second portion corresponding to a second time instant within the capture duration after the first time instant. A smooth trajectory of the image capture device may be determined based on the subsequent portion of the trajectory and/or other information such that a portion of the smooth trajectory corresponding to the first portion of the trajectory may be determined based on the second portion of the trajectory. The smooth trajectory may have a smoother change in the rotational position of the image capture device than the trajectory. The viewing window for the visual content may be determined based on a smooth trajectory of the shell and/or other information. The viewing window may define one or more ranges of visual content. The stabilized visual content of the video may be generated based on the viewing window and/or other information. The stabilized visual content may comprise a truncation of the extent of the visual content within the viewing window.

Electronic storage 13 may be configured to include electronic storage media that electronically stores information. Electronic storage 13 may store software algorithms, information determined by processor 11, information received remotely, and/or other information that enables system 10 to function properly. For example, the electronic storage 13 may store information related to visual content, visual information defining visual content, information related to an image capture device, information related to an optical element, information related to an image sensor, information related to a position sensor, position information characterizing a rotational position of the image capture device, information related to a trajectory of the image capture device, information related to a smooth trajectory of the image capture device, information related to a field of view of the optical element, information related to a viewing window, information related to stable visual content, information related to interception of visual content, and/or other information.

Processor 11 may be configured to provide information processing capabilities in system 10. Thus, processor 11 may include one or more of a digital processor, an analog processor, a digital circuit designed to process information, a central processing unit, a graphics processing unit, a microcontroller, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information. The processor 11 may be configured to execute one or more machine readable instructions 100 to facilitate stabilizing video. The machine-readable instructions 100 may include one or more computer program components. The machine-readable instructions 100 may include one or more of a trajectory component 102, a smooth trajectory component 104, a viewing window component 106, a generation component 108, and/or other computer program components.

During the capture duration, the image capture device may capture visual content ((image (s)), video frame(s), video (s)) having a field of view. The field of view of the visual content may define a field of view of a scene captured within the visual content. The acquisition duration may be measured/defined in terms of duration and/or number of frames. For example, visual content may be captured during a capture duration of 60 seconds and/or from one point in time to another. As another example, 1800 images may be captured during the capture duration. If the images are captured at a rate of 30 images/second, the capture duration may correspond to 60 seconds. Other capture durations are contemplated.

The system 10 may be remote from the image capture device or local to the image capture device. One or more portions of the image capture device may be remote from the system 10 or part of the system 10. One or more portions of the system 10 may be remote from or part of the image capture device. For example, one or more components of system 10 may be carried by a housing, such as a housing of an image capture device. For example, the image sensor(s) and position sensor(s) of system 10 may be carried by a housing of the image capture device. The housing may carry other components, such as a processor 11, electronic storage 13, and/or one or more optical elements. Reference to the housing of the image capture device may refer to the image capture device and vice versa. For example, reference to a position/motion of a housing of an image capture device may refer to a position/motion of the image capture device, and vice versa.

An image capture device may refer to a device for recording visual information in the form of images, video, and/or other media. The image capture device may be a standalone device (e.g., a camera) or may be part of another device (e.g., part of a smartphone). Fig. 3 shows an exemplary image capture device 302. Image capture device 302 may include a housing 312, and housing 312 may carry (attach to, support, hold, and/or otherwise carry) optical element 304, image sensor 306, position sensor 308, processor 310, and/or other components. Other configurations of the image capture device are contemplated.

The optical element 304 may include instrument(s), tool(s), and/or medium that act on light passing through them. For example, the optical element 304 may include one or more of a lens, a mirror, a prism, and/or other optical elements. The optical element 304 may affect the direction, deviation, and/or path of light passing through the optical element 304. The optical element 304 may have a field of view 305. The optical element 304 may be configured to direct light within the field of view 305 to the image sensor 306. The field of view 305 may include the field of view of the scene within the field of view of the optical element 304 and/or the field of view of the scene transmitted to the image sensor 306. For example, optical element 304 may direct light within its field of view to image sensor 306, or may direct light within a portion of its field of view to image sensor 306. The field of view 305 of the optical element 304 may refer to the range of the observable world seen through the optical element 304. The field of view 305 of the optical element 304 may include one or more angles (e.g., vertical angles, horizontal angles, diagonal angles) at which the optical element 304 receives light and passes it to the image sensor 306. In some embodiments, the field of view 305 may be greater than or equal to 180 degrees. In some implementations, the field of view 305 may be less than or equal to 180 degrees.

The field of view 305 may be larger than the size of the intercepting/viewing window used to produce stable visual content. A portion of the visual content captured from light within the field of view 305 may be presented on a display and/or used to generate video. The portions of the visual content presented on the display/used to generate the video may include those portions of the visual content that are within the viewing window. The viewing window may define the extent of the visual content (e.g., image (s)/video frame (s)) to be included within the cutout. The viewing window may be determined such that visual content within the rendered/generated video does not appear to shake/jerky or appears to be less shaky/jerky. For example, the shape, size, and/or location of the viewing window within the visual content may be determined to compensate for the motion of image capture device 302 during capture such that video appears to be captured from image capture device 302 with less motion. That is, the visual content captured by the image capture device 302 may be cropped to generate stable visual content.

Image sensor 306 may include sensor(s) that convert received light into an output signal. The output signal may comprise an electrical signal. For example, image sensor 306 may include one or more of a charge coupled device sensor, an active pixel sensor, a complementary metal oxide semiconductor sensor, an N-type metal oxide semiconductor sensor, and/or other image sensors. The image sensor 306 may generate an output signal conveying information defining the visual content of one or more video frames of one or more images and/or videos. For example, the image sensor 306 may be configured to generate a visual output signal based on light incident on the image sensor 306 during a capture duration. The visual output signal may convey visual information defining a visual content having a field of view.

Position sensor 308 may include sensor(s) that measure the position and/or motion experienced. Position sensor 308 may convert the experienced position and/or motion into an output signal. The output signal may comprise an electrical signal. For example, position sensor 308 may refer to a set of position sensors that may include one or more inertial measurement units, one or more accelerometers, one or more gyroscopes, and/or other position sensors. Position sensor 308 may generate an output signal that conveys information indicative of a position and/or a motion of position sensor 308 and/or the device(s) carrying position sensor 308 (e.g., image capture device 302 and/or housing 312).

For example, position sensor 308 may be configured to generate a position output signal based on the position of the housing/image capture device during the capture duration. The position output signal may convey position information characterizing the position of the housing 312 at different times (time points, durations) within the capture duration. The position information may characterize the position of housing 312 based on a particular translational and/or rotational position of housing 312, and/or based on a change in the translational and/or rotational position of housing 312 as a function of progress over the capture duration. That is, the position information may characterize the translational and/or rotational position of the housing 312, and/or the (motion) change (e.g., direction, amount, velocity, acceleration) in the translational and/or rotational position of the housing 312 during the capture duration.

The position information may be determined based on the signal generated by the position sensor 308 and independently of the information/signal generated by the image sensor 306. That is, the location information may not be determined using the visual content/images/video generated by the image sensor 306. Determining the position/motion of housing 312/image capture device 302 using visual content/images/video can be computationally expensive in terms of processing power, processing time, and/or battery consumption. Using information/signals from position sensor 308 to determine the position/motion of housing 312/image capture device 302 may be computationally cheaper. That is, when determining the position/motion of housing 312/image capture device 302 based on information/signals from position sensor 308, less processing power, processing time, and/or battery consumption may be required than when determining the position/motion of housing 312/image capture device 302 based on information/signals from image sensor 306. The position information determined independently of the image information may be used to determine the trajectory of the housing 312/image capture device 302 during the capture duration.

Processor 310 may include one or more processors (logic circuits) that provide information processing capabilities in image capture device 302. Processor 310 may provide one or more computing functions for image capture device 302. Processor 310 may operate/send command signals to one or more components of image capture device 302 to operate image capture device 302. For example, processor 310 may facilitate operation of image capture device 302 when capturing image(s) and/or video(s), operation of optical element 304 (e.g., changing the manner in which light is directed by optical element 304), and/or operation of image sensor 306 (e.g., changing the manner in which received light is converted into information defining an image/video, and/or post-processing the image/video after capture).

Processor 310 may obtain information from image sensor 306 and/or location sensor 308 and/or facilitate transmission of information from image sensor 306 and/or location sensor 308 to another device/component. Processor 310 may be remote from processor 11 or local to processor 11. One or more portions of processor 310 may be part of processor 11 and/or one or more portions of processor 10 may be part of processor 310. Processor 310 may include and/or perform one or more functions of processor 11 shown in fig. 1.

For example, the processor 310 may use the position information to stabilize visual content captured by the optical element 304 and/or the image sensor 306. Visual content may be captured by image capture device 302 during a capture duration. The trajectory (path, course) of the rotational and/or translational positions experienced by image capture device 302 during the capture duration may be determined based on the position information and/or other information. A smooth trajectory of image capture device 302 may be determined based on look-ahead of the trajectory and/or other information. The look-ahead of the trajectory may include using subsequent portions of the trajectory to determine a previous portion of the smoothed trajectory. For example, the portion of the smoothed trajectory corresponding to a point or a duration within the capture duration may be determined based on the portion of the trajectory corresponding to a future point/future duration prior to the point/duration within the capture duration. For example, the portion of the smoothed trajectory corresponding to a point (video frame) within the capture duration may be determined based on the portion of the trajectory corresponding to a duration of one second after/before the point within the capture duration.

The smoothed trajectory may be used to determine a viewing window for stabilizing the visual content. The viewing window may define the range(s) of visual content to be cropped to stabilize the visual content. The stabilized visual content may include a cropped range of visual content, wherein the viewing window defines a range of cropped visual content. The range within which the viewing window can move within the visual content can be referred to as a stable margin. The stable margins may specify how much the crop may move while remaining entirely within the field of view of the visual content.

This stabilization of visual content may maintain intentional motion experienced by image capture device 302 during the capture duration. Intentional motion may refer to motion desired by a user of image capture device 302. Intentional motion may include meaningful phenomena related to the semantics of a video that may express what a user actually wants to capture. For example, the intentional motion may include a forward motion to follow an object of interest, a panning motion to capture a wide view of a landscape, and/or other motions. Unintentional motion may refer to motion that is not desired by a user of the image capture device 304. Unintentional motion may result in the generation of noise within captured visual content, such as motion of visual content due to unintentional motion of the image capture device 304 (e.g., caused by a hand grip, vibration, jolt).

Intentional motion is characterized by motion at a lower frequency than unintentional motion. Unintentional motion may be characterized by motion (e.g., jitter, sloshing) at a higher frequency than intentional motion. In order to identify intentional motion characterized by low frequencies, it may be desirable to use a sufficiently long time range of experienced motion. Therefore, to stabilize the visual content, subsequent portions of the trajectory are used to determine a smooth trajectory. Longer intervals of time range may better approximate intentional motion while causing longer delays (longer waits to determine a smooth trajectory).

A smooth trajectory (clipping trajectory) may be determined based on minimization of a value, score, and/or metric representing a high frequency quantity. The value/score/metric may be determined based on an objective function consisting of multiple sets of terms. For example, the objective function may include two sets of terms, one representing the L2-norm of the angular velocity between frames and the other representing the angular acceleration. One or more of these terms may be weighted by one or more constants. The constants may determine the amount of contribution of the respective terms to the objective function. From a signal processing point of view, these two terms may correspond to the output power of the two high-pass filters.

The values of the smoothed trajectory over the capture duration may include values that minimize a value/score/metric representing a high frequency quantity while accounting for clipping constraints. Clipping constraints may require that the viewing window used to intercept the visual content remain within the field of view of the visual content. The viewing window (determined based on the smoothed trajectory) may not exceed the capture range of the visual content.

In some embodiments, an iterative minimization may be used to determine a smooth trajectory. Iterative minimization may include an iteration of two steps to alter the most recent estimate: (1) an updating step of updating the current estimation value in such a way that the value of the objective function decreases, without being affected by any constraint(s); (2) an inference step (project step) of applying constraint(s) -testing whether the current estimate satisfies the constraint(s) (e.g., whether a viewing window used to intercept the visual content remains within the field of view of the visual content), and if the constraint(s) is not satisfied (e.g., the viewing window exceeds the field of view of the visual content), re-inferring the current estimate.

Stabilization of the visual content may include using a trajectory retrieval algorithm and a stabilization algorithm. Inputs to the trajectory retrieval algorithm may include a trajectory experienced by the image capture device 302 (e.g., rotational position when capturing video frames), a bootstrap trajectory, a fixed past, a number of iterations to be performed, and/or other information. The output of the trajectory retrieval algorithm may include a smoothed trajectory (e.g., for determining the virtual/actual rotational position of the truncated portion of the video frame) and/or other information.

The trajectory retrieval algorithm may include one or more of the following steps. The inference can be prepared by copying the input trace into an inference buffer (projection buffer). If the bootstrap is shorter than a given length, the bootstrap can be acquired and filled. Bootstrapping may be used as a current/initial guess for smoothing the trajectory. If there is no fixed past, then the first sample can be used to infer bootstrapping on the left. Otherwise, bootstrapping can be inferred on the right using the last sample. The trajectory may be smoothed using an iterative approach. The trajectories may be smoothed using a fine-scale method and a coarse-scale method. For example, the trajectory may be smoothed using a fine-scale approach to smooth the trajectory on a small scale, and then the smoothed trajectory may be examined to determine whether a clipping constraint is satisfied. If the clipping constraint is not satisfied, then an inference can be performed and the smoothness of the trajectory can be changed such that the clipping constraint is satisfied. Multiple iterations of fine-scale smoothing may be performed, and after an iteration of fine-scale smoothing, the current guess for the smoothed trajectory may be sub-sampled. The trajectory may then be smoothed using a coarse-scale method to smooth the trajectory over a larger scale. The coarse-scale smoothing may be performed for a plurality of iterations, and after the iterations of the coarse-scale smoothing, a sub-sampled current guess for a smoothed trajectory may be upsampled. The trajectory can then be smoothed again using a fine-scale approach. Iterative smoothing of the trajectory may result in convergence of the smoothed trajectory. After a number of iterations, subsequent iterations may have less impact on the smoothness of the trajectory. Once the iteration of fine-scale smoothing and coarse-scale smoothing is completed, the current guess can be output.

The inputs to the stabilization algorithm may include visual content (video streams, video frames and their timestamps (e.g., the time at which the center scan line was captured)), location information (e.g., gyroscope streams, packets of gyroscope samples), and/or other information captured by image capture device 302. The output of the stabilization algorithm may include stabilized visual content, a truncated viewing window for the visual content (e.g., a temporally stable crop orientation), and/or other information.

The stabilization algorithm may include one or more of the following steps for each video frame to be stabilized. The gyroscope samples may be integrated to arrive at a frame timestamp. The frame may be inserted into a circular buffer. When there are a sufficient number of frames in the buffer (e.g., the number of frames in the buffer is equal to and/or greater than a threshold number), the input trace may be unrolled. The fixed past may be prepared by taking the crop orientation history and unrolling it back relative to the first sample of the input trajectory. If there is no history, the fixed past may be considered empty. Bootstrapping may be prepared by taking the crop orientation history and populating the rest with input samples according to their timestamps. If there is no history, then the bootstrap may be considered equal to the input trace. A trajectory retrieval algorithm may be invoked/executed. The smoothed trajectories output by the trajectory retrieval algorithm may be stored in an orientation history. The frame in the header of the buffer can be warped based on the smoothed trajectory by calculating a rotation corresponding to the passage from the source orientation to the stable orientation. The cropped field of view may be sampled to obtain a grid of points, and the grid may be rotated. The mesh may be projected onto the source camera image plane and warped. The warped image may be encoded.

Referring back to fig. 1, the processor 11 (or one or more components of the processor 11) may be configured to obtain information to facilitate stabilizing the video. Obtaining information may include one or more of accessing, obtaining, analyzing, determining, reviewing, identifying, loading, locating, opening, receiving, retrieving, checking, storing, and/or otherwise obtaining information. The processor 11 may obtain information from one or more locations. For example, processor 11 may obtain information from a memory location (e.g., electronic storage 13, electronic storage of information and/or signals generated by one or more sensors, electronic storage of a network-accessible device), and/or other location. Processor 11 may obtain information from one or more hardware components (e.g., image sensors, position sensors) and/or one or more software components (e.g., software running on a computing device).

For example, the processor 11 (or one or more components of the processor 11) may obtain visual information defining visual content having a field of view, positional information characterizing the position of the image capture device at different times within the capture duration, and/or other information. One or more types of information may be obtained during and/or after the image capture device acquires the visual content. For example, the visual information, location information, and/or other information may be obtained at the time the visual content is captured by the image capture device and/or after the visual content has been captured and stored in a memory (e.g., electronic storage 13).

The trajectory component 102 may be configured to determine a trajectory of the image capture device/housing of the image capture device during the capture duration based on the location information and/or other information. The determination of the trajectory may be referred to as trajectory generation. The trajectory generated by the trajectory component 102 may include a trajectory of the image capture device/housing of the image capture device as observed by one or more position sensors. An (observed) trajectory may refer to the progress of one or more paths and/or positions that the image capture device/housing follows/experiences during the capture duration. The (observed) trajectory may reflect the position of the image capturing device/housing of the image capturing device at different moments in time within the capturing duration. The position of the image capture device/image capture device housing may include a rotational position (e.g., rotation about one or more axes of the image capture device) and/or a translational position of the image capture device/image capture device housing. For example, the trajectory component 102 may determine a trajectory of the image capture device/image capture device housing during the capture duration based on position information characterizing a particular translational and/or rotational position of the image capture device/housing, and/or position information characterizing changes in the translational and/or rotational position of the image capture device/housing over the progress of the capture duration.

The (observed) trajectory may comprise different portions corresponding to different moments in time within the capture duration. For example, the (observed) trajectory may include a first portion corresponding to a first time instant within the capture duration and a second portion corresponding to a second time instant within the capture duration. The second time may be within the capture duration after the first time.

FIG. 4 illustrates an exemplary trajectory of an image capture device as observed by a position sensor(s). The trajectory may include an observed yaw trajectory (observed yaw trajectory)400 of the image capture device. The observed yaw trajectory 400 may reflect the yaw angular position (e.g., a rotational position defined to the left or right relative to the yaw axis) of the image capture device/image capture device housing at different times during the capture duration. The observed yaw trajectory 400 may show the image capture device rotating in a negative yaw direction, rotating in a positive yaw direction, rotating back in a forward configuration, and then rotating in a negative yaw direction. For example, during capture of an image, the image capture device may have rotated to the right, then to the left, rotated forward, and then rotated to the right. Other types of trajectories (e.g., pitch trajectory, roll trajectory, pan trajectory) are contemplated.

It may be undesirable to generate video that includes visual content (e.g., of image(s), video frame(s), video (s)) captured along the observed yaw trajectory 400. Generating a video by outputting images captured along the observed yaw trajectory 400 may result in the video having a lens that is panned and/or appears to include unintentional camera motion. For example, a sharp/rapid change in yaw angle position of an image capture device may result in a sudden change in visual direction within a video (e.g., rapid camera motion to the left or right). Multiple changes in the yaw angle position of the image capture device may result in a lens that changes the viewing direction (e.g., right, left, forward, right).

The stabilized visual content may be generated to provide a smoother view of the captured visual content. Stabilizing may include using a smaller portion/range of visual content to provide an intercepted view of the visual content that results in a more stable view than when viewing the entire visual content. The generation of the stable visual content may include providing a truncated view of the captured visual content using a smaller visual portion of the captured visual content (e.g., a smaller visual portion of an image/video frame). The stable visual content may provide a more stable view of the captured visual content than when the entire captured visual content is presented. Such stabilization may be provided, for example, by creating a stable (smooth) trajectory over the capture duration and determining the cut-out from the visual content based on the stable trajectory. A cut-out of visual content may refer to using one or more portions of visual content to render, for example, a cropped portion of an image or a cropped portion of an image. The cut-out of the visual content may include one or more portions of the visual content presented on the display and/or one or more portions of the visual content used to generate video frames of the video content. However, some stabilization techniques may not be able to maintain the intent of the user capturing the image.

For example, the video may be stabilized by predicting the position/motion of the camera based on the past position/motion. For example, when attempting to determine the position and/or shape of a cutout of an image captured at a given time within a capture duration, the position/motion of the image capture device prior to that time may be used to determine how to position/shape the cutout to create a stable view. The use of such "past" position/motion information may conflict with intentional motion by the user of the image capture device.

For example, in an observed yaw trajectory 400, a rotation of the image capture device to the right, then to the left, then to the front may be the result of the image capture device inadvertently rotating to the right, the user overcorrecting the rotation to the left, and then rotating the image capture device to the right to a forward direction. Determining a cut-out of the image using the "past" position/motion information may result in a predicted trajectory as shown in fig. 5A.

For example, based on a rightward rotation of the image capture device during duration a 502, a predicted yaw trajectory a 512 at duration B504 may be predicted, with the predicted yaw trajectory a 512 continuing to rotate to the right. Based on a small rotation of the image capturing device to the right during duration D506, a predicted yaw trajectory B514 at duration D508 may be predicted, the predicted yaw trajectory B514 continuing with a small rotation to the right. The predicted yaw trajectory a 512 may be in the opposite direction of the actual movement of the image capture device during duration B504, and the predicted yaw trajectory B514 may be very different from the observed yaw trajectory. Such a difference between the observed yaw trajectory and the predicted yaw trajectory may result in the image not including sufficient visual information (e.g., pixels) to account for the attempted stable and/or truncated position/shape.

As another example, the video may be stabilized by filtering observed changes in the position/motion of the image capture device. For example, a low pass filter may be applied to the (observed) trajectory to remove sudden rotational and/or translational changes in the position/motion of the image capturing device.

For example, as shown in fig. 5B, by applying a low pass filter to the observed yaw trajectory 400, a filtered yaw trajectory 516 may be determined. The filtered yaw trajectory 516 may have smoother changes in the position/motion of the image capture device as compared to the observed yaw trajectory. However, such filtering may not take into account how the position/motion changes during the capture duration, and may not maintain the intent of the user capturing the image. For example, while the video generated from the filtered yaw trajectory 516 may not include an abrupt change in visual direction within the video, the video may still contain a lens that is changing the viewing direction in a discontinuous motion to the right, then to the left, forward, and then to the right (e.g., rotating the right by an angle, holding the position for a period of time, then rotating the right, then rotating back a little to the left, then rotating the right again).

The smooth trajectory component 104 may be configured to determine a smooth trajectory of the image capture device/housing of the image capture device based on one or more subsequent portions of the trajectory and/or other information. The determination of a smoothed trajectory may refer to the generation of a smoothed trajectory. A smooth trajectory may refer to one or more paths and/or a series of positions (e.g., rotational positions, translational positions) that are used to determine which portions (snippets) of visual content may be used to generate the video. The smoothed trajectory may be used to determine a viewing window of the cut-out from which the visual content is generated. The placement of the viewing window within the visual content (e.g., orientation of the viewing window relative to the field of view of the visual content, shape of the viewing window, size of the viewing window) may be determined based on the smoothed trajectory.

The smooth trajectory may reflect the actual position and/or the virtual position of the image capture device/housing of the image capture device at different times within the capture duration. The actual position may refer to a position assumed by the image capturing device/housing of the image capturing device. The virtual position may refer to a position not assumed by the image capturing device/housing of the image capturing device. The virtual position may be offset (rotationally and/or translationally) from the actual position of the image capturing device/housing of the image capturing device.

The smooth trajectory may have a smoother change of the position (rotational position, translational position) of the housing of the image capturing device/image capturing device compared to the (observed) trajectory of the housing of the image capturing device/image capturing device. That is, the smooth trajectory may have less jitter (slight irregular movement/change), less abrupt changes, and/or less discontinuous changes in the position (rotational position, translational position) of the image capture device/image capture device housing than the (observed) trajectory of the image capture device/image capture device housing. Smoothing the trajectory may include smoothing the change in position of the trajectory with the image capture device/image capture device housing more smoothly than the trajectory may include: the high-frequency variations in the position (rotational position, translational position) of the image capturing device/housing of the image capturing device in the trajectory are removed from the smoothed trajectory. That is, the smooth trajectory may not include and/or may have less high frequency variations of the rotational position and/or the translational position of the housing than the (observed) trajectory.

Determining a smoothed trajectory based on subsequent portions of the (observed) trajectory (smoothed trajectory generation) may include determining a portion of the smoothed trajectory corresponding to a given time instance within the capture duration based on one or more portions of the (observed) trajectory corresponding to one or more subsequent time instances (future time instance(s) of the given time instance). That is, the smooth trajectory component 104 may "look ahead" (look ahead from a given time) at a time to determine a portion of the captured trajectory. Looking forward may comprise using one or more subsequent portions of the (observed) trajectory to determine a previous portion of the smoothed trajectory. The generation of such a smooth trajectory may be referred to as look-ahead trajectory generation. Subsequent time instants within the duration may or may not be adjacent to the given time instant. Using subsequent portion(s) of the trajectory(s) observed may enable the smooth trajectory component 104 to determine a smooth trajectory that maintains the user's intended motion with respect to the image capture device. The intentional motion of the user may refer to a motion of the image capture device that the user plans/intends to perform.

To determine a smoothed trajectory based on looking forward at the (observed) trajectory, a previous portion of the smoothed trajectory may be determined using a subsequent portion of the (observed) trajectory, such that a portion of the smoothed trajectory corresponding to the portion of the (observed) trajectory (corresponding to a time instant within the capture duration) is determined based on a subsequent portion of the (observed) trajectory (corresponding to a subsequent time instant within the capture duration).

The "future" position/motion of the image capture device may be analyzed (looking forward) to determine whether the particular position (s)/motion(s) of the image capture device at a certain moment in time is an intentional motion or an unintentional motion (e.g., shaking due to vibrations, rotation due to collision/mishandling of the image capture device). For example, when determining a smooth trajectory for a certain time instant within the capture duration (e.g., corresponding to the 1000 th video frame), the position (s)/motion(s) of the image capture device for a duration after the time instant (e.g., corresponding to the next 30 video frames, corresponding to the next second captured) may be analyzed to determine whether the position/motion of the image capture device at the time instant is intentional or unintentional.

For example, to determine a smooth trajectory based on a subsequent portion of the (observed) trajectory, the rotational position of the image capture device/housing at a given moment in time within the capture duration may be obtained. The given moment in time may be a point in time within the capture duration and may correspond to a video frame of the visual content captured at the point in time within the capture duration. The rotational position of the image capture device/housing at a later time within the capture duration may be obtained. The later time may be a time duration after the time point within the acquisition time duration. The respective rotational position of the image capture device/housing on the smooth trajectory at the given time may be determined based on the rotational position of the image capture device/housing at the point in time, the rotational position of the image capture device/housing during a duration of time after the point in time, and/or other information. The placement of the viewing window for the visual content relative to the field of view of the visual content captured at the given moment may be determined based on the respective rotational position of the image capture device/housing on the smooth trajectory at the given moment and/or other information.

In some implementations, determining a smooth trajectory based on the subsequent portion of the (observed) trajectory may further include checking whether the determined smooth trajectory results in the respective viewing window for the cutout of the visual content remaining within or beyond the field of view of the visual content (whether the placement of the viewing window satisfies or violates the margin constraint). For example, it may be determined whether placement of a viewing window for visual content at the given time results in one or more portions of the viewing window exceeding the field of view of the visual content. In response to determining that the portion(s) of the viewing window for the visual content exceeds the field of view of the visual content at the given time, the respective rotational position of the image capture device/housing within the smooth trajectory at the given time may be adjusted. That is, the smooth trajectory may be adjusted such that the viewing window for the visual content does not exceed the field of view of the visual content at the given time (the placement of the viewing window satisfies the margin constraint).

In some implementations, the respective position (e.g., rotational position, translational position) of the image capture device/housing within the smooth trajectory at a given time may be initially determined based on a combination of the position of the image capture device/housing at the given time and an estimate of the respective position of the image capture device/housing, and/or other information. This initial determination of the location within the smoothed trajectory may be referred to as bootstrapping.

For example, an optimal estimate of the rotational position of the image capture device/housing within a smooth trajectory at a given moment may be determined based on a minimization of a combination of the rotational velocity of the image capture device/housing and the rotational acceleration of the image capture device/housing and/or other information. An optimal estimate of the rotational position that minimizes motion (a combination of velocity and acceleration) may result in the placement of the viewing window of the cutout for the visual content not satisfying the margin constraint. Thus, the initial rotational position within the smooth trajectory at a given instant may be determined by combining the rotational position of the image capturing device/housing at a given instant with the best estimate of the rotational position within the smooth trajectory. The combination of the rotational position and the optical estimation of the rotational position may comprise an average of the two rotational positions. The average of the two rotational positions may weight the two rotational positions equally or differently.

In some implementations, the smooth trajectory component 104 may further determine a smooth trajectory of the image capture device/housing of the image capture device based on one or more previous portions of the (observed) trajectory. Past location/motion information of the image capture device may provide context for intentional motion. In determining the smooth trajectory, the past location/motion of the image capture may be weighted less than the future location/motion of the image capture device. The influence of the previous portion(s) of the (observed) trajectory on the determination of the smoothed trajectory may be smaller than the influence of the future portion(s) of the (observed) trajectory on the determination of the smoothed trajectory. The amount of the previous portion(s) of the (observed) trajectory used to determine the smoothed trajectory may be less than the amount of the future portion(s) of the (observed) trajectory. For example, when determining a smooth trajectory for a certain time instant within the capture duration (e.g., corresponding to the 1000 th video frame), the amount of duration after the time instant used (e.g., corresponding to the next 30 video frames, corresponding to the next second of capture) may be greater than the amount of duration before the time instant (e.g., corresponding to the previous 15 video frames, corresponding to the previous 0.5 seconds of capture).

FIG. 5C illustrates an exemplary smoothed trajectory determined by the smoothed trajectory component 104. The smoothed trajectory may include a smoothed yaw trajectory 532. The smoothed yaw trajectory 532 may reflect the yaw angular position (e.g., a rotational position defined to the left or right relative to the yaw axis) of the image capture device/image capture device housing to be used to determine which portions (cutouts) of the visual content may be used to generate the video. For example, the smoothed yaw trajectory 532 may include zero rotation about the yaw axis (forward direction) for the duration 522, 524, 526, and then a smooth rotation to the right for the duration 528, 530. Other types of smooth trajectories (e.g., smooth pitch trajectory, smooth roll trajectory, smooth pan trajectory) are contemplated.

The smoothed yaw trajectory 532 may be determined such that a portion of the smoothed yaw trajectory 532 corresponding to the portion of the observed yaw trajectory 400 is determined based on a subsequent portion of the observed yaw trajectory 400. For example, the portion(s) of the smoothed yaw trajectory 532 within the one or more portions of duration a522 may be determined based on the portion(s) of the observed yaw trajectory 400 within duration B524 and/or duration C526 (looking forward from duration B524 and/or duration C526). The portion(s) of the observed yaw trajectory 400 over duration B524 and/or duration C526 may be used to determine in which direction and/or by how much the smoothed yaw trajectory 532 may differ from the observed yaw trajectory 400 over the portion(s) of duration a 522. The smoothed yaw trajectory 532 may be determined based on subsequent portion(s) of the observed yaw trajectory 400 such that the smoothed yaw trajectory 532 preserves intentional motion of the image capture device by the user. For example, based on the subsequent portion(s) of the observed yaw trajectory 400 (looking forward), the smooth trajectory component 104 may determine that the rotation of the image capture device to the right and left during the durations 522, 524 is an unintentional motion (e.g., the image capture device inadvertently rotates to the right, then the user excessively rotates to the left to correct the rotation), and may determine that the smooth yaw trajectory 532 is to be directed forward during the durations 522, 524. Based on the subsequent portion(s) of the observed yaw trajectory 400 (looking forward), the smooth trajectory component 104 may determine that the staggered rotation of the image capture device to the right during the duration 528, 530 includes unintentional motion (non-continuous right rotation), and may determine that the smooth yaw trajectory 532 includes continuous right rotation during the duration 528, 530. Other determinations of smooth trajectories are contemplated.

In some embodiments, the smooth trajectory may be determined based on a rotational speed of the image capture device/image capture device housing and a minimization of rotational acceleration of the image capture device/image capture device housing while respecting a set of constraints. For example, a smooth trajectory may be determined by generating a smooth path that obeys the set of constraints, rather than by modifying the (observed) trajectory. For example, a smooth path defining the yaw, pitch, and/or roll angular positions may be generated by finding a path of the image capture device/image capture device housing that minimizes a combination of the time derivative, the second order time derivative, and/or other time derivative(s) of the yaw, pitch, and/or roll angular positions while respecting the set of constraints:

in some implementations, one or more portions of the minimization calculation may be changed. For example, one or more portions of the minimization calculation (e.g., the first time derivative) may be changed to have a greater or lesser effect than other portion(s) of the minimization calculation (e.g., the second time derivative), and/or other factors may be introduced into the calculation.

In some implementations, information about the high frequency (jitter) of image capture can be used to improve the visual characteristics of the generated video content/stabilized visual content. Certain portions of the high frequencies in the input may be maintained based on image capture configurations, such as exposure start times and exposure durations, position information (e.g., position sensor readings), and/or other information. For example, motion of the image capture device/image sensor during frame exposure may be analyzed and used to generate/modify a capture trajectory that minimizes inter-frame motion (e.g., smoothes inter-frame motion) while preserving intra-frame motion that may contain high frequencies therein. This may provide improved visual characteristics of the generated video content, for example by compensating for motion blur and/or low light image capture conditions.

For example, image capture may not occur immediately. Instead, an image sensor pixel site (site) may take some time to collect light. This may result in the image sensor motion being divided in time into two phases: inter-frame motion, which may not be captured and may be suppressed, and intra-frame motion, which may be "wrapped" in the image and may not be removed. When considering intra motion, better visual characteristics (e.g., impression) may be provided such that the smooth trajectory is smooth for inter phases and corresponds to the original motion for intra phases. That is, the smooth trajectory may move/follow in the same direction and at the same speed as during the frame exposure phase, so that its motion coincides with motion blur in the image.

In some implementations, the degree to which the smooth trajectory of the image capture device/housing deviates from the (observed) trajectory of the image capture device/housing may depend on the amount of rotational and/or translational motion experienced by the image capture device/housing during the capture duration, the exposure time to capture visual content, and/or other information. That is, the amount of smoothing performed may take into account the amount of rotational and/or translational motion that the image capture device undergoes during the capture of the visual content and/or the exposure time that the image capture device uses to capture the visual content. For example, visual content captured during a large amount of motion of an image capture device may result in motion blur within the visual content, while smoothing a trajectory may result in motion blur becoming more apparent (e.g., motion blur becoming larger, longer) within stable visual content.

The amount of smoothing/stabilization performed/allowed may depend on the motion of the image capture device, the exposure time of the image capture device, and/or other information. Fig. 7 shows a plot 700 illustrating different smoothing values (e.g., values characterizing the frequency of motion) as a function of the motion of the image capture device and the exposure time for the image capture device to capture visual content. The smoothing values (S1, S2, S3, S4) may refer to the amount of smoothing performed to generate a smoothed trajectory. The smoothing value may affect, characterize, and/or set the power/intensity of the smoothing/stabilization performed. For example, the smoothed value may range between 0 and 1, where a value of 0 corresponds to non-smoothness (smoothing failure) and a value of 1 corresponds to conventional smoothness (smoothing as described herein). A value between 0 and 1 may correspond to a reduced smoothness. For example, a value of 0.5 may correspond to reducing the smoothing effect by half (e.g., smoothing operating at 50%).

Using different smoothing values may use different smoothing capabilities (stabilization strength) for different capture situations. For example, the smoothing value S1 may correspond to an amount of smoothing to be used when the exposure time is short (shorter than T1, short exposure time threshold) and the motion of the image capture device is high (e.g., higher than and/or equal to a high motion threshold). The smoothing value S2 may correspond to an amount of smoothing to be used when the exposure time is long (longer than T2, long exposure time threshold) and the motion of the image capture device is high. The smoothing value S3 may correspond to an amount of smoothing to be used when the exposure time is short (shorter than T1) and the motion of the image capture device is low (e.g., below and/or equal to a low motion threshold). The smoothing value S4 may correspond to an amount of smoothing to be used when the exposure time is long (longer than T2) and the motion of the image capture device is low. Thus, the exposure time and the motion of the image capture device may determine the degree to which visual content captured by the image capture device is to be stabilized. In some implementations, the exposure time of the image capture device can be automatically adjusted based on lighting conditions, motion of the image capture device, and/or other information. For example, longer exposure times may be used in low light conditions, while shorter exposure times may be used in high light conditions. For example, referring to fig. 7, an exposure time less than T1 may be used in high light conditions, an exposure time between T1 and T2 may be used in medium light conditions, and an exposure time greater than T2 may be used in low light conditions. Higher exposure times may result in a greater amount of motion blur within the captured visual content.

One or more of the smoothed values (S1, S2, S3, S4) may be adjustable. For example, the value of the smoothing value may be set as follows: s1-1, S2-1, S3-1, and S4-1. Such smooth value settings may result in complete stabilization of the captured visual content regardless of the motion and/or exposure time of the image capture device. As another example, the value of the smoothing value may be set as follows: s1-1, S2-0, S3-1, and S4-0. Setting of such a smoothing value may result in complete stabilization of the captured visual content when the exposure time is shorter than T1, and in failure of stabilization of the captured visual content when the exposure time is longer than T2. The smoothing value may vary linearly between 0 and 1 as the exposure time increases from T1 to T2. As yet another example, the value of the smoothing value may be set as follows: s1-0.6, S2-0, S3-1, and S4-0.5. Setting of such a smoothing value may result in complete stabilization of the captured visual content when the exposure time is shorter than T1 and the motion of the image capturing device is low, and may result in failure of stabilization of the captured visual content when the exposure time is longer than T2 and the motion of the image capturing device is high. When the exposure time is shorter than T1 and the motion of the image capturing device is high, the stabilization may operate at 60%, and when the exposure time is longer than T2 and the motion of the image capturing device is low, the stabilization may operate at 50%. Other values of the smoothed value are contemplated. The smoothed value when the motion of the image capture device is between the high motion threshold and the low motion threshold and the smoothed value when the exposure time is between the short exposure time threshold and the long exposure time threshold may be determined as an interpolation of S1, S2, S3, and/or S4.

In some implementations, a set of constraints (including one or more constraints) can be applied in generating the smooth trajectory. The set of constraints for generating a smooth trajectory may include one or more constraints that provide limitations/definitions/rules on how to generate a smooth path/smooth trajectory. For example, the set of constraints may include margin constraints that provide a limit (s)/rule(s) on how far from the (observed) trajectory may be to generate a smooth path/smooth trajectory. The margin constraint may be determined based on a size of a viewing window used to generate the truncated portion of the visual content and a difference between fields of view of the visual content and/or other information. The field of view of the visual content may refer to the field of view of a scene captured within the visual content. That is, the field of view of the visual content may refer to the spatial extent/angle of a scene captured within the visual content. The size of the viewing window may refer to the field of view of a cutout used to generate the video based on the visual content. That is, the video may be generated based on the visual content of the images within the viewing window. The viewing window may be defined in terms of shape and/or size.

For example, fig. 6A shows an exemplary orientation of viewing window 604 relative to image a 600. Image a 600 may have a field of view 602. The viewing window 604 may have a truncated field of view 606. Image a 600 may include a captured scene within an angle defined by field of view 602. The viewing window 604 may provide a cutout of the image a 600 to be used for video generation. The truncated field of view 606 of the viewing window 604 may be smaller than the field of view 602. The amount and/or direction by which the smoothed path/smoothed trajectory may deviate from the (observed) trajectory may depend on the difference between the field of view 602 and the truncated field of view 606. The difference (e.g., 10%) between the field of view 602 and the truncated field of view 606 may define an edge distance 608 within which the viewing window 604 may move relative to the image a 600/field of view 602. The margin 608 may be a stable margin that specifies how much the viewing window may move while remaining within the field of view of the visual content.

For example, referring to fig. 6B, the viewing window 614 may be rotated relative to the field of view 612 of image B610 without exceeding the pixels captured within image B610. The viewing window 624 may move laterally relative to the field of view 622 of image C620 without exceeding the pixels captured within image C620.

Referring back to fig. 6A, a larger difference between the field of view 602 and the viewing window 604/truncated field of view 606 may enable a larger movement of the viewing window 604 relative to the field of view 602 of the image a 600, while a smaller difference between the field of view 602 and the viewing window 604/truncated field of view 606 may enable a smaller movement of the viewing window 604 relative to the field of view 602 of the image a 600. However, a larger margin 608 may result in a waste of pixel space and computational resources (e.g., processor power and/or battery consumption to capture images having a larger optical field of view than is required to generate video).

In some implementations, the set of constraints may include trajectory constraints that provide the limit (s)/rule(s) on how the smooth path/smooth trajectory may be generated based on subsequent portions of the (observed) trajectory. Trajectory constraints may be determined based on subsequent portions of the (observed) trajectory and/or other information. That is, the trajectory constraints may include one or more constraints relating to the shape of the (observed) trajectory "future". Trajectory constraints may preserve intentional motion of the image capture device in the generated path.

In some embodiments, the set of constraints may include a target constraint that provides a limit (s)/rule(s) on how a smooth path/smooth trajectory may be generated based on a target within the image. A target may refer to a person, object, and/or thing that may be selected for inclusion in a video. For example, an image captured by an image capture device may include one or more views of a person (e.g., a person of interest), and a user may wish to create a video that includes the person. The target constraints may include one or more constraints relating to the position of the target within the image such that the image is stable around the position of the target within the image. That is, the target constraints may affect the generation of a smooth path/smooth trajectory such that the target is located within one or more intercepts of the visual content. Other constraints may be envisaged.

In some embodiments, a set of parameters (including one or more parameters) may control the generation of the smoothed trajectory. The set of parameters for generating the smoothed trajectory may include one or more parameters that influence and/or guide how the smoothed path/smoothed trajectory is generated. For example, the set of parameters may include a weight balance parameter, a low light high pass parameter, a viscosity parameter, and/or other parameters.

The weight balance parameter may refer to a parameter that controls the type of motion that is minimized in the generation of the smooth trajectory. For example, the types of motions that may be minimized to generate a smooth trajectory/smooth path may include rotational velocity, rotational acceleration, translational velocity, translational acceleration, and/or other motions. For example, the weight balance parameter (value of the weight balance parameter) may control to what extent the combination of rotational speed and rotational acceleration is minimized when determining a smooth trajectory. The weight balance parameter may range between 0 and 1, where 0 corresponds to a minimization of angular velocity and 1 corresponds to a minimization of angular acceleration. A value between 0 and 1 may correspond to a minimization of both angular velocity and angular acceleration, as well as a minimization of different types of motion weighted by the value. Other values of the weight balancing parameter are contemplated.

Mere minimization of angular velocity may result in a smooth trajectory that tends to maintain a stable position when possible, and may result in piecewise linear sharp corners appearing in the smooth trajectory. Mere minimization of angular velocity may minimize apparent motion between video frames, but the resulting motion may appear unnatural. Mere minimization of angular acceleration may result in smooth trajectories with no/a reduced number of corner points. Minimization of only angular acceleration may result in constant velocity and may result in more motion than minimization of only angular velocity. Minimization of the combination of angular velocity and angular acceleration may provide intermediate generation of a smooth trajectory (with different minimization weighted according to the value of the weight balance parameter).

The low light high pass parameter may refer to a parameter that controls the amount/intensity of smoothing performed to generate a smooth trajectory. For trajectory smoothing/visual content stabilization, the motion of the image capture device may be classified as (1) inter-frame motion and (2) intra-frame motion. Inter-frame motion may refer to motion between frame exposure time periods. Inter-frame motion may be attenuated based on edge distance constraints. Intra-frame motion may refer to motion of the image capture device during the exposure period of each video frame. Motion may be embedded in the video frames by motion blur and the attenuation of motion within the frames may cause the motion blur to pulsate, thereby creating a poor visual impression. Such visual defects can be reduced by having a smooth track follow the (observed) track during the exposure period.

The low light high pass parameter may range between 0 and 1, where 0 corresponds to no motion blur compensation and 1 corresponds to full motion blur compensation. A low light high pass parameter of 0 may result in the observed motion being completely attenuated and not taking into account the motion of the image capturing device during the exposure time period. A low light high pass parameter of 1 may result in the smooth track being the same as the (observed) track during the exposure period. That is, the output motion may include high frequencies that mask motion blur within the visual content. Values between 0 and 1 may change the amount/intensity of smoothing performed to generate a smooth trajectory. Other values of the low light high pass parameter are contemplated.

The stickiness parameter may refer to a parameter that controls the extent to which the previous portion(s) of the (observed) trajectory affect the determination of a smooth trajectory. The viscosity parameter may range between 0 and 1. A viscosity parameter of 1 may result in a "lazy" stabilizer that will maximize the impact of previous motions/positions unless future motions/positions indicate that lazily determined smooth trajectories will result in violating the margin constraint. A viscosity parameter of 0 may result in the generation of a smooth trajectory without taking into account the previous portion(s) of the (observed) trajectory. A value between 0 and 1 may change the extent to which the previous portion(s) of the (observed) trajectory influence the determination of the smoothed trajectory. The value of the viscosity parameter may affect bootstrapping-the value of the viscosity parameter may determine how to weight the image capture device/housing's position and the best estimate of the position to form an initial estimate of a smooth trajectory. Other values of the viscosity parameter are contemplated.

The viewing window component 106 may be configured to determine one or more viewing windows for visual content based on a smooth trajectory of the image capture device/housing and/or other information. The placement of the viewing window of the visual content relative to the field of view of the visual content captured at a time may be determined based on the respective position (rotational position, translational position) of the image capture device/housing within the smooth trajectory at that time and/or other information.

The viewing window may define one or more ranges of visual content. The viewing window may define the range(s) of visual content to be included within the cut-out of visual content according to the progression through the length of progression of the visual content. The length of progress of the visual content may be the same as or determined based on the duration of capture of the visual content. The viewing window may define the range(s) of visual content included in the video generated from the visual content. For example, the viewing window may define which spatial portions of the visual content captured by the image capture device during the capture duration are presented on the display(s) and/or included within the stable visual content.

The viewing window may be characterized by a viewing direction, a viewing size (e.g., viewing zoom, viewing magnification), viewing rotation, and/or other information. The viewing direction may define a viewing direction of the visual content. The viewing direction may define an angle/visual portion of the visual content at which the viewing window may be directed. The viewing direction may be defined based on rotation about an axis defining a lateral movement (e.g., yaw) of the viewing window. The viewing direction may be defined based on a rotation about an axis defining a vertical movement (e.g., a pitch) of the viewing window. The yaw and pitch values of the viewing direction may determine the position of the viewing window within the captured image/video frame.

The viewing size may define the size of the viewing window. The viewing size may define the size (e.g., size, magnification, viewing angle) of the viewable range of the visual content. The viewing size may define the dimensions of the viewing window. In some implementations, the viewing size can define different shapes of the viewing window/viewable range. For example, the viewing window may be shaped as a rectangle, triangle, circle, and/or other shape. The viewing rotation may define a rotation of the viewing window. The viewing rotation may define one or more rotations of the viewing window about one or more axes. For example, a viewing rotation may be defined based on a rotation (e.g., scrolling) about an axis corresponding to the viewing direction.

The viewing window(s) may be determined from the progression through the capture duration. That is, one or more of a viewing direction, a viewing rotation, and/or a viewing size may be determined for different portions of the capture duration. For example, different placements of the viewing window may be determined for different portions of the capture duration (determined based on viewing direction, viewing rotation, and/or viewing size). The viewing window(s) may be determined for different points in time and/or different durations within the capture duration. The viewing window(s) may be determined for different images/video frames and/or different groups of images/video frames captured during the capture duration.

The viewing window may be used to provide a cut-out of the visual content. A cut of visual content may refer to an output of one or more portions of the visual content for presentation (e.g., based on a current presentation, a future presentation of a video generated using the cut). The truncated portion of the visual content may refer to a range of the visual content obtained for viewing and/or extraction. The range of visual content viewable/extractable within the viewing window may be used to provide a view of different spatial portions of the visual content.

For example, the visual content may include a field of view, and the truncated portion of the visual content may include the entire range of the visual content (visual content within the entire field of view) or one or more ranges of the visual content (visual content within one or more portions of the field of view). The viewing window may define a range of visual content to be included within the cut-out of the visual content according to a progression through the length of progression/capture duration of the visual content. The viewing window may correspond to the entire duration of the progress length/capture duration, or to one or more portions of the progress length/capture duration (e.g., the portion including the moment of interest). The snippets of the visual content may be presented on one or more displays, included in one or more videos, and/or otherwise used to present the visual content.

Determining a viewing window for the visual content based on the smoothed trajectory may include determining one or more of a viewing direction, a viewing rotation, and/or a viewing size of the viewing window based on the smoothed trajectory. That is, determining a viewing window for the visual content may include determining an orientation of the viewing window relative to a field of view of the visual content based on the smoothed trajectory. The placement (viewing direction, viewing rotation and/or viewing size) of the viewing window at different times may be determined based on the respective positions/values of the smoothed trajectories. The orientation of the viewing window relative to the field of view of the visual content may determine which portions (snippets) of the visual content may be used to generate the video. That is, the viewing window component 106 can determine how the cutout for the visual content can be oriented relative to the field of view of the visual content.

The viewing window component 106 can be configured to determine how the viewing window should be oriented relative to the field of view of the visual content (images, video frames) based on the smooth trajectory. The viewing window component 106 can determine, based on the smoothed trajectory, how a cutout for the visual content can be positioned laterally and/or vertically with respect to a field of view of the visual content. The viewing window component 106 can determine how the cutout for the visual content can be rotated relative to the field of view of the visual content based on the smooth trajectory.

The viewing window may be oriented relative to a field of view of the visual content to provide a cutout of the visual content that is stable relative to a previous and/or next cutout of the visual content. For example, the viewing window may be oriented relative to a field of view of an image/video frame captured by the image capture device to provide a cutout of the image/video frame such that the presented content appears to have been captured by the stabilized/more stabilized image capture device (e.g., the cutout of the video frame is stabilized/more stabilized relative to a cutout of a previous video frame and/or a cutout of a subsequent video frame).

For example, referring to fig. 6B, the viewing window component 106 may determine the orientation of the viewing window 614 relative to the field of view 612 of image B610 based on a smooth trajectory corresponding to a time instance of image B610. Viewing window 614 may be oriented relative to field of view 612 to provide a cutout of image B610 that is stable relative to a previous and/or next cutout of the image (e.g., stable relative to a cutout of image a 600 using viewing window 604 of image a 600 as shown in fig. 6A). The viewing window component 106 can determine the orientation of the viewing window 624 relative to the field of view 622 of image C620 based on a smooth trajectory corresponding to a time instance of image C620. Viewing window 624 may be oriented relative to field of view 622 to provide a cutout of image C620 that is stable relative to a previous and/or next cutout of the image (e.g., stable relative to a cutout of image B610 using viewing window 614 of image B610).

The generation component 108 can be configured to generate stable visual content for one or more videos based on the viewing window(s) and/or other information. The stabilized visual content may comprise an interception of the range(s) of visual content within the viewing window(s). The stabilized visual content may be generated as a video frame that includes the range(s) of visual content within the viewing window(s). The stabilized visual content may be generated as an output of the visual portion captured by the image capture device, where the output portion includes the range(s) of the visual content within the viewing window(s). The range(s) of visual content included in the video within the viewing window may be stabilized by selective cropping. The portion of the visual content that is cropped for stabilization may depend on the smooth trajectory and/or other information. The portion of the visual content that is cropped for stabilization may depend on one or more of a weight balance parameter, a low light high pass parameter, a stickiness parameter, and/or other parameters.

Video content may refer to media content that may be used as one or more video/video clips. The video content may include one or more videos/video clips, and/or other video content stored in one or more formats/containers. The format may refer to one or more ways of arranging/laying out information defining the video content (e.g., a file format). A container may refer to one or more ways (e.g., a compressed packet format) in which information defining video content is arranged/laid out in association with other information. The video content may define visual content viewable according to a progression through a progression length of the video content. The video content may include video frames defining visual content. That is, the visual content of the video content may be included within video frames of the video content.

The video frames of the video content may be determined based on the visual content of the visual content within the viewing window and/or other information. Video frames of the video content may be determined based on the truncation of the visual content from the smooth trajectory, viewing window, and/or other information. For example, referring to fig. 6A-6B, video frames of video content may be determined based on the visual content of image a 600 within viewing window 604, the visual content of image B610 within viewing window 614, the visual content of image C620 within viewing window 624, and/or other information. Such determination of the range of visual content included in the video content may enable stabilization of the video content.

In some implementations, a video frame of video content can be determined based on a distortion of an image (e.g., one or more portions of visual content of the image). The distortion of the image may provide different perspectives of the content captured within the image, the different perspectives corresponding to how the content would be viewed if the image had been captured from the image capture device on the capture track.

In some implementations, the visual content (e.g., images, video frames) and/or one or more portions of the visual content used to generate the video content can be stored in one or more buffers (e.g., 1s buffers, circular buffers). The buffer(s) may be used to store visual content/portions of the visual content that may be used to generate stable visual content (e.g., distorted portions included in the cut). The buffer(s) may be used to store other information, such as visual information, location information, and/or other information for look-ahead and/or trajectory generation. For example, the buffer(s) may be used to store image/video frames that are being trackgenerated using look-ahead. After generating the respective portions of the smoothed trajectory, the relevant portions of the images in the buffer(s) (the visual content of the images within the viewing window) may be used to generate video content.

The video content generated by the generation component 108 can be defined by video information. The video information defining the video content may define an encoded version/copy of the video content and/or instructions for rendering the video content. For example, the video information may define an encoded version/copy of the video content/video frame, and the video information (e.g., a video file) may be opened in a video player to present the video content. The video information may define instructions to render the video content for presentation. For example, the video information may define a guide track that includes information about which visual portions of the visual content (images, video frames) should be included within the presentation of the video content. The guide track may include information about the position, size, shape and/or rotation of the cut-out of the image/video frame that will be used to provide a stable view of the image/video frame as it progresses through the video content. When video content is opened and/or to be presented, the video player may use the guide track to retrieve the relevant visual portion of the image/video frame.

The generation component 108 can be configured to enable storage of video information and/or other information in one or more storage media. For example, the video information may be stored in electronic storage 13, a remote storage location (storage medium located at/accessible through a server), and/or other location. In some implementations, the generation component 108 can enable storage of the video information through one or more intermediary devices. For example, the processor 11 may be located within a computing device that is not connected to the storage device (e.g., the computing device lacks a WiFi/cellular connection to the storage device). The generation component 108 can enable storage of video information through another device with the necessary connections (e.g., a computing device that uses a WiFi/cellular connection of a paired mobile device (e.g., smartphone, tablet, laptop) to store the information in one or more storage media). Other storage locations for video information and other storage of video information are contemplated.

In some implementations, the processor 11 may represent multiple processors, and at least one of the processors may be a remote processor located remotely from the housing of the image capture device (e.g., image capture device 302). One or more functions of the components 102, 104, 106, 108 may be performed by the image capture device 302 (e.g., by the processor 310) and/or by a remote processor. For example, one or more of trajectory determination (a function of trajectory component 102), smooth trajectory determination (a function of smooth trajectory component 104), viewing window determination (a function of viewing window component 106), and/or generation of stable visual content (a function of generation component 108) may be performed by the remote processor after the visual content is captured by the image capture device.

The image capture device may capture location information and visual content, but may not be able to stabilize the visual content in real time. For example, the image capture device may not have sufficient resources to apply the stabilization techniques described herein in real-time and/or may use its resources for other tasks. Once sufficient resources are available, stabilization of the visual content may be performed by the image capture device. The stabilization of the visual content may be performed by the remote processor. For example, the remote processor may be one or more processors of a remote computing device (e.g., mobile device, desktop, server), and the remote processor may receive visual information and location information captured/generated by the image capture device. The remote computing device (software running on the remote computing device) may apply the stabilization techniques described herein after the image capture device captures the visual content. Post-capture stabilization of visual content may be performed by the remote processor in response to receiving visual information and location information, in response to a user/system command to stabilize the visual content, in response to the visual content being turned on for playback, and/or in response to other conditions.

Although the description herein may be directed to images and video, one or more other embodiments of the systems/methods described herein may be configured for other types of media content. Other types of media content may include one or more of audio content (e.g., music, podcasts, audiobooks, and/or other audio content), multimedia presentations, images, slideshows, visual content (e.g., one or more images and/or videos), and/or other media content.

The invention may be implemented in hardware, firmware, software, or any suitable combination thereof. Aspects of the invention may be implemented as instructions stored on a machine-readable medium, which may be read and executed by one or more processors. A machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a tangible (non-transitory) machine-readable storage medium may include read-only memory, random-access memory, magnetic disk storage media, optical storage media, flash-memory devices, etc., and a machine-readable transmission medium may include forms such as propagated signals on a carrier wave, infrared signals, digital signals, etc. Firmware, software, routines, or instructions may be described herein in terms of performing particular actions and according to particular exemplary aspects and embodiments of the present invention.

In some embodiments, some or all of the functionality attributed herein to system 10 may be provided by an external resource not included in system 10. External resources may include hosts/sources of information, calculations, and/or processing, and/or other providers of information, calculations, and/or processing external to system 10.

Although processor 11 and electronic storage 13 are shown in fig. 1 as being connected to interface 12, any communication medium may be used to facilitate interaction between any of the components of system 10. One or more components of system 10 may communicate with each other via hardwired communication, wireless communication, or both hardwired communication and wireless communication. For example, one or more components of system 10 may communicate with each other over a network. For example, the processor 11 may communicate wirelessly with the electronic memory 13. By way of non-limiting example, the wireless communication may include one or more of radio communication, bluetooth communication, Wi-Fi communication, cellular communication, infrared communication, Li-Fi communication, or other wireless communication. Other types of communication are contemplated by the present invention.

Although the processor 11 is shown as a single entity in fig. 1, this is for illustration purposes only. In some embodiments, processor 11 may include multiple processing units. These processing units may be physically located within the same device, or processor 11 may represent processing functionality of a plurality of devices operating in coordination. The processor 11 may be configured to run one or more components by: software; hardware; firmware; some combination of software, hardware, and/or firmware; and/or other mechanisms for configuring processing capabilities on processor 11.

It should be appreciated that although computer components are shown in fig. 1 as being co-located within a single processing unit, in embodiments in which processor 11 includes multiple processing units, one or more computer program components may be located remotely from other computer program components. Although described as performing, or configured to perform, operations, the computer program component may include instructions that may program the processor 11 and/or the system 10 to perform the operations.

Although the computer program components are described herein as being implemented by machine-readable instructions 100 via processor 11, this is for ease of reference only and is not intended to be limiting. In some embodiments, one or more functions of the computer program components described herein may be implemented via hardware (e.g., a special purpose chip, a field programmable gate array) rather than software. One or more functions of the computer program components described herein may be software-implemented, hardware-implemented, or both.

The description of the functionality provided by the different computer program components described herein is for illustrative purposes, and is not intended to be limiting, as any computer program component may provide more or less functionality than is described. For example, one or more computer program components may be eliminated, and some or all of its functionality may be provided by other computer program components. As another example, processor 11 may be configured to execute one or more additional computer program components that may perform some or all of the functionality attributed below to one or more of the computer program components described herein.

The electronic storage media of electronic storage 13 may be provided integrally (i.e., substantially non-removably) with one or more components of system 10 and/or removable storage connectable to one or more components of system 10 via, for example, a port (e.g., a USB port, a Firewire port, etc.) or a drive (e.g., a disk drive, etc.). Electronic storage 13 may include one or more of optically readable storage media (e.g., optical disks, etc.), magnetically readable storage media (e.g., magnetic tape, magnetic hard drive, floppy drive, etc.), electrical charge-based memory media (e.g., EPROM, EEPROM, RAM, etc.), solid-state storage media (e.g., flash drive, etc.), and/or other electronically readable storage media. Electronic storage 13 may be a separate component within system 10, or electronic storage 13 may be provided integrally with one or more other components of system 10 (e.g., processor 11). Although the electronic memory 13 is shown in fig. 1 as a single entity, this is for illustrative purposes only. In some embodiments, electronic storage 13 may include a plurality of storage units. These storage units may be physically located within the same device, or electronic storage 13 may represent storage functionality of a plurality of devices operating in coordination.

Fig. 2 illustrates a method 200 for stabilizing video. The operations of method 200 presented below are intended to be illustrative. In some implementations, the method 200 may be implemented with one or more additional operations not described and/or without one or more of the operations discussed. In some embodiments, two or more operations may occur substantially simultaneously.

In some implementations, the method 200 can be implemented in one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, a central processing unit, a graphics processing unit, a microcontroller, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information). The one or more processing devices may include one or more devices that perform some or all of the operations of method 200 in response to instructions stored electronically on one or more electronic storage media. The one or more processing devices may include one or more devices configured via hardware, firmware, and/or software, which are specifically designed to perform one or more operations of method 200.

Referring to fig. 2 and method 200, at operation 201, a visual output signal may be generated. The visual output signal may convey visual information defining a visual content having a field of view. In some implementations, operation 201 may be performed by the same or similar components as image sensor 306 (shown in fig. 3 and described herein).

At operation 202, a position output signal may be generated. The position output signal may convey position information characterizing a rotational position of the image capture device at different times within the capture duration. In some implementations, operation 202 may be performed by the same or similar components as position sensor 308 (shown in fig. 3 and described herein).

At operation 203, a trajectory of the image capture device during a capture duration may be determined based on the location information. In some implementations, operation 203 may be performed by a processor component that is the same as or similar to the trajectory component 102 (shown in fig. 1 and described herein).

At operation 204, a smooth trajectory of the image capture device may be determined based on the subsequent portion of the trajectory. In some implementations, operation 204 may be performed by a processor component that is the same as or similar to smooth trajectory component 104 (shown in fig. 1 and described herein).

At operation 205, a viewing window for the visual content may be determined based on the smoothed trajectory. The viewing window may define one or more ranges of visual content. In some implementations, operation 205 may be performed by a processor component that is the same as or similar to viewing window component 106 (shown in fig. 1 and described herein).

At operation 206, stable visual content may be generated based on the viewing window. The stabilized visual content may comprise a truncation of the range(s) of visual content within the viewing window. In some implementations, operation 206 may be performed by a processor component that is the same as or similar to generation component 108 (shown in fig. 1 and described herein).

While the system(s) and/or method(s) of the invention has been described in detail for the purpose of illustration based on what is currently considered to be the most practical and preferred embodiments, it is to be understood that such detail is solely for that purpose and that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present invention contemplates that, to the extent possible, one or more features of any embodiment can be combined with one or more features of any other embodiment.

Claims

1. An image capture system for stabilizing video, the image capture system comprising:

a housing;

an optical element carried by the housing and configured to direct light within a field of view to an image sensor;

the image sensor carried by the housing and configured to generate a visual output signal based on light incident on the image sensor during a capture duration, the visual output signal conveying visual information defining visual content having the field of view;

a position sensor carried by the housing and configured to generate a position output signal based on a position of the housing during the capture duration, the position output signal conveying position information characterizing a rotational position of the housing at different times within the capture duration; and

one or more physical processors configured by machine-readable instructions to:

determining a trajectory of the housing during the capture duration based on the position information, the trajectory reflecting rotational positions of the housing at different times within the capture duration, the trajectory including a first portion corresponding to a first time within the capture duration and a second portion corresponding to a second time within the capture duration after the first time;

determining a portion of a smooth trajectory of the housing corresponding to a previous time within the capture duration based on a subsequent portion of the trajectory corresponding to a subsequent time within the capture duration, wherein a portion of the smooth trajectory corresponding to the first time within the capture duration is determined based on a second portion of the trajectory corresponding to a second time after the first time within the capture duration, and the smooth trajectory has a smoother change in rotational position of the housing than the trajectory;

determining a viewing window for the visual content based on the smoothed trajectory of the casing, the viewing window defining one or more ranges of the visual content; and

generating stabilized visual content of a video based on the viewing window, the stabilized visual content comprising a cut-out of the one or more ranges of the visual content within the viewing window.

2. The image capture system of claim 1, wherein the position information further characterizes a translational position of the housing at different times within the capture duration, and the trajectory further reflects the translational position of the housing at different times within the capture duration.

3. The image capture system of claim 1, wherein at least one of the one or more processors is a remote processor located remotely from the housing, and the generation of the stable visual content is performed by the remote processor after the visual content is captured.

4. The image capture system of claim 1, wherein the smooth trajectory has a smoother change in rotational position of the housing than the trajectory is characterized by the smooth trajectory having less jitter in rotational position of the housing than the trajectory.

5. The image capture system of claim 4, wherein the smooth trajectory having a smoother change in rotational position of the housing than the trajectory comprises: removing high frequency variations in the rotational position of the housing in the trajectory from the smooth trajectory.

6. The image capture system of claim 1, wherein determining the portion of the smooth trajectory of the housing corresponding to a previous time within the capture duration based on a subsequent portion of the trajectory corresponding to a subsequent time within the capture duration comprises:

obtaining a rotational position of the housing at the first time within the capture duration, the first time being a point in time and corresponding to a video frame of the visual content captured at the point in time within the capture duration;

obtaining a rotational position of the housing at the second time within the capture duration, the second time being a duration after the point in time within the capture duration; and

determining a respective rotational position of the housing within the smooth trajectory at the first time based on the rotational position of the housing at the point in time and the rotational position of the housing during the duration of time after the point in time;

wherein the placement of the viewing window for the visual content relative to the field of view of the visual content captured at the first instance in time is determined based on the respective rotational position of the housing within the smooth trajectory at the first instance in time.

7. The image capture system of claim 6, wherein determining the portion of the smooth trajectory of the housing corresponding to a previous time within the capture duration based on a subsequent portion of the trajectory corresponding to a subsequent time within the capture duration further comprises:

determining whether placement of the viewing window for the visual content at the first instance in time results in one or more portions of the viewing window exceeding a field of view of the visual content;

in response to determining that one or more portions of the viewing window for the visual content at the first time exceed a field of view of the visual content, adjusting the respective rotational position of the housing within the smooth trajectory at the first time such that the viewing window for the visual content at the first time does not exceed the field of view of the visual content.

8. The image capture system of claim 6, wherein the respective rotational position of the housing within the smooth trajectory at the first time is initially determined based on a combination of a rotational position of the housing at the first time and the estimate of the respective rotational position of the housing.

9. The image capture system of claim 8, wherein the estimate of the respective rotational position of the housing is determined based on a minimization of a combination of a rotational velocity of the housing and a rotational acceleration of the housing.

10. The image capture system of claim 1, wherein an extent to which the smooth trajectory of the housing deviates from the trajectory of the housing depends on an amount of rotational motion experienced by the housing during the capture duration and an exposure time to capture the visual content.

11. A method for stabilizing video, the method performed by an image capture system comprising an optical element, an image sensor, a position sensor, and one or more processors, the optical element carried by an image capture device and configured to direct light within a field of view to the image sensor, the method comprising:

generating, by the image sensor, a visual output signal based on light incident on the image sensor during a capture duration, the visual output signal conveying visual information defining visual content having the field of view;

generating, by the position sensor, a position output signal based on a position of the image capture device during the capture duration, the position output signal conveying position information characterizing a rotational position of the image capture device at different times within the capture duration; and

determining, by the one or more processors, a trajectory of the image capture device during the capture duration based on the location information, the trajectory reflecting a rotational position of the image capture device at different times within the capture duration, the trajectory including a first portion corresponding to a first time within the capture duration and a second portion corresponding to a second time within the capture duration after the first time;

determining, by the one or more processors, a portion of a smooth trajectory of the image capture device corresponding to a previous time within the capture duration based on a subsequent portion of the trajectory corresponding to a subsequent time within the capture duration, wherein a portion of the smooth trajectory corresponding to the first time within the capture duration is determined based on a second portion of the trajectory corresponding to a second time after the first time within the capture duration, the smooth trajectory having a smoother change in rotational position of the image capture device than the trajectory;

determining, by the one or more processors, a viewing window for the visual content based on the smoothed trajectory of the image capture device, the viewing window defining one or more ranges of the visual content; and

generating, by the one or more processors, stabilized visual content of a video based on the viewing window, the stabilized visual content comprising a truncation of the one or more ranges of the visual content within the viewing window.

12. The method of claim 11, wherein the position information further characterizes a translated position of the image capture device at different times within the capture duration, and the trajectory further reflects the translated position of the image capture device at different times within the capture duration.

13. The method of claim 11, wherein at least one of the one or more processors is a remote processor located remotely from the image capture device, and the generating of the stable visual content is performed by the remote processor after capturing the visual content.

14. The method of claim 11, wherein the smooth trajectory has a smoother change in the rotational position of the image capture device than the trajectory is characterized by the smooth trajectory having less jitter in the rotational position of the image capture device than the trajectory.

15. The method of claim 14, wherein the smooth trajectory having a smoother change in rotational position of the image capture device than the trajectory comprises: removing high frequency variations in the rotational position of the image capture device in the trajectory from the smoothed trajectory.

16. The method of claim 11, wherein determining the portion of the smoothed trajectory of the image capture device that corresponds to a previous time within the capture duration based on a subsequent portion of the trajectory that corresponds to a subsequent time within the capture duration comprises:

obtaining a rotational position of the image capture device at the first time instance within the capture duration, the first time instance being a point in time and corresponding to a video frame of the visual content captured at the point in time within the capture duration;

obtaining a rotational position of the image capture device at the second time instant within the capture duration, the second time instant being a duration after the point in time within the capture duration; and

determining a respective rotational position of the image capture device within the smooth trajectory at the first time instant based on the rotational position of the image capture device at the time instant and the rotational position of the image capture device during the duration of time after the time instant;

wherein the placement of the viewing window for the visual content relative to the field of view of the visual content captured at the first instance in time is determined based on the respective rotational position of the image capture device within the smooth trajectory at the first instance in time.

17. The method of claim 16, wherein determining the portion of the smoothed trajectory of the image capture device that corresponds to a previous time within the capture duration based on a subsequent portion of the trajectory that corresponds to a subsequent time within the capture duration further comprises:

determining whether placement of the viewing window for the visual content at the first time instance results in one or more portions of the viewing window exceeding a field of view of the visual content;

in response to determining that one or more portions of the viewing window for the visual content at the first instance in time exceed a field of view of the visual content, adjusting the respective rotational position of the image capture device within the smooth trajectory at the first instance in time such that the viewing window for the visual content at the first instance in time does not exceed the field of view of the visual content.

18. The method of claim 16, wherein the respective rotational position of the image capture device within the smooth trajectory at the first time instant is initially determined based on a combination of a rotational position of the image capture device at the first time instant and an estimate of the respective rotational position of the image capture device.

19. The method of claim 18, wherein the estimate of the respective rotational position of the image capture device is determined based on a minimization of a combination of a rotational speed of the image capture device and a rotational acceleration of the image capture device.

20. The method of claim 11, wherein the extent to which the smooth trajectory of the image capture device deviates from the trajectory of the image capture device depends on an amount of rotational motion experienced by the image capture device during the capture duration and an exposure time for capturing the visual content.