WO2017112800A1 - Macro image stabilization method, system and devices - Google Patents
Macro image stabilization method, system and devices Download PDFInfo
- Publication number
- WO2017112800A1 WO2017112800A1 PCT/US2016/068093 US2016068093W WO2017112800A1 WO 2017112800 A1 WO2017112800 A1 WO 2017112800A1 US 2016068093 W US2016068093 W US 2016068093W WO 2017112800 A1 WO2017112800 A1 WO 2017112800A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- camera
- video
- image
- focus
- attention
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/68—Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
- H04N23/681—Motion detection
- H04N23/6812—Motion detection based on additional sensors, e.g. acceleration sensors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/68—Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
- H04N23/681—Motion detection
- H04N23/6815—Motion detection by distinguishing pan or tilt from motion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/68—Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
- H04N23/682—Vibration or motion blur correction
Definitions
- the field of the invention relates to image stabilization for image capturing devices, such as cameras that are in motion, and further relates to and provides methods, systems and devices that produce motion-stabilized video streams where the camera itself is subject to conditions of agitation or movement.
- the invention further relates to synthesizing one or more new cameras from information captured by a physical camera, including synthesis of a new camera to produce scene images with different points of view than the physical camera.
- the camera when the camera is agitated due to movement, such as, for example, when an individual using the camera moves to chase something, there may be undesirable agitation (e.g., from running) that changes the camera image direction.
- a macro image stabilization system for video and in particular to maintain a look of natural imaging for a video stream that is taken with an image recording device, such as, a camera that is undergoing movement or agitation.
- the video image is recorded while the camera is shaking, and a live stream of video, recorded video, or both, is produced having a stabilized look as if the camera were stationary or maintained substantially in a stationary position, or traveling on a substantially smoother or otherwise more desirable trajectory.
- Undesired camera movement, though occurring, is minimized or eliminated in the video produced using the devices, systems and methods of the invention.
- the stabilization mechanism preferably may image video of subjects which themselves also may be in motion. Stabilized images also may be generated where the camera is moving relative to the subject or along with the subject.
- Preferred embodiments provide a wide angle viewing field, generate one or more synthetic cameras, implement stabilization, and synthesize multiple views.
- the plurality of synthesized views are manipulated to provide a stabilized video stream comprising the multiple plurality of synthetic camera streams or portions thereof.
- the synthesized views or portions thereof are assembled together while the camera continues capturing the scene to provide motion stabilized video output.
- the devices preferably are configured to generate a video stream of live video of events taking place, or that have occurred, where the video stream output is motion-stabilized video.
- the devices, system and methods include replay aspects, which may involve potentially changing the viewpoint or producing a different video product well after the fact (i.e., after the event captured has taken place).
- the method, system and devices are configured to manipulate the information ascertained by the device and its components, which include a number of sensors, to output a motion stabilized live video stream that may be transmitted from the device to another location or other hardware component (computer, device, etc.) through a communication network (e.g., cellular, Wi-Fi, satellite, or other suitable communication transmission medium).
- a communication network e.g., cellular, Wi-Fi, satellite, or other suitable communication transmission medium.
- the system, method and devices may be implemented in conjunction with a number of circumstances, including law enforcement and first responder type activities, as well as sporting events and other activities. Implementation of the invention may be made in connection with recording of sports activities, as well as by participants in a sport, such as, playing soccer, kayaking, automobile racing, cycling, running, and the like. According to some embodiments, the system, method and devices may be configured for use in connection with law enforcement activities. Some preferred embodiments may be provides devices configured as law enforcement body cameras to be worn by an individual, or cameras that are to be supported on or in a vehicle (e.g., a police vehicle, car, helicopter, cycle, or bike).
- a vehicle e.g., a police vehicle, car, helicopter, cycle, or bike.
- Embodiments of the devices and system may include transmission capabilities for transmitting video, including live streaming stabilized video of a scene, or target object being pursued.
- the devices may comprise cameras that are part of a system that includes a remote command center or component that can control the device and its functions.
- the device synthesized viewpoint may be remotely selected or designated (e.g., so that a scene or target object may be viewed as if being imaged from a different position).
- a fisheye lens is utilized in connection with an image capture mechanism (such as an image sensor for recording the image).
- the fisheye lens may be placed over a standard camera lens.
- a removably positionable fisheye lens attachment may be provided.
- a fisheye lens is provided to act as the primary lens for the image recording device or camera, and in yet other embodiments, the fisheye lens may be provided in conjunction with one or more other lenses (e.g. , a narrow or standard field lens) to yield alternate views.
- Embodiments of the device and system may further include a plurality of image capture components, such as, for example, a plurality of lenses, which are arranged to capture images from a respective plurality of directions.
- fish eye lenses are positioned over one or more conventional or standard lenses, or one or more of the lenses may be fisheye lenses, and one or more a standard lens (or a telephoto or zoom lens), or combinations of these.
- the system, method and devices generate, from a single camera, one or more synthetic cameras that provide imaging, including live streaming video, that otherwise would not be obtainable from a single conventional camera.
- Other embodiments may employ a plurality of physical cameras or camera components for a plurality of points of view, from which one or more video streams may be generated that otherwise would not be obtainable from a single conventional camera, or from the plurality of cameras.
- a plurality of synthetic cameras are generated from one physical camera, or from a plurality of physical cameras, thereby providing a plurality of video streams from a respective plurality of points of view.
- a first plurality of video streams may be provided from the first plurality of synthetic cameras generated from images from a first camera, and a second (or more) plurality of video streams may be provided from a second plurality of synthetic cameras generated from images from the second camera.
- the system, method and devices generate and stream video that is adjusted video to depict a point of view that includes a target object or object of interest. This may be done by fixing the camera focus of attention to a designated direction or point in space. Embodiments also may designate the focus of attention based on an event occurrence (which may be detected or designated).
- Embodiments generate a focus of attention to direct the video imaging (or image frames) to the desired virtual vantage point.
- the focus of attention may be generated using information from the sensors (including the image, motion and other sensors) that is processed to determine the instantaneous viewpoint for each of the one or more virtual cameras.
- the imaging information including processed imaging information, may be further manipulated to provide video (including streaming video) which may contain image information about a scene from any of the cameras or virtual cameras synthesized therefrom.
- the system, method and devices preferably include a stabilization mechanism.
- the stabilization mechanism generates and assembles the virtual image streams in connection with the designated look point or focus of attention to produce what is an expected view as if the camera were looking from the physical direction (the designated look direction), even though the camera has moved to image from a different or alternate look direction.
- Embodiments of the stabilization mechanism comprise or utilize the sensors of the device, including position and orientation information, such as, for example, data from an inertial measurement unit (IMU) (e.g., IMU data).
- IMU inertial measurement unit
- the systems and devices produce stabilized video, which may be live streaming video that may be transmitted, stored, or both.
- the stabilized video is generated through manipulation of the image and sensor information.
- the camera image sensor and other sensors of the device are associated with the device capture information.
- the captured information is associated with a timestamp.
- the image information may be manipulated by applying one or more corrections or adjustments to the image for example, to make intrinsic corrections, extrinsic corrections or both.
- the stabilization mechanism is configured to generate one, and preferably a plurality of virtual cameras from the contributing sources, such as data ascertained from the device sensors and associated components.
- the virtual cameras preferably have or are assigned a focus of attention.
- the focus of attention is a preferred look direction or target object to be tracked, and may be selected or determined from sensor information, an instruction or direction or other information source, including, for example, video content captured by a camera (e.g., its image sensor).
- the stabilization mechanism images a target scene, and according to preferred embodiments, an object in the scene, where the camera or other imaging device that is capturing the scene image is shaking or being agitated in some manner.
- a viewpoint is synthesized for one or more, and preferably for a corresponding plurality of cameras, which are synthetic cameras.
- the synthetic camera viewpoints provide the stabilization mechanism with a plurality of video streams, which are new video streams.
- Manipulation of the synthesized video stream preferably is done to provide a stabilized video.
- Preferred embodiments of the method, system and devices provide a video product that is stabilized and improved video from one or more cameras.
- Preferred embodiments provide stabilized video that is live streaming video, stored recorded video, or both, from a plurality of viewpoint synthesized virtual cameras, as well as one or more physical cameras, or combinations thereof.
- the video stream created preferably is generated from the physical camera image information, as well as the one or more virtual camera streams generated from manipulations of the physical camera data.
- the system, method and devices may be configured to permit motion that is intended motion, such as, for example, movement of the camera on a person or supporting object, where a change in direction is made and determined to be an intended movement as opposed to a movement for which motion correction is implemented.
- intended motion such as, for example, movement of the camera on a person or supporting object, where a change in direction is made and determined to be an intended movement as opposed to a movement for which motion correction is implemented.
- the invention has utility in a number of applications, and in particular where camera movement occurs and where undesirable camera movement would compromise the image being captured.
- Some particular applications for the stabilization mechanisms shown and described herein include law enforcement, security, sports, education, health and elder care industries.
- police body cameras are often used to record events as they are taking place, and although the cameras may capture a significant amount of activity, there are typically voids in the video capture, which are unusable portions. In many instances the voids or unusable portions coincide with the occurrence of the most important or interest generating events.
- the police body camera user a law enforcement officer, may be called upon to initiate a chase, on foot. The camera therefore may shake with each step and movement of the officer.
- the portion of the video during that chase time may be unusable because the image is not captured in a manner that provides suitable detail.
- the use of the video as evidentiary value such as, in a subsequent trial or hearing, or even to see whether there is an accomplice assisting the fleeing suspect, or whether the suspect disposed of an item (e.g. , such as a weapon or stolen item), may have no value at all or very limited value.
- the present invention is designed to minimize or eliminate the undesirable effects of motion due to camera agitation so usable, motion stabilized video may be captured, and streamed from the device in a stabilized form. [0017]
- the invention provides a system, method and devices for producing high resolution and highly detailed video images captured from a camera that is itself in motion.
- the present system, method and devices implement a stabilization mechanism to capture and produce images, including live, streaming video that has high resolution and high detail and which is obtained from a camera that is itself in motion.
- the stabilized video images or stream may be transmitted from the camera device for viewing at a location remote from the camera device.
- the method, system and device may be configured to remotely control a synthesized viewpoint.
- the control of a synthesized viewpoint may be implemented in connection with a remote component that communicates with the camera device, such as a command server. This may be utilized in connection with law enforcement activities.
- the imaging may take place where the resolution is high.
- embodiments of a camera with the motion stabilization mechanism may be configured to run at a high definition or ultrahigh definition ⁇ e.g.,
- the camera image sensor may be configured to have a higher resolution than the video product output.
- the stabilization system, method and device may be implemented in conjunction with a mobile camera that captures and streams live video.
- Other objects of the invention may provide a system and method and a plurality of mobile video cameras that are configured with associated motion-tracking sensors that track the camera movements and processing components with the purpose of synthesizing new video sequences as they would have been captured by a virtual camera with programmable location, motion, position/orientation, and characteristics.
- Fig. 1 is an exemplary embodiment schematically illustrating a device configured in accordance with the invention.
- Fig. 2 is a schematic flow diagram illustrating a preferred embodiment of image stabilization generation processes in accordance with the invention implemented in conjunction with hardware processing components.
- FIG. 3 A is a schematic illustration of a camera and subject geometry for imaging a simple scene using an idealized "pinhole" camera.
- Fig. 3B is a schematic illustration representing the resulting image of the scene depicted in Fig. 3 A.
- FIG. 4 A is a schematic illustration of the camera and subject geometry for imaging the simple scene, as in Fig. 3 A, using the idealized "pinhole" camera of Fig. 3 A, but with the camera being aimed along a new optical axis OA' .
- Fig. 4B is a schematic illustration representing the resulting image of the scene depicted in Fig. 4A.
- FIG. 5 is a schematic flow diagram illustrating a preferred embodiment of image stabilization generation processes in accordance with the invention implemented in conjunction with hardware processing components, for a device configured in accordance with the invention as a single camera.
- Fig. 6 is an exemplary embodiment of a device configured as a mobile body camera with a stabilizing mechanism for generating and transmitting real-time live stabilized video of a scene.
- Fig. 7A depicts a standard image of a scene taken from a frame of a video imaged using a standard video camera, with the corresponding motion information depicted in connection with the image, where the motion information represents the change in orientation with respect to the desired target orientation.
- Fig. 7B depicts an image of the scene in Fig. 7A taken with an image capture device configured in accordance with the present invention, to produce stabilized video, also depicted with the corresponding motion information.
- Fig. 8 A depicts a standard image of the scene of Fig. 7A, taken from another frame of the video imaged using the standard video camera, with the corresponding motion information depicted in connection with the image, showing movement different than that of Fig. 7A.
- Fig. 8B depicts an image of the scene in Fig. 8 A from a frame of the video taken with the image capture device configured in accordance with the present invention to produce stabilized video, also depicted with the corresponding motion information.
- Fig. 9A depicts an image of a scene taken from a frame of a video imaged using a standard video camera that exhibits the extrinsic distortion artifact caused by motion of a rolling- shutter camera.
- Fig. 9B depicts an image of a scene taken from a frame of a video imaged using a device according to the invention configured to generate enhanced video where the extrinsic rolling- shutter distortion due to camera motion has been corrected.
- an exemplary embodiment of an apparatus 110 is depicted schematically using a block diagram.
- the apparatus 110 illustrates exemplary hardware that may be configured to comprise a stabilization mechanism according to the invention.
- the apparatus or device 110 according to a preferred embodiment comprises an imaging component, such as a camera 122.
- an imaging component such as a camera 122.
- the camera 122 preferably, the reference to the camera 122 may represent one or more cameras.
- the camera 122 preferably comprises an image sensor for capturing images.
- the image sensor has a field comprising an area of the sensor, which may be made up of pixels.
- the pixels define spatial coordinates of the image sensor field.
- the camera 122 preferably includes a capture component or objective, such as a lens (see e.g., Fig. 6), for capturing an image of a scene and directing it onto the image sensor.
- the camera 122 may also include other camera circuitry or hardware, such as mirrors, reflectors a power source.
- the camera 122 may be provided as part of the device 110 and may be arranged to utilize a power supply of the device 110 as well as the other device components ⁇ e.g., storage component, processor and the like).
- the camera 122 further includes circuitry to provide information captured to the CPU 111 for processing and storage.
- sensors and the camera including the camera image sensor), as well as the software and processing components are integrated as part of the device 110.
- the device 110 also may be referred to as a camera.
- communications hardware such as, for example, a radio 118 (or other transmission components) also may be provided as part of the device 110 or in association therewith.
- the radio 118 may comprise components for receiving and
- the components comprising one or more transceivers, antennas, and a processing component (which in some embodiments may be shared with the device CPU 111).
- the diagram is CPU-centric, and in the exemplary embodiment, is organized around a central processing unit (CPU) 111.
- the CPU 111 carries out processing functions based on instructions from software, firmware or other stored or communicated commands, and controls and manages the system to capture, process and generate an output of stabilized video.
- the CPU 111 preferably performs or directs data-processing computations.
- the circuitry also may include one or more additional processing components, processors, co-processors, programmable logic controllers, and other separate or integrated components.
- a plurality of data-creating sources are provided for ascertaining data.
- the data-creating sources are illustrated comprising one or more cameras 122, and preferably a set of cameras with accompanying microphones 121, and include at least one Inertial Measurement Unit (IMU) 116 per IMU.
- IMU Inertial Measurement Unit
- the unit may have a single IMU 116, where, on the other hand, where a lens or capturing objective is independently movable relative to another lens or capturing objective, an IMU 116 preferably is associated with each lens or capturing objective.
- the IMU 116 is configured to provide suitable information to ascertain movements, including positions and orientation, of an associated camera device (which preferably carries the IMU 116, so IMU sensed motion may be or designated to be the camera motion).
- the orientation and motion of the camera 122 may be derived from the
- the derivation may be accomplished by implementing a calculation based on the IMU motion and physical relationship of the camera and IMU (assuming rigid relative placement).
- the IMU 116 provides a plurality of degrees of freedom, and, in the preferred embodiments, preferably up to at least six degrees of freedom (DOF), providing a plurality of axes along which acceleration may be measured, and about which rotation may be measured.
- the IMU 116 is configured to provide measurements of rotation and acceleration so as to provide data from which the camera location and position/orientation may be ascertained.
- each IMU 116 may include, at a minimum, a set of three mutually-orthogonal accelerometers plus a set of three mutually-orthogonal gyroscopes providing measurements of acceleration and rotation with sufficient frequency and accuracy to isolate the position and orientation of every camera.
- the motion and position/orientation determining circuitry may include one or more additional components for facilitating accuracy.
- improved accuracy of the position/orientation and location information may be obtained with the addition of one or more magnetometers, and preferably three mutually-orthogonal magnetometers or, according to some other embodiments, with the inclusion of redundant instruments, or both.
- the block diagram illustrates the CPU 111, along with a clock 112, memory 113, a GPU 114, and an interface 115 for other input/output components.
- the device components are arranged in a suitable circuitry.
- Other input/output components may, for example, comprise displays, touch screens, keyboards or other input devices, switches, USB, SDIO, sensors and the like.
- the CPU 111 preferably is configured to collect raw image and motion data from all of its sources.
- Software with instructions for ascertaining and processing information from the sensors and other components of the device 110 or components associated or linked with the device 110 is provided for instructing a processing component to carry out processes on the data.
- a processing component is shown comprising a CPU 111, which may be controlled with the software, and, which according to some embodiments, may include an operating system software (which may be part of or separate from the software for instructing the CPU 111 on the processing of data from the device sensors and components).
- the device 110 is configured to generate one or more synthesized camera outputs.
- the CPU 111 preferably is instructed to perform the necessary manipulations of the information ascertained by the device 110 and its components.
- the manipulations may include calculations to derive one or more synthesized camera outputs.
- one or more additional components such as an additional processing component may be provided.
- an additional processing component such as an additional processing component may be provided.
- a suitable amount of memory may be provided, as well as a GPU (Graphics Processing Unit) 114.
- the additional processing component such as, for example, the GPU 114 shown in Fig. 1, may be any of those commonly used to accelerate video and image processing calculations.
- the device 110 preferably includes a reference designation component that enables data to be associated based on a reference point.
- the information provided by the sensors, including the IMU, and position data preferably is coordinated to correspond with a point in time at which the information was obtained.
- the reference designation component is shown comprising a clock 112.
- the clock 112 is used to provide a common reference timebase against which to synchronize all other incoming data.
- the reference based on the reference timebase provides knowledge of the orientation of the camera 122 (or device 110) at each point in its video stream so that data from different sensors or image sources can be meaningfully and coherently combined.
- each camera 122 may be referenced to a particular time point, so that the respective images (and other data) from the respective plurality of cameras (and sensors) may be coordinated together as they were oriented at the time of the image.
- This stored information may be processed at any time, so as to be streamed from the device.
- Preferred embodiments of the device 110 preferably provide a nonvolatile storage component 125 for storing information, which, for example, may include images/video, position and motion data as well as associated time stamping.
- the device 110 may store raw data as well as the synthesized or processed data.
- the device 110 may be configured so that raw data products are continuously archived to nonvolatile storage.
- the nonvolatile storage component also may hold the stored program or programs used to direct the CPU 111 and GPU 114.
- Final products such as, for example, synthesized data or images, may also be archived.
- the device 110 is configured to store the raw information used to provide the final image or data product, it is not necessary to also store the final or processed products since all raw information required to recreate them is already stored.
- the device 110 is configured to store the raw data and transmit a processed stream of motion stabilized video (in-real-time) generated from the raw data.
- the device 110 preferably includes one or more volatile memory components 113, which may comprise memory into which information may be loaded to facilitate processing.
- the device 110 preferably is configured with one or more communications components for facilitating transmission of information from the device 110 to a remote component or location, as well as for receiving transmissions (such as, for example, instructions or operating commands).
- the communications component may comprise a radio interface 118.
- the radio interface 118 may be used to transmit products - or raw data- in real-time or on demand.
- the raw data preferably may comprise compressed data that has been compressed with a suitable compression algorithm to facilitate transmission.
- the radio interface 118 also serves as a reception point for remote commands and requests.
- the device 110 may be managed or controlled by receiving commands communicated to the device 110 through the radio interface 118.
- the radio interface 118 may be separate or integrated into the device 110, and may include one or more antennas, transceivers and processing or controllers for processing or managing communications and transmissions to or from the device 110.
- the device 110 also may be configured with one or more additional connections or ports, for connecting one or more additional components to the device. This may be done through a wired or wireless interface.
- the device 110 when used in connection with some applications may include additional input/output devices 1 15a as indicated on the diagram in Fig. 1.
- the device 110 may be configured to provide the data from the sensors and other associated components.
- the device 110 is shown configured as a camera and preferably has an associated IMU 1 16.
- the device 1 10 illustrated also preferably includes a microphone 121.
- a schematic flow diagram is provided illustrating a preferred embodiment of a system and devices for generating motion stabilized video.
- the flow diagrams and schematic illustrations in the figures represent an exemplary embodiment of imaging and sensor components and processes implemented in conjunction with hardware processing components.
- the device 110 is illustrated by reference schematically to device components (noting for convenience that only some of the device components shown in Fig. 1 are represented in Fig. 2). Referring to Fig.
- FIG. 2 the flow of data from sensors, such as, for example, the IMU 1 16, camera 122, microphone 121 , and other potential sensor components that may be provided or associated with the device 1 10 (e.g. , through the I/O, see 1 15 in Fig. 1), to final output is depicted in accordance with a preferred implementation of the invention.
- sensors such as, for example, the IMU 1 16, camera 122, microphone 121 , and other potential sensor components that may be provided or associated with the device 1 10 (e.g. , through the I/O, see 1 15 in Fig. 1), to final output is depicted in accordance with a preferred implementation of the invention.
- a system is represented wherein a plurality of image capturing devices 1 10 are represented, each having a respective IMU 1 16, camera 122 and microphone 121.
- the sensors are components that provide inputs, which include information that is obtained based on the occurrence of events or presence or absence of conditions.
- Rectangles in Fig. 2 denote processes that operate on this data as it propagates through the diagram in the direction of the arrows.
- solid rectangular boundaries denote processes (things that take inputs, do something to them to produce outputs).
- Other shapes preferably are used to represent "objects” like sensors, cameras, files.
- the dashed boundaries are rectangular, but intended to denote both grouping and multiplicity according to some embodiments.
- raw data from a number of sensors is transformed into fused data.
- a sensor fusion process 210 is carried out to reconcile the data sources against a common reference timebase 218.
- data provided by each sensor such as for example, the camera (e.g., image sensor) 122, IMU 1 16, microphone 121 (and other components on a device 1 10), is associated with a particular time, so that the data from each source has a common associated time.
- the data from the contributing sources including data ascertained from the device sensors and associated components, preferably may be stored 211 on a device storage component 125 in connection with its associated time data or time stamp.
- a series of corrections or calibrations are applied to the images from each camera 122. Both intrinsic and extrinsic image processing operations are considered, generally accounting for camera-related and motion-related idiosyncrasies/imperfections, respectively.
- the device 110 is configured to implement one or more corrective processing steps that may be carried out, which may comprise one or more intrinsic image corrections 212 and extrinsic image corrections 213.
- the correction steps 212, 213 provide a calibrated video stream from a camera 122 (or a plurality of calibrated video streams from a respective plurality of cameras 122).
- viewpoint synthesis 214 is implemented to create one or more new video streams that gives the appearance of having been recorded by a virtual or synthetic camera with a specific time-dependent location and orientation that does not correspond to any physically available camera.
- Cropping 217 may also be implemented to ensure the absence of any unexposed portion of the frame, or
- synthesis 214 is driven by a focus of attention (FOA) process 215 that selects the desired virtual vantage point based on selection elements 216, which may comprise one or more external requests and controls, actual camera motion, analysis of raw video, or algorithmic processing.
- Synthesized video products 220 may be communicated, transmitted and/or processed further.
- synthesized video products 220 can be transmitted 221 with a suitable transmission component (e.g. , the device radio 118), displayed on a suitable display component, such as a display screen, subjected to further processing (e.g. , application specific processing 223), or simply stored 224 on a storage component, or by any combination of these.
- the synthesized video products 220 preferably comprise motion stabilized video which is generated from the one or more streams of the synthesized virtual cameras.
- the present system, method and devices such as the device 1 10 depicted in Figs. 1 and 2, preferably implement sensor fusion 210 to coordinate the information from a plurality of sensors.
- the reference platform contains a number of sensors each producing its own output data stream.
- the device 1 10 is configured to produce video images that are stabilized images (including a stabilized video stream).
- the device 1 10 is shown in Fig. 1 comprising a plurality of sensor components, such as, for example, an IMU 116, camera 122 (with an image sensor), microphone 121, as well as other possible components provided with or linked for association therewith (e.g., components supported by the I/O 115).
- the system may include a plurality of devices 110 (see, e.g., Fig. 2).
- a plurality of devices 110 see, e.g., Fig. 2.
- sensor instruments including the sensor components shown and described, may be loosely-coupled or even independent, and may operate at different and potentially time- varying rates (e.g., where a motion sensor ascertains information at 100 times per second and a camera outputs a frame at 30 frames per second).
- Some of the sensor components may even possess distinct and unsynchronized local clocks (or "timebases").
- the device 110 preferably is configured to manage the sensor data by
- Sensor fusion 210 manipulates the information to reconcile the data sources against a common reference timebase 218 in order to produce "fused data" from raw data.
- the system, method, and devices preferably are configured to generate fusion of data.
- creating fused data does not modify any of the raw data, leaving the raw data available for further use (e.g., processing or manipulation).
- Preferred embodiments create fused data to augment and extend raw content with sufficient cross-reference metadata to co-register (align) the raw data types using the reference timebase 218 and its known (or estimated) relationships to the separate sensor source timebases.
- each sensor may have associated with it a sensor time base.
- the camera 122 image sensor may have a camera image sensor timebase 218a, while the microphone 212 has a microphone timebase 218b, and while the IMU has a timebase 218c.
- Sensor fusion 210 preferably may produce new sensor products or data components. Sensor fusion 210 may provide more than "synchronization" 209 (see Fig. 5) because new sensor products may also be derived at this stage, i.e., the whole is potentially greater than the simple sum of its constituent parts.
- a plurality of sensors are configured to provide sensor data. In an exemplary embodiment illustrated in Figs. 1 and 2, three sources of sensor data are identified: IMUs 116, cameras 122, and microphones 121.
- the camera 122 and microphone 121 are well- known electronic sensors recognized by nontechnical end-users, and do not require further explanation. Common practice bundles the sensors together, yielding a bonded audio/video data stream that is already fused. This is such a ubiquitous configuration that, in the sequel, the term "video" always implicitly allows for the optional presence of synchronized audio, whether such synchronization is effected at the sensor assemblies or by the sensor fusion 210 process.
- the IMU 116 preferably is provided as a single component, or may be provided as a plurality of components, and, according to some embodiments, the components may be associated together in a circuit, or in conjunction with a microcontroller or microprocessing unit.
- the IMU 116 may be configured to provide three types of data used to derive accurate relative and absolute estimates of position and orientation, which include angular rotation rates measured by gyroscopes, acceleration measured by accelerometers, and magnetic field measured by magnetometers.
- the IMU 116 may be configured to provide the position and orientation data and may, for example, comprise a monolithic device, an integrated circuit or assembly, or aggregation of disparate sensors.
- the components comprising the IMU sensor may provide data that is separate and will be separately timestamped, or may provide already fused together data.
- the IMU is a monolithic device, an integrated assembly (e.g., black- box or grey-box assembly), or an aggregation of disparate sensors, the data streams from distinct IMU sensing modalities may or may not already be fused with each other.
- Embodiments of the devices according to the invention may include one or more additional components to further facilitate movement determinations.
- additional instruments may be present to aid in the position, orientation, or navigation tasks, including multiple sets of the three
- raw data ascertained from the sensors including, for example, position and orientation data from the IMU 1 16 (and/or its components), the microphone 121 , and image sensor of the camera 122 is stored in its unmodified form on a storage component, such as the device storage component 125.
- a storage component such as the device storage component 125.
- Preferred embodiments provide a database which stores sensor data in accordance with a timestamp.
- the raw data may include the sensor timebase for each sensor.
- the storage may be provided on each device 110, and in the case of a plurality of devices 110 operating, there will be timestamped data from the sensors of the respective plurality of devices 1 10.
- the device 1 10 may provide a respective plurality of data for each camera lens component.
- Unmodified raw data along with the additional metadata utilized to achieve sensor fusion are always stored so that a database of unadulterated source materials remains available for future review.
- the resulting archive supports reprocessing with different control settings, enhanced exploratory or experimental offline processing, and complex workflows involving the fusion of external image sources or other independently-acquired data.
- FIG. 2 depicts storage 211 occurring after the fusion 210 of the data sources, it may be advantageous to implement both processes in a distributed fashion.
- the presence of a lone "storage" block 215 in Fig. 2 is exemplary and does not limit the form of the archive to a single file or database.
- Raw data may be archived separately and immediately as it emerges from individual sensors, with fusion-related metadata written to one or more physically separate archives.
- the data may be archived in a database or other form.
- the image data preferably is processed in an intrinsic image correction step 212.
- the image may be manipulated at this step to compensate for certain properties that the camera may have imparted to the image.
- the image data is manipulated to convert or adjust the image.
- the system and devices are configured to implement intrinsic image corrections 212.
- Intrinsic image corrections 212 represent manipulations to the image based on camera properties. According to preferred embodiments, the manipulations preferably comprise computations upon and changes to an image in order to compensate distortions or imperfections that depend only on the physical characteristics and geometry of the camera, including the lens and sensor.
- Some examples of the manipulations may include correction of distortion from lens irregularities or skewed projections attributable to manufacturing defects, compensation for anisotropic (direction-dependent) sensitivity to lighting, or removal of wide-angle lens phenomena such as fisheye warping or barrel distortion.
- camera-specific calibration measurements are utilized to perform these calculations, with no other sensors being involved - as the name indicates, the computations are based entirely on intrinsic properties of the camera.
- software containing instructions to instruct the processor to manipulate the data and provide the changes is provided.
- the software may be integrated on a chip on the device 110, or be provided on a device storage component.
- the processing component such as the CPU 111, preferably is instructed to implement the manipulations to provide an intrinsic image correction, and corresponding intrinsically corrected image (ICI).
- the ICI image (or IC image data) may be stored, or further processed, or both.
- this intrinsic image correction manipulation step 212 specifically excluded are effects related to the motion of the camera or subject.
- manipulations for intrinsic image corrections 212 which may comprise computations, operate upon raw video data without regard for the timestamps accompanying the frames, so processing can be applied prior to sensor fusion.
- intrinsic corrections may be provided as part of the fusion process 210 itself, wherein raw images are "fused" with camera-specific calibration information to derive a new and more refined video product).
- the image preferably is processed to undergo further manipulation in one or more extrinsic image correction steps 213.
- Extrinsic image corrections 213 comprise manipulation of the image which include operations upon an image to compensate for defects that are not inherent in the camera or sensor physical configurations, but that arise as a result of external circumstances. Some examples include rolling shutter artifact mitigation and motion blur compensation. For example, in the case of a rolling shutter, the image is not exposed
- the extrinsic image corrections 213 may be applied in conjunction with the viewpoint synthesis 214. As discussed herein it may be preferred that some - or all - extrinsic image corrections may be combined with the viewpoint synthesis 214 instead of being applied separately.
- the image is further processed in the viewpoint synthesis 214 and cropping 217 steps.
- the stabilization mechanism preferably implements viewpoint synthesis 214.
- the system and devices shown and described herein are configured to carry out viewpoint synthesis.
- Viewpoint synthesis 214 lies at the heart of the image processing chain depicted in Figure 2.
- the viewpoint synthesis is intimately associated with cropping 217.
- the depiction of multiple dashed boxes surrounding the viewpoint synthesis 214 and cropping 217 operations in Fig. 2 denote the possibility of having more than one set of simultaneous parallel operations to create independent outputs.
- the viewpoint synthesis 214 preferably comprises a reprojection manipulation of the image.
- the reprojection of the viewpoint synthesis 214 is carried out on the image to optimize the processing by combining it where appropriate to facilitate processing.
- Reprojection may be carried out after manipulation by extrinsic and intrinsic adjustments, in steps 212, 213, that have taken place on the sensor fusion processed image, or, alternatively, may be carried out by combining it with the extrinsic and/or intrinsic adjustments 212,213.
- Preferred embodiments of the devices, such as the imaging device 110, systems and method are configured to implement manipulation of a captured image by reprojection and viewpoint synthesis.
- FIG. 3 A is a schematic illustrating camera and subject geometry for imaging a simple scene using an idealized "pinhole" camera.
- Fig. 3B is a schematic illustration representing the resulting image.
- the schematic illustrations of Figs. 3 A and 3B present an orthographic projection viewed from above the camera, with plane of the schematic oriented at right angles to both the focal plane and the vertical axis of the camera.
- this symmetric arrangement allows the third dimension extending upward and downward from the page to be ignored.
- a camera with infinitesimal aperture at point O is aimed directly at point A.
- OA forms the optical axis of the camera, and the projected image of point A will appear directly in the center of the sensor (i. e. , the camera image sensor).
- the camera focal plane captures the image of its scene at some projected distance OX (the focal length) from the aperture; although this plane is physically located behind the aperture, it is convenient (and fairly conventional) to draw it in front of the lens as a "virtual focal plane" for clarity and ease of understanding.
- the field of view (FOV) of the camera is limited to the area contained within angle BOC.
- the only object visible in this posited exemplary scene is a sphere 300 located at the extreme right edge of the FOV, with its center in the plane of the figure. Its size and distance are unimportant, but the sphere 300 subtends angle DOC as viewed from the camera - hence its projection upon the focal plane along the horizontal axis of the camera is the line segment YZ.
- Fig. 3B what is represented is the view seen by the camera, as captured on its sensor (i.e., image sensor).
- this sphere 300 represents an object of interest, in retrospect it would have been desirable to have captured its image while it was closer to the center of the sensor (the sensor field boundaries referenced as W and Z), more aligned with the optical axis OA that intuitively forms the "center of attention" of the camera.
- Figs. 4 A and 4B demonstrate how this geometry could have been obtained from the existing configuration. As shown in Figs. 4A and 4B, most of the original lines and notations from Figs. 3A and 3B are retained for reference, with additional information being introduced (and having prime designations in the reference characters).
- the center of the sphere 300 projects to the center of the sensor (i.e., through X').
- the resulting image is shown in Fig. 4B (ignoring for the nonce the dark shading to the right of the image of the sphere).
- the resulting image is not identical to shifting the image from Fig. 3B leftward because the image plane W'Z' is tilted with respect to the original WZ - scene objects (if there were any present) to the left of the sphere 300 (as viewed from the aperture) will lie closer to the new image plane while those to the right will be farther away.
- the FOV When the camera undergoes movement, such as, for example, a change in its position, or orientation, the FOV is changed. Physically changing the orientation of the camera (as described in the example situation immediately above) results in an FOV bounded by the new angle B'OC, but reprojection remains limited to using imaging data captured from within the initial bounds of BOC. Effectively, the FOV of the synthesized camera is reduced to B'OC. Referring to Fig. 4B, the black shading reminds that it is not possible to reproject images from regions of space that were not represented in the source image with its original optical axis. In the case of
- the system, method and devices preferably are configured with instructions for manipulating the image data to adjust the optical axis from a first optical axis designation to a second optical axis designation. Subsequent optical axis adjustments may be made in accordance with image data.
- the image data preferably includes fused sensor data, including the pixel image information from the camera image sensor, other sensor data, such as, for example, microphone data, and IMU data (e.g. , orientation and position data), as well as the timebase at which the data was obtained.
- An optical axis manipulation provides a reprojection of the image, such as, for example, as illustrated in connection with the object or sphere 300, and adjusts the image plane.
- Reprojection represents a component of viewpoint synthesis 214 (see Fig. 2).
- a first new camera is synthesized to correspond with the reprojection and provides a new point of view (compare Figs. 3A,3B with Figs. 4A,4B).
- the reprojection comprises a component of the viewpoint synthesis.
- each camera clearly possesses a well-defined viewpoint. This is the first viewpoint VP1.
- a plurality of alternate viewpoints (VP2, ... VPn) may be generated through processing of the image data information of or corresponding with the first viewpoint VP1.
- a processing step of applying an adjustment or correction may be applied to viewpoint image information, and the adjustment may be based on an application of a mathematical formula applied to the image data.
- the image adjustments preferably utilize the IMU and other sensor data.
- the system, method and device preferably are configured to generate an image (frame of video or portion of a frame) corresponding with and having a virtual viewpoint VP (e.g., VP1, VP2, . . VPn).
- the images generated may be done in succession or continuously as the camera undergoes movement, or during the time that the camera is capturing a scene, i. e. , recording video, where the camera may be in an intended position for some image capture and in a changed position for some other of the image capture.
- the manipulations of the video generated from the image capture information preferably may be carried out recursively, from the IMU
- the image information obtained from the information of the physical camera image capture may be manipulated by mapping to preserve some points, straight lines and planes, but not others.
- a designated viewpoint which may become a designated virtual viewpoint (where the camera has been moved from its original physical position providing the designated viewpoint)
- ratios of distances between points on a straight line may be preserved (from physical camera original viewpoint versus a virtual viewpoint).
- the manipulations may permit angles between lines to change as well as distances between points in the virtual viewpoint synthesized image (such as a frame or frame portion).
- the manipulation of the image preferably is carried out so that some of the parallel lines may remain parallel.
- the manipulations may comprise linear transformations (such as "affine transformations") and generation of a viewpoint and its associated field of view may be produced from applying geometrical optics principles to the image data to produce a manipulated image data for the data set image (obtained by the physical camera) but corresponding with the designated viewpoint (VP).
- the manipulations generate an image that represents the field of view (which may be a synthetic field of view) from the designated viewpoint (VP).
- the manipulations according to a preferred embodiment comprise reassignment of the image data from the previous or reference focus of attention viewpoint (VP1) to a different or synthesized viewpoint (VP2). An angular component of the image data is applied to change the angle for the field of view.
- the angular component manipulation may change the angle with respect to the path between the scene, scene object (or image target) and the synthesized field of view (FOV) of the synthesized viewpoint (VP2), the synthesized corresponding field of view referred to as (FOVyp 2 )-
- the angular manipulation depends on the location in the field of view, as for example, in Fig. 4A, the assignment of the shifted field of view (FOV) represented by W'Y'Z'.
- W'Y'Z' the shifted field of view
- an angular shift is positive in regard to the synthesized FOV for image portions to the left of the plane intersection (that is left of where W'Z' intersects W Z), and negative in regard to the synthesized FOV for image portions to the right of the plane intersection (that is right of where W'Z' intersects W Z).
- the image in the first field of view (FOV), such as that field represented by W Z in Fig. 3 A, preferably is captured by the image sensor pixels.
- the pixels represent spatial coordinates of the image field.
- an alternate viewpoint may be considered to be where the camera has moved from its position/orientation in Fig. 3A, and alternate corresponding field of view, such as field of view (FOVVPA), is associated with the alternate viewpoint VPA (for example, shown by the different viewpoint in Fig. 4A where FOV W Z' is represented).
- the device is configured to manipulate the captured data associated with the pixels to generate the image from the synthesized camera having a synthesized viewpoint (SVPl).
- the synthesized viewpoint may be a synthesized viewpoint corresponding to an initial viewpoint (such as VPl), so that the image is synthesized as if the camera were still in the position depicted in Fig. 3 A.
- an initial viewpoint such as VPl
- Fig. 3A and field of view represented in Fig. 3 A may be synthesized from another viewpoint (e.g. , such as, for example, to be imaged as a virtual synthesized viewpoint and field of view from the camera position depicted in Fig. 4A, if that were to be desirable).
- Other synthesized cameras and viewpoints may be produced.
- image which preferably may be a video stream.
- the image information, including camera movement, may be processed rapidly so that the viewpoint and field of view manipulations are made rapidly and the video stream may be produced with the viewpoint synthesis manipulations applied.
- the processing manipulations of the image data are done rapidly in response to and in coordination with the IMU movement data.
- a video stream is produced.
- the video stream is produced and may depict the scene from the point of view of the designated viewpoint, even though the image frame (or image information) was captured at a moved position of the camera, which may be an alternate position from which imaging is done from an alternate viewpoint (VP A).
- the data captured from the alternate viewpoint (VP A) may be synthesized to have a synthetic look direction, as if imaged from the initial or designated viewpoint (e.g. , VPl).
- the synthetic cameras provide frames or frame portions so that the viewpoint from the moved position or changed orientation (VP A) of the camera may be used to generate a synthetic camera (or plurality thereof) that captures the image from the initial viewpoint (VPl), even though the physical camera FOA has moved from that initial viewpoint (VPl).
- a synthetic camera may generate video images as if produced from a camera that is imaging the scene (e.g. , subject or target object) from an alternate position (or a number of alternate positions), which are alternate to the camera look point. That is, even though the camera look point or FOA is imaging in one direction, the scene may be viewed from one or more other directions (i.e. , as if captured from one or more directions).
- video data preferably is obtained from a first or initial direction (VPl), and may be manipulated to generate video of the scene that corresponds with an alternate or second viewpoint (e.g. , VP2).
- the video generated from the alternate or second point of view (VP2) preferably is generated by manipulating the video information obtained for the scene from imaging in the direction VPl .
- Adjustments are made to the pixel data, such as for example, an angular adjustment to provide an angle corresponding to and angle that the FOV has changed, and a relationship adjustment for image pixels along parallel lines.
- the adjustments may be implemented as the video is being captured with the camera so that adjustments to the image data provide a look direction that is smoothed even though the camera may be undergoing multiple position/orientation changes.
- the image data preferably is manipulated rapidly to provide a stream of adjusted video.
- the adjustments or manipulations to the image data to provide a selected or designated look direction preferably also do so while the camera is undergoing desired motion, such as translational motion.
- desired motion such as translational motion.
- the device and system is configured to discern desired movements from the undesired camera motion that requires adjustment.
- the desired movement preferably is determined by monitoring and evaluating the motion of the camera, and preferably, the continued motion of the camera.
- the device and system are configured to evaluate the motion and time information and determine whether the motion is a deliberate motion that is desired or acceptable motion (that is not corrected) or whether the motion is undesired motion.
- the device and system preferably are configured to ascertain movements of the camera (which provides a position of where the lens is pointing), and evaluate the times at which the movements occur.
- the device and system preferably are configured to distinguish between a first type of camera movement, which may be translational or intentional movement, and a second type of camera movement, which may be oscillating or rotational movement (such as a change in orientation of the camera).
- a first type of camera movement which may be translational or intentional movement
- a second type of camera movement which may be oscillating or rotational movement (such as a change in orientation of the camera).
- the look direction is selected or designated to be a location in front of the camera (although it could be designated to be another designated direction, preferably within the field of view of the camera). Activity by the individual typically will result in movement of the camera.
- the camera may experience movements as a result of the individual walking, running, ascending or descending stairs, driving in a vehicle, or other movement.
- an individual is moving forward, e.g. , walking, running, traveling in a vehicle
- a motion type that is typically translational motion.
- the motion movement for example, of a forward moving individual, is preferably evaluated and identified by the device and system as the aforementioned first type of movement.
- the device camera movement information preferably is obtained by the sensors, e.g., the motion sensors such as one or more IMU's, and the timestamp identifies the movement as a function of a time interval.
- the time motion data is obtained, and the movement pattern is evaluated to ascertain whether within the movement pattern, there is a threshold degree of randomness.
- a first randomness threshold i.e., low randomness or relative low randomness
- Intended movement of the camera is where the camera movement is desired, so if the camera is moving forward, the field of view remains in front of the camera, and if the individual and camera worn by that person were to make a turn, such as, at a corner of a street, that movement would be intended movement.
- the focus of attention (FOA) of the camera may point toward a particular direction or at a desired object or subject to be followed by the camera (e.g., a target).
- the device and system preferably are configured so that movement information that identifies a first type of designated movement, such as directional movement, does not implement an image adjustment for that movement.
- the movement data is considered in connection with a time frame, and movement changes in short time durations may be designated to be movements for which adjustment or manipulation of the image is made ⁇ e.g., to reproject the viewpoint from what the camera actually views at the time of movement).
- movement changes in longer time durations may be designated to be intended movement for which no adjustment or manipulation of the image viewpoint or look direction is made to compensate for the longer duration intended movement.
- the system and device identify the undesired camera movement and apply a manipulation to the image (even though the image is recording a scene where the camera is moving toward something, e.g., walking in a direction).
- the method is carried out to ascertain movement of the camera, which may be desired movement and undesired movement, which may happen at the same time or at different times. . Walking may result in movement of the camera in a left or right direction, or upward or downward direction, as the individual's steps taken may jostle the camera.
- These movements are detected by the device and system components and the movement information is processed and preferably evaluated to be identified as undesirable movement.
- the camera look direction which is in a non-designated direction (based on the undesirable movement), is manipulated for that time of movement to have a look direction that represents the designated look direction (which the camera does not have).
- a synthetic camera generated by the device and system provides a look direction that is or substantially is close to the designated look direction.
- the target such as an object or subject being followed
- focus of attention therefore may be maintained, or attempted to be maintained.
- the field of view of the camera may overlap with the field of view of the synthetic camera or synthesized viewpoint.
- the camera is configured to image a wide field of view to facilitate increasing the likelihood of capture of the designated or desired viewpoint within the field.
- the image sensor field may be larger than the smaller synthesized output image to allow for a more expansive (or wider) area to provide more field for the synthetic cameras and the corresponding viewpoints they may have.
- the synthetic camera output image may cover a portion of the image sensor field, and, according to preferred embodiments, the portion which the synthesized image output covers may be any location within the image sensor field, and may change so that the synthesized image output is from different portions of the image sensor field.
- the camera also moves abruptly (even when harnessed).
- the abrupt movements may otherwise produce unusable video images, if the actual camera look direction were used for each frame.
- the device and system implement manipulation of the image information to produce a video stream with video image frames that maintain a look direction through the camera and one or more, and preferably a plurality, of synthesized cameras synthesized from the one camera through which images are being recorded.
- the synthetic camera viewpoint image frames are injected into the video stream when the physical camera look direction has moved from the desired or designated look direction, and, preferably as a result of unintended camera movements.
- image adjustment may be made by redesignating a viewpoint or look direction.
- the camera may initially point in a designated look direction
- another direction form which an image is captured including a direction from a synthetic camera, may be designated as a look direction or viewpoint.
- the system and device preferably may generate a plurality of synthetic cameras from a physical camera imaging the scene. Accordingly, a large ensemble of viable alternate viewpoints may be generated.
- the system and device are configured to generate viable alternative viewpoints.
- the depictions shown and discussed have been provided in conjunction with individual images, the system, method and devices capture information that is captured as a video stream.
- the desired video stream preferably is a video stream that is designated or purposed to image a particular object or target. The designation may be along a particular path or focus of attention. As discussed herein, reprojection generally reduces the field of view.
- one or more additional cameras may be used to provide the missing data - possibly after their own images are subjected to reprojection.
- the image information obtained from the camera and other sensors is utilized to generate one or more new video streams from one or more respectively synthesized virtual cameras.
- a plurality of synthesized virtual camera streams may be generated.
- the device may manipulate the image data so as to fuse data from all cameras and exploit multiple video sources to synthesize one or more new video streams from correspondingly- synthesized virtual cameras.
- the system, method and device preferably are configured to provide a suitable level of coverage for the camera.
- the image field may be suitably configured to capture a field in which the activity is occurring (or likely to occur), for example, by using an appropriate lens, such as, for example a wide field type, or fisheye lens.
- an appropriate lens such as, for example a wide field type, or fisheye lens.
- the image field may be configured to capture a suitable field within which the activity is occurring (or anticipated to take place).
- Embodiments of the system, method and device are configured to provide the desired level of coverage for the activity taking place.
- a plurality of cameras may be utilized in accordance with the present system to provide a plurality of FOV s, which have corresponding respective viewpoints.
- one or more (or all) of the plurality of cameras may provide image data from the camera and other sensors, which is manipulated to provide a respective synthesized virtual camera, and generate one or more new video streams from each of the one or more respectively synthesized virtual cameras.
- a plurality of synthesized virtual camera streams may be generated from each of the plurality of physical cameras, and the respective image
- the camera when more than one camera is not implemented (e.g., to image the same object or subject) or is unavailable, the camera preferably is configured with a wide field of view.
- the native camera sensor FOV may be adjusted to enhance the field using one or more wide angles lenses.
- These wide angle lenses such as fisheye lenses, provide more FOV, but have distortion at the edges.
- Embodiments may employ deliberately severe distortion (such as that found in fisheye lenses) in order to achieve these goals while ameliorating the effective FOV losses.
- the devices and systems may be configured to maintain a look direction that corresponds with the direction of the moving body or other support on which the camera or device is carried.
- the device may be moving in a forward direction, and the direction may be processed to correspond with a movement vector.
- a body carried device such as a mobile police body camera, may be determined from its motion data to be moving in a particular direction, say, for example, a radial path of travel. The motion may be determined to be substantially along a particular path of travel, the radial path, until some other change in direction is sensed.
- the camera may be configured to determine a field of view to be a path following that radial direction.
- the field of view may be adjusted slightly to the radial direction of travel, which is the look direction and where the look direction is likely the desired direction of the pursuit or interest area.
- the travel direction is along travel vector (TV1) and the camera device is moving (e.g., due to body motion of the person carrying it)
- the image may be manipulated to provide the field of view of a camera that is moving in the travel path, in this example, along TV1.
- a synthesized camera may be configured to sense and follow an intended path of travel.
- the motion stabilization also may be implemented to generate a video stream of image data representing a field of view that manipulates the image or images forming the video stream to a stabilized depiction of the scene.
- the mechanism of the device processes the movement information and preferably obtains the scene image from the expected path direction that is anticipated or perceived to be desired, based on the device configuration.
- the device preferably is configured with software containing instructions to process the data from the sensors and apply manipulations to the image data to produce stabilized video from one or more synthetic viewpoints.
- FOV reduction and its variability highlight another difference between individual camera viewpoints (whether synthesized or real) and the synthesized viewpoint of a virtual camera.
- FOV is an intrinsic property of the sensor, the lens and their relative placement.
- the lens may have one or more settings, and the lens may become a different lens when a different setting is applied.
- the lens is not a fixed-focus lens, it effectively becomes a different lens when it is adjusted.
- the FOV of a reprojected image is limited by its source image with potential further reductions being dependent on the transformation geometry.
- Producing an aesthetically acceptable video stream from any single camera invariably requires rectangular cropping 217 (see Fig. 2) such that worst-case FOV reduction never reveals unexposed (off- camera) portions of the imaged scene.
- output products 220 from viewpoint synthesis 214 ultimately remain limited by raw data sources but still enjoy considerable freedom in choosing the intrinsic characteristics of the synthesized camera including its field of view.
- an embodiment may employ multiple cameras to provide synthetic video output. According to some alternate embodiments, deliberate physical placement of multiple cameras can ensure this degree of flexibility. Of course, despite the freedom to select FOV explicitly, the final output must be rectangular (for the rectangular display format).
- Fig. 2 shows cropping 217 following after viewpoint synthesis 214, preferred implementations may implicitly merge the two processes, absorbing the former into the latter.
- explicit per-camera i.e. single-camera
- viewpoint synthesis 214 may be configured to manipulate the image information so as to only select subsets of image data that will be used in the construction of its final image.
- Fig. 2 illustrates a process diagram and denotes a focus of attention (FOA) 215 process.
- the system and devices such as the exemplary device 110, are configured with instructions to synthesize one or more virtual cameras from the device information.
- the device 1 10 is configured with software containing instructions to process the image and sensor data to select a desired viewpoint, which preferably is a desired instantaneous viewpoint for the virtual camera (or viewpoints in the case of a plurality of virtual cameras).
- the viewpoint selection information generated to represent the viewpoint for a virtual camera preferably is stored on the storage component of the device 110.
- a database is generated or constructed to hold the image information, including virtual camera information for the one or more virtual cameras that are synthesized.
- the database preferably includes the virtual camera image generated for the corresponding focus of attention.
- the images preferably may be a video stream (for example, up to 30 frames per second captured video).
- the device 110 also preferably is configured to stream the video as live streaming video.
- the device 110 processes the video images captured from the one or more cameras and one or more synthesized or virtual cameras.
- the images are adjusted to capture a frame image in accordance with the desired selected focus of attention (FOA).
- the focus of attention (FOA) may be generated by the device 110.
- the system and device generate a focus of attention (FOA) 215 to select or designate the desired instantaneous viewpoint for one or more virtual cameras, forming the controls that drive viewpoint synthesis.
- FOA focus of attention
- FOA and viewpoint are interchangeable terms; the two names distinguish between one process that selects virtual viewpoints and one or more processes that reify those viewpoints via extensive image-processing manipulations, which include computations.
- Components of the imaging computations involve designating or assigning of a look direction, which may be a focus of attention of the camera or lens.
- a component of the data profile is IMU data associated with the position and orientation of the camera.
- the image information captured by the camera and the IMU position/orientation information are designated or registered in time.
- the camera image information is manipulated to produce an image or image portion that corresponds with the assigned or desired look direction (from which the camera was moved) when the new look direction occurs when the camera moves or changes position or orientation.
- the manipulations preferably include obtaining the relative IMU movement differential for movement of the camera taking place.
- Movement information or data is obtained from the IMU sampling, which may be a number of times per second, and the movements may be processed to relate the prior movement. This may be related to the initial or designated camera look direction, and each subsequent movement may be related to the previous movement, so that adjustments of the actual image captured from a viewpoint may be adjusted from a relative position movement or differential.
- the IMU information preferably is applied to the time point at which the corresponding image information was captured.
- the IMU movement information may be used to determine an angular adjustment and a position adjustment within the frame of the image sensor.
- the camera image sensor field captures images from the initial look direction, and then the image sensor field captures images from a direction that the IMU has detected the camera to have been moved to.
- the angular component adjustment may be applied to adjust the moved field to relate to the initial field by a rotation about an angle (or one or more angles) which may be one of the 6 degrees of freedom or three axis of the IMU.
- the focal length also provides a parameter that is used to determine distances for reprojected images.
- a distance component adjustment is made for objects appearing in the scene, which preferably may be applied to the pixels.
- the pixels may be manipulated to be moved to more particularly represent a camera image of the scene as if taken from the initial point of view (even though the camera has been moved).
- Kalman filtering may be used to smooth trajectory of the movements, which may smooth the look direction, so as to concentrate the movements to reduce deviations from noise or other inaccuracies.
- the image manipulations may be processed in conjunction with an applied Kalman filtering so image manipulations are applied to movement information that preferably represents movement data that is optimized for the camera movements detected by the IMU.
- the IMU data may determine movement, position, and/or orientation at a particular time and relative to a designated look direction,
- a system and device for producing macro stabilized video output stream from a camera that is subject to movements having a camera configured according to the depictions herein.
- the camera lens moves with the camera and an IMU (e.g. , movement/position/orientation sensor) ascertains information as to the position/orientation (i. e. , movements) of the camera, including relative movement thereof from a previous position, or a state, where the same position is maintained over the ascertainment period.
- the position data may be ascertained in a unit time, such as, for example, number of position data sets obtained per second.
- the system and device preferably capture video images wherein the scene which comprises objects in the field of view are represented as having a position based on a three component vector, which is at a particular time (relating to the image capture time for that portion or frame of video).
- the image space may be represented by the sensor field, which is made up of pixels.
- an image position may be represented by two coordinates.
- the coordinates are homogeneous coordinates.
- a component vector having a unit value is appended, so that the coordinate vectors for the image position may be [x y 1], and coordinates for the world position may be [X Y Z 1].
- the captured image data are recorded.
- the camera may be undergoing movement (other than rotational movement), and therefore, the coordinate system may be changing in conjunction with corresponding movement.
- the system and device are configured to manipulate the data to generate a manipulated or synthesized image, and may proceed as if it is starting all over again at each frame and utilize that simple coordinate system (such as the coordinate system referred to in Fig. 3 A) for translational movement.
- camera rotations are tracked and recorded, while the system and device may ignore the translational motion because translational motion (e.g. , such as moving the camera in a forward direction to follow a subject) actually may be desired.
- the IMU provides position data corresponding to the time at which the image data for the image coordinates was captured or obtained.
- the system and device are configured to generate a relation for the image coordinates with regard to the world coordinates (i.e. , the coordinates of objects in the scene).
- a matrix component K is applied to relate the coordinates, such that the matrix component K may be applied to the world coordinates in order to produce a linear matrix multiplier.
- This matrix multiplier preferably is dependent upon intrinsic camera parameters, such as, for example, focal length and scaling constants relating to physical dimensions and pixels.
- Additional multiplicative matrices may be applied to relate the image matrix coordinates at a particular time to a corresponding movement at that time.
- the rotation information may be applied to designate camera movement by way of a rotation matrix. Accumulated rotations of the camera at time t may be captured by a rotation matrix, R(t).
- a movement component is applied to relate the movement of the camera.
- one component of the camera movement comprises rotational movement or a rotation.
- the IMU provides the camera rotation data and that camera rotation data is associated with a time.
- the system and device are configured to generate a projected image location, which may be a synthesized view point, which represents a given world location for an object or scene captured at a particular time (t).
- a projected image location which may be a synthesized view point, which represents a given world location for an object or scene captured at a particular time (t).
- the device and system preferably may utilize the generated accumulated rotations and apply the rotation matrix R(t) to the matrix component K (first matrix multiplier Ml) so that the rotation matrix R(Y) provides a second matrix multiplier (M ), and where the first and second multipliers (Ml and M2) are applied to the matrix world coordinates to generate a projected image.
- the projected image preferably may be represented as manipulated data, and a relationship may be established between the image coordinates (image vector), and the world coordinates (world vector) based on the camera movement and the application of one or more movement components, such as a rotation matrix R(t) in this example, as well as the intrinsic camera matrix K.
- One or more additional movement components may be implemented by the device and system to generate the relationship for a projected image.
- the other movement components may be matrices, including, for example, one or more additional matrix multipliers, M3, M4 . . . Mn.
- the one or more additional movement components may be applied by the device and system to reflect one or more other movements of the camera. Corrections, such as compensation for rolling shutter may be implemented by the device and system, where a different movement component, such as a rotation matrix, applies for each row of the captured image (versus each frame).
- Motion stabilized video may be generated from relating the image for the video frame or frame portion captured for a time.
- the system and device may be configured to coordinate camera movement, as indicated by the IMU or other motion sensing components, with the image captured.
- One or more components are applied to the image information, based on the relationship between the world coordinates and image coordinates.
- a synthetic camera may be generated so that an image may be produced having a look direction or synthesized viewpoint which represents a viewpoint that is desired or designated.
- the system and device generate a synthesized look direction where the resulting field of view substantially overlaps the one that was recorded.
- the system preferably is configured to synthesize a virtual camera and re-point the virtual camera to capture a designated target or subject.
- the system generates a synthetic camera and produces images from a selected or designated viewpoint, which, may correspond with a viewpoint that the physical camera never actually held.
- a selected or designated viewpoint which, may correspond with a viewpoint that the physical camera never actually held.
- One example of the implementation is the generation of an image from the synthetic camera, where the image may be produced having a look direction or synthesized viewpoint which represents a viewpoint that is desired ⁇ e.g., such as directed toward a target or desired direction).
- the look direction or synthesized viewpoint may maintain a desired look direction at a particular position (e.g. , a position in front of the camera or the camera location).
- a particular position e.g. , a position in front of the camera or the camera location.
- the look direction or synthesized viewpoint may maintain a desired look direction which may be relative to a camera position (such as an original camera position, or position of the camera within which an event, e.g., sound, flash, etc., occurs).
- the system may generate video images that are taken from a specified viewpoint, and may involve a viewpoint that is different than the viewpoint that the camera was actually pointing.
- the video images may be taken from a designated viewpoint, which, for instances where the camera is located to point in the designated direction, will utilize that frame, and for instances where the camera is pointing in a direction other than the designated look direction will utilize a synthetic camera image frame that has the look direction of the designated viewpoint.
- movements of the camera relative to the designated viewpoint may be related to one or more prior camera positions from which the synthetic viewpoint may be maintained.
- the designated viewpoint may be maintained by making further adjustments to the already adjusted image information. Further manipulating of the image information based on the change in image position relative to the previous position may be implemented to provide the viewpoint from the synthetic camera.
- the synthetic camera viewpoint is within a field of view that substantially overlaps the field of view recorded with the camera ⁇ e.g., on the camera image sensor).
- the adjustment components may be combined together and applied (as opposed to being sequentially applied).
- the synthetic camera may be generated for each camera movement, and the image may be stabilized by producing the video stream output that is manipulated to be from a desired or assigned point of view.
- the processing of the video images captured preferably is done at a high rate and is coordinated with the information obtained by the sensors, including the camera image sensor, the IMU and other sensors (microphone), temperature, etc.
- the device preferably is configured to record the raw video captured, and may stream live motion stabilized video.
- the system may be configured with multiple unrelated cameras, which may be different locations, including in widely dispersed geographic locations - lacking any pairwise overlapping FOVs.
- the FOA process may be implemented for each of the remotely situated cameras. A large number of application-dependent options are available for realizing this process.
- the FOA may be directly controlled. This may be accomplished by external inputs or requests communicated to the device 110 or camera, or may be generated and updated autonomously via a local processor; combinations where either of the two choices is used to assist or guide the other are also viable and may be implemented in conjunction with the system, method and devices.
- External requests may be manually directed by humans, driven by automation, or both. Requests may specify a fixed target direction (z. e. a fixed point at infinite distance, regardless of camera movement), or a fixed direction relative to the camera (e.g. directly in front, again regardless of camera movement). Alternately, the FOA may be adjusted at any time under external or program control in order to compensate for actual or predicted camera motion, to simulate camera motion, to filter or smooth camera motion, or to follow a desired target object.
- Fig. 2 shows the FOA process 215 making decisions based not only on external controls 216, but also knowledge of the motion of the system components 226 (typically each camera).
- the system and devices are configured to generate a focus of attention based on inputs from the image and other data utilized in connection with the sensor fusion 210.
- optional motion processing of the information may be carried out on the sensor fusion information, (e.g. , fused video or sensor- fused video).
- the motion processing preferably may be carried out for camera applications where a synthesized viewpoint must be achieved in the presence of camera motion.
- Also indicated in the diagram of Fig. 2 is the possibility of having the FOA process examine raw, partially-corrected, reprojected, or final image data 226 in order to guide viewpoint selection.
- the data examination capability may be implemented for achieving automatic or semi-autonomous target detection, acquisition, and tracking.
- Fig. 2 depicts motion processing 227, which is an optional process. Although motion processing 227 is depicted as a separate process item, according to some embodiments, it may be subsumed by the focus of attention (FOA) 215 mechanism. Motion processing 227 simplifies the FOA implementation by assuming responsibility for operations related to the processing of fused motion sensor information. In Fig. 2, an exemplary illustration of fused sensor information 228 is depicted in conjunction with motion processing 227.
- FOA focus of attention
- motion processing 227 include complex mathematical operations (e.g., Kalman filtering) to improve the quality of IMU data by combining fused measurements from multiple sensors with potentially mixed modalities, and deriving a smoothed or otherwise more desirable trajectory for any single camera or the overall sensor platform from the actual motion - the latter typically representing a stabilized motion that would subsequently lead to a correspondingly stabilized synthesized camera image stream.
- Kalman filtering may also be applied to create a smoothed virtual trajectory from actual motion, and for choosing a smoothed look direction (FOA) in some modes of operation.
- FOA smoothed look direction
- FIG. 5 is an illustration of another exemplary camera embodiment illustrating an implementation where a device 410 comprising a single camera is configured to image video and to produce stabilized video stream.
- the embodiment illustrated in Fig. 5 is similar to the devices and components discussed and illustrated herein and in connection with Fig. 1 , and the other figures.
- the reference to "synchronization" 209 is made instead of "sensor fusion" 210 (referenced in Fig. 2), the embodiment depicted in Fig. 5 representing an implementation where data is obtained is from particular components including a single fisheye lens 122f.
- the device 410 may include or be associated with one or more IMUs.
- a plurality of IMU' s may be associated with each camera to provide information about camera movement, including position and orientation (as well as acceleration and other movement detection data).
- the system, method and devices may be configured to utilize one or more cameras (or lens options).
- One preferred embodiment is configured to use two cameras, one camera is configured with a fisheye lens, and the other camera preferably is a standard or conventional field of view camera that is designated for other views, such as, for use for close-ups.
- the cameras may provide separate independent video imaging, and video streams, may be switched to provide one or the other, may record both streams (including other information such as raw data), and may transmit one or both camera streams.
- the cameras may be configured to save power by operating one camera at a time.
- the standard or conventional camera typically requires less processing and less power than a wider field, fisheye lens camera. This may be directed by the device itself, wherein one or more of the two cameras may be able to detect event information and, based on the event information, operate the wide field camera or the close up camera.
- Event information may include any of the information provided by the sensors, and may include manual inputs that are received by one or more of the cameras (which may be transmitted remotely to the camera, or may be actuated using the camera or camera controls).
- the streams generated by the cameras are stabilized video streams, and may be generated by utilizing the information, such as the sensor data and image data.
- the information such as the sensor data and image data.
- one or more virtual cameras with a focus of attention and field of view, as discussed herein, are generated.
- the image is manipulated to produce a stabilized image (video stream).
- intrinsic corrections to the image information are made to remove the distortion (or other effects of a lens), as discussed herein, by flattening, unwarping or unwrapping.
- the stabilized video from each camera is generated through the virtual camera information being used to construct the stream.
- the device preferably the cameras are configured to process their respective data streams. Alternatively, the cameras may comprise one or more separate lenses and one or more separate image sensors that are configured to operate using device components (e.g., processing components) of a single device.
- the system and methods disclosed herein may be implemented in connection with alternatively configured devices.
- embodiments of the invention may be implemented using circuitry and components typically used in a mobile phone, the invention may be implemented in other devices, including the mobile video cameras shown and discussed herein.
- the stabilization mechanism may be configured to provide high detailed high resolution video from a mobile video camera, and may transmit live video from the field to a remote location.
- Fig. 6 there is illustrated an exemplary embodiment of an image capture device 510 configured having the image stabilization mechanism described herein.
- the capture device 510 is depicted in an embodiment as a video camera that includes a housing 51 1 an image sensor 523 and circuitry 529 for capturing video from the image sensor 523 and from the other sensors of the device 510, including the IMU 516.
- the device or camera 510 preferably includes one or more components for recording audio (such as the sensor or microphone 121 represented in Fig. 1).
- the device or camera image sensor 523 preferably is a single high-resolution sensor with very high resolution (currently at least 8 megapixels) and with a wide field of view (FOV) in two axes.
- the device 510 preferably includes a capture objective or lens 530, and according to preferred embodiments, the lens 530 is a fisheye lens.
- the image capture device 510 may be referred to as a camera (e.g., video camera).
- the image capture device 510 may be used while in motion itself to capture the image of a scene, a moving subject within a scene, or a number of moving subjects.
- the device 510 preferably carries out image capture in a continuous manner and provides video, which may comprise a unit number of frames per time period, such as, for example, frames per second.
- the image capture device 510 may be employed to capture a variety of moving subjects, such as, for example, individuals, vehicles, animals, as well as other objects.
- the IMU preferably may be any suitable IMU, including any of the IMU's discussed and depicted herein.
- the IMU comprises a component that provides three-axis, real-time IMU data including the orientation and motion of the camera 510.
- the camera 510 is configured with circuitry so that the camera motion (including position and orientation) preferably is captured simultaneously with camera video/audio (and in some instances, other sensor information provided by other sensors or input components).
- the camera 510 is provided with circuitry to store and communicate the image data and other sensor data outputs.
- the camera 510 preferably includes circuitry and preferably transmission components to produce and communicate real-time output comprising one or more video streams.
- the one or more output video streams each have 720p resolution (or greater).
- Embodiments of the device 510 may be configured to produce a FOV, suitable for wireless transmission, which may comprise a field of view (FOV) that is more typical or standard than the wide-angle original image.
- the device 510 includes a storage component 525 onto which raw video and IMU data are stored locally, on the device.
- the device 510 may communicate real-time, live video streams from the device (e.g., directly), in instances where the device 510 is operating in a location with insufficient bandwidth for real-time transmission, the stored information may be transmitted, or transmission may resume when suitable
- the device 510 storage of raw data from the sensors including camera image sensors and other device sensors (IMU) are stored as time stamped data, and real-time outputs can be reproduced at any later time from this information.
- raw products, or alternative products such as, for example, one or more synthesized cameras and their respective viewpoints or
- combinations thereof, may be produced from the stored raw data.
- the image capture device 510 has an image sensor 523 that has a high operating resolution, such as for example, 3.5K or 4K.
- the device 510 also includes a processing component, such as the processor 511 which is arranged connected with other components, such as, volatile memory 513, storage 525, an optional GPU 514, a clock 512, and IMU 516 and microphone 521.
- a power supply such as, for example, a rechargeable battery 550, also is provided.
- the IMU 516 provides information that identifies the exact position and orientation of the image capture element, which in this embodiment is the device 510, where the lens 530 is fixed relative to the device body 511.
- the device 510 includes a processing component and software containing instructions to instruct the processing component to carry out manipulations of the sensor information.
- the data from the image sensor and IMU (as well as other sensors) of the device 510 or associated with the device 510 preferably are processed to provide a video stream that may be viewed on a suitable display.
- the display may be provided on the device 510, or remote from the device 510 at a location to which the video is communicated from the device through a suitable network.
- the video produced and transmitted by the device preferably is motion stabilized, high quality video.
- the device 510 is configured with instructions to process the information from the sensors by associating each data item with a timestamp.
- the sensors may be configured to record or ascertain information at a suitable time interval, such as, in the case of video a number of frames per second, (or minute or other time interval), and in the case of the IMU every fraction of a second (or when a change from a stationary position is detected).
- the sensor information preferably is synchronized so that the processing of a video frame or stream coordinates the sensor information inputs at a point in time.
- the IMU and video preferably are registered, either by design or through a calibration process.
- the device 510 may be configured with a suitable calibration routine where components, such as, for example, the IMU and camera sensor ascertain data, and the data is related against one or more known or measurable condition.
- the information obtained by the device or camera 510 preferably is processed for intrinsic correction, which may include processing the data to adjust the information parameters, such as, for example, using use lens/camera calibration parameters to correct for static deviations, including fisheye distortion.
- the device 510 also is configured to collect and store data, and provide images that are adjusted for extrinsic conditions.
- the device 510 may include software that has instructions to direct a processing component to manipulate the data (such as, for example, the fused sensor data, see Fig. 2, 210) to produce extrinsic corrections (see e.g. , Fig. 2, 213), which for example, may include compensation for rolling shutter using IMU data and video data.
- the device 510 preferably is configured to manipulate the information captured by the image sensor 523 and other sensors to reduce the effects of motion blur.
- the device stabilization mechanism may implement the processing of the data to provide corrected or adjusted data that may be used to produce motion stabilized video from a single device 510 or from a plurality of devices 510. In the case of the single device 510, one or more synthesized viewpoints are generated from the information.
- the synthesized viewpoints are generated from processing the information, preferably with the device processing component, to change the optical axis location by designating an optical axis at a different location, which may be a repositioned optical axis (if the camera was proceeding pursuant to a previous optical axis).
- the designated optical axis may be a designated location that is within the FOV of the camera.
- the optical axis may be changed or repositioned (or otherwise designated) to point anywhere within the physical FOV of the camera.
- the designation of a different or new optical axis provides a synthesized camera operating with a FOV of the designated optical axis.
- the device 510 is configured with instructions for manipulating the data by reprojecting the high-resolution data (combined with manipulation of the data for unwarping, e.g. , of fish-eye, or wide field distortion) to produce a smaller FOV.
- the smaller field of view (FOV) generated by the device 510 preferably is produced having a more suitable resolution for the available transmission bandwidth.
- the device processing manipulations may be configured to produce an image resolution based on the bandwidth available.
- the device 510 although a single camera preferably generates one or more, and preferably multiple synthetic cameras. The multiple synthetic cameras may be utilized to provide video image streams from the respective multiple viewpoints available within the field of view.
- the image may also be manipulated to enhance the image by cropping the image to remove potential unfilled frame portions.
- the device 510 stabilization mechanism preferably manipulates the image and video (e.g., video image frames or streams) to provide enhanced video for viewing and displaying.
- the stabilization mechanism preferably has a component configured to evaluate the IMU motion history to keep track of orientation (device camera orientation).
- the device is configured with instructions that instruct the processing component to conduct a comparison of position/orientation (sensed at a point in time) against desired position/orientation. If there is no difference determined, then the processing concludes that the result is that the movement has not taken place or if it has it is not appreciable.
- the IMU preferably does and continues to track changes so that the orientation/position is integrated and derived from these changes. In other words, even where there is no change in camera position at a particular time, the position/orientation information for that time is recorded and used for determining subsequent relative movement.
- the device may be configured to provide thresholds for movement that would be a triggering movement for which an orientation change has occurred.
- the difference between the orientation data between one orientation value such as a first orientation value (OV1), and a second orientation value (OV2) (which may be a next successive orientation value), is determined and that difference between values (OV2-OV1) may be utilized by the device to synthesize the desired viewpoint in realtime.
- orientation values these values also may include camera position, geographic specific location, as well as other information.
- a GPS component or chip of the camera, or associated with the camera preferably may provide geo-specific location information.
- the orientation values may include spatial coordinate data, (x,y,z) coordinates as well as one or more angular components, to determine spatial movement of the camera.
- translational movement which may occur when the camera is moving, may not be included in the stabilization mechanism.
- Some alternate embodiments may employ a translation, but for most embodiments, the translation may be provided by adjunct information comprising a data parameter of geolocation (which may be provided by one or more of the device components or circuitry).
- the device preferably images and stabilizes the video in realtime to generate a real-time video stream.
- a system comprising a plurality of devices, or devices with a plurality of cameras (or lenses) also may provide motion stabilized video.
- the random movements of the device or camera are adjusted to provide an enhanced video. For example, movements of an individual, such as, for example, a law enforcement officer wearing the device configured as a body camera, agitate the camera. The camera therefore experiences undesired movements, which are for the most part random movements.
- the synthesis of the video stream captured preferably is made from the image information captured through a fisheye lens. Extreme movements of the camera, e.g., pointing too high or too low, may still capture the target object within the field of view.
- the camera movement is sensed by the IMU (and other sensors) which handle rotations in all three axes.
- the image information and sensor data also is processed so as to streamline and optimize efficiency. For example, manipulations of the data for unwrapping the images, stabilizing the images, adjusting the images for rolling shutter compensation, carrying out viewpoint synthesis, as well as generating a video stream with the adjustments applied thereto, may be efficiently carried out by a suitably configured device, such as those devices (110, 510).
- a suitably configured device such as those devices (110, 510).
- device processing components and stored instructions provided on the software contained on the device implement the image manipulations from the data to produce a stabilized video output (such as, for example, a live stream of stabilized video).
- the device 510 is configured to generate a target or focus of attention (FOA).
- FOA target or focus of attention
- the device implements a FOA determination that may be based on an estimated law enforcement personnel view— what the individual is looking at based on the individual's motion.
- the device is configured to sense the motion, and the IMU provided information, which preferably may be continuously monitored by the processing component of the device.
- the information may be stored, and, in some embodiments, may be ascertained as a unit number of position information per interval, such as samplings per second. According to other embodiments, the maximum position data is captured and stored.
- the information is stored with a time stamp.
- the processing component may monitor the sensor data to determine when a threshold movement has occurred.
- the device may be configured to detect deliberate sudden turning motion for low-latency changes and differentiate this movement from the random sudden abrupt movements for which correction or adjustment of the video or image is beneficial.
- the device preferably is configured to follow the intended FOA in order to provide a desired field of view, by evaluating and comparing the position or movement information at subsequent time intervals.
- the device preferably processes the movement or other position data rapidly so as to make a rapid determination of whether to maintain or adjust the designated field of view.
- the stabilization mechanism preferably is implemented by the device to adjust the video stream from the captured images (or video) from the camera.
- the device preferably adjusts the video to stabilize the video by manipulating the frame capture field of view and provide a virtual captured image during that interval (however brief or long) of motion turbulence that is an adjusted actual camera image.
- the image preferably is adjusted for turbulent shaking motion, such as that produced by a camera user running with the camera.
- the image and video captured by the device also may be adjusted for horizontal/vertical orientation. Embodiments also may provide remote operations of these features.
- the device may be utilized in connection with other installations where, for example, the camera is used or installed in a fixed orientation, for example, where the supposedly fixed camera may be subject to external forces like wind.
- Embodiments of the device may be configured to have movement capabilities where the lens may move (switch between one or more lenses), or have an externally directed view (e.g. with a joystick from an associated or remotely situated controller) to look around or to move the device or its lens element to follow a target.
- Embodiments of the system, methods and devices preferably may be configured to allow external control and networking allows coordination across multiple devices (or cameras). This may be implemented for increased situational awareness across one or more locations, or for optimizing transmission bandwidth by controlling the particular device and view being captured, based on its location, orientation, or other attribute.
- the device is configured as a camera to record and generate motion stabilized video from a camera being agitated.
- the camera image when shaking, is moving around.
- the camera has a center for the lens and direction in which it points (see e.g., OA and OA' in Figs. 3 A and 4A, respectively), the orientation of the so called look direction changes.
- Motion sensors of the device are configured to keep track of the changes and note the relative movement, i. e. , a change in direction.
- the movement also may be determined from a particular location as well as a relative change.
- the motion sensors tracking the movement may be used to maintain a synthesized look direction.
- the synthesized look direction provides a focus of attention. The corrections are discussed herein and an example is illustrated in Figs. 7A,7B and 8A,8B.
- FIG. 7B shows an image taken with an image capture device configured according to the embodiments of the invention.
- the images in Figs. 7A,7B and 8A,8B are frames of a captured video stream.
- the image capture device is configured as a camera to record video.
- Fig. 7A depicts a standard image on the left (from a video frame), taken with a standard camera.
- the image (from a video frame) on the right (Fig. 7B) illustrates an image recorded with an embodiment of the camera configured according to the invention, such as, for example, the camera 510.
- Figs. 7A shows an image taken with an image capture device configured according to the embodiments of the invention.
- each image represents a frame of video taken at the same time.
- the standard camera and stabilization camera 510 each images the scene through a fisheye lens, respectively provided on each camera.
- the fisheye view distorts the scene.
- the image of Fig. 7B shows the scene with minimal or no distortion.
- the respective cameras that image the scenes shown in Figs. 7A and 7B were moved by shaking and rotating while continuing to record video of the scene.
- Figs. 8A and 8B illustrate the scene taken with the respective cameras under the conditions of movement, which in this example, involve shaking and movement. An indicator of the movement is provided, and as shown in Figs.
- the conditions are roll of 1.33, pitch of 29.48 and yaw of 4.10.
- the respective cameras are imaging the scene from a particular location.
- the scene may be imaged where the cameras are changing their location, such as, for example, to follow a moving subject.
- An exemplary device, such as the camera 510 illustrates the captured images in Figs.
- Figs. 7B and 8B and, though subjected to the same movement conditions as the standard camera providing the respective scene images of Figs. 7A and 8 A, generates an image that exhibits stabilization of the scene.
- the scene images shown in Figs. 7A and 8A show significant departure from the positioning, although the subject has remained substantially or entirely static.
- the movements of the respective cameras are minimized in Figs. 7B and 8B to provide a stabilized image frame, whereas the standard camera shows movement within the scene frame.
- the frames depicted are frames taken from a captured video stream, and represent the stabilization in the images generated in Figs. 7B and 8B.
- the stabilization is generated for the video stream and the camera produces stabilized video.
- the undesired motion is “removed” (stabilized) from the video generated, but also the severe distortion of the fisheye lens has been removed.
- movement of the standard camera would be exhibited as shaking in the other frames captured while the camera is undergoing movement, and in the video generated by the displaying of the standard camera captured frames.
- the present camera such as, for example, the camera 510, is configured to image the scene from the direction of the camera lens.
- a wide field lens is used to capture more of a scene from the same camera location or viewpoint (e.g., to provide an expanded field of view).
- the capture device 510 is configured with a fisheye lens.
- the capture device 510 preferably includes circuitry for controlling the operations of the device 510.
- the circuitry includes a power supply 550, at least one image sensor 523 and may include one or more other sensors. .
- FIGs. 9A and 9B a scene is depicted to illustrate compensating for extrinsic distortion due to camera motion and rolling shutter distortion.
- Figs. 9A and 9B represent the process block 213 of Fig. 5 as an example of extrinsic adjustments that may be made.
- Fig. 9A the scene is depicted and exhibits extrinsic distortion.
- the extrinsic distortion in this example is due to rolling-shutter effect. (Although the entire image depicted in Fig.
- Fig. 9A is subject to this effect, it is most easily seen in the high degree of "leaning" in the vertical lamppost.
- Fig. 9B depicts an enhanced scene with a correction applied to normalize distortion.
- the scene is imaged with a standard field of view lens.
- the view of the scene illustrates certain objects as having a curvature to them due to rolling shutter. This is the extrinsic distortion due to rolling shutter, not the intrinsic distortion due to a fisheye lens.
- This video depicted in Figs. 9A and 9B was recorded without a fisheye, and is provided to demonstrate rolling shutter correction.
- the image on the right, Fig. 9B is generated with the device but manipulated with the device processing components and instructions to process the image information that was captured to result in the image shown in Fig. 9A.
- the extrinsic corrections 213 are applied to manipulate the image
- intrinsic corrections 212 such as the physical characteristics and geometry of the camera, including the lens and sensor
- the intrinsic correction or adjustment is made to normalize, or reduce the distortion.
- the device stabilization mechanism preferably implements stabilization in
- the device preferably manipulates the distorted normalized image data to produce a stabilized image.
- the device image also refers to video.
- the camera device assigns or designates a focus of attention (a look point).
- the camera device is configured to maintain the look point.
- the camera position is moved along with the look point, so the new camera position (the moved position) has a new look point.
- the movements of the camera are detected by the IMU (and possibly other motion sensing components).
- the virtual camera is synthesized and may provide a number of synthetic images or image portions corresponding with the change in camera position or orientation.
- the image and sensor information are ascertained and stored.
- the image information also is processed so that as the camera motion takes place, and the designated look point is moved or disrupted, the camera device implements a look point from one or more synthesized virtual cameras.
- the virtual camera designated look point is used to generate or designate a video portion, such as a frame or portion of a frame, to produce a video that synthesizes from the actual image information and data a corresponding video or portion that provides the designated look point.
- the process may continue for each camera movement or disruption, and provide an output of stabilized video, which may be a stabilized stream of video.
- An image recording device according to the invention may be constructed as shown and described herein.
- the image recording device such as a camera, may comprise a stabilization mechanism having at least one movement sensor for sensing movement and providing movement data, an image sensor disposed to receive an image thereon, and a hardware processor configured with software containing instructions to process movement information comprising movement data from the motion sensor and image data from the image sensor.
- the image recording device has a lens and other components and can capture and record video frames.
- the device through the device sensors, including one or more IMU's or IMU components, identifies changes in position between successive frame captures from information provided from the IMU or other movement sensors.
- the changes in position are assigned a first delta which comprises a position change between a first position and a second position.
- the first position corresponds with a first frame
- the second position corresponds with a second frame.
- the lens of the device has a corresponding focus of attention and a field of view represented on the sensor.
- the device generates one or more virtual cameras synthesized from the information.
- the virtual camera has a first virtual camera focus of attention and a first virtual camera field of view.
- the device also utilizes the processing components and circuitry with stored instructions containing in device software to instruct the processor to carry out an evaluation of the movement information and determine whether the movement meets a threshold that requires a corrective adjustment. Where corrective adjustment is determined to be required, then the device produces an adjusted video stream of video which includes one or more frames or frame portions from the first virtual camera and has the first virtual camera focus of attention for those one or more frames or frame portions.
- the device may continue to monitor camera movement or turbulence, and continue to generate synthesized virtual cameras having a desired look direction, even where the camera has moved from the intended (i. e. , desired) or original look direction. From the synthetic camera
- the motion stabilized video stream is produced, and continues to be generated by the imaging device.
- a plurality of physical cameras are utilized from which are produced a plurality of virtual cameras synthesized from the respective camera image information.
- the system may be configured to utilize multiple cameras, such as the camera devices 510 shown and described herein, to increase effective virtual/available FOV and increase the available angles to be synthesized.
- Embodiments provide camera captures of video, and may store the information in one or more separate databases, as well as a collective database.
- each camera may be configured to provide stabilized video.
- the plurality of cameras provides for a plurality of fields of view.
- the cameras may be coordinated with each other, or with one or more coordinated, to provide a relative image capture location parameter, so that the images captured by one camera field of view may be related to the FOV of another or any other camera.
- the cameras also preferably are configured to generate stabilized video (or images) and may generate the stabilized video from a plurality of video streams from one or more virtual synthesized cameras synthesized from one or the plurality of cameras.
- the imaging system includes a plurality of cameras that image the area surrounding the camera location, which may be an individual, in the case of the camera being worn as a body camera, or object where the camera is carried on an object such as a vehicle.
- the plurality of cameras preferably are recording, and the recording includes one or more, or a continuum of sequences that takes place at the same time.
- information imaged from a plurality of directions may be captured.
- the imaging also may be captured where the cameras (or one or more of them) are shaking, and the subject is in motion.
- the depiction of the optical axis in Figs. 3 A,3B and 4A,4B considers rotation of the camera, but does not depict translation.
- the processing of image information may include steps to eliminate or reduce distortion, correct barrel distortion, or adjust a horizon, such as leveling it, as well as adjusting perspective distortion through manipulation of a vanishing point or other image data.
- Components such as the devices 110, 410, 510 and cameras 122, depict exemplary embodiments for carrying out the method, and comprising a system for producing stabilized video streams.
- the devices and cameras may include global positioning system components, such as GPS location chips, and GHPS location data may be part of a device (or camera) data profile (along with other information as to position, orientation and movement).
- GPS location chips GPS location chips
- GHPS location data may be part of a device (or camera) data profile (along with other information as to position, orientation and movement).
- the features disclosed and shown herein in connection with embodiments may be applied to one or more other embodiments, and one or more features may be combined or provided together.
- the image may be adjusted to provide an isometric viewpoint, but have infinite or increased zoom capability.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Studio Devices (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562271046P | 2015-12-22 | 2015-12-22 | |
US62/271,046 | 2015-12-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017112800A1 true WO2017112800A1 (en) | 2017-06-29 |
Family
ID=59091181
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2016/068093 WO2017112800A1 (en) | 2015-12-22 | 2016-12-21 | Macro image stabilization method, system and devices |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2017112800A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110493203A (en) * | 2019-07-31 | 2019-11-22 | 湖南微算互联信息技术有限公司 | A kind of cloud cell-phone camera head controlling method, system, device and storage medium |
WO2020154196A1 (en) * | 2019-01-22 | 2020-07-30 | Daqri, Llc | Systems and methods for generating composite depth images based on signals from an inertial sensor |
CN111586284A (en) * | 2019-02-19 | 2020-08-25 | 北京小米移动软件有限公司 | Scene recognition prompting method and device |
US10867220B2 (en) | 2019-05-16 | 2020-12-15 | Rpx Corporation | Systems and methods for generating composite sets of data from different sensors |
CN112734653A (en) * | 2020-12-23 | 2021-04-30 | 影石创新科技股份有限公司 | Motion smoothing processing method, device and equipment for video image and storage medium |
CN112740652A (en) * | 2018-09-19 | 2021-04-30 | 高途乐公司 | System and method for stabilizing video |
EP3767945A4 (en) * | 2018-03-16 | 2022-03-16 | Arashi Vision Inc. | Anti-shake method for panoramic video, and portable terminal |
US11696027B2 (en) | 2018-05-18 | 2023-07-04 | Gopro, Inc. | Systems and methods for stabilizing videos |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150163408A1 (en) * | 2013-11-01 | 2015-06-11 | The Lightco Inc. | Methods and apparatus relating to image stabilization |
US20150254825A1 (en) * | 2014-03-07 | 2015-09-10 | Texas Instruments Incorporated | Method, apparatus and system for processing a display from a surround view camera solution |
-
2016
- 2016-12-21 WO PCT/US2016/068093 patent/WO2017112800A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150163408A1 (en) * | 2013-11-01 | 2015-06-11 | The Lightco Inc. | Methods and apparatus relating to image stabilization |
US20150254825A1 (en) * | 2014-03-07 | 2015-09-10 | Texas Instruments Incorporated | Method, apparatus and system for processing a display from a surround view camera solution |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3767945A4 (en) * | 2018-03-16 | 2022-03-16 | Arashi Vision Inc. | Anti-shake method for panoramic video, and portable terminal |
US11696027B2 (en) | 2018-05-18 | 2023-07-04 | Gopro, Inc. | Systems and methods for stabilizing videos |
CN112740652B (en) * | 2018-09-19 | 2022-07-08 | 高途乐公司 | System and method for stabilizing video |
US11979662B2 (en) | 2018-09-19 | 2024-05-07 | Gopro, Inc. | Systems and methods for stabilizing videos |
US11678053B2 (en) | 2018-09-19 | 2023-06-13 | Gopro, Inc. | Systems and methods for stabilizing videos |
CN112740652A (en) * | 2018-09-19 | 2021-04-30 | 高途乐公司 | System and method for stabilizing video |
US11647289B2 (en) | 2018-09-19 | 2023-05-09 | Gopro, Inc. | Systems and methods for stabilizing videos |
WO2020154196A1 (en) * | 2019-01-22 | 2020-07-30 | Daqri, Llc | Systems and methods for generating composite depth images based on signals from an inertial sensor |
US11082607B2 (en) | 2019-01-22 | 2021-08-03 | Facebook Technologies, Llc | Systems and methods for generating composite depth images based on signals from an inertial sensor |
CN113491112A (en) * | 2019-01-22 | 2021-10-08 | 脸谱科技有限责任公司 | System and method for generating a synthesized depth image based on signals from inertial sensors |
CN111586284B (en) * | 2019-02-19 | 2021-11-30 | 北京小米移动软件有限公司 | Scene recognition prompting method and device |
CN111586284A (en) * | 2019-02-19 | 2020-08-25 | 北京小米移动软件有限公司 | Scene recognition prompting method and device |
US11403499B2 (en) | 2019-05-16 | 2022-08-02 | Facebook Technologies, Llc | Systems and methods for generating composite sets of data from different sensors |
US10867220B2 (en) | 2019-05-16 | 2020-12-15 | Rpx Corporation | Systems and methods for generating composite sets of data from different sensors |
CN110493203A (en) * | 2019-07-31 | 2019-11-22 | 湖南微算互联信息技术有限公司 | A kind of cloud cell-phone camera head controlling method, system, device and storage medium |
CN112734653A (en) * | 2020-12-23 | 2021-04-30 | 影石创新科技股份有限公司 | Motion smoothing processing method, device and equipment for video image and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12003692B2 (en) | Systems, methods and apparatus for compressing video content | |
WO2017112800A1 (en) | Macro image stabilization method, system and devices | |
US11647204B2 (en) | Systems and methods for spatially selective video coding | |
US11490054B2 (en) | System and method for adjusting an image for a vehicle mounted camera | |
US11475538B2 (en) | Apparatus and methods for multi-resolution image stitching | |
US10666856B1 (en) | Gaze-directed photography via augmented reality feedback | |
US11671712B2 (en) | Apparatus and methods for image encoding using spatially weighted encoding quality parameters | |
US10404915B1 (en) | Method and system for panoramic video image stabilization | |
JP5659305B2 (en) | Image generating apparatus and image generating method | |
TWI503786B (en) | Mobile device and system for generating panoramic video | |
JP5659304B2 (en) | Image generating apparatus and image generating method | |
US11398008B2 (en) | Systems and methods for modifying image distortion (curvature) for viewing distance in post capture | |
JP5769813B2 (en) | Image generating apparatus and image generating method | |
EP4362485A2 (en) | High dynamic range processing based on angular rate measurements | |
AU2012256370B2 (en) | Panorama processing | |
JP5865388B2 (en) | Image generating apparatus and image generating method | |
US10051180B1 (en) | Method and system for removing an obstructing object in a panoramic image | |
AU2019271924B2 (en) | System and method for adjusting an image for a vehicle mounted camera | |
CN113891000A (en) | Shooting method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16880044 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16880044 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16880044 Country of ref document: EP Kind code of ref document: A1 |