US20130321396A1 - Multi-input free viewpoint video processing pipeline - Google Patents
Multi-input free viewpoint video processing pipeline Download PDFInfo
- Publication number
- US20130321396A1 US20130321396A1 US13/599,170 US201213599170A US2013321396A1 US 20130321396 A1 US20130321396 A1 US 20130321396A1 US 201213599170 A US201213599170 A US 201213599170A US 2013321396 A1 US2013321396 A1 US 2013321396A1
- Authority
- US
- United States
- Prior art keywords
- scene
- proxy
- streams
- vcds
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/04—Texture mapping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/08—Volume rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/10—Geometric effects
- G06T15/20—Perspective computation
- G06T15/205—Image-based rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/111—Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
- H04N13/117—Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation the virtual viewpoint locations being selected by the viewers or determined by viewer tracking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/194—Transmission of image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/239—Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/243—Image signal generators using stereoscopic image cameras using three or more 2D image sensors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/246—Calibration of cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/257—Colour aspects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/142—Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
- H04N7/157—Conference systems defining a virtual conference space and using avatars or agents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/56—Particle system, point based geometry or rendering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2227/00—Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
- H04R2227/005—Audio distribution systems for home, i.e. multi-room use
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
Definitions
- a given video generally includes one or more scenes, where each scene in the video can be either relatively static (e.g., the objects in the scene do not substantially change or move over time) or dynamic (e.g., the objects in the scene substantially change and/or move over time).
- the viewpoint of each scene is chosen by the director when the video is recorded/captured and this viewpoint cannot be controlled or changed by an end user while they are viewing the video.
- the viewpoint of each scene is fixed and cannot be modified when the video is being rendered and displayed.
- a free viewpoint video an end user can interactively control and change their viewpoint of each scene at will while they are viewing the video.
- each end user can interactively request different synthetic (i.e., virtual) viewpoints of each scene on-the-fly when the video is being rendered and displayed.
- Free viewpoint video processing pipeline technique embodiments described herein are generally applicable to generating a free viewpoint video of a scene and presenting it to a user.
- an arrangement of sensors is used to capture the scene, where the arrangement includes a plurality of video capture devices and generates a plurality of streams of sensor data each of which represents the scene from a different geometric perspective.
- the streams of sensor data are then input and calibrated.
- a scene proxy is then generated from the calibrated streams of sensor data, where the scene proxy geometrically describes the scene as a function of time and includes one or more types of geometric proxy data which is matched to a first set of current pipeline conditions in order to maximize the photo-realism of the free viewpoint video that results from the scene proxy at each point in time.
- This scene proxy generation includes the following actions.
- the current pipeline conditions in the first set are periodically analyzed.
- the results of this periodic analysis are then used to select one or more different 3D (three-dimensional) reconstruction methods which are matched to these current pipeline conditions.
- the selected 3D reconstruction methods are then used to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data.
- the 3D reconstructions of the scene and the results of the periodic analysis are then used to generate the scene proxy.
- the scene proxy is input.
- a current synthetic viewpoint of the scene is then generated from the scene proxy, where this current synthetic viewpoint generation maximizes the photo-realism of the current synthetic viewpoint based upon a second set of current pipeline conditions.
- the current synthetic viewpoint of the scene is then displayed.
- the current synthetic viewpoint generation includes the following actions.
- the current pipeline conditions in the second set are periodically analyzed.
- the results of this periodic analysis are then used to select one or more different image-based rendering methods which are matched to these current pipeline conditions.
- the selected image-based rendering methods and the results of the period analysis are then used to generate the current synthetic viewpoint of the scene.
- FIG. 1 is a diagram illustrating an exemplary embodiment, in simplified form, of the various stages in the free viewpoint video processing pipeline.
- FIG. 2 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for generating a free viewpoint video of a scene.
- FIG. 3 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for calibrating the streams of sensor data which are generated by an arrangement of sensors that is being used to capture the scene.
- FIG. 4 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for generating a scene proxy from the calibrated streams of sensor data.
- FIG. 5 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for presenting a free viewpoint video of a scene to an end user.
- FIG. 6 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for generating a current synthetic viewpoint of the scene from the scene proxy.
- FIG. 7 is a diagram illustrating an exemplary embodiment, in simplified form, of a continuum of the various exemplary image-based rendering methods which can be employed by the free viewpoint video processing pipeline technique embodiments described herein.
- FIG. 8 is a diagram illustrating the various degrees of viewpoint navigation freedom that can be supported by the pipeline technique embodiments describe herein.
- FIG. 9 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for using a point cloud 3D (three-dimensional) reconstruction method to generate 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of depth map images of the scene.
- a point cloud 3D (three-dimensional) reconstruction method to generate 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of depth map images of the scene.
- FIG. 10 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for using the point cloud 3D reconstruction method to generate 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of infrared images of the scene.
- FIG. 11 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for using the point cloud 3D reconstruction method to generate 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of color images of the scene.
- FIG. 12 is a diagram illustrating a simplified example of a general-purpose computer system on which various embodiments and elements of the free viewpoint video processing pipeline technique, as described herein, may be implemented.
- the term “sensor” is used herein to refer to any one of a variety of scene-sensing devices which can be used to generate a stream of sensor data that represents a given scene.
- the pipeline technique embodiments described herein employ a plurality of sensors which can be configured in various arrangements to capture a scene, thus allowing a plurality of streams of sensor data to be generated each of which represents the scene from a different geometric perspective.
- Each of the sensors can be any type of video capture device (VCD) (e.g., any type of video camera), or any type of audio capture device, or any combination thereof.
- VCD video capture device
- Each of the sensors can also be either static (i.e., the sensor has a fixed spatial location and a fixed rotational orientation which do not change over time), or moving (i.e., the spatial location and/or rotational orientation of the sensor change over time).
- the pipeline technique embodiments can employ a combination of different types of sensors to capture a given scene.
- baseline is used herein to refer to a ratio of the actual physical distance between a given pair of VCDs to the average of the actual physical distance from each VCD in the pair to the viewpoint of the scene.
- this ratio is larger than a prescribed value the pair of VCDs is referred to herein as a wide baseline stereo pair of VCDs.
- this ratio is smaller than the prescribed value the pair of VCDs is referred to herein as a narrow baseline stereo pair of VCDs.
- the pipeline technique embodiments described herein generally involve an FVV processing pipeline for generating an FVV of a given scene and presenting the FVV to one or more end users.
- the pipeline technique embodiments are advantageous for various reasons including, but not limited to, the following.
- the pipeline technique embodiments create a feeling of immersion for any end user who is viewing a rendering of the captured scene, thus enhancing their viewing experience.
- the pipeline technique embodiments also enable optimal viewpoint navigation for up to six degrees of viewpoint navigation freedom.
- the pipeline technique embodiments described herein do not rely upon having to constrain the FVV processing pipeline in order to produce a desired visual result.
- the pipeline technique embodiments eliminate the need to place constraints on the FVV processing pipeline in order to generate various synthetic viewpoints of the scene which are photo-realistic and thus are free of discernible artifacts.
- the pipeline technique embodiments eliminate having to constrain the arrangement of the sensors that are used to capture the scene. Accordingly, the pipeline technique embodiments are operational with any arrangement of sensors.
- the pipeline technique embodiments also eliminate having to constrain the complexity or composition of the scene that is being captured (e.g., neither the environment(s) in the scene, nor the types of objects in the scene, nor the number of people of in the scene, among other things has to be constrained). Accordingly, the pipeline technique embodiments are operational with any type of scene, including both relatively static and dynamic scenes.
- the pipeline technique embodiments also eliminate having to constrain the number or types of sensors that are used to capture the scene. Accordingly, the pipeline technique embodiments are operational with any number of sensors and all types of sensors.
- the pipeline technique embodiments also eliminate having to constrain the number of degrees of viewpoint navigation freedom that are provided during the rendering and end user viewing of the captured scene. Accordingly, the pipeline technique embodiments can produce visual results having as many as six degrees of viewpoint navigation freedom.
- the pipeline technique embodiments can also produce visual results having just one degree of viewpoint navigation freedom.
- the pipeline technique embodiments described herein do not rely upon having to use a specific 3D (three-dimensional) reconstruction method in the FVV processing pipeline to generate a 3D reconstruction of the captured scene. Accordingly, the pipeline technique embodiments support the use of any one or more 3D reconstruction methods in the pipeline and therefore provide the freedom to use whatever 3D reconstruction method(s) produces the desired visual result (e.g., the highest degree of photo-realism for the particular scene being captured and the desired number of degrees of viewpoint navigation freedom) based on the particular characteristics of the streams of sensor data that are generated by the sensors (e.g., based on factors such as the particular number and types of sensors that are used to capture the scene, and the particular arrangement of these sensors that is used), along with other current pipeline conditions.
- the desired visual result e.g., the highest degree of photo-realism for the particular scene being captured and the desired number of degrees of viewpoint navigation freedom
- the desired visual result e.g., the highest degree of photo-realism for the particular scene being captured and the desired number of degrees of viewpoint
- the pipeline technique embodiments described herein do not rely upon having to use a specific image-based rendering method in the FVV processing pipeline during the rendering and end user viewing of the captured scene. Accordingly, the pipeline technique embodiments support the use of any one or more image-based rendering methods in the pipeline and therefore provide the freedom to use whatever image-based rendering method(s) produces the desired visual result based on the particular characteristics of the streams of sensor data that are generated by the sensors, along with other current pipeline conditions.
- an image-based rendering method that renders a lower fidelity 3D geometric proxy of the captured scene may produce an optimally photo-realistic visual result when the end user's viewpoint is close to the axis of one of the VCDs (such as with billboards).
- a conventional image warping/morphing image-based rendering method may produce an optimally photo-realistic visual result.
- a conventional view interpolation image-based rendering method may produce an optimally photo-realistic visual result.
- a conventional lumigraph or light-field image-based rendering method may produce an optimally photo-realistic visual result.
- the pipeline technique embodiments described herein result in a flexible, robust and commercially viable next generation FVV processing pipeline that meets the needs of today's various creative video producers and editors.
- the pipeline technique embodiments are applicable to various types of video-based media applications such as consumer entertainment (e.g., movies, television shows, and the like) and video-conferencing/telepresence, among others.
- the pipeline technique embodiments support a broad range of features that provide for the capture (i.e., recording), processing, storage, distribution, rendering, and end user viewing of any type of FVV that can be generated.
- Various implementations of the pipeline technique embodiments are possible, where each different implementation supports a different type of FVV. Exemplary types of supported FVV are described in more detail hereafter.
- pipeline technique embodiments described herein allow any one or more parameters in the FVV processing pipeline to be freely modified without introducing artifacts into the FVV that is presented to the one or more end users. This allows the photo-realism of the FVV that is presented to each end user to be maximized (i.e., the artifacts are minimized) regardless of the characteristics of the various sensors that are used to capture the scene, and the characteristics of the various streams of sensor data that are generated by the sensors.
- Exemplary pipeline parameters which can be modified include, but are not limited to, the following.
- the number and types of sensors that are used to capture the scene can be modified.
- the arrangement of the sensors can also be modified. Which if any of the sensors is static and which is moving can also be modified.
- the complexity and composition of the scene can also be modified. Whether the scene is relatively static or dynamic can also be modified.
- the 3D reconstruction methods and image-based rendering methods that are used can also be modified.
- the number of degrees of viewpoint navigation freedom that are provided during the rendering and end user viewing of the captured scene can also be modified.
- FIG. 1 illustrates an exemplary embodiment, in simplified form, of the various stages in the FVV processing pipeline.
- the FVV processing pipeline 100 starts with a capture stage 102 during which, and generally speaking, the following actions take place.
- An arrangement of sensors is used to capture a given scene, where the arrangement includes a plurality of VCDs and generates a plurality of streams of sensor data each of which represents the scene from a different geometric perspective.
- These streams of sensor data are input from the sensors and calibrated in a manner which will be described in more detail hereafter.
- the calibrated streams of sensor data are then output to a processing stage 104 .
- the pipeline technique embodiments described herein support the use of various types, various numbers and various combinations of sensors which can be configured in various arrangements, including both 2D and 3D arrangements, where each of the sensors can be either static or moving.
- the processing stage 104 of the FVV processing pipeline 100 inputs the calibrated streams of sensor data.
- a scene proxy which geometrically describes the captured scene as a function of time is then generated from the calibrated streams of sensor data.
- the scene proxy is then output to a storage and distribution stage 106 .
- the scene proxy includes one or more types of geometric proxy data which is matched to a first set of current conditions in the pipeline 100 , where these conditions are generally associated with the specific implementation of the pipeline technique embodiments that is being used.
- the scene proxy is generated using one or more different 3D reconstruction methods which extract 3D geometric information from the calibrated streams of sensor data.
- the particular 3D reconstruction methods that are used and the particular manner in which the scene proxy is generated are determined based on a periodic analysis of the first set of current conditions.
- the pipeline technique embodiment described herein use automated computer-vision-type 3D reconstruction methods which can operate without human input.
- the storage and distribution stage 106 of the FVV processing pipeline 100 inputs the scene proxy and based on the specific implementation of the pipeline technique embodiments described herein and the related type of FVV that is being processed in the pipeline (i.e., the type of FVV that is being generated and presented to the one or more end users), can either store the scene proxy, or output the scene proxy and distribute it to one or more end users who either are, or will be, viewing the FVV, or both.
- this distribution takes place by transmitting the scene proxy over whatever one or more data communication networks the end users' computing devices are connected to. It will be appreciated that this transmission is implemented in a manner that meets the needs of the specific implementation of the pipeline technique embodiments and the related type of FVV that is being processed in the pipeline 100 .
- a rendering stage 108 of the FVV processing pipeline 100 inputs the scene proxy which is output from the storage and distribution stage 106 .
- a current synthetic viewpoint of the captured scene is then generated from the scene proxy, where this current synthetic viewpoint generation maximizes the photo-realism of the current synthetic viewpoint based upon a second set of current conditions in the pipeline 100 .
- this second set of current conditions may include viewpoint navigation information which is output by a user viewing experience stage 110 , and may also include temporal navigation information which can also be output by the user viewing experience stage.
- the current synthetic viewpoint is then output to the user viewing experience stage 110 .
- the second set of current conditions is also generally associated with the specific implementation of the pipeline technique embodiments that is being used.
- the current synthetic viewpoint of the captured scene is generated using one or more different image-based rendering methods.
- the particular image-based rendering methods that are used and the particular manner in which the current synthetic viewpoint of the captured scene is generated are determined based on a periodic analysis of the second set of current conditions.
- the user viewing experience stage 110 of the FVV processing pipeline 100 generally provides the one or more end users with the ability to view the current synthetic viewpoint of the captured scene on a display device and spatiotemporally navigate/control this viewpoint on-the-fly at will.
- the user viewing experience stage 110 provides each end user with the ability to continuously and interactively navigate/control their viewpoint of the scene that is being displayed on their display device.
- the user viewing experience stage 110 may also provide each end user with the ability to interactively temporally navigate/control the FVV at will.
- the current synthetic viewpoint of the captured scene is input from the rendering stage 108 and displayed on each end user's display device.
- each end user can interactively navigate their viewpoint of the scene and based on this viewpoint navigation the rendering stage 108 will modify the current synthetic viewpoint of the scene accordingly.
- each end user can also interactively temporally control the FVV and based on this temporal control the rendering stage 108 will either temporally pause/stop, or rewind, or fast forward the FVV accordingly.
- the pipeline technique embodiments described herein provide the freedom to use whatever image-based rendering method(s) produces the desired visual result (e.g., produces an optimally photo-realistic visual result) based on the current pipeline conditions.
- each different implementation supports a different type of FVV and a different user viewing experience.
- each of these different implementations differs in terms of the user viewing experience it provides, its latency characteristics (i.e., how rapidly the streams of sensor data have to be processed through the FVV processing pipeline), its storage characteristics, its transmission and related bandwidth characteristics, and the types of computing device hardware it necessitates.
- one implementation of the pipeline technique embodiments described herein supports asynchronous (i.e., non-live) FVV, which corresponds to a situation where the streams of sensor data that are generated by the sensors are pre-captured 102 , then post-processed 104 , and the resulting scene proxy is then stored and can be transmitted in a one-to-many manner (i.e., broadcast) to one or more end users 106 .
- asynchronous FVV i.e., non-live
- an FVV producer to optionally manually “touch-up” the streams of sensor data that are input during the capture stage 102 , and also optionally manually remove any 3D reconstruction artifacts that are introduced in the processing stage 104 .
- This particular implementation is referred to hereafter as the asynchronous FVV implementation.
- Exemplary types of video-based media that work well in the asynchronous FVV implementation include movies, documentaries, sitcoms and other types of television shows, music videos, digital memories, and the like.
- Another exemplary type of video-based media that works well in the asynchronous FVV implementation is the use of special effects technology where synthetic objects are realistically modeled, lit, shaded and added to a pre-captured scene.
- the streams of sensor data generated by the sensors are captured 102 and processed 104 , and the resulting scene proxy is stored and distributed 106 in a manner that supports the particular number of degrees of viewpoint navigation freedom that are being provided in the user viewing experience stage 110 , and also supports the particular geometric proxy that is processed by the image-based rendering method in the rendering stage 108 .
- another implementation of the pipeline technique embodiments described herein supports unidirectional (i.e., one-way) live FVV, which corresponds to a situation where the streams of sensor data that are being generated by the sensors are concurrently captured 102 and processed 104 , and the resulting scene proxy is stored and transmitted in a one-to-many manner on-the-fly (i.e., live) to one or more end users 106 .
- each end user can view 110 the scene live (i.e., each use can view the scene at substantially the same time it is being captured 102 ).
- This particular implementation is referred to hereafter as the unidirectional live FVV implementation.
- Exemplary types of video-based media that work well in the unidirectional live FVV implementation include sporting events, news programs, live concerts, and the like.
- the streams of sensor data generated by the sensors are captured 102 and processed 104 , and the resulting scene proxy is stored and distributed 106 in a manner that supports the particular number of degrees of viewpoint navigation freedom that are being provided in the user viewing experience stage 110 , and also supports the particular geometric proxy that is processed by the image-based rendering method in the rendering stage 108 .
- yet another implementation of the pipeline technique embodiments described herein supports bidirectional (i.e., two-way) live FVV such as that which is associated with various video-conferencing/telepresence applications.
- This particular implementation is referred to hereafter as the bidirectional live FVV implementation.
- This bidirectional live FVV implementation is generally the same as the unidirectional live FVV implementation with the following exception.
- a computing device at each physical location that is participating in a given video-conferencing/telepresence session is able to concurrently capture 102 streams of sensor data that are being generated by sensors which are capturing a local scene and process 104 these locally captured streams of sensor data, store and transmit the resulting local scene proxy in a one-to-many manner on the fly to the other physical locations that are participating in the session 106 , receive a remote scene proxy from each of the remote physical locations that are participating in the session 106 , and render 108 each received proxy.
- the streams of sensor data generated by the sensors are captured 102 and processed 104 , and the resulting local scene proxy is stored and distributed 106 in a manner that supports the particular number of degrees of viewpoint navigation freedom that are being provided in the user viewing experience stage 110 , and also supports the particular geometric proxy that is processed by the image-based rendering method in the rendering stage 108 .
- the pipeline technique embodiments described herein generally employ a plurality of sensors which are configured in a prescribed arrangement to capture a given scene.
- the pipeline technique embodiments are operable with any type of sensor, any number (two or greater) of sensors, any arrangement of sensors (where this arrangement can include a plurality of different geometries and different geometric relationships between the sensors), and any combination of different types of sensors.
- the pipeline technique embodiments are also operable with both static and moving sensors.
- a given sensor can be any type of VCD (examples of which are described in more detail hereafter), or any type of audio capture device (such as a microphone, or the like), or any combination thereof.
- Each VCD generates a stream of video data which includes a stream of images (also known as and referred to herein as frames) of the scene from the specific geometric perspective of the VCD.
- each audio capture device generates a stream of audio data representing the audio emanating from the scene from the specific geometric perspective of the audio capture device.
- VCDs Exemplary types of VCDs that can be employed include, but are not limited to, the following.
- a given VCD can be a conventional visible light video camera which generates a stream of video data that includes a stream of color images of the scene.
- a given VCD can also be a conventional light-field camera (also known as a “plenoptic camera”) which generates a stream of video data that includes a stream of color light-field images of the scene.
- a given VCD can also be a conventional infrared structured-light projector combined with a conventional infrared video camera that is matched to the projector, where this projector/camera combination generates a stream of video data that includes a stream of infrared images of the scene.
- a given VCD can also be a conventional monochromatic video camera which generates a stream of video data that includes a stream of monochrome images of the scene.
- a given VCD can also be a conventional time-of-flight camera which generates a stream of video data that includes both a stream of depth map images of the scene and a stream of color images of the scene.
- the term “color camera” is sometimes used herein to refer to any type of VCD that generates color images of the scene.
- the pipeline technique embodiments described herein generally employ a minimum of one VCD which generates color image data for the scene, along with one or more other VCDs that can be used in combination to generate 3D geometry data for the scene.
- VCD which generates color image data for the scene
- VCDs that can be used in combination to generate 3D geometry data for the scene.
- VCDs it is advantageous to increase the number of sensors being used as the complexity of the scene increases.
- the use of additional VCDs serves to reduce the number of occluded areas within the scene. It may also be advantageous to capture the entire scene using a given arrangement of static VCDs, and at the same time also capture a specific higher complexity region of the scene using one or more additional moving VCDs.
- VCDs In a situation where a large number of VCDs is used to capture a complex scene, different combinations of the VCDs can be used during the processing stage of the FVV processing pipeline (e.g., a situation where a specific VCD is part of both a narrow baseline stereo pair and a different wide baseline stereo pair involving a third VCD).
- FIG. 2 illustrates an exemplary embodiment, in simplified form, of a process for generating an FVV of a scene.
- the process starts in block 200 with using an arrangement of sensors to capture the scene, where the arrangement includes a plurality of VCDs and generates a plurality of streams of sensor data each of which represents the scene from a different geometric perspective.
- the streams of sensor data are then input (block 202 ) and calibrated (block 204 ).
- a given stream of sensor data will include video data whenever the sensor that generated the stream is a VCD.
- a given stream of sensor data will include audio data whenever the sensor that generated the stream is an audio capture device.
- a given stream of sensor data will include both video and audio data whenever the sensor that generated the stream is a combined video and audio capture device.
- various methods can be used to calibrate the streams of sensor data. One such method will now be described in more detail.
- FIG. 3 illustrates an exemplary embodiment, in simplified form, of a process for calibrating the streams of sensor data which are generated by the arrangement of sensors.
- the process starts in block 300 with determining the number of VCDs in the arrangement of sensors that is being used to capture the scene. Intrinsic characteristics of each of the VCDs are then determined (block 302 ).
- Exemplary intrinsic characteristics which can be determined for a given one of the VCDs include one or more of the VCD type, or the VCD's frame rate, or the VCD's shutter speed, or the VCD's mosaic pattern, or the VCD's white balance, or the bit depth and pixel resolution of the images that are generated by the VCD, or the focal length of the VCD's lens, or the principal point of the VCD's lens, or the VCD's skew coefficient, or the distortions of the VCD's lens, or the VCD's field of view, among others. It will be appreciated that knowing such intrinsic characteristics for each of the VCDs allows the FVV processing pipeline to understand the governing physics and optics of each of the VCDs.
- Extrinsic characteristics of each of the VCDs at each point in time during the capture of the scene are also determined (block 304 ).
- Exemplary extrinsic characteristics which can be determined for a given VCD include one or more of the VCD's current rotational orientation (i.e., the direction that the VCD is currently pointing), or the VCD's current spatial location (i.e., the VCD's current location within the arrangement), or whether the VCD is static or moving, or the current geometric relationship between the VCD and each of the other VCDs in the arrangement (i.e., the VCD's current position relative to each of the other VCDs in the arrangement), or the current position of the VCD relative to the scene, or whether or not the VCD is genlocked (i.e., temporally synchronized) with the other VCDs in the arrangement, among others.
- the determination of the intrinsic and extrinsic characteristics of each of the VCDs can be made using various conventional methods, examples of which will be described in more detail hereafter.
- the knowledge of the number of VCDs in the arrangement, and the intrinsic and extrinsic characteristics of each of the VCDs, is then used to temporally and spatially calibrate the streams of sensor data (block 306 ).
- the intrinsic and extrinsic characteristics of each of the VCDs in the arrangement are commonly determined by performing one or more calibration procedures which calibrate the VCDs, where these procedures are specific to the particular types of VCDs that are being used to capture the scene, and the particular number and arrangement of the VCDs.
- the calibration procedures are performed and the streams of sensor data which are generated thereby are input before the scene capture.
- the calibration procedures can be performed and the streams of sensor data which are generated thereby can be input either before or after the scene capture. Exemplary calibration procedures will now be described.
- the VCDs that are being used to capture the scene are genlocked and include a combination of color cameras, VCDs which generate a stream of infrared images of the scene, and one or more time-of-flight cameras, and this combination of cameras is arranged in a static array
- the cameras in the array can be calibrated and the intrinsic and extrinsic characteristics of each of the cameras can be determined in the following manner.
- a stream of calibration data can be input from each of the cameras in the array while a common physical feature (such as a ball, or the like) is internally illuminated with an incandescent light (which is visible to all of the cameras) and moved throughout the scene.
- These streams of calibration data can then be analyzed using conventional methods to determine both an intrinsic and extrinsic calibration matrix for each of the cameras.
- the VCDs that are being used to capture the scene include a plurality of color cameras which are arranged in a static array
- the cameras in the array can be calibrated and the intrinsic and extrinsic characteristics of each of the cameras can be determined in the following manner.
- a stream of calibration data can be input from each camera in the array while it is moved around the scene but in close proximity to its static location (thus allowing each camera in the array to view overlapping parts of the static background of the scene).
- the streams of sensor data can be analyzed using conventional methods to identify features in the scene, and these features can then be used to calibrate the cameras in the array and determine the intrinsic and extrinsic characteristics of each of the cameras by employing a conventional structure-from-motion method.
- each of these moving VCDs can be calibrated and its intrinsic and extrinsic characteristics can be determined at each point in time during the scene capture by using a conventional background model to register and calibrate relevant individual images that were generated by the VCD.
- the VCDs that are being used to capture the scene include a combination of static and moving VCDs
- the VCDs can be calibrated and the intrinsic and extrinsic characteristics of each of the VCDs can be determined by employing conventional multistep calibration procedures.
- the pipeline technique embodiments described herein will both spatially and temporally calibrate the streams of sensor data generated by the VCDs at all points in time during the scene capture before the streams are processed in the processing stage.
- this spatial and temporal calibration can be performed as follows. After the scene is captured and the streams of sensor data representing the scene are input, the streams of sensor data can be analyzed using conventional methods to separate the static and moving elements of the scene.
- the static elements of the scene can then be used to generate a background model. Additionally, the moving elements of the scene can be used to generate a global timeline that encompasses all of the VCDs, and each image in each stream of sensor data is assigned a relative time.
- the intrinsic characteristics of each of the VCDs can be determined by using conventional methods to analyze each of the streams of sensor data.
- the intrinsic characteristics of each of the VCDs can also be determined by reading appropriate hardware parameters directly from each of the VCDs.
- the capture stage is not directly connected to the VCDs but rather the streams of sensor data are pre-recorded and then imported into the capture stage, the number of VCDs and various intrinsic properties of each of the VCDs can be determined by analyzing the streams of sensor data using conventional methods.
- a scene proxy is generated from the calibrated streams of sensor data (block 206 ).
- the scene proxy geometrically describes the scene as a function of time and includes one or more types of geometric proxy data which is matched to a set of current pipeline conditions in order to maximize the photo-realism of the FVV that results from the scene proxy at each point in time. These conditions can be in any one or more of the aforementioned stages of the FVV processing pipeline.
- the scene proxy can also be distributed to the end user by transmitting it over the network to the other computing device (block 210 ).
- FIG. 4 illustrates an exemplary embodiment, in simplified form, of a process for generating the scene proxy from the calibrated streams of senor data. As exemplified in FIG. 4 , the process starts in block 400 with periodically analyzing the set of current pipeline conditions.
- the set of current pipeline conditions can include one or more conditions in the capture stage of the FVV processing pipeline such as the number of VCDs that either is being, or was, used to capture the scene, or one or more of the intrinsic characteristics of each of the VCDs (e.g., the VCD type, among others), or one or more of the extrinsic characteristics of each of the VCDs (e.g., the current position of the VCD relative to the scene, and whether the VCD is static or moving, among others), or the like.
- the number of VCDs that either is being, or was, used to capture the scene or one or more of the intrinsic characteristics of each of the VCDs (e.g., the VCD type, among others), or one or more of the extrinsic characteristics of each of the VCDs (e.g., the current position of the VCD relative to the scene, and whether the VCD is static or moving, among others), or the like.
- the set of current pipeline conditions can also include one or more conditions in the processing stage of the pipeline such as whether the scene proxy is being generated on-the-fly during the rendering and end user viewing of the captured scene, or the scene proxy is being generated asynchronously from the rendering and end user viewing of the scene (i.e., the particular type of FVV that is being generated and the related speed at which the scene proxy has to be generated), or the like.
- the set of current pipeline conditions can also include one or more conditions in the storage and distribution stage of the FVV processing pipeline such as the amount of storage space that is currently available to store the scene proxy, or the network transmission bandwidth that is currently available, or the like.
- the set of current pipeline conditions can also include one or more conditions in the user viewing experience stage of the pipeline such as the type of display device upon which the FVV either is, or will be, viewed, or the particular characteristics of the display device (e.g., one or more of its aspect ratio, or its pixel resolution, or its form factor, among others), or the level of data fidelity that is desired in the free viewpoint video, or the like.
- the results of this analysis are then used to select one or more different 3D reconstruction methods which are matched to the current pipeline conditions (block 402 ).
- the selected 3D reconstruction methods are then used to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data (block 404 ).
- the 3D reconstructions of the scene and the results of the period analysis are then used to generate the scene proxy (block 406 ).
- the actions of blocks 400 , 402 , 404 and 406 are repeated for the duration of the scene (block 408 , No).
- the 3D reconstruction methods which are selected, the types of 3D reconstructions of the scene which are generated, and thus the types of geometric proxy data in the scene proxy can change over time based upon changes in the current pipeline conditions.
- the current pipeline conditions can be analyzed using different periodicities.
- the current pipeline conditions can be analyzed on a frame-by-frame basis (i.e., for each image in the streams of sensor data).
- the current pipeline conditions can be analyzed using a periodicity of a prescribed number of sequential frames, where this number is greater than one.
- the current pipeline conditions can be analyzed using a periodicity of a prescribed period of time.
- the scene proxy will include one or more types of geometric proxy data examples of which include, but are not limited to, the following.
- the scene proxy can include a stream of depth map images of the scene.
- the scene proxy can also include a stream of calibrated point cloud reconstructions of the scene. As is appreciated in the art of 3D reconstruction, these point cloud reconstructions are a low order geometric representation of the scene.
- the scene proxy can also include one or more high order geometric models, where these models can include one or more of planes, or billboards, or existing (i.e., previously created) generic object models (e.g., human body models, or human face models, or clothing models, or furniture models, or the like) which can be either modified, or animated, or both, among others.
- high order geometric models can be advantageously used to fill in occlusions that may exist in the captured scene.
- the scene proxy can also include other high fidelity proxies such as a stream of mesh models of the scene and a corresponding stream of texture maps which define texture data for each of the mesh models, among others.
- the 3D reconstruction methods that are used and the related manner in which the scene proxy is generated are based upon a period analysis (i.e., monitoring) of the various current conditions in the FVV processing pipeline, the 3D reconstruction methods that are used and the resulting types of data in the scene proxy can change over time based on changes in the pipeline conditions.
- the types of 3D reconstruction methods that can be used in these implementations are limited to high speed 3D reconstruction methods.
- the scene proxy that is generated will include a stream of calibrated point cloud reconstructions of the scene, and may also include one or more high order geometric models which can be either modified, or animated, or both.
- the scene proxy that is generated can include both a stream of calibrated point cloud reconstructions of the scene, as well as one or more higher fidelity geometric proxies of the scene (such as when the calibrated point cloud reconstructions of the scene are used to generate a stream of mesh models of the scene, among other possibilities).
- the asynchronous FVV implementation of the pipeline technique embodiments also allows a plurality of 3D reconstruction steps to be used in sequence when generating the scene proxy.
- a stream of calibrated point cloud reconstructions of the scene has been generated, but there are some noisy or error prone stereo matches present in these reconstructions that extend beyond a human silhouette boundary in the scene.
- these noisy or error prone stereo matches can lead to the wrong texture data appearing in the mesh models of the scene, thus resulting in artifacts in the rendered scene.
- These artifacts can be eliminated by running a segmentation process to separate the foreground from the background, and then points outside of the human silhouette can be rejected as outliers.
- FIG. 9 illustrates an exemplary embodiment, in simplified form, of a process for using a point cloud 3D reconstruction method to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of depth map images of the scene (hereafter simply referred to as different depth map image streams).
- the calibrated streams of sensor data include a plurality of different depth map image streams (block 900 , Yes)
- these different depth map image streams are merged into a stream of calibrated point cloud reconstructions of the scene (block 902 ). It is noted that these point cloud reconstructions are unordered and as such, overlaps may exist therein.
- the stream of calibrated point cloud reconstructions of the scene can then optionally be used to generate one or more different high fidelity geometric proxies of the scene (block 904 ).
- the stream of calibrated point cloud reconstructions of the scene can be used to generate a stream of mesh models of the scene, where this mesh model generation can be performed using conventional methods such as Poisson, among others.
- FIG. 10 illustrates an exemplary embodiment, in simplified form, of a process for using the point cloud 3D reconstruction method to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of infrared images of the scene (hereafter simply referred to as different infrared image streams).
- the calibrated streams of sensor data include a plurality of different infrared image streams (block 1000 , Yes)
- the following actions occur. Any narrow baseline stereo pairs of VCDs that exist in the arrangement of sensors and generate pairs of infrared image streams are identified (block 1002 ).
- a first set of different depth map image streams is then created from the pairs of infrared image streams generated by the identified narrow baseline stereo pairs of VCDs (block 1004 ). Any wide baseline stereo pairs of VCDs that exist in the arrangement of sensors and generate pairs of infrared image streams are then identified (block 1006 ).
- a second set of different depth map image streams is then created from the pairs of infrared image streams generated by the identified wide baseline stereo pairs of VCDs (block 1008 ).
- the different depth map image streams in the first set and the second set are then merged into a stream of calibrated point cloud reconstructions of the scene (block 1010 ). It is noted that these point cloud reconstructions are unordered and as such, overlaps may exist therein.
- the stream of calibrated point cloud reconstructions of the scene can then optionally be used to generate one or more different high fidelity geometric proxies of the scene (block 1012 ).
- the stream of calibrated point cloud reconstructions of the scene can be used to generate a stream of mesh models of the scene.
- FIG. 11 illustrates an exemplary embodiment, in simplified form, of a process for using the point cloud 3D reconstruction method to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of color images of the scene (hereafter simply referred to as different color image streams).
- the calibrated streams of sensor data include a plurality of different color image streams (block 1100 , Yes)
- the following actions occur. Any narrow baseline stereo pairs of VCDs that exist in the arrangement of sensors and generate pairs of color image streams are identified (block 1102 ).
- a first set of different depth map image streams is then created from the pairs of color image streams generated by the identified narrow baseline stereo pairs of VCDs (block 1104 ).
- any wide baseline stereo pairs of VCDs that exist in the arrangement of sensors and generate pairs of color image streams are then identified (block 1106 ).
- a second set of different depth map image streams is then created from the pairs of color image streams generated by the identified wide baseline stereo pairs of VCDs (block 1108 ).
- the different depth map image streams in the first set and the second set are then merged into a stream of calibrated point cloud reconstructions of the scene (block 1110 ). It is noted that these point cloud reconstructions are unordered and as such, overlaps may exist therein.
- the stream of calibrated point cloud reconstructions of the scene can then optionally be used to generate one or more different high fidelity geometric proxies of the scene (block 1112 ).
- the stream of calibrated point cloud reconstructions of the scene can be used to generate a stream of mesh models of the scene.
- a given VCD can be in a plurality of narrow baseline stereo pairs of VCDs, and can also be in a plurality of wide baseline stereo pairs of VCDs. This serves to maximize the number of different depth map image streams that are created, which in turn serves to maximize the precision of the scene proxy.
- this section provides an overview description, in simplified form, of several additional implementations of the capture and processing stages 102 and 104 of the FVV processing pipeline 100 . It will be appreciated that the implementations described in this section are merely exemplary. Many other implementations of the capture and processing stages 102 and 104 are also possible which use other types of sensor arrangements and generate other types of scene proxies.
- a circular arrangement of eight genlocked VCDs is used to capture a scene which includes one or more human beings, where each of the VCDs includes a combination of one infrared structured-light projector, two infrared video cameras, and one color camera. Accordingly, the VCDs each generate a different stream of video data which includes both a stereo pair of infrared image streams and a color image stream. As described heretofore, the pair of infrared image streams and the color image stream generated by each VCD are first used to generate different depth map image streams. The different depth map image streams are then merged into a stream of calibrated point cloud reconstructions of the scene.
- a conventional view-dependent texture mapping method which accurately represents specular textures such as skin is then used to extract texture data from the color image stream generated by each VCD and map this texture data to the stream of mesh models of the scene.
- four genlocked visible light video cameras are used to capture a scene which includes one or more human beings, where the cameras are evenly placed around the scene. Accordingly, the cameras each generate a different stream of video data which includes a color image stream.
- An existing 3D geometric model of a human body can be used in the scene proxy as follows. Conventional methods can be used to kinematically articulate the model over time in order to fit (i.e., match) the model to the streams of video data generated by the cameras. The kinematically articulated model can then be colored as follows.
- a conventional view-dependent texture mapping method can be used to extract texture data from the color image stream generated by each camera and map this texture data to the kinematically articulated model.
- three unsynchronized visible light video cameras are used to capture a soccer game, where each of the cameras is moving and is located far from the game (e.g., rather than the spatial location of each of the cameras being fixed to a specified arrangement, each of the cameras is hand held by a different user who is capturing the game while they freely move about). Accordingly, the cameras each generate a different stream of video data which includes a stream of color images of the game.
- Articulated billboards can be used to represent the moving players in the scene proxy of the game as follows. For each stream of video data, conventional methods can be used to generate a segmentation mask for each body part of each player in the stream. Conventional methods can then be used to generate an articulated billboard model of each of the moving players in the game from the appropriate segmentation masks. The articulated billboard model can then be colored as just described.
- This section provides a more detailed description of the rendering and user viewing experience stages of the FVV processing pipeline.
- FIG. 5 illustrates an exemplary embodiment, in simplified form, of a process for presenting an FVV of a scene to an end user.
- the process starts in block 500 with inputting a scene proxy which geometrically describes the scene as a function of time.
- a current synthetic viewpoint of the scene is then generated from the scene proxy, where this current synthetic viewpoint generation maximizes the photo-realism of the current synthetic viewpoint based upon a set of current pipeline conditions (block 502 ). These conditions can be in any one or more of the aforementioned stages of the FVV processing pipeline.
- the current synthetic viewpoint of the scene is then displayed on a display device (block 504 ) so that it can be viewed and navigated by the end user.
- a prescribed number of degrees of viewpoint navigation freedom are provided to the end user, where this number is greater than or equal to one and less than or equal to six.
- FIG. 6 illustrates an exemplary embodiment, in simplified form, of a process for generating the current synthetic viewpoint of the scene from the scene proxy.
- the process starts in block 600 with periodically analyzing the set of current pipeline conditions.
- the set of current pipeline conditions can include one or more conditions in the capture stage of the FVV processing pipeline such as the number of VCDs that either is being, or was, used to capture the scene, or one or more of the intrinsic characteristics of each of the VCDs (e.g., the VCD type, among others), or one or more of the extrinsic characteristics of each of the VCDs (e.g., the positioning of the VCD relative to the scene, and whether the VCD is static or moving, among others), or the complexity and composition of the scene, or whether the scene is relatively static or dynamic, or the like.
- the capture stage of the FVV processing pipeline such as the number of VCDs that either is being, or was, used to capture the scene, or one or more of the intrinsic characteristics of each of the VCDs (e.g.
- the set of current pipeline conditions can also include one or more conditions in the processing stage of the pipeline such as the particular 3D reconstruction methods that are being (or were) used to generate the scene proxy, or the types of geometric proxy data that are in the scene proxy, or the level of data fidelity that is desired in the free viewpoint video, or the like.
- the set of current pipeline conditions can also include one or more conditions in the rendering and user viewing experience stages of the FVV processing pipeline such as the graphics processing capabilities/features that are available in the hardware of the computing device which is being used by a given end user to generate the current synthetic viewpoint of the scene, or the type of display device upon which the current synthetic viewpoint of the scene is being displayed, or the particular characteristics of the display device (described heretofore), or the number of degrees of viewpoint navigation freedom that are being provided to the end user, or the view frustum of the current synthetic viewpoint, or whether or not this computing device includes a natural user interface (and if so, the particular natural user interface modalities that are anticipated to be used by the end user), or the like.
- this computing device includes a natural user interface (and if so, the particular natural user interface modalities that are anticipated to be used by the end user), or the like.
- the set of current pipeline conditions can also include information which is generated by the end user in the user viewing experience stage that specifies desired changes to (i.e., controls) the current synthetic viewpoint of the scene. Such information can include one or more of viewpoint navigation information which is being output by this stage based upon the FVV navigation that is being performed by the end user, or temporal navigation information which may also be output by this stage based upon this FVV navigation.
- the set of current pipeline conditions can also include the type of FVV that is being presented to the end user.
- the results of this analysis are then used to select one or more different image-based rendering methods which are matched to the current pipeline conditions (block 602 ).
- the selected image-based rendering methods and the results of the period analysis are then used to generate the current synthetic viewpoint of the scene (block 604 ).
- the actions of blocks 600 , 602 and 604 are repeated for the duration of the scene (block 606 , No).
- the one or more image-based rendering methods which are used and the current synthetic viewpoint of the scene that is generated can change over time based upon changes in the current pipeline conditions.
- the current pipeline conditions can be analyzed using different periodicities.
- the pipeline technique embodiments described herein can use a wide variety of image-based rendering methods in various combinations, where the particular types of image-based rendering methods that are being used depend upon various current conditions in the FVV processing pipeline.
- the image-based rendering methods that are employed by the pipeline technique embodiments described herein can render novel views (i.e., synthetic viewpoints) of the scene directly from a collection of images in the scene proxy without having to know the scene geometry.
- An overview exemplary image-based rendering methods which can be employed by the pipeline technique embodiments is provided hereafter.
- the pipeline technique embodiments described herein support using any type of display device to view the FVV including, but not limited to, the very small form factor display devices used on conventional smart phones and other types of mobile devices, the small form factor display devices used on conventional tablet computers and netbook computers, the display devices used on conventional laptop computers and personal computers, conventional televisions and 3D televisions, conventional autostereoscopic 3D display devices, conventional head-mounted transparent display devices, and conventional wearable heads-up display devices such as those that are used in virtual reality applications.
- the rendering stage of the FVV processing pipeline will simultaneously generate both left and right current synthetic viewpoints of the scene at an appropriate aspect ratio and resolution in order to create a stereoscopic effect for the end user.
- the rendering stage will generate just a single current synthetic viewpoint.
- the pipeline technique embodiments described herein also support using any type of user interface modality to control the current viewpoint while viewing the FVV including, but not limited to, conventional keyboards, conventional pointing devices (such as a mouse, or a graphics tablet, or the like), and conventional natural user interface modalities (such as voice, or a touch-sensitive display screen, or the head tracking functionality that is integrated into wearable heads-up display devices, or a motion and location sensing device (such as the Microsoft KinectTM (a trademark of Microsoft Corporation), among others), or the like).
- conventional keyboards such as a mouse, or a graphics tablet, or the like
- conventional natural user interface modalities such as voice, or a touch-sensitive display screen, or the head tracking functionality that is integrated into wearable heads-up display devices, or a motion and location sensing device (such as the Microsoft KinectTM (a trademark of Microsoft Corporation), among others), or the like.
- Microsoft KinectTM a trademark of Microsoft Corporation
- the FVV processing pipeline can process the streams of sensor data differently in order to enable different end user viewing experiences based on the particular type(s) of user interface modality that is anticipated to be used by the end user.
- the particular type(s) of user interface modality that is anticipated to be used by the end user.
- all six degrees of viewpoint navigation freedom could be provided to the end user.
- the pipeline technique embodiments if the end user at each physical location that is participating in a given video-conferencing/telepresence session is using the wearable heads-up display device to view and navigate the FVV, then parallax functionality can be implemented in order to provide each end user with an optimally realistic viewing experience when they control/change their viewpoint of the FVV using head movements; the pipeline can also provide for corrected conversational geometry between two end users, thus providing the appearance that both end users are looking directly at each other.
- the rendering stage can optimize the current synthetic viewpoint that is being displayed based on the end user's current spatial location in front of their display device. In this way, the end user's current spatial location can be mapped to the 3D geometry within the FVV.
- FIG. 8 illustrates the various degrees of viewpoint navigation freedom that can be supported by the pipeline technique embodiments describe herein.
- the pipeline technique embodiments generally support spatiotemporal (i.e., space-time) navigation of the FVV. More particularly, the asynchronous FVV, unidirectional live FVV, and bidirectional live FVV implementations described herein can each support spatial viewpoint navigation of the FVV having as many as six degrees of freedom, which can be appropriate when the end user is viewing and navigating an FVV that includes high fidelity geometric information. As exemplified in FIG.
- these six degrees of freedom include viewpoint navigation along the x axis, viewpoint navigation rotationally about the x axis ( ⁇ x), viewpoint navigation along the y axis, viewpoint navigation rotationally about the y axis ( ⁇ y), viewpoint navigation along the z axis, and viewpoint navigation rotationally about the z axis ( ⁇ z).
- the asynchronous FVV, unidirectional live FVV, and bidirectional live FVV implementations can also each support spatial viewpoint navigation of the FVV having just one degree of viewpoint navigation freedom, which can be appropriate when the viewpoint navigation of the FVV is constrained to a straight line that connects the sensors.
- the asynchronous FVV implementation can also support temporal navigation of the FVV.
- a producer or editor of the FVV may want to specify the particular types of viewpoint navigation that are possible at different times during the FVV.
- a movie director may want to confine the end user's viewpoint navigation to a limited area of the scene or a specific axis, but in another scene the director may want to allow the end user to freely navigate their viewpoint throughout the entire area of the scene.
- the current synthetic viewpoint of the scene is generated using one or more image-based rendering methods which are selected based upon a periodic analysis of the aforementioned set of current pipeline conditions. Accordingly, the particular image-based rendering methods that are used can change over time based upon changes in the current pipeline conditions. It will thus be appreciated that in one situation where the scene has a low degree of complexity and the arrangement of sensors which either is being, or was, used to capture the scene are located close to the scene, just a single image-based rendering method may be used to generate the current synthetic viewpoint of the scene.
- a plurality of image-based rendering methods may be used to generate the current synthetic viewpoint of the scene depending on the location of the current viewpoint relative to the scene and the particular types of geometric proxy data that are in the scene proxy.
- FIG. 7 illustrates an exemplary embodiment, in simplified form, of a continuum of the various exemplary image-based rendering methods which can be employed by the pipeline technique embodiments described herein.
- these various image-based rendering methods can be classified into three categories according to the amount and type of scene geometry information that is included in the scene proxy and thus is available to be used in the rendering stage, namely rendering with without scene geometry 706 (i.e., the scene geometry is unknown), rendering with implicit scene geometry 704 (i.e., correspondence), and rendering with explicit scene geometry 702 (which can be either approximate or accurate).
- These categories 702 , 704 and 706 are to be viewed as a continuum 700 rather than strict and discrete categories since it will be appreciated that certain of the image-based rendering methods defy strict categorization.
- having less scene geometry information in the scene proxy will generally decrease the end user's options for navigating the current synthetic viewpoint of the scene (i.e., the synthetic viewpoints will generally be limited to positions between VCDs or near VCDs.
- the lower the VCD density i.e., the smaller the number of VCDs that is used in the arrangement
- the smaller the number of images that is available in the scene proxy the smaller the number of images that is available in the scene proxy, and thus the more scene geometry information that is needed to be available in the scene proxy in order to generate synthetic viewpoints of the scene which are photo-realistic.
- having more scene geometry information in the scene proxy will generally increase the end user's options for navigating the current synthetic viewpoint of the scene (i.e., the synthetic viewpoints can be navigated to positions which are far away from the real VCD viewpoints).
- the scene proxy includes a large number of images but does not include any scene geometry or correspondence information.
- the current synthetic viewpoint of the scene can be generated by using a conventional light-field method, or a conventional lumigraph method, or a conventional concentric mosaics method, among others, to process the scene proxy.
- a conventional light-field method or a conventional lumigraph method, or a conventional concentric mosaics method, among others.
- each of these methods relies on the characterization of the conventional plenoptic function, and constructs a continuous representation of the plenoptic function from the images in the scene proxy.
- the light-field method is generally applicable when the images of the scene are uniformly captured.
- the light-field method generates new images of the scene by appropriately filtering and interpolating the images in the scene proxy.
- the lumigraph method is similar to the light-field method except that the lumigraph method is generally applicable when the images of the scene are not uniformly captured.
- the lumigraph method enhances the rendering performance by applying approximated geometry to compensate for this non-uniform capture.
- the concentric mosaics method is applicable when the arrangement of VCDs is circular.
- Conventional image mosaicing methods can also be used to construct a complete plenoptic function at a fixed viewpoint from an incomplete set of images of the scene.
- the scene proxy does not include explicit scene geometry information, but rather it includes implicit scene geometry information in the form of feature (e.g., point) correspondences between images, where these correspondences can be computed using conventional computer vision methods.
- the current synthetic viewpoint of the scene can be generated by using various conventional transfer methods (such as a conventional view interpolation method, or a conventional view morphing method, among others) to process the scene proxy.
- transfer methods are characterized by the use of a relatively small number of images with the application of geometric constraints (which are either recovered or known a priori) to project image pixels appropriately at a given synthetic viewpoint.
- the view interpolation method generates synthetic viewpoints of the scene by interpolating optical flow between corresponding points.
- the view morphing method generates synthetic viewpoints that reside on a line which links the optical centers of two different VCDs based on point correspondences.
- the scene proxy includes explicit and accurate scene geometry information and a small number of images, where this geometry information can be in form of either depth along known lines-of-sight, or 3D coordinates, among other things.
- this geometry information can be in form of either depth along known lines-of-sight, or 3D coordinates, among other things.
- the current synthetic viewpoint of the scene can be generated by using conventional 3D warping methods, or a conventional layered depth images method, or a conventional layered depth images tree method, or a conventional view-dependent texture mapping method, or a conventional view-dependent geometry method, among others, to process the scene proxy.
- the 3D warping methods can be used when the scene proxy includes both depth map images and color (or monochrome) images of the scene.
- the 3D warping methods can be used to render the image from any nearby point of view by projecting the pixels of the image to their proper 3D locations and then re-projecting them onto a new picture.
- the rendering speed of such 3D warping methods can be increased by using conventional relief texture methods which factor the warping process into a relatively simple pre-warping operation and a conventional texture mapping operation (which may be performed by conventional graphics processing hardware).
- the 3D warping methods can be applied to both traditional perspective images as well as multi-perspective images.
- the view-dependent geometry method was first used in the context of 3D cartoons and trades off geometry and images, and may be used to represent the current synthetic viewpoint of the scene more compactly.
- a conventional texture-mapped models method can also be used to generate the current synthetic viewpoint of the scene.
- the pipeline technique has been described by specific reference to embodiments thereof, it is understood that variations and modifications thereof can be made without departing from the true spirit and scope of the pipeline technique.
- the capture and processing stages of the FVV processing pipeline being implemented on one computing device (or a collection of computing devices), and the rendering and user viewing experience stages of the pipeline being implemented on another computing device(s) which is being used by an end user(s) to view the FVV
- an alternate embodiment of the pipeline technique described herein is possible where the capture, processing, rendering and user viewing experience stages of the pipeline are implemented on a single computing device (i.e., the FVV can be rendered and viewed on the same computing device that is used to input/calibrate the streams of sensor data and generate the scene proxy).
- the sensors can also be a wearable body-suit that provides a stream of depth data.
- FIG. 12 illustrates a simplified example of a general-purpose computer system on which various embodiments and elements of the pipeline technique, as described herein, may be implemented. It is noted that any boxes that are represented by broken or dashed lines in FIG. 12 represent alternate embodiments of the simplified computing device, and that any or all of these alternate embodiments, as described below, may be used in combination with other alternate embodiments that are described throughout this document.
- FIG. 12 shows a general system diagram showing a simplified computing device 1200 .
- Such computing devices can be typically be found in devices having at least some minimum computational capability, including, but not limited to, personal computers (PCs), server computers, handheld computing devices, laptop or mobile computers, communications devices such as cell phones and personal digital assistants (PDAs), multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, and audio or video media players.
- PCs personal computers
- server computers handheld computing devices
- laptop or mobile computers communications devices such as cell phones and personal digital assistants (PDAs), multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, and audio or video media players.
- PDAs personal digital assistants
- the device should have a sufficient computational capability and system memory to enable basic computational operations.
- the computational capability is generally illustrated by one or more processing unit(s) 1210 , and may also include one or more graphics processing units (GPUs) 1215 , either or both in communication with system memory 1220 .
- GPUs graphics processing units
- processing unit(s) 1210 may be specialized microprocessors (such as a digital signal processor (DSP), a very long instruction word (VLIW) processor, a field-programmable gate array (FPGA), or other micro-controller) or can be conventional central processing units (CPUs) having one or more processing cores including, but not limited to, specialized GPU-based cores in a multi-core CPU.
- DSP digital signal processor
- VLIW very long instruction word
- FPGA field-programmable gate array
- CPUs central processing units having one or more processing cores including, but not limited to, specialized GPU-based cores in a multi-core CPU.
- the simplified computing device 1200 of FIG. 12 may also include other components, such as, for example, a communications interface 1230 .
- the simplified computing device 1200 of FIG. 12 may also include one or more conventional computer input devices 1240 (e.g., pointing devices, keyboards, audio (e.g., voice) input/capture devices, video input/capture devices, haptic input devices, devices for receiving wired or wireless data transmissions, and the like).
- the simplified computing device 1200 of FIG. 12 may also include other optional components, such as, for example, one or more conventional computer output devices 1250 (e.g., display device(s) 1255 , audio output devices, video output devices, devices for transmitting wired or wireless data transmissions, and the like).
- Exemplary types of input devices (herein also referred to as user interface modalities) and display devices that are operable with the pipeline technique embodiments described herein have been described heretofore.
- typical communications interfaces 1230 additional types of input and output devices 1240 and 1250 , and storage devices 1260 for general-purpose computers are well known to those skilled in the art, and will not be described in detail herein.
- the simplified computing device 1200 of FIG. 12 may also include a variety of computer readable media.
- Computer readable media can be any available media that can be accessed by the computer 1200 via storage devices 1260 , and includes both volatile and nonvolatile media that is either removable 1270 and/or non-removable 1280 , for storage of information such as computer-readable or computer-executable instructions, data structures, program modules, or other data.
- Computer readable media may include computer storage media and communication media.
- Computer storage media includes, but is not limited to, computer or machine readable media or storage devices such as digital versatile disks (DVDs), compact discs (CDs), floppy disks, tape drives, hard drives, optical drives, solid state memory devices, random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, magnetic cassettes, magnetic tapes, magnetic disk storage, or other magnetic storage devices, or any other device which can be used to store the desired information and which can be accessed by one or more computing devices.
- DVDs digital versatile disks
- CDs compact discs
- floppy disks tape drives
- hard drives optical drives
- solid state memory devices random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, magnetic cassettes, magnetic tapes, magnetic disk storage, or other magnetic storage devices, or any other device which can be used to store the desired information and which can be accessed by one or more computing
- modulated data signal or “carrier wave” generally refer to a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
- communication media includes wired media such as a wired network or direct-wired connection carrying one or more modulated data signals, and wireless media such as acoustic, radio frequency (RF), infrared, laser, and other wireless media for transmitting and/or receiving one or more modulated data signals or carrier waves.
- wired media such as a wired network or direct-wired connection carrying one or more modulated data signals
- wireless media such as acoustic, radio frequency (RF), infrared, laser, and other wireless media for transmitting and/or receiving one or more modulated data signals or carrier waves.
- software, programs, and/or computer program products embodying the some or all of the various embodiments of the pipeline technique described herein, or portions thereof, may be stored, received, transmitted, or read from any desired combination of computer or machine readable media or storage devices and communication media in the form of computer executable instructions or other data structures.
- pipeline technique embodiments described herein may be further described in the general context of computer-executable instructions, such as program modules, being executed by a computing device.
- program modules include routines, programs, objects, components, data structures, and the like, that perform particular tasks or implement particular abstract data types.
- the pipeline technique embodiments may also be practiced in distributed computing environments where tasks are performed by one or more remote processing devices, or within a cloud of one or more devices, that are linked through one or more communications networks.
- program modules may be located in both local and remote computer storage media including media storage devices.
- the aforementioned instructions may be implemented, in part or in whole, as hardware logic circuits, which may or may not include a processor.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Graphics (AREA)
- General Physics & Mathematics (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Processing Or Creating Images (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Image Generation (AREA)
- Image Processing (AREA)
- Telephonic Communication Services (AREA)
- Length Measuring Devices By Optical Means (AREA)
- Studio Devices (AREA)
Abstract
Free viewpoint video of a scene is generated and presented to a user. An arrangement of sensors generates streams of sensor data each of which represents the scene from a different geometric perspective. The sensor data streams are calibrated. A scene proxy is generated from the calibrated sensor data streams. The scene proxy geometrically describes the scene as a function of time and includes one or more types of geometric proxy data which is matched to a first set of current pipeline conditions in order to maximize the photo-realism of the free viewpoint video resulting from the scene proxy at each point in time. A current synthetic viewpoint of the scene is generated from the scene proxy. This viewpoint generation maximizes the photo-realism of the current synthetic viewpoint based upon a second set of current pipeline conditions. The current synthetic viewpoint is displayed.
Description
- This application claims the benefit of and priority to provisional U.S. patent application Ser. No. 61/653,983 filed May 31, 2012.
- A given video generally includes one or more scenes, where each scene in the video can be either relatively static (e.g., the objects in the scene do not substantially change or move over time) or dynamic (e.g., the objects in the scene substantially change and/or move over time). In a traditional video the viewpoint of each scene is chosen by the director when the video is recorded/captured and this viewpoint cannot be controlled or changed by an end user while they are viewing the video. In other words, in a traditional video the viewpoint of each scene is fixed and cannot be modified when the video is being rendered and displayed. In a free viewpoint video an end user can interactively control and change their viewpoint of each scene at will while they are viewing the video. In other words, in a free viewpoint video each end user can interactively request different synthetic (i.e., virtual) viewpoints of each scene on-the-fly when the video is being rendered and displayed.
- This Summary is provided to introduce a selection of concepts, in a simplified form, that are further described hereafter in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
- Free viewpoint video processing pipeline technique embodiments described herein are generally applicable to generating a free viewpoint video of a scene and presenting it to a user. In one exemplary embodiment an arrangement of sensors is used to capture the scene, where the arrangement includes a plurality of video capture devices and generates a plurality of streams of sensor data each of which represents the scene from a different geometric perspective. The streams of sensor data are then input and calibrated. A scene proxy is then generated from the calibrated streams of sensor data, where the scene proxy geometrically describes the scene as a function of time and includes one or more types of geometric proxy data which is matched to a first set of current pipeline conditions in order to maximize the photo-realism of the free viewpoint video that results from the scene proxy at each point in time. This scene proxy generation includes the following actions. The current pipeline conditions in the first set are periodically analyzed. The results of this periodic analysis are then used to select one or more different 3D (three-dimensional) reconstruction methods which are matched to these current pipeline conditions. The selected 3D reconstruction methods are then used to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data. The 3D reconstructions of the scene and the results of the periodic analysis are then used to generate the scene proxy.
- In another exemplary embodiment the scene proxy is input. A current synthetic viewpoint of the scene is then generated from the scene proxy, where this current synthetic viewpoint generation maximizes the photo-realism of the current synthetic viewpoint based upon a second set of current pipeline conditions. The current synthetic viewpoint of the scene is then displayed. The current synthetic viewpoint generation includes the following actions. The current pipeline conditions in the second set are periodically analyzed. The results of this periodic analysis are then used to select one or more different image-based rendering methods which are matched to these current pipeline conditions. The selected image-based rendering methods and the results of the period analysis are then used to generate the current synthetic viewpoint of the scene.
- The specific features, aspects, and advantages of the free viewpoint video processing pipeline technique embodiments described herein will become better understood with regard to the following description, appended claims, and accompanying drawings where:
-
FIG. 1 is a diagram illustrating an exemplary embodiment, in simplified form, of the various stages in the free viewpoint video processing pipeline. -
FIG. 2 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for generating a free viewpoint video of a scene. -
FIG. 3 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for calibrating the streams of sensor data which are generated by an arrangement of sensors that is being used to capture the scene. -
FIG. 4 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for generating a scene proxy from the calibrated streams of sensor data. -
FIG. 5 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for presenting a free viewpoint video of a scene to an end user. -
FIG. 6 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for generating a current synthetic viewpoint of the scene from the scene proxy. -
FIG. 7 is a diagram illustrating an exemplary embodiment, in simplified form, of a continuum of the various exemplary image-based rendering methods which can be employed by the free viewpoint video processing pipeline technique embodiments described herein. -
FIG. 8 is a diagram illustrating the various degrees of viewpoint navigation freedom that can be supported by the pipeline technique embodiments describe herein. -
FIG. 9 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for using apoint cloud 3D (three-dimensional) reconstruction method to generate 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of depth map images of the scene. -
FIG. 10 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for using thepoint cloud 3D reconstruction method to generate 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of infrared images of the scene. -
FIG. 11 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for using thepoint cloud 3D reconstruction method to generate 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of color images of the scene. -
FIG. 12 is a diagram illustrating a simplified example of a general-purpose computer system on which various embodiments and elements of the free viewpoint video processing pipeline technique, as described herein, may be implemented. - In the following description of free viewpoint video (FVV) processing pipeline technique embodiments (hereafter simply referred to as pipeline technique embodiments) reference is made to the accompanying drawings which form a part hereof, and in which are shown, by way of illustration, specific embodiments in which the pipeline technique can be practiced. It is understood that other embodiments can be utilized and structural changes can be made without departing from the scope of the pipeline technique embodiments.
- It is also noted that for the sake of clarity specific terminology will be resorted to in describing the pipeline technique embodiments described herein and it is not intended for these embodiments to be limited to the specific terms so chosen. Furthermore, it is to be understood that each specific term includes all its technical equivalents that operate in a broadly similar manner to achieve a similar purpose. Reference herein to “one embodiment”, or “another embodiment”, or an “exemplary embodiment”, or an “alternate embodiment”, or “one implementation”, or “another implementation”, or an “exemplary implementation”, or an “alternate implementation” means that a particular feature, a particular structure, or particular characteristics described in connection with the embodiment or implementation can be included in at least one embodiment of the pipeline technique. The appearances of the phrases “in one embodiment”, “in another embodiment”, “in an exemplary embodiment”, “in an alternate embodiment”, “in one implementation”, “in another implementation”, “in an exemplary implementation”, and “in an alternate implementation” in various places in the specification are not necessarily all referring to the same embodiment or implementation, nor are separate or alternative embodiments/implementations mutually exclusive of other embodiments/implementations. Yet furthermore, the order of process flow representing one or more embodiments or implementations of the pipeline technique does not inherently indicate any particular order not imply any limitations of the pipeline technique.
- The term “sensor” is used herein to refer to any one of a variety of scene-sensing devices which can be used to generate a stream of sensor data that represents a given scene. Generally speaking and as will be described in more detail hereafter, the pipeline technique embodiments described herein employ a plurality of sensors which can be configured in various arrangements to capture a scene, thus allowing a plurality of streams of sensor data to be generated each of which represents the scene from a different geometric perspective. Each of the sensors can be any type of video capture device (VCD) (e.g., any type of video camera), or any type of audio capture device, or any combination thereof. Each of the sensors can also be either static (i.e., the sensor has a fixed spatial location and a fixed rotational orientation which do not change over time), or moving (i.e., the spatial location and/or rotational orientation of the sensor change over time). The pipeline technique embodiments can employ a combination of different types of sensors to capture a given scene.
- The term “baseline” is used herein to refer to a ratio of the actual physical distance between a given pair of VCDs to the average of the actual physical distance from each VCD in the pair to the viewpoint of the scene. When this ratio is larger than a prescribed value the pair of VCDs is referred to herein as a wide baseline stereo pair of VCDs. When this ratio is smaller than the prescribed value the pair of VCDs is referred to herein as a narrow baseline stereo pair of VCDs.
- The pipeline technique embodiments described herein generally involve an FVV processing pipeline for generating an FVV of a given scene and presenting the FVV to one or more end users. The pipeline technique embodiments are advantageous for various reasons including, but not limited to, the following. Generally speaking and as will be appreciated from the more detailed description that follows, the pipeline technique embodiments create a feeling of immersion for any end user who is viewing a rendering of the captured scene, thus enhancing their viewing experience. The pipeline technique embodiments also enable optimal viewpoint navigation for up to six degrees of viewpoint navigation freedom.
- Furthermore, the pipeline technique embodiments described herein do not rely upon having to constrain the FVV processing pipeline in order to produce a desired visual result. In other words, the pipeline technique embodiments eliminate the need to place constraints on the FVV processing pipeline in order to generate various synthetic viewpoints of the scene which are photo-realistic and thus are free of discernible artifacts. More particularly and by way of example but not limitation, the pipeline technique embodiments eliminate having to constrain the arrangement of the sensors that are used to capture the scene. Accordingly, the pipeline technique embodiments are operational with any arrangement of sensors. The pipeline technique embodiments also eliminate having to constrain the complexity or composition of the scene that is being captured (e.g., neither the environment(s) in the scene, nor the types of objects in the scene, nor the number of people of in the scene, among other things has to be constrained). Accordingly, the pipeline technique embodiments are operational with any type of scene, including both relatively static and dynamic scenes. The pipeline technique embodiments also eliminate having to constrain the number or types of sensors that are used to capture the scene. Accordingly, the pipeline technique embodiments are operational with any number of sensors and all types of sensors. The pipeline technique embodiments also eliminate having to constrain the number of degrees of viewpoint navigation freedom that are provided during the rendering and end user viewing of the captured scene. Accordingly, the pipeline technique embodiments can produce visual results having as many as six degrees of viewpoint navigation freedom. The pipeline technique embodiments can also produce visual results having just one degree of viewpoint navigation freedom.
- Yet, furthermore, the pipeline technique embodiments described herein do not rely upon having to use a specific 3D (three-dimensional) reconstruction method in the FVV processing pipeline to generate a 3D reconstruction of the captured scene. Accordingly, the pipeline technique embodiments support the use of any one or more 3D reconstruction methods in the pipeline and therefore provide the freedom to use whatever 3D reconstruction method(s) produces the desired visual result (e.g., the highest degree of photo-realism for the particular scene being captured and the desired number of degrees of viewpoint navigation freedom) based on the particular characteristics of the streams of sensor data that are generated by the sensors (e.g., based on factors such as the particular number and types of sensors that are used to capture the scene, and the particular arrangement of these sensors that is used), along with other current pipeline conditions.
- Yet furthermore, the pipeline technique embodiments described herein do not rely upon having to use a specific image-based rendering method in the FVV processing pipeline during the rendering and end user viewing of the captured scene. Accordingly, the pipeline technique embodiments support the use of any one or more image-based rendering methods in the pipeline and therefore provide the freedom to use whatever image-based rendering method(s) produces the desired visual result based on the particular characteristics of the streams of sensor data that are generated by the sensors, along with other current pipeline conditions. By way of example but not limitation, in an exemplary situation where just two VCDs are used to capture a scene, an image-based rendering method that renders a
lower fidelity 3D geometric proxy of the captured scene (herein simply referred to as a scene proxy) may produce an optimally photo-realistic visual result when the end user's viewpoint is close to the axis of one of the VCDs (such as with billboards). In another exemplary situation where a large number of VCDs configured in a circular arrangement are used to capture a scene, a conventional image warping/morphing image-based rendering method may produce an optimally photo-realistic visual result. In yet another exemplary situation where a large number of VCDs configured in either a 2D (two-dimensional) or 3D array arrangement are used to capture a scene, a conventional view interpolation image-based rendering method may produce an optimally photo-realistic visual result. In yet another exemplary situation where an even larger number of VCDs is used, a conventional lumigraph or light-field image-based rendering method may produce an optimally photo-realistic visual result. - It will thus be appreciated that the pipeline technique embodiments described herein result in a flexible, robust and commercially viable next generation FVV processing pipeline that meets the needs of today's various creative video producers and editors. By way of example but not limitation and as will be appreciated from the more detailed description that follows, the pipeline technique embodiments are applicable to various types of video-based media applications such as consumer entertainment (e.g., movies, television shows, and the like) and video-conferencing/telepresence, among others. The pipeline technique embodiments support a broad range of features that provide for the capture (i.e., recording), processing, storage, distribution, rendering, and end user viewing of any type of FVV that can be generated. Various implementations of the pipeline technique embodiments are possible, where each different implementation supports a different type of FVV. Exemplary types of supported FVV are described in more detail hereafter.
- Additionally, the pipeline technique embodiments described herein allow any one or more parameters in the FVV processing pipeline to be freely modified without introducing artifacts into the FVV that is presented to the one or more end users. This allows the photo-realism of the FVV that is presented to each end user to be maximized (i.e., the artifacts are minimized) regardless of the characteristics of the various sensors that are used to capture the scene, and the characteristics of the various streams of sensor data that are generated by the sensors. Exemplary pipeline parameters which can be modified include, but are not limited to, the following. The number and types of sensors that are used to capture the scene can be modified. The arrangement of the sensors can also be modified. Which if any of the sensors is static and which is moving can also be modified. The complexity and composition of the scene can also be modified. Whether the scene is relatively static or dynamic can also be modified. The 3D reconstruction methods and image-based rendering methods that are used can also be modified. The number of degrees of viewpoint navigation freedom that are provided during the rendering and end user viewing of the captured scene can also be modified.
-
FIG. 1 illustrates an exemplary embodiment, in simplified form, of the various stages in the FVV processing pipeline. As exemplified inFIG. 1 , theFVV processing pipeline 100 starts with acapture stage 102 during which, and generally speaking, the following actions take place. An arrangement of sensors is used to capture a given scene, where the arrangement includes a plurality of VCDs and generates a plurality of streams of sensor data each of which represents the scene from a different geometric perspective. These streams of sensor data are input from the sensors and calibrated in a manner which will be described in more detail hereafter. The calibrated streams of sensor data are then output to aprocessing stage 104. As described heretofore, the pipeline technique embodiments described herein support the use of various types, various numbers and various combinations of sensors which can be configured in various arrangements, including both 2D and 3D arrangements, where each of the sensors can be either static or moving. - Referring again to
FIG. 1 , theprocessing stage 104 of theFVV processing pipeline 100 inputs the calibrated streams of sensor data. A scene proxy which geometrically describes the captured scene as a function of time is then generated from the calibrated streams of sensor data. The scene proxy is then output to a storage anddistribution stage 106. As will be described in more detail hereafter, the scene proxy includes one or more types of geometric proxy data which is matched to a first set of current conditions in thepipeline 100, where these conditions are generally associated with the specific implementation of the pipeline technique embodiments that is being used. As will also be described in more detail hereafter, the scene proxy is generated using one or more different 3D reconstruction methods which extract 3D geometric information from the calibrated streams of sensor data. The particular 3D reconstruction methods that are used and the particular manner in which the scene proxy is generated are determined based on a periodic analysis of the first set of current conditions. As will be appreciated from the more detailed description that follows, unlike the 3D reconstruction methods that are used in the arts of video gaming and special effects, the pipeline technique embodiment described herein use automated computer-vision-type 3D reconstruction methods which can operate without human input. - Referring again to
FIG. 1 , the storage anddistribution stage 106 of theFVV processing pipeline 100 inputs the scene proxy and based on the specific implementation of the pipeline technique embodiments described herein and the related type of FVV that is being processed in the pipeline (i.e., the type of FVV that is being generated and presented to the one or more end users), can either store the scene proxy, or output the scene proxy and distribute it to one or more end users who either are, or will be, viewing the FVV, or both. In an exemplary embodiment of the pipeline technique described herein where the capture and processingstages pipeline 100. - Referring again to
FIG. 1 , arendering stage 108 of theFVV processing pipeline 100 inputs the scene proxy which is output from the storage anddistribution stage 106. A current synthetic viewpoint of the captured scene is then generated from the scene proxy, where this current synthetic viewpoint generation maximizes the photo-realism of the current synthetic viewpoint based upon a second set of current conditions in thepipeline 100. As will be described in more detail hereafter, this second set of current conditions may include viewpoint navigation information which is output by a userviewing experience stage 110, and may also include temporal navigation information which can also be output by the user viewing experience stage. The current synthetic viewpoint is then output to the userviewing experience stage 110. In addition to including this viewpoint and temporal navigation information, the second set of current conditions is also generally associated with the specific implementation of the pipeline technique embodiments that is being used. Generally speaking, the current synthetic viewpoint of the captured scene is generated using one or more different image-based rendering methods. The particular image-based rendering methods that are used and the particular manner in which the current synthetic viewpoint of the captured scene is generated are determined based on a periodic analysis of the second set of current conditions. - Referring again to
FIG. 1 , the userviewing experience stage 110 of theFVV processing pipeline 100 generally provides the one or more end users with the ability to view the current synthetic viewpoint of the captured scene on a display device and spatiotemporally navigate/control this viewpoint on-the-fly at will. In other words, the userviewing experience stage 110 provides each end user with the ability to continuously and interactively navigate/control their viewpoint of the scene that is being displayed on their display device. The userviewing experience stage 110 may also provide each end user with the ability to interactively temporally navigate/control the FVV at will. During the userviewing experience stage 110 the current synthetic viewpoint of the captured scene is input from therendering stage 108 and displayed on each end user's display device. Each end user can interactively navigate their viewpoint of the scene and based on this viewpoint navigation therendering stage 108 will modify the current synthetic viewpoint of the scene accordingly. In situations where the capture and processingstages rendering stage 108 will either temporally pause/stop, or rewind, or fast forward the FVV accordingly. As described heretofore, the pipeline technique embodiments described herein provide the freedom to use whatever image-based rendering method(s) produces the desired visual result (e.g., produces an optimally photo-realistic visual result) based on the current pipeline conditions. - As noted heretofore, various implementations of the pipeline technique embodiments described herein are possible, where each different implementation supports a different type of FVV and a different user viewing experience. As will now be described in more detail, each of these different implementations differs in terms of the user viewing experience it provides, its latency characteristics (i.e., how rapidly the streams of sensor data have to be processed through the FVV processing pipeline), its storage characteristics, its transmission and related bandwidth characteristics, and the types of computing device hardware it necessitates.
- Referring again to
FIG. 1 , one implementation of the pipeline technique embodiments described herein supports asynchronous (i.e., non-live) FVV, which corresponds to a situation where the streams of sensor data that are generated by the sensors are pre-captured 102, then post-processed 104, and the resulting scene proxy is then stored and can be transmitted in a one-to-many manner (i.e., broadcast) to one ormore end users 106. As such, there is effectively an unlimited amount of time available for theprocessing stage 104. This allows an FVV producer to optionally manually “touch-up” the streams of sensor data that are input during thecapture stage 102, and also optionally manually remove any 3D reconstruction artifacts that are introduced in theprocessing stage 104. This particular implementation is referred to hereafter as the asynchronous FVV implementation. Exemplary types of video-based media that work well in the asynchronous FVV implementation include movies, documentaries, sitcoms and other types of television shows, music videos, digital memories, and the like. Another exemplary type of video-based media that works well in the asynchronous FVV implementation is the use of special effects technology where synthetic objects are realistically modeled, lit, shaded and added to a pre-captured scene. It will be appreciated that the streams of sensor data generated by the sensors are captured 102 and processed 104, and the resulting scene proxy is stored and distributed 106 in a manner that supports the particular number of degrees of viewpoint navigation freedom that are being provided in the userviewing experience stage 110, and also supports the particular geometric proxy that is processed by the image-based rendering method in therendering stage 108. - Referring again to
FIG. 1 , another implementation of the pipeline technique embodiments described herein supports unidirectional (i.e., one-way) live FVV, which corresponds to a situation where the streams of sensor data that are being generated by the sensors are concurrently captured 102 and processed 104, and the resulting scene proxy is stored and transmitted in a one-to-many manner on-the-fly (i.e., live) to one ormore end users 106. As such, each end user can view 110 the scene live (i.e., each use can view the scene at substantially the same time it is being captured 102). This particular implementation is referred to hereafter as the unidirectional live FVV implementation. Exemplary types of video-based media that work well in the unidirectional live FVV implementation include sporting events, news programs, live concerts, and the like. As with the asynchronous FVV implementation, the streams of sensor data generated by the sensors are captured 102 and processed 104, and the resulting scene proxy is stored and distributed 106 in a manner that supports the particular number of degrees of viewpoint navigation freedom that are being provided in the userviewing experience stage 110, and also supports the particular geometric proxy that is processed by the image-based rendering method in therendering stage 108. - Referring again to
FIG. 1 , yet another implementation of the pipeline technique embodiments described herein supports bidirectional (i.e., two-way) live FVV such as that which is associated with various video-conferencing/telepresence applications. This particular implementation is referred to hereafter as the bidirectional live FVV implementation. This bidirectional live FVV implementation is generally the same as the unidirectional live FVV implementation with the following exception. In the bidirectional live FVV implementation a computing device at each physical location that is participating in a given video-conferencing/telepresence session is able to concurrently capture 102 streams of sensor data that are being generated by sensors which are capturing a local scene andprocess 104 these locally captured streams of sensor data, store and transmit the resulting local scene proxy in a one-to-many manner on the fly to the other physical locations that are participating in thesession 106, receive a remote scene proxy from each of the remote physical locations that are participating in thesession 106, and render 108 each received proxy. As with the asynchronous FVV implementation, the streams of sensor data generated by the sensors are captured 102 and processed 104, and the resulting local scene proxy is stored and distributed 106 in a manner that supports the particular number of degrees of viewpoint navigation freedom that are being provided in the userviewing experience stage 110, and also supports the particular geometric proxy that is processed by the image-based rendering method in therendering stage 108. - Referring again to
FIG. 1 , it will be appreciated that in the unidirectional and bidirectional live FVV implementations of the pipeline technique embodiments described herein in order for an end user to be able to view the scene live, the capture, processing, storage and distribution, rendering, and user viewing experience stages 102/104/106/108/110 have to be completed within a very short period of time. - This section provides a more detailed description of the capture and processing stages of the FVV processing pipeline. The pipeline technique embodiments described herein generally employ a plurality of sensors which are configured in a prescribed arrangement to capture a given scene. The pipeline technique embodiments are operable with any type of sensor, any number (two or greater) of sensors, any arrangement of sensors (where this arrangement can include a plurality of different geometries and different geometric relationships between the sensors), and any combination of different types of sensors. The pipeline technique embodiments are also operable with both static and moving sensors. A given sensor can be any type of VCD (examples of which are described in more detail hereafter), or any type of audio capture device (such as a microphone, or the like), or any combination thereof. Each VCD generates a stream of video data which includes a stream of images (also known as and referred to herein as frames) of the scene from the specific geometric perspective of the VCD. Similarly, each audio capture device generates a stream of audio data representing the audio emanating from the scene from the specific geometric perspective of the audio capture device.
- Exemplary types of VCDs that can be employed include, but are not limited to, the following. A given VCD can be a conventional visible light video camera which generates a stream of video data that includes a stream of color images of the scene. A given VCD can also be a conventional light-field camera (also known as a “plenoptic camera”) which generates a stream of video data that includes a stream of color light-field images of the scene. A given VCD can also be a conventional infrared structured-light projector combined with a conventional infrared video camera that is matched to the projector, where this projector/camera combination generates a stream of video data that includes a stream of infrared images of the scene. This projector/camera combination is also known as a “structured-
light 3D scanner”. A given VCD can also be a conventional monochromatic video camera which generates a stream of video data that includes a stream of monochrome images of the scene. A given VCD can also be a conventional time-of-flight camera which generates a stream of video data that includes both a stream of depth map images of the scene and a stream of color images of the scene. For simplicity sake, the term “color camera” is sometimes used herein to refer to any type of VCD that generates color images of the scene. - It will be appreciated that variability in factors such as the composition and complexity of a given scene, and each end user's viewpoint navigation during the user viewing experience stage of the FVV processing pipeline, among other factors, can impact the determination of how many sensors to use to capture the scene, the particular type(s) of sensors to use, and the particular arrangement of the sensors to use. The pipeline technique embodiments described herein generally employ a minimum of one VCD which generates color image data for the scene, along with one or more other VCDs that can be used in combination to generate 3D geometry data for the scene. In situations where an outdoor scene is being captured or the sensors are located far from the scene, it is advantageous to capture the scene using both a wide baseline stereo pair of color cameras and a narrow baseline stereo pair of color cameras. In situations where an indoor scene is being captured, it is advantageous to capture the scene using a narrow baseline stereo pair of VCDs both of which generate video data that includes a stream of infrared images of the scene in order to eliminate the dependency on scene lighting variables.
- Generally speaking, it is advantageous to increase the number of sensors being used as the complexity of the scene increases. In other words, as the scene becomes more complex (e.g., as additional people are added to the scene), the use of additional VCDs serves to reduce the number of occluded areas within the scene. It may also be advantageous to capture the entire scene using a given arrangement of static VCDs, and at the same time also capture a specific higher complexity region of the scene using one or more additional moving VCDs. In a situation where a large number of VCDs is used to capture a complex scene, different combinations of the VCDs can be used during the processing stage of the FVV processing pipeline (e.g., a situation where a specific VCD is part of both a narrow baseline stereo pair and a different wide baseline stereo pair involving a third VCD).
-
FIG. 2 illustrates an exemplary embodiment, in simplified form, of a process for generating an FVV of a scene. As exemplified inFIG. 2 , the process starts inblock 200 with using an arrangement of sensors to capture the scene, where the arrangement includes a plurality of VCDs and generates a plurality of streams of sensor data each of which represents the scene from a different geometric perspective. The streams of sensor data are then input (block 202) and calibrated (block 204). It will be appreciated that a given stream of sensor data will include video data whenever the sensor that generated the stream is a VCD. A given stream of sensor data will include audio data whenever the sensor that generated the stream is an audio capture device. A given stream of sensor data will include both video and audio data whenever the sensor that generated the stream is a combined video and audio capture device. It will also be appreciated that various methods can be used to calibrate the streams of sensor data. One such method will now be described in more detail. -
FIG. 3 illustrates an exemplary embodiment, in simplified form, of a process for calibrating the streams of sensor data which are generated by the arrangement of sensors. As exemplified inFIG. 3 , the process starts inblock 300 with determining the number of VCDs in the arrangement of sensors that is being used to capture the scene. Intrinsic characteristics of each of the VCDs are then determined (block 302). Exemplary intrinsic characteristics which can be determined for a given one of the VCDs include one or more of the VCD type, or the VCD's frame rate, or the VCD's shutter speed, or the VCD's mosaic pattern, or the VCD's white balance, or the bit depth and pixel resolution of the images that are generated by the VCD, or the focal length of the VCD's lens, or the principal point of the VCD's lens, or the VCD's skew coefficient, or the distortions of the VCD's lens, or the VCD's field of view, among others. It will be appreciated that knowing such intrinsic characteristics for each of the VCDs allows the FVV processing pipeline to understand the governing physics and optics of each of the VCDs. Extrinsic characteristics of each of the VCDs at each point in time during the capture of the scene are also determined (block 304). Exemplary extrinsic characteristics which can be determined for a given VCD include one or more of the VCD's current rotational orientation (i.e., the direction that the VCD is currently pointing), or the VCD's current spatial location (i.e., the VCD's current location within the arrangement), or whether the VCD is static or moving, or the current geometric relationship between the VCD and each of the other VCDs in the arrangement (i.e., the VCD's current position relative to each of the other VCDs in the arrangement), or the current position of the VCD relative to the scene, or whether or not the VCD is genlocked (i.e., temporally synchronized) with the other VCDs in the arrangement, among others. The determination of the intrinsic and extrinsic characteristics of each of the VCDs can be made using various conventional methods, examples of which will be described in more detail hereafter. The knowledge of the number of VCDs in the arrangement, and the intrinsic and extrinsic characteristics of each of the VCDs, is then used to temporally and spatially calibrate the streams of sensor data (block 306). - As is appreciated in the art of video recording, the intrinsic and extrinsic characteristics of each of the VCDs in the arrangement are commonly determined by performing one or more calibration procedures which calibrate the VCDs, where these procedures are specific to the particular types of VCDs that are being used to capture the scene, and the particular number and arrangement of the VCDs. In the unidirectional and bidirectional live FVV implementations of the pipeline technique embodiments described herein the calibration procedures are performed and the streams of sensor data which are generated thereby are input before the scene capture. In the asynchronous FVV implementation of the pipeline technique embodiments the calibration procedures can be performed and the streams of sensor data which are generated thereby can be input either before or after the scene capture. Exemplary calibration procedures will now be described.
- In a situation where the VCDs that are being used to capture the scene are genlocked and include a combination of color cameras, VCDs which generate a stream of infrared images of the scene, and one or more time-of-flight cameras, and this combination of cameras is arranged in a static array, the cameras in the array can be calibrated and the intrinsic and extrinsic characteristics of each of the cameras can be determined in the following manner. A stream of calibration data can be input from each of the cameras in the array while a common physical feature (such as a ball, or the like) is internally illuminated with an incandescent light (which is visible to all of the cameras) and moved throughout the scene. These streams of calibration data can then be analyzed using conventional methods to determine both an intrinsic and extrinsic calibration matrix for each of the cameras.
- In another situation where the VCDs that are being used to capture the scene include a plurality of color cameras which are arranged in a static array, the cameras in the array can be calibrated and the intrinsic and extrinsic characteristics of each of the cameras can be determined in the following manner. A stream of calibration data can be input from each camera in the array while it is moved around the scene but in close proximity to its static location (thus allowing each camera in the array to view overlapping parts of the static background of the scene). After the scene is captured by the static array of color cameras and the streams of sensor data generated thereby are input, the streams of sensor data can be analyzed using conventional methods to identify features in the scene, and these features can then be used to calibrate the cameras in the array and determine the intrinsic and extrinsic characteristics of each of the cameras by employing a conventional structure-from-motion method.
- In yet another situation where one or more of the VCDs that are being used to capture the scene are moving VCDs (such as when the spatial location of a given VCD changes over time, or when controls on a given VCD are used to optically zoom in on the scene while it is being captured (which is commonly done during the recording of sporting events, among other things)), each of these moving VCDs can be calibrated and its intrinsic and extrinsic characteristics can be determined at each point in time during the scene capture by using a conventional background model to register and calibrate relevant individual images that were generated by the VCD. In yet another situation where the VCDs that are being used to capture the scene include a combination of static and moving VCDs, the VCDs can be calibrated and the intrinsic and extrinsic characteristics of each of the VCDs can be determined by employing conventional multistep calibration procedures.
- In yet another situation where there is no temporal synchronization between the VCDs that are being used to capture the scene and the arrangement of the sensors can randomly change over time (such as when a plurality of mobile devices are held up by different users and the VCDs on these devices are used to capture the scene), the pipeline technique embodiments described herein will both spatially and temporally calibrate the streams of sensor data generated by the VCDs at all points in time during the scene capture before the streams are processed in the processing stage. In an exemplary embodiment of the pipeline technique this spatial and temporal calibration can be performed as follows. After the scene is captured and the streams of sensor data representing the scene are input, the streams of sensor data can be analyzed using conventional methods to separate the static and moving elements of the scene. The static elements of the scene can then be used to generate a background model. Additionally, the moving elements of the scene can be used to generate a global timeline that encompasses all of the VCDs, and each image in each stream of sensor data is assigned a relative time. The intrinsic characteristics of each of the VCDs can be determined by using conventional methods to analyze each of the streams of sensor data.
- In an embodiment of the pipeline technique described herein where the capture stage of the FVV processing pipeline is directly connected to the VCDs that are being used to capture the scene, the intrinsic characteristics of each of the VCDs can also be determined by reading appropriate hardware parameters directly from each of the VCDs. In another embodiment of the pipeline technique where the capture stage is not directly connected to the VCDs but rather the streams of sensor data are pre-recorded and then imported into the capture stage, the number of VCDs and various intrinsic properties of each of the VCDs can be determined by analyzing the streams of sensor data using conventional methods.
- Referring again to
FIG. 2 , after the streams of sensor data have been calibrated (block 204), a scene proxy is generated from the calibrated streams of sensor data (block 206). As will be described in more detail hereafter, the scene proxy geometrically describes the scene as a function of time and includes one or more types of geometric proxy data which is matched to a set of current pipeline conditions in order to maximize the photo-realism of the FVV that results from the scene proxy at each point in time. These conditions can be in any one or more of the aforementioned stages of the FVV processing pipeline. After the scene proxy has been generated (block 206), it can be stored (block 208). In a situation where a given end user either is, or will be, viewing the FVV on another computing device which is connected to a data communication network, the scene proxy can also be distributed to the end user by transmitting it over the network to the other computing device (block 210). -
FIG. 4 illustrates an exemplary embodiment, in simplified form, of a process for generating the scene proxy from the calibrated streams of senor data. As exemplified inFIG. 4 , the process starts inblock 400 with periodically analyzing the set of current pipeline conditions. The set of current pipeline conditions can include one or more conditions in the capture stage of the FVV processing pipeline such as the number of VCDs that either is being, or was, used to capture the scene, or one or more of the intrinsic characteristics of each of the VCDs (e.g., the VCD type, among others), or one or more of the extrinsic characteristics of each of the VCDs (e.g., the current position of the VCD relative to the scene, and whether the VCD is static or moving, among others), or the like. The set of current pipeline conditions can also include one or more conditions in the processing stage of the pipeline such as whether the scene proxy is being generated on-the-fly during the rendering and end user viewing of the captured scene, or the scene proxy is being generated asynchronously from the rendering and end user viewing of the scene (i.e., the particular type of FVV that is being generated and the related speed at which the scene proxy has to be generated), or the like. - The set of current pipeline conditions can also include one or more conditions in the storage and distribution stage of the FVV processing pipeline such as the amount of storage space that is currently available to store the scene proxy, or the network transmission bandwidth that is currently available, or the like. The set of current pipeline conditions can also include one or more conditions in the user viewing experience stage of the pipeline such as the type of display device upon which the FVV either is, or will be, viewed, or the particular characteristics of the display device (e.g., one or more of its aspect ratio, or its pixel resolution, or its form factor, among others), or the level of data fidelity that is desired in the free viewpoint video, or the like.
- Referring again to
FIG. 4 , after the analysis of the set of current pipeline conditions has been completed (block 400), the results of this analysis are then used to select one or more different 3D reconstruction methods which are matched to the current pipeline conditions (block 402). The selected 3D reconstruction methods are then used to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data (block 404). The 3D reconstructions of the scene and the results of the period analysis are then used to generate the scene proxy (block 406). The actions ofblocks - It will thus be appreciated that the pipeline technique embodiments described herein can use a wide variety of 3D reconstruction methods in various combinations, where the particular types of 3D reconstruction methods that are being used depend upon various current conditions in the FVV processing pipeline. Accordingly and as will be described in more detail hereafter, the scene proxy will include one or more types of geometric proxy data examples of which include, but are not limited to, the following. The scene proxy can include a stream of depth map images of the scene. The scene proxy can also include a stream of calibrated point cloud reconstructions of the scene. As is appreciated in the art of 3D reconstruction, these point cloud reconstructions are a low order geometric representation of the scene. The scene proxy can also include one or more high order geometric models, where these models can include one or more of planes, or billboards, or existing (i.e., previously created) generic object models (e.g., human body models, or human face models, or clothing models, or furniture models, or the like) which can be either modified, or animated, or both, among others. Such high order geometric models can be advantageously used to fill in occlusions that may exist in the captured scene. The scene proxy can also include other high fidelity proxies such as a stream of mesh models of the scene and a corresponding stream of texture maps which define texture data for each of the mesh models, among others. It will further be appreciated that since the particular 3D reconstruction methods that are used and the related manner in which the scene proxy is generated are based upon a period analysis (i.e., monitoring) of the various current conditions in the FVV processing pipeline, the 3D reconstruction methods that are used and the resulting types of data in the scene proxy can change over time based on changes in the pipeline conditions.
- Generally speaking, for the unidirectional and bidirectional live FVV implementations of the pipeline technique embodiments described herein, due to the fact that the capture, processing, storage and distribution, rendering, and user viewing experience stages of the FVV processing pipeline have to be completed within a very short period of time, the types of 3D reconstruction methods that can be used in these implementations are limited to
high speed 3D reconstruction methods. By way of example but not limitation, in the unidirectional and bidirectional live FVV implementations of the pipeline technique embodiments the scene proxy that is generated will include a stream of calibrated point cloud reconstructions of the scene, and may also include one or more high order geometric models which can be either modified, or animated, or both. It will be appreciated that 3D reconstruction methods which can be implemented in hardware are also favored in the unidirectional and bidirectional live FVV implementations of the pipeline technique embodiments. The use of VCDs which generate infrared images of the scene is also favored in the unidirectional and bidirectional live FVV implementations of the pipeline technique embodiments. - For the asynchronous FVV implementation of the pipeline technique embodiments described herein, due to the fact that the capture and processing stages of the FVV processing pipeline operate asynchronously from the rendering and user viewing experience stages (and as such, there is effectively an unlimited amount of time available for the processing stage), more rigorous (and thus slower) 3D reconstruction methods can be used in this implementation. By way of example but not limitation, in the asynchronous FVV implementation of the pipeline technique embodiments the scene proxy that is generated can include both a stream of calibrated point cloud reconstructions of the scene, as well as one or more higher fidelity geometric proxies of the scene (such as when the calibrated point cloud reconstructions of the scene are used to generate a stream of mesh models of the scene, among other possibilities). The asynchronous FVV implementation of the pipeline technique embodiments also allows a plurality of 3D reconstruction steps to be used in sequence when generating the scene proxy. By way of example but not limitation, consider a situation where a stream of calibrated point cloud reconstructions of the scene has been generated, but there are some noisy or error prone stereo matches present in these reconstructions that extend beyond a human silhouette boundary in the scene. It will be appreciated that these noisy or error prone stereo matches can lead to the wrong texture data appearing in the mesh models of the scene, thus resulting in artifacts in the rendered scene. These artifacts can be eliminated by running a segmentation process to separate the foreground from the background, and then points outside of the human silhouette can be rejected as outliers.
-
FIG. 9 illustrates an exemplary embodiment, in simplified form, of a process for using apoint cloud 3D reconstruction method to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of depth map images of the scene (hereafter simply referred to as different depth map image streams). As exemplified inFIG. 9 , whenever the calibrated streams of sensor data include a plurality of different depth map image streams (block 900, Yes), these different depth map image streams are merged into a stream of calibrated point cloud reconstructions of the scene (block 902). It is noted that these point cloud reconstructions are unordered and as such, overlaps may exist therein. Depending on the current pipeline conditions, the stream of calibrated point cloud reconstructions of the scene can then optionally be used to generate one or more different high fidelity geometric proxies of the scene (block 904). By way of example but not limitation, the stream of calibrated point cloud reconstructions of the scene can be used to generate a stream of mesh models of the scene, where this mesh model generation can be performed using conventional methods such as Poisson, among others. -
FIG. 10 illustrates an exemplary embodiment, in simplified form, of a process for using thepoint cloud 3D reconstruction method to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of infrared images of the scene (hereafter simply referred to as different infrared image streams). As exemplified inFIG. 10 , whenever the calibrated streams of sensor data include a plurality of different infrared image streams (block 1000, Yes), the following actions occur. Any narrow baseline stereo pairs of VCDs that exist in the arrangement of sensors and generate pairs of infrared image streams are identified (block 1002). A first set of different depth map image streams is then created from the pairs of infrared image streams generated by the identified narrow baseline stereo pairs of VCDs (block 1004). Any wide baseline stereo pairs of VCDs that exist in the arrangement of sensors and generate pairs of infrared image streams are then identified (block 1006). A second set of different depth map image streams is then created from the pairs of infrared image streams generated by the identified wide baseline stereo pairs of VCDs (block 1008). The different depth map image streams in the first set and the second set are then merged into a stream of calibrated point cloud reconstructions of the scene (block 1010). It is noted that these point cloud reconstructions are unordered and as such, overlaps may exist therein. Depending on the current pipeline conditions, the stream of calibrated point cloud reconstructions of the scene can then optionally be used to generate one or more different high fidelity geometric proxies of the scene (block 1012). By way of example but not limitation and as just described, the stream of calibrated point cloud reconstructions of the scene can be used to generate a stream of mesh models of the scene. -
FIG. 11 illustrates an exemplary embodiment, in simplified form, of a process for using thepoint cloud 3D reconstruction method to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of color images of the scene (hereafter simply referred to as different color image streams). As exemplified inFIG. 11 , whenever the calibrated streams of sensor data include a plurality of different color image streams (block 1100, Yes), the following actions occur. Any narrow baseline stereo pairs of VCDs that exist in the arrangement of sensors and generate pairs of color image streams are identified (block 1102). A first set of different depth map image streams is then created from the pairs of color image streams generated by the identified narrow baseline stereo pairs of VCDs (block 1104). Any wide baseline stereo pairs of VCDs that exist in the arrangement of sensors and generate pairs of color image streams are then identified (block 1106). A second set of different depth map image streams is then created from the pairs of color image streams generated by the identified wide baseline stereo pairs of VCDs (block 1108). The different depth map image streams in the first set and the second set are then merged into a stream of calibrated point cloud reconstructions of the scene (block 1110). It is noted that these point cloud reconstructions are unordered and as such, overlaps may exist therein. Depending on the current pipeline conditions, the stream of calibrated point cloud reconstructions of the scene can then optionally be used to generate one or more different high fidelity geometric proxies of the scene (block 1112). By way of example but not limitation and as just described, the stream of calibrated point cloud reconstructions of the scene can be used to generate a stream of mesh models of the scene. - It will be appreciated that depending on the particular arrangement of sensors that is used to capture the scene, a given VCD can be in a plurality of narrow baseline stereo pairs of VCDs, and can also be in a plurality of wide baseline stereo pairs of VCDs. This serves to maximize the number of different depth map image streams that are created, which in turn serves to maximize the precision of the scene proxy.
- Referring again to
FIG. 1 , this section provides an overview description, in simplified form, of several additional implementations of the capture and processingstages FVV processing pipeline 100. It will be appreciated that the implementations described in this section are merely exemplary. Many other implementations of the capture and processingstages - In one implementation of the capture and processing stages of the FVV processing pipeline a circular arrangement of eight genlocked VCDs is used to capture a scene which includes one or more human beings, where each of the VCDs includes a combination of one infrared structured-light projector, two infrared video cameras, and one color camera. Accordingly, the VCDs each generate a different stream of video data which includes both a stereo pair of infrared image streams and a color image stream. As described heretofore, the pair of infrared image streams and the color image stream generated by each VCD are first used to generate different depth map image streams. The different depth map image streams are then merged into a stream of calibrated point cloud reconstructions of the scene. These point cloud reconstructions are then used to generate a stream of mesh models of the scene. A conventional view-dependent texture mapping method which accurately represents specular textures such as skin is then used to extract texture data from the color image stream generated by each VCD and map this texture data to the stream of mesh models of the scene.
- In another implementation of the capture and processing stages of the FVV processing pipeline four genlocked visible light video cameras are used to capture a scene which includes one or more human beings, where the cameras are evenly placed around the scene. Accordingly, the cameras each generate a different stream of video data which includes a color image stream. An existing 3D geometric model of a human body can be used in the scene proxy as follows. Conventional methods can be used to kinematically articulate the model over time in order to fit (i.e., match) the model to the streams of video data generated by the cameras. The kinematically articulated model can then be colored as follows. A conventional view-dependent texture mapping method can be used to extract texture data from the color image stream generated by each camera and map this texture data to the kinematically articulated model.
- In yet another implementation of the capture and processing stages of the FVV processing pipeline three unsynchronized visible light video cameras are used to capture a soccer game, where each of the cameras is moving and is located far from the game (e.g., rather than the spatial location of each of the cameras being fixed to a specified arrangement, each of the cameras is hand held by a different user who is capturing the game while they freely move about). Accordingly, the cameras each generate a different stream of video data which includes a stream of color images of the game. Articulated billboards can be used to represent the moving players in the scene proxy of the game as follows. For each stream of video data, conventional methods can be used to generate a segmentation mask for each body part of each player in the stream. Conventional methods can then be used to generate an articulated billboard model of each of the moving players in the game from the appropriate segmentation masks. The articulated billboard model can then be colored as just described.
- This section provides a more detailed description of the rendering and user viewing experience stages of the FVV processing pipeline.
-
FIG. 5 illustrates an exemplary embodiment, in simplified form, of a process for presenting an FVV of a scene to an end user. As exemplified inFIG. 5 , the process starts inblock 500 with inputting a scene proxy which geometrically describes the scene as a function of time. A current synthetic viewpoint of the scene is then generated from the scene proxy, where this current synthetic viewpoint generation maximizes the photo-realism of the current synthetic viewpoint based upon a set of current pipeline conditions (block 502). These conditions can be in any one or more of the aforementioned stages of the FVV processing pipeline. The current synthetic viewpoint of the scene is then displayed on a display device (block 504) so that it can be viewed and navigated by the end user. As described herein, a prescribed number of degrees of viewpoint navigation freedom are provided to the end user, where this number is greater than or equal to one and less than or equal to six. -
FIG. 6 illustrates an exemplary embodiment, in simplified form, of a process for generating the current synthetic viewpoint of the scene from the scene proxy. As exemplified inFIG. 6 , the process starts inblock 600 with periodically analyzing the set of current pipeline conditions. The set of current pipeline conditions can include one or more conditions in the capture stage of the FVV processing pipeline such as the number of VCDs that either is being, or was, used to capture the scene, or one or more of the intrinsic characteristics of each of the VCDs (e.g., the VCD type, among others), or one or more of the extrinsic characteristics of each of the VCDs (e.g., the positioning of the VCD relative to the scene, and whether the VCD is static or moving, among others), or the complexity and composition of the scene, or whether the scene is relatively static or dynamic, or the like. The set of current pipeline conditions can also include one or more conditions in the processing stage of the pipeline such as the particular 3D reconstruction methods that are being (or were) used to generate the scene proxy, or the types of geometric proxy data that are in the scene proxy, or the level of data fidelity that is desired in the free viewpoint video, or the like. - The set of current pipeline conditions can also include one or more conditions in the rendering and user viewing experience stages of the FVV processing pipeline such as the graphics processing capabilities/features that are available in the hardware of the computing device which is being used by a given end user to generate the current synthetic viewpoint of the scene, or the type of display device upon which the current synthetic viewpoint of the scene is being displayed, or the particular characteristics of the display device (described heretofore), or the number of degrees of viewpoint navigation freedom that are being provided to the end user, or the view frustum of the current synthetic viewpoint, or whether or not this computing device includes a natural user interface (and if so, the particular natural user interface modalities that are anticipated to be used by the end user), or the like. The set of current pipeline conditions can also include information which is generated by the end user in the user viewing experience stage that specifies desired changes to (i.e., controls) the current synthetic viewpoint of the scene. Such information can include one or more of viewpoint navigation information which is being output by this stage based upon the FVV navigation that is being performed by the end user, or temporal navigation information which may also be output by this stage based upon this FVV navigation. The set of current pipeline conditions can also include the type of FVV that is being presented to the end user.
- Referring again to
FIG. 6 , after the analysis of the set of current pipeline conditions has been completed (block 600), the results of this analysis are then used to select one or more different image-based rendering methods which are matched to the current pipeline conditions (block 602). The selected image-based rendering methods and the results of the period analysis are then used to generate the current synthetic viewpoint of the scene (block 604). The actions ofblocks - It will thus be appreciated that the pipeline technique embodiments described herein can use a wide variety of image-based rendering methods in various combinations, where the particular types of image-based rendering methods that are being used depend upon various current conditions in the FVV processing pipeline. Unlike the rendering methods that are employed in conventional 3D computer graphics applications where the 3D geometry of the scene that is being rendered is known (i.e., the geometric primitives for the scene are known), the image-based rendering methods that are employed by the pipeline technique embodiments described herein can render novel views (i.e., synthetic viewpoints) of the scene directly from a collection of images in the scene proxy without having to know the scene geometry. An overview exemplary image-based rendering methods which can be employed by the pipeline technique embodiments is provided hereafter.
- The pipeline technique embodiments described herein support using any type of display device to view the FVV including, but not limited to, the very small form factor display devices used on conventional smart phones and other types of mobile devices, the small form factor display devices used on conventional tablet computers and netbook computers, the display devices used on conventional laptop computers and personal computers, conventional televisions and 3D televisions, conventional autostereoscopic 3D display devices, conventional head-mounted transparent display devices, and conventional wearable heads-up display devices such as those that are used in virtual reality applications. In a situation where the end user is using an autostereoscopic 3D display device to view the FVV, then the rendering stage of the FVV processing pipeline will simultaneously generate both left and right current synthetic viewpoints of the scene at an appropriate aspect ratio and resolution in order to create a stereoscopic effect for the end user. In another situation where the end user is using a conventional television to view the FVV, then the rendering stage will generate just a single current synthetic viewpoint.
- The pipeline technique embodiments described herein also support using any type of user interface modality to control the current viewpoint while viewing the FVV including, but not limited to, conventional keyboards, conventional pointing devices (such as a mouse, or a graphics tablet, or the like), and conventional natural user interface modalities (such as voice, or a touch-sensitive display screen, or the head tracking functionality that is integrated into wearable heads-up display devices, or a motion and location sensing device (such as the Microsoft Kinect™ (a trademark of Microsoft Corporation), among others), or the like). It will be appreciated that if the end user either is, or will be, using one or more natural user interface modalities while they are viewing the FVV, this can influence the spatiotemporal navigation capabilities that are provided to the end user. In other words, the FVV processing pipeline can process the streams of sensor data differently in order to enable different end user viewing experiences based on the particular type(s) of user interface modality that is anticipated to be used by the end user. By way of example but not limitation, in a situation where a given end user is using the wearable heads-up display device to view and navigate the FVV, then all six degrees of viewpoint navigation freedom could be provided to the end user. In the bidirectional live FVV implementation of the pipeline technique embodiments, if the end user at each physical location that is participating in a given video-conferencing/telepresence session is using the wearable heads-up display device to view and navigate the FVV, then parallax functionality can be implemented in order to provide each end user with an optimally realistic viewing experience when they control/change their viewpoint of the FVV using head movements; the pipeline can also provide for corrected conversational geometry between two end users, thus providing the appearance that both end users are looking directly at each other. In another situation where a given end user is using the motion and location sensing device navigate the FVV, then the rendering stage can optimize the current synthetic viewpoint that is being displayed based on the end user's current spatial location in front of their display device. In this way, the end user's current spatial location can be mapped to the 3D geometry within the FVV.
-
FIG. 8 illustrates the various degrees of viewpoint navigation freedom that can be supported by the pipeline technique embodiments describe herein. As described heretofore, the pipeline technique embodiments generally support spatiotemporal (i.e., space-time) navigation of the FVV. More particularly, the asynchronous FVV, unidirectional live FVV, and bidirectional live FVV implementations described herein can each support spatial viewpoint navigation of the FVV having as many as six degrees of freedom, which can be appropriate when the end user is viewing and navigating an FVV that includes high fidelity geometric information. As exemplified inFIG. 8 , these six degrees of freedom include viewpoint navigation along the x axis, viewpoint navigation rotationally about the x axis (θx), viewpoint navigation along the y axis, viewpoint navigation rotationally about the y axis (θy), viewpoint navigation along the z axis, and viewpoint navigation rotationally about the z axis (θz). The asynchronous FVV, unidirectional live FVV, and bidirectional live FVV implementations can also each support spatial viewpoint navigation of the FVV having just one degree of viewpoint navigation freedom, which can be appropriate when the viewpoint navigation of the FVV is constrained to a straight line that connects the sensors. The asynchronous FVV implementation can also support temporal navigation of the FVV. - In some embodiments of the pipeline technique described herein, such as the asynchronous FVV implementation described herein, a producer or editor of the FVV may want to specify the particular types of viewpoint navigation that are possible at different times during the FVV. By way of example but not limitation, in one scene a movie director may want to confine the end user's viewpoint navigation to a limited area of the scene or a specific axis, but in another scene the director may want to allow the end user to freely navigate their viewpoint throughout the entire area of the scene.
- As described heretofore, the current synthetic viewpoint of the scene is generated using one or more image-based rendering methods which are selected based upon a periodic analysis of the aforementioned set of current pipeline conditions. Accordingly, the particular image-based rendering methods that are used can change over time based upon changes in the current pipeline conditions. It will thus be appreciated that in one situation where the scene has a low degree of complexity and the arrangement of sensors which either is being, or was, used to capture the scene are located close to the scene, just a single image-based rendering method may be used to generate the current synthetic viewpoint of the scene. In another situation where the scene has a high degree of complexity and the arrangement of sensors which either is being, or was, used to capture the scene are located far from the scene, a plurality of image-based rendering methods may be used to generate the current synthetic viewpoint of the scene depending on the location of the current viewpoint relative to the scene and the particular types of geometric proxy data that are in the scene proxy.
-
FIG. 7 illustrates an exemplary embodiment, in simplified form, of a continuum of the various exemplary image-based rendering methods which can be employed by the pipeline technique embodiments described herein. As exemplified inFIG. 7 , for didactic purposes these various image-based rendering methods can be classified into three categories according to the amount and type of scene geometry information that is included in the scene proxy and thus is available to be used in the rendering stage, namely rendering with without scene geometry 706 (i.e., the scene geometry is unknown), rendering with implicit scene geometry 704 (i.e., correspondence), and rendering with explicit scene geometry 702 (which can be either approximate or accurate). Thesecategories continuum 700 rather than strict and discrete categories since it will be appreciated that certain of the image-based rendering methods defy strict categorization. - As also exemplified in
FIG. 7 , a trade-off exists between the amount and type of scene geometry information that is available to be used in the rendering stage, and the number of images that are needed to be available in the scene proxy in order to generate synthetic viewpoints of the scene which are photo-realistic. Generally speaking, the higher the VCD density in the arrangement of sensors that is being used to capture the scene (i.e., the larger the number of VCDs that is used in the arrangement), the larger the number of images that is available in the scene proxy, and thus the less scene geometry information that is needed to be available in the scene proxy in order to generate synthetic viewpoints of the scene which are photo-realistic. However, it is noted that having less scene geometry information in the scene proxy will generally decrease the end user's options for navigating the current synthetic viewpoint of the scene (i.e., the synthetic viewpoints will generally be limited to positions between VCDs or near VCDs. Correspondingly, the lower the VCD density (i.e., the smaller the number of VCDs that is used in the arrangement), the smaller the number of images that is available in the scene proxy, and thus the more scene geometry information that is needed to be available in the scene proxy in order to generate synthetic viewpoints of the scene which are photo-realistic. However, it is noted that having more scene geometry information in the scene proxy will generally increase the end user's options for navigating the current synthetic viewpoint of the scene (i.e., the synthetic viewpoints can be navigated to positions which are far away from the real VCD viewpoints). - On the
left side 706 of thecontinuum 700 exemplified inFIG. 7 the scene proxy includes a large number of images but does not include any scene geometry or correspondence information. In this situation the current synthetic viewpoint of the scene can be generated by using a conventional light-field method, or a conventional lumigraph method, or a conventional concentric mosaics method, among others, to process the scene proxy. As is appreciated in the art of image-based rendering, each of these methods relies on the characterization of the conventional plenoptic function, and constructs a continuous representation of the plenoptic function from the images in the scene proxy. The light-field method is generally applicable when the images of the scene are uniformly captured. The light-field method generates new images of the scene by appropriately filtering and interpolating the images in the scene proxy. The lumigraph method is similar to the light-field method except that the lumigraph method is generally applicable when the images of the scene are not uniformly captured. The lumigraph method enhances the rendering performance by applying approximated geometry to compensate for this non-uniform capture. Unlike the light-field and lumigraph methods which are applicable when the arrangement of VCDs is a 2D grid, the concentric mosaics method is applicable when the arrangement of VCDs is circular. Conventional image mosaicing methods can also be used to construct a complete plenoptic function at a fixed viewpoint from an incomplete set of images of the scene. - In the middle 704 of the
continuum 700 exemplified inFIG. 7 the scene proxy does not include explicit scene geometry information, but rather it includes implicit scene geometry information in the form of feature (e.g., point) correspondences between images, where these correspondences can be computed using conventional computer vision methods. In this situation the current synthetic viewpoint of the scene can be generated by using various conventional transfer methods (such as a conventional view interpolation method, or a conventional view morphing method, among others) to process the scene proxy. As is appreciated in the art of image-based rendering, such transfer methods are characterized by the use of a relatively small number of images with the application of geometric constraints (which are either recovered or known a priori) to project image pixels appropriately at a given synthetic viewpoint. These geometric constraints can be in the form of known depth values at each pixel, or epipolar constraints between stereo pairs of images, or trifocal/tri-linear tensors that link correspondences between triplets of images. The view interpolation method generates synthetic viewpoints of the scene by interpolating optical flow between corresponding points. The view morphing method generates synthetic viewpoints that reside on a line which links the optical centers of two different VCDs based on point correspondences. - On the
right side 702 of thecontinuum 700 exemplified inFIG. 7 the scene proxy includes explicit and accurate scene geometry information and a small number of images, where this geometry information can be in form of either depth along known lines-of-sight, or 3D coordinates, among other things. In this situation the current synthetic viewpoint of the scene can be generated by using conventional 3D warping methods, or a conventional layered depth images method, or a conventional layered depth images tree method, or a conventional view-dependent texture mapping method, or a conventional view-dependent geometry method, among others, to process the scene proxy. As is appreciated in the art of image-based rendering, the 3D warping methods, or the layered depth images method, or the layered depth images tree method can be used when the scene proxy includes both depth map images and color (or monochrome) images of the scene. When the scene proxy includes depth information for all the points in an image, the 3D warping methods can be used to render the image from any nearby point of view by projecting the pixels of the image to their proper 3D locations and then re-projecting them onto a new picture. The rendering speed of such 3D warping methods can be increased by using conventional relief texture methods which factor the warping process into a relatively simple pre-warping operation and a conventional texture mapping operation (which may be performed by conventional graphics processing hardware). It is noted that the 3D warping methods can be applied to both traditional perspective images as well as multi-perspective images. The view-dependent geometry method was first used in the context of 3D cartoons and trades off geometry and images, and may be used to represent the current synthetic viewpoint of the scene more compactly. A conventional texture-mapped models method can also be used to generate the current synthetic viewpoint of the scene. - While the pipeline technique has been described by specific reference to embodiments thereof, it is understood that variations and modifications thereof can be made without departing from the true spirit and scope of the pipeline technique. By way of example but not limitation, rather than the capture and processing stages of the FVV processing pipeline being implemented on one computing device (or a collection of computing devices), and the rendering and user viewing experience stages of the pipeline being implemented on another computing device(s) which is being used by an end user(s) to view the FVV, an alternate embodiment of the pipeline technique described herein is possible where the capture, processing, rendering and user viewing experience stages of the pipeline are implemented on a single computing device (i.e., the FVV can be rendered and viewed on the same computing device that is used to input/calibrate the streams of sensor data and generate the scene proxy). Furthermore, in addition to the sensors being any type of VCD, or any type of audio capture device, or any combination thereof as described heretofore, the sensors can also be a wearable body-suit that provides a stream of depth data.
- It is also noted that any or all of the aforementioned embodiments can be used in any combination desired to form additional hybrid embodiments. Although the pipeline technique embodiments have been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described heretofore. Rather, the specific features and acts described heretofore are disclosed as example forms of implementing the claims.
- The pipeline technique embodiments described herein are operational within numerous types of general purpose or special purpose computing system environments or configurations.
FIG. 12 illustrates a simplified example of a general-purpose computer system on which various embodiments and elements of the pipeline technique, as described herein, may be implemented. It is noted that any boxes that are represented by broken or dashed lines inFIG. 12 represent alternate embodiments of the simplified computing device, and that any or all of these alternate embodiments, as described below, may be used in combination with other alternate embodiments that are described throughout this document. - For example,
FIG. 12 shows a general system diagram showing asimplified computing device 1200. Such computing devices can be typically be found in devices having at least some minimum computational capability, including, but not limited to, personal computers (PCs), server computers, handheld computing devices, laptop or mobile computers, communications devices such as cell phones and personal digital assistants (PDAs), multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, and audio or video media players. - To allow a device to implement the pipeline technique embodiments described herein, the device should have a sufficient computational capability and system memory to enable basic computational operations. In particular, as illustrated by
FIG. 12 , the computational capability is generally illustrated by one or more processing unit(s) 1210, and may also include one or more graphics processing units (GPUs) 1215, either or both in communication withsystem memory 1220. Note that the processing unit(s) 1210 may be specialized microprocessors (such as a digital signal processor (DSP), a very long instruction word (VLIW) processor, a field-programmable gate array (FPGA), or other micro-controller) or can be conventional central processing units (CPUs) having one or more processing cores including, but not limited to, specialized GPU-based cores in a multi-core CPU. - In addition, the
simplified computing device 1200 ofFIG. 12 may also include other components, such as, for example, acommunications interface 1230. Thesimplified computing device 1200 ofFIG. 12 may also include one or more conventional computer input devices 1240 (e.g., pointing devices, keyboards, audio (e.g., voice) input/capture devices, video input/capture devices, haptic input devices, devices for receiving wired or wireless data transmissions, and the like). Thesimplified computing device 1200 ofFIG. 12 may also include other optional components, such as, for example, one or more conventional computer output devices 1250 (e.g., display device(s) 1255, audio output devices, video output devices, devices for transmitting wired or wireless data transmissions, and the like). Exemplary types of input devices (herein also referred to as user interface modalities) and display devices that are operable with the pipeline technique embodiments described herein have been described heretofore. Note thattypical communications interfaces 1230, additional types of input andoutput devices 1240 and 1250, andstorage devices 1260 for general-purpose computers are well known to those skilled in the art, and will not be described in detail herein. - The
simplified computing device 1200 ofFIG. 12 may also include a variety of computer readable media. Computer readable media can be any available media that can be accessed by thecomputer 1200 viastorage devices 1260, and includes both volatile and nonvolatile media that is either removable 1270 and/or non-removable 1280, for storage of information such as computer-readable or computer-executable instructions, data structures, program modules, or other data. By way of example but not limitation, computer readable media may include computer storage media and communication media. Computer storage media includes, but is not limited to, computer or machine readable media or storage devices such as digital versatile disks (DVDs), compact discs (CDs), floppy disks, tape drives, hard drives, optical drives, solid state memory devices, random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, magnetic cassettes, magnetic tapes, magnetic disk storage, or other magnetic storage devices, or any other device which can be used to store the desired information and which can be accessed by one or more computing devices. - Storage of information such as computer-readable or computer-executable instructions, data structures, program modules, and the like, can also be accomplished by using any of a variety of the aforementioned communication media to encode one or more modulated data signals or carrier waves, or other transport mechanisms or communications protocols, and includes any wired or wireless information delivery mechanism. Note that the terms “modulated data signal” or “carrier wave” generally refer to a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. For example, communication media includes wired media such as a wired network or direct-wired connection carrying one or more modulated data signals, and wireless media such as acoustic, radio frequency (RF), infrared, laser, and other wireless media for transmitting and/or receiving one or more modulated data signals or carrier waves. Combinations of the any of the above should also be included within the scope of communication media.
- Furthermore, software, programs, and/or computer program products embodying the some or all of the various embodiments of the pipeline technique described herein, or portions thereof, may be stored, received, transmitted, or read from any desired combination of computer or machine readable media or storage devices and communication media in the form of computer executable instructions or other data structures.
- Finally, the pipeline technique embodiments described herein may be further described in the general context of computer-executable instructions, such as program modules, being executed by a computing device. Generally, program modules include routines, programs, objects, components, data structures, and the like, that perform particular tasks or implement particular abstract data types. The pipeline technique embodiments may also be practiced in distributed computing environments where tasks are performed by one or more remote processing devices, or within a cloud of one or more devices, that are linked through one or more communications networks. In a distributed computing environment, program modules may be located in both local and remote computer storage media including media storage devices. Additionally, the aforementioned instructions may be implemented, in part or in whole, as hardware logic circuits, which may or may not include a processor.
Claims (20)
1. A computer-implemented process for generating a free viewpoint video of a scene, comprising:
using a computing device to perform the following process actions:
using an arrangement of sensors to capture the scene, wherein the arrangement comprises a plurality of video capture devices (VCDs) and generates a plurality of streams of sensor data each of which represents the scene from a different geometric perspective;
inputting the streams of sensor data;
calibrating the streams of sensor data; and
generating a scene proxy from the calibrated streams of sensor data, wherein the scene proxy geometrically describes the scene as a function of time and comprises one or more types of geometric proxy data which is matched to a set of current pipeline conditions in order to maximize the photo-realism of the free viewpoint video that results from the scene proxy at each point in time, said generation comprising the actions of,
periodically analyzing the current pipeline conditions,
using results of the periodic analysis to select one or more different three-dimensional (3D) reconstruction methods which are matched to the current pipeline conditions,
using the selected 3D reconstruction methods to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data, and
using the 3D reconstructions of the scene and the results of the periodic analysis to generate the scene proxy.
2. The process of claim 1 , wherein the process action of calibrating the streams of sensor data comprises the actions of:
determining the number of VCDs in the arrangement of sensors;
determining intrinsic characteristics of each of the VCDs;
determining extrinsic characteristics of each of the VCDs at each point in time during the capture of the scene; and
using knowledge of the number of VCDs in the arrangement, and the intrinsic and extrinsic characteristics of each of the VCDs, to temporally and spatially calibrate the streams of sensor data.
3. The process of claim 2 , wherein each of the VCDs generates a stream of video data comprising a stream of images of the scene, and the intrinsic characteristics of each of the VCDs comprise one or more of:
the VCD type; or
the frame rate of the VCD; or
the shutter speed of the VCD; or
the mosaic pattern of the VCD; or
the white balance of the VCD; or
the bit depth of said images; or
the pixel resolution of said images; or
the focal length of a lens of the VCD; or
the principal point of said lens; or
the skew coefficient of the VCD; or
the distortions of said lens; or
the field of view of the VCD.
4. The process of claim 2 , wherein the extrinsic characteristics of each of the VCDs comprise one or more of:
the current rotational orientation of the VCD; or
the current spatial location of the VCD; or
whether the VCD is static or moving; or
the current geometric relationship between the VCD and each of the other VCDs in the arrangement of sensors; or
the current position of the VCD relative to the scene; or
whether or not the VCD is genlocked with the other VCDs in the arrangement.
5. The process of claim 2 , wherein the set of current pipeline conditions comprises one or more of:
the number of VCDs in the arrangement of sensors; or
one or more of the intrinsic characteristics of each of the VCDs; or
one or more of the extrinsic characteristics of each of the VCDs.
6. The process of claim 1 , wherein,
the free viewpoint video being generated comprises either asynchronous free viewpoint video, or unidirectional live free viewpoint video, or bidirectional live free viewpoint video, and
the set of current pipeline conditions comprises the type of free viewpoint video being generated and the related speed at which the scene proxy has to be generated.
7. The process of claim 1 , further comprising an action of storing the scene proxy, wherein the set of current pipeline conditions comprises the amount of storage space that is currently available to store the scene proxy.
8. The process of claim 1 , further comprising an action of distributing the scene proxy to an end user, wherein,
the end user either is, or will be, viewing the free viewpoint video on another computing device which is connected to a data communication network,
said distribution comprises transmitting the scene proxy over the network to said other computing device, and
the set of current pipeline conditions comprises the network transmission bandwidth that is currently available.
9. The process of claim 8 , wherein the other computing device comprises a display device upon which the free viewpoint video either is, or will be, viewed, and the set of current pipeline conditions comprises one or more of:
the display device type; or
particular characteristics of the display device, said characteristics comprising one or more of the aspect ratio of the display device, or the pixel resolution of the display device, or the form factor of the display device; or
a level of data fidelity that is desired in the free viewpoint video.
10. The process of claim 1 , wherein the scene proxy comprises one or more of:
a stream of depth map images of the scene; or
a stream of calibrated point cloud reconstructions of the scene; or
one or more high order geometric models comprising one or more of planes, or billboards, or previously created generic object models; or
a stream of mesh models of the scene.
11. The process of claim 1 , wherein whenever the calibrated streams of sensor data comprise a plurality of different depth map image streams, the process action of using the selected 3D reconstruction methods to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data comprises the actions of:
merging said depth map image streams into a stream of calibrated point cloud reconstructions of the scene; and
using said stream of calibrated point cloud reconstructions to generate one or more different high fidelity geometric proxies of the scene.
12. The process of claim 1 , wherein whenever the calibrated streams of sensor data comprise a plurality of different infrared image streams, the process action of using the selected 3D reconstruction methods to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data comprises the actions of:
identifying any narrow baseline stereo pairs of VCDs that exist in the arrangement of sensors and generate pairs of infrared image streams;
creating a first set of different depth map image streams from the pairs of infrared image streams generated by the identified narrow baseline stereo pairs of VCDs;
identifying any wide baseline stereo pairs of VCDs that exist in the arrangement of sensors and generate pairs of infrared image streams;
creating a second set of different depth map image streams from the pairs of infrared image streams generated by the identified wide baseline stereo pairs of VCDs;
merging the different depth map image streams in the first and second sets into a stream of calibrated point cloud reconstructions of the scene; and
using said stream of calibrated point cloud reconstructions to generate one or more different high fidelity geometric proxies of the scene.
13. The process of claim 1 , wherein whenever the calibrated streams of sensor data comprise a plurality of different color image streams, the process action of using the selected 3D reconstruction methods to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data comprises the actions of:
identifying any narrow baseline stereo pairs of VCDs that exist in the arrangement of sensors and generate pairs of color image streams;
creating a first set of different depth map image streams from the pairs of color image streams generated by the identified narrow baseline stereo pairs of VCDs;
identifying any wide baseline stereo pairs of VCDs that exist in the arrangement of sensors and generate pairs of color image streams;
creating a second set of different depth map image streams from the pairs of color image streams generated by the identified wide baseline stereo pairs of VCDs;
merging the different depth map image streams in the first and second sets into a stream of calibrated point cloud reconstructions of the scene; and
using said stream of calibrated point cloud reconstructions to generate one or more different high fidelity geometric proxies of the scene.
14. A computer-implemented process for presenting a free viewpoint video of a scene to a user, comprising:
using a computing device to perform the following process actions:
inputting a scene proxy which geometrically describes the scene as a function of time;
generating a current synthetic viewpoint of the scene from the scene proxy, wherein said generation maximizes the photo-realism of said viewpoint based upon a set of current pipeline conditions, and said generation comprises the actions of,
periodically analyzing the current pipeline conditions,
using results of the periodic analysis to select one or more different image-based rendering methods which are matched to the current pipeline conditions, and
using the selected image-based rendering methods and the results of the period analysis to generate the current synthetic viewpoint of the scene; and
displaying the current synthetic viewpoint of the scene.
15. The process of claim 14 , wherein the set of current pipeline conditions comprises information generated by the user that specifies desired changes to the current synthetic viewpoint of the scene, said information comprising one or more of:
viewpoint navigation information; or
temporal navigation information.
16. The process of claim 14 , wherein an arrangement of sensors comprising a plurality of video capture devices (VCDs) is used to capture the scene, and the set of current pipeline conditions comprises one or more of:
the number of VCDs in the arrangement of sensors; or
one or more intrinsic characteristics of each of the VCDs; or
one or more extrinsic characteristics of each of the VCDs; or
the complexity and composition of the scene; or
whether the scene is relatively static or dynamic.
17. The process of claim 14 , wherein the scene proxy is generated using one or more different three-dimensional (3D) reconstruction methods and comprises one or more types of geometric proxy data, and the set of current pipeline conditions comprises one or more of:
the different 3D reconstruction methods used to generate the scene proxy; or
the types of geometric proxy data that are in the scene proxy; or
a level of data fidelity that is desired in the free viewpoint video.
18. The process of claim 14 , wherein the computing device comprises a display device upon which the current synthetic viewpoint of the scene is being displayed, a prescribed number of degrees of viewpoint navigation freedom are provided to the user, said number is greater than or equal to one and less than or equal to six, and the set of current pipeline conditions comprises one or more of:
graphics processing capabilities that are available in the hardware of the computing device being used to generate the current synthetic viewpoint of the scene; or
the display device type; or
particular characteristics of the display device, said characteristics comprising one or more of the aspect ratio of the display device, or the pixel resolution of the display device, or the form factor of the display device; or
the number of degrees of viewpoint navigation freedom that are provided to the user; or
the view frustum of the current synthetic viewpoint; or
whether or not the computing device comprises a natural user interface, and if so, the natural user interface modalities that are anticipated to be used by the user.
19. The process of claim 14 , wherein,
the free viewpoint video being presented comprises either asynchronous free viewpoint video, or unidirectional live free viewpoint video, or bidirectional live free viewpoint video, and
the set of current pipeline conditions comprises the type of free viewpoint video being presented.
20. A computer-implemented process for generating a free viewpoint video of a scene and presenting the free viewpoint video to a user, comprising:
using a computing device to perform the following process actions:
using an arrangement of sensors to capture the scene, wherein the arrangement generates a plurality of streams of sensor data each of which represents the scene from a different geometric perspective;
inputting the streams of sensor data;
calibrating the streams of sensor data;
generating a scene proxy from the calibrated streams of sensor data, wherein the scene proxy geometrically describes the scene as a function of time and comprises one or more types of geometric proxy data which is matched to a first set of current pipeline conditions in order to maximize the photo-realism of the free viewpoint video that results from the scene proxy at each point in time, said scene proxy generation comprising the actions of,
periodically analyzing the first set of current pipeline conditions,
using results of the periodic analysis of said first set to select one or more different three-dimensional (3D) reconstruction methods which are matched to said first set,
using the selected 3D reconstruction methods to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data, and
using the 3D reconstructions of the scene and the results of the periodic analysis of said first set to generate the scene proxy;
generating a current synthetic viewpoint of the scene from the scene proxy, wherein said current synthetic viewpoint generation maximizes the photo-realism of said viewpoint based upon a second set of current pipeline conditions, and said current synthetic viewpoint generation comprises the actions of,
periodically analyzing the second set of current pipeline conditions,
using results of the periodic analysis of said second set to select one or more different image-based rendering methods which are matched to said second set, and
using the selected image-based rendering methods and the results of the period analysis of said second set to generate the current synthetic viewpoint of the scene; and
displaying the current synthetic viewpoint of the scene.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/599,170 US20130321396A1 (en) | 2012-05-31 | 2012-08-30 | Multi-input free viewpoint video processing pipeline |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261653983P | 2012-05-31 | 2012-05-31 | |
US13/599,170 US20130321396A1 (en) | 2012-05-31 | 2012-08-30 | Multi-input free viewpoint video processing pipeline |
Publications (1)
Publication Number | Publication Date |
---|---|
US20130321396A1 true US20130321396A1 (en) | 2013-12-05 |
Family
ID=49669652
Family Applications (10)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/566,877 Active 2034-02-16 US9846960B2 (en) | 2012-05-31 | 2012-08-03 | Automated camera array calibration |
US13/588,917 Abandoned US20130321586A1 (en) | 2012-05-31 | 2012-08-17 | Cloud based free viewpoint video streaming |
US13/598,536 Abandoned US20130321593A1 (en) | 2012-05-31 | 2012-08-29 | View frustum culling for free viewpoint video (fvv) |
US13/599,263 Active 2033-02-25 US8917270B2 (en) | 2012-05-31 | 2012-08-30 | Video generation using three-dimensional hulls |
US13/599,170 Abandoned US20130321396A1 (en) | 2012-05-31 | 2012-08-30 | Multi-input free viewpoint video processing pipeline |
US13/598,747 Abandoned US20130321575A1 (en) | 2012-05-31 | 2012-08-30 | High definition bubbles for rendering free viewpoint video |
US13/599,678 Abandoned US20130321566A1 (en) | 2012-05-31 | 2012-08-30 | Audio source positioning using a camera |
US13/599,436 Active 2034-05-03 US9251623B2 (en) | 2012-05-31 | 2012-08-30 | Glancing angle exclusion |
US13/614,852 Active 2033-10-29 US9256980B2 (en) | 2012-05-31 | 2012-09-13 | Interpolating oriented disks in 3D space for constructing high fidelity geometric proxies from point clouds |
US13/790,158 Abandoned US20130321413A1 (en) | 2012-05-31 | 2013-03-08 | Video generation using convict hulls |
Family Applications Before (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/566,877 Active 2034-02-16 US9846960B2 (en) | 2012-05-31 | 2012-08-03 | Automated camera array calibration |
US13/588,917 Abandoned US20130321586A1 (en) | 2012-05-31 | 2012-08-17 | Cloud based free viewpoint video streaming |
US13/598,536 Abandoned US20130321593A1 (en) | 2012-05-31 | 2012-08-29 | View frustum culling for free viewpoint video (fvv) |
US13/599,263 Active 2033-02-25 US8917270B2 (en) | 2012-05-31 | 2012-08-30 | Video generation using three-dimensional hulls |
Family Applications After (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/598,747 Abandoned US20130321575A1 (en) | 2012-05-31 | 2012-08-30 | High definition bubbles for rendering free viewpoint video |
US13/599,678 Abandoned US20130321566A1 (en) | 2012-05-31 | 2012-08-30 | Audio source positioning using a camera |
US13/599,436 Active 2034-05-03 US9251623B2 (en) | 2012-05-31 | 2012-08-30 | Glancing angle exclusion |
US13/614,852 Active 2033-10-29 US9256980B2 (en) | 2012-05-31 | 2012-09-13 | Interpolating oriented disks in 3D space for constructing high fidelity geometric proxies from point clouds |
US13/790,158 Abandoned US20130321413A1 (en) | 2012-05-31 | 2013-03-08 | Video generation using convict hulls |
Country Status (1)
Country | Link |
---|---|
US (10) | US9846960B2 (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9191643B2 (en) | 2013-04-15 | 2015-11-17 | Microsoft Technology Licensing, Llc | Mixing infrared and color component data point clouds |
US9661312B2 (en) * | 2015-01-22 | 2017-05-23 | Microsoft Technology Licensing, Llc | Synthesizing second eye viewport using interleaving |
EP3266199A4 (en) * | 2015-03-01 | 2018-07-18 | NEXTVR Inc. | Methods and apparatus for supporting content generation, transmission and/or playback |
EP3425592A1 (en) * | 2017-07-06 | 2019-01-09 | Canon Kabushiki Kaisha | Image processing apparatus, image processing method, and program, for generating a virtual viewpoint image |
WO2019034807A1 (en) | 2017-08-15 | 2019-02-21 | Nokia Technologies Oy | Sequential encoding and decoding of volymetric video |
WO2019034808A1 (en) | 2017-08-15 | 2019-02-21 | Nokia Technologies Oy | Encoding and decoding of volumetric video |
CN109462749A (en) * | 2017-09-06 | 2019-03-12 | 佳能株式会社 | Information processing unit, information processing method and medium |
EP3460761A1 (en) * | 2017-09-22 | 2019-03-27 | Canon Kabushiki Kaisha | Image processing apparatus, image processing method, image processing system, and program |
CN110073414A (en) * | 2016-11-30 | 2019-07-30 | 佳能株式会社 | Image processing equipment and method |
US10510111B2 (en) | 2013-10-25 | 2019-12-17 | Appliance Computing III, Inc. | Image-based rendering of real spaces |
US10554713B2 (en) | 2015-06-19 | 2020-02-04 | Microsoft Technology Licensing, Llc | Low latency application streaming using temporal frame transformation |
CN110769241A (en) * | 2019-11-05 | 2020-02-07 | 广州虎牙科技有限公司 | Video frame processing method and device, user side and storage medium |
CN111052750A (en) * | 2017-08-30 | 2020-04-21 | 三星电子株式会社 | Method and device for point cloud stream transmission |
CN111343447A (en) * | 2017-09-19 | 2020-06-26 | 佳能株式会社 | Data providing apparatus, control method of data providing apparatus, and storage medium |
US10944960B2 (en) * | 2017-02-10 | 2021-03-09 | Panasonic Intellectual Property Corporation Of America | Free-viewpoint video generating method and free-viewpoint video generating system |
US11044570B2 (en) | 2017-03-20 | 2021-06-22 | Nokia Technologies Oy | Overlapping audio-object interactions |
US11074036B2 (en) | 2017-05-05 | 2021-07-27 | Nokia Technologies Oy | Metadata-free audio-object interactions |
US11096004B2 (en) * | 2017-01-23 | 2021-08-17 | Nokia Technologies Oy | Spatial audio rendering point extension |
CN113905221A (en) * | 2021-09-30 | 2022-01-07 | 福州大学 | Stereo panoramic video asymmetric transmission stream self-adaption method and system |
US11250619B2 (en) | 2016-11-30 | 2022-02-15 | Canon Kabushiki Kaisha | Image processing apparatus and method |
US11395087B2 (en) | 2017-09-29 | 2022-07-19 | Nokia Technologies Oy | Level-based audio-object interactions |
US11632489B2 (en) | 2017-01-31 | 2023-04-18 | Tetavi, Ltd. | System and method for rendering free viewpoint video for studio applications |
US11748918B1 (en) * | 2020-09-25 | 2023-09-05 | Apple Inc. | Synthesized camera arrays for rendering novel viewpoints |
Families Citing this family (231)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007043036A1 (en) * | 2005-10-11 | 2007-04-19 | Prime Sense Ltd. | Method and system for object reconstruction |
US8866920B2 (en) | 2008-05-20 | 2014-10-21 | Pelican Imaging Corporation | Capturing and processing of images using monolithic camera array with heterogeneous imagers |
US11792538B2 (en) | 2008-05-20 | 2023-10-17 | Adeia Imaging Llc | Capturing and processing of images including occlusions focused on an image sensor by a lens stack array |
US20150373153A1 (en) * | 2010-06-30 | 2015-12-24 | Primal Space Systems, Inc. | System and method to reduce bandwidth requirement for visibility event packet streaming using a predicted maximal view frustum and predicted maximal viewpoint extent, each computed at runtime |
US9892546B2 (en) * | 2010-06-30 | 2018-02-13 | Primal Space Systems, Inc. | Pursuit path camera model method and system |
US8878950B2 (en) | 2010-12-14 | 2014-11-04 | Pelican Imaging Corporation | Systems and methods for synthesizing high resolution images using super-resolution processes |
WO2013049699A1 (en) | 2011-09-28 | 2013-04-04 | Pelican Imaging Corporation | Systems and methods for encoding and decoding light field image files |
US9001960B2 (en) * | 2012-01-04 | 2015-04-07 | General Electric Company | Method and apparatus for reducing noise-related imaging artifacts |
US9300841B2 (en) * | 2012-06-25 | 2016-03-29 | Yoldas Askan | Method of generating a smooth image from point cloud data |
EP3869797B1 (en) | 2012-08-21 | 2023-07-19 | Adeia Imaging LLC | Method for depth detection in images captured using array cameras |
US10079968B2 (en) | 2012-12-01 | 2018-09-18 | Qualcomm Incorporated | Camera having additional functionality based on connectivity with a host device |
US9519968B2 (en) * | 2012-12-13 | 2016-12-13 | Hewlett-Packard Development Company, L.P. | Calibrating visual sensors using homography operators |
US9224227B2 (en) * | 2012-12-21 | 2015-12-29 | Nvidia Corporation | Tile shader for screen space, a method of rendering and a graphics processing unit employing the tile shader |
US8866912B2 (en) | 2013-03-10 | 2014-10-21 | Pelican Imaging Corporation | System and methods for calibration of an array camera using a single captured image |
US9144905B1 (en) * | 2013-03-13 | 2015-09-29 | Hrl Laboratories, Llc | Device and method to identify functional parts of tools for robotic manipulation |
US9578259B2 (en) | 2013-03-14 | 2017-02-21 | Fotonation Cayman Limited | Systems and methods for reducing motion blur in images or video in ultra low light with array cameras |
US9445003B1 (en) * | 2013-03-15 | 2016-09-13 | Pelican Imaging Corporation | Systems and methods for synthesizing high resolution images using image deconvolution based on motion and depth information |
WO2014162824A1 (en) * | 2013-04-04 | 2014-10-09 | ソニー株式会社 | Display control device, display control method and program |
US10262462B2 (en) | 2014-04-18 | 2019-04-16 | Magic Leap, Inc. | Systems and methods for augmented and virtual reality |
US9208609B2 (en) * | 2013-07-01 | 2015-12-08 | Mitsubishi Electric Research Laboratories, Inc. | Method for fitting primitive shapes to 3D point clouds using distance fields |
CN105308953A (en) * | 2013-07-19 | 2016-02-03 | 谷歌技术控股有限责任公司 | Asymmetric sensor array for capturing images |
US10140751B2 (en) * | 2013-08-08 | 2018-11-27 | Imagination Technologies Limited | Normal offset smoothing |
CN104424655A (en) * | 2013-09-10 | 2015-03-18 | 鸿富锦精密工业(深圳)有限公司 | System and method for reconstructing point cloud curved surface |
JP6476658B2 (en) * | 2013-09-11 | 2019-03-06 | ソニー株式会社 | Image processing apparatus and method |
US9286718B2 (en) * | 2013-09-27 | 2016-03-15 | Ortery Technologies, Inc. | Method using 3D geometry data for virtual reality image presentation and control in 3D space |
US10591969B2 (en) | 2013-10-25 | 2020-03-17 | Google Technology Holdings LLC | Sensor-based near-field communication authentication |
US9888333B2 (en) * | 2013-11-11 | 2018-02-06 | Google Technology Holdings LLC | Three-dimensional audio rendering techniques |
US10119808B2 (en) | 2013-11-18 | 2018-11-06 | Fotonation Limited | Systems and methods for estimating depth from projected texture using camera arrays |
US9426361B2 (en) | 2013-11-26 | 2016-08-23 | Pelican Imaging Corporation | Array camera configurations incorporating multiple constituent array cameras |
EP2881918B1 (en) * | 2013-12-06 | 2018-02-07 | My Virtual Reality Software AS | Method for visualizing three-dimensional data |
US9233469B2 (en) * | 2014-02-13 | 2016-01-12 | GM Global Technology Operations LLC | Robotic system with 3D box location functionality |
US9530226B2 (en) * | 2014-02-18 | 2016-12-27 | Par Technology Corporation | Systems and methods for optimizing N dimensional volume data for transmission |
WO2015130320A1 (en) | 2014-02-28 | 2015-09-03 | Hewlett-Packard Development Company, L.P. | Calibration of sensors and projector |
US9396586B2 (en) | 2014-03-14 | 2016-07-19 | Matterport, Inc. | Processing and/or transmitting 3D data |
US10600245B1 (en) * | 2014-05-28 | 2020-03-24 | Lucasfilm Entertainment Company Ltd. | Navigating a virtual environment of a media content item |
CN104089628B (en) * | 2014-06-30 | 2017-02-08 | 中国科学院光电研究院 | Self-adaption geometric calibration method of light field camera |
US11051000B2 (en) | 2014-07-14 | 2021-06-29 | Mitsubishi Electric Research Laboratories, Inc. | Method for calibrating cameras with non-overlapping views |
US10169909B2 (en) * | 2014-08-07 | 2019-01-01 | Pixar | Generating a volumetric projection for an object |
US10547825B2 (en) | 2014-09-22 | 2020-01-28 | Samsung Electronics Company, Ltd. | Transmission of three-dimensional video |
US11205305B2 (en) | 2014-09-22 | 2021-12-21 | Samsung Electronics Company, Ltd. | Presentation of three-dimensional video |
CN107077743B (en) | 2014-09-29 | 2021-03-23 | 快图有限公司 | System and method for dynamic calibration of an array camera |
US9600892B2 (en) * | 2014-11-06 | 2017-03-21 | Symbol Technologies, Llc | Non-parametric method of and system for estimating dimensions of objects of arbitrary shape |
EP3221851A1 (en) * | 2014-11-20 | 2017-09-27 | Cappasity Inc. | Systems and methods for 3d capture of objects using multiple range cameras and multiple rgb cameras |
US9396554B2 (en) | 2014-12-05 | 2016-07-19 | Symbol Technologies, Llc | Apparatus for and method of estimating dimensions of an object associated with a code in automatic response to reading the code |
DE102014118989A1 (en) * | 2014-12-18 | 2016-06-23 | Connaught Electronics Ltd. | Method for calibrating a camera system, camera system and motor vehicle |
US11019330B2 (en) | 2015-01-19 | 2021-05-25 | Aquifi, Inc. | Multiple camera system with auto recalibration |
US9686520B2 (en) * | 2015-01-22 | 2017-06-20 | Microsoft Technology Licensing, Llc | Reconstructing viewport upon user viewpoint misprediction |
CN111866022B (en) * | 2015-02-03 | 2022-08-30 | 杜比实验室特许公司 | Post-meeting playback system with perceived quality higher than that originally heard in meeting |
EP3070942B1 (en) * | 2015-03-17 | 2023-11-22 | InterDigital CE Patent Holdings | Method and apparatus for displaying light field video data |
US10878278B1 (en) * | 2015-05-16 | 2020-12-29 | Sturfee, Inc. | Geo-localization based on remotely sensed visual features |
JP6975642B2 (en) * | 2015-06-11 | 2021-12-01 | コンティ テミック マイクロエレクトロニック ゲゼルシャフト ミット ベシュレンクテル ハフツングConti Temic microelectronic GmbH | How to create a virtual image of the vehicle's perimeter |
US9460513B1 (en) | 2015-06-17 | 2016-10-04 | Mitsubishi Electric Research Laboratories, Inc. | Method for reconstructing a 3D scene as a 3D model using images acquired by 3D sensors and omnidirectional cameras |
KR101835434B1 (en) * | 2015-07-08 | 2018-03-09 | 고려대학교 산학협력단 | Method and Apparatus for generating a protection image, Method for mapping between image pixel and depth value |
US9848212B2 (en) * | 2015-07-10 | 2017-12-19 | Futurewei Technologies, Inc. | Multi-view video streaming with fast and smooth view switch |
EP3335418A1 (en) | 2015-08-14 | 2018-06-20 | PCMS Holdings, Inc. | System and method for augmented reality multi-view telepresence |
GB2543776B (en) * | 2015-10-27 | 2019-02-06 | Imagination Tech Ltd | Systems and methods for processing images of objects |
US10812778B1 (en) | 2015-11-09 | 2020-10-20 | Cognex Corporation | System and method for calibrating one or more 3D sensors mounted on a moving manipulator |
US20180374239A1 (en) * | 2015-11-09 | 2018-12-27 | Cognex Corporation | System and method for field calibration of a vision system imaging two opposite sides of a calibration object |
US11562502B2 (en) * | 2015-11-09 | 2023-01-24 | Cognex Corporation | System and method for calibrating a plurality of 3D sensors with respect to a motion conveyance |
US10757394B1 (en) * | 2015-11-09 | 2020-08-25 | Cognex Corporation | System and method for calibrating a plurality of 3D sensors with respect to a motion conveyance |
CN108369639B (en) * | 2015-12-11 | 2022-06-21 | 虞晶怡 | Image-based image rendering method and system using multiple cameras and depth camera array |
US10352689B2 (en) | 2016-01-28 | 2019-07-16 | Symbol Technologies, Llc | Methods and systems for high precision locationing with depth values |
US10145955B2 (en) | 2016-02-04 | 2018-12-04 | Symbol Technologies, Llc | Methods and systems for processing point-cloud data with a line scanner |
KR20170095030A (en) * | 2016-02-12 | 2017-08-22 | 삼성전자주식회사 | Scheme for supporting virtual reality content display in communication system |
CN107097698B (en) * | 2016-02-22 | 2021-10-01 | 福特环球技术公司 | Inflatable airbag system for a vehicle seat, seat assembly and method for adjusting the same |
US10989542B2 (en) | 2016-03-11 | 2021-04-27 | Kaarta, Inc. | Aligning measured signal data with slam localization data and uses thereof |
WO2017155970A1 (en) | 2016-03-11 | 2017-09-14 | Kaarta, Inc. | Laser scanner with real-time, online ego-motion estimation |
US11573325B2 (en) | 2016-03-11 | 2023-02-07 | Kaarta, Inc. | Systems and methods for improvements in scanning and mapping |
US11567201B2 (en) | 2016-03-11 | 2023-01-31 | Kaarta, Inc. | Laser scanner with real-time, online ego-motion estimation |
US10721451B2 (en) | 2016-03-23 | 2020-07-21 | Symbol Technologies, Llc | Arrangement for, and method of, loading freight into a shipping container |
CA2961921C (en) | 2016-03-29 | 2020-05-12 | Institut National D'optique | Camera calibration method using a calibration target |
US10762712B2 (en) | 2016-04-01 | 2020-09-01 | Pcms Holdings, Inc. | Apparatus and method for supporting interactive augmented reality functionalities |
US9805240B1 (en) | 2016-04-18 | 2017-10-31 | Symbol Technologies, Llc | Barcode scanning and dimensioning |
CN107341768B (en) * | 2016-04-29 | 2022-03-11 | 微软技术许可有限责任公司 | Grid noise reduction |
WO2017197114A1 (en) | 2016-05-11 | 2017-11-16 | Affera, Inc. | Anatomical model generation |
WO2017197294A1 (en) | 2016-05-12 | 2017-11-16 | Affera, Inc. | Three-dimensional cardiac representation |
EP3264759A1 (en) | 2016-06-30 | 2018-01-03 | Thomson Licensing | An apparatus and a method for generating data representative of a pixel beam |
US10192345B2 (en) * | 2016-07-19 | 2019-01-29 | Qualcomm Incorporated | Systems and methods for improved surface normal estimation |
US11082471B2 (en) * | 2016-07-27 | 2021-08-03 | R-Stor Inc. | Method and apparatus for bonding communication technologies |
US10574909B2 (en) | 2016-08-08 | 2020-02-25 | Microsoft Technology Licensing, Llc | Hybrid imaging sensor for structured light object capture |
US10776661B2 (en) | 2016-08-19 | 2020-09-15 | Symbol Technologies, Llc | Methods, systems and apparatus for segmenting and dimensioning objects |
US9980078B2 (en) | 2016-10-14 | 2018-05-22 | Nokia Technologies Oy | Audio object modification in free-viewpoint rendering |
US10229533B2 (en) * | 2016-11-03 | 2019-03-12 | Mitsubishi Electric Research Laboratories, Inc. | Methods and systems for fast resampling method and apparatus for point cloud data |
US11042161B2 (en) | 2016-11-16 | 2021-06-22 | Symbol Technologies, Llc | Navigation control method and apparatus in a mobile automation system |
US10451405B2 (en) | 2016-11-22 | 2019-10-22 | Symbol Technologies, Llc | Dimensioning system for, and method of, dimensioning freight in motion along an unconstrained path in a venue |
EP3336801A1 (en) * | 2016-12-19 | 2018-06-20 | Thomson Licensing | Method and apparatus for constructing lighting environment representations of 3d scenes |
US10354411B2 (en) | 2016-12-20 | 2019-07-16 | Symbol Technologies, Llc | Methods, systems and apparatus for segmenting objects |
JP7320352B2 (en) * | 2016-12-28 | 2023-08-03 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 3D model transmission method, 3D model reception method, 3D model transmission device, and 3D model reception device |
JP7086522B2 (en) * | 2017-02-28 | 2022-06-20 | キヤノン株式会社 | Image processing equipment, information processing methods and programs |
WO2018172614A1 (en) | 2017-03-22 | 2018-09-27 | Nokia Technologies Oy | A method and an apparatus and a computer program product for adaptive streaming |
US10726574B2 (en) | 2017-04-11 | 2020-07-28 | Dolby Laboratories Licensing Corporation | Passive multi-wearable-devices tracking |
JP6922369B2 (en) * | 2017-04-14 | 2021-08-18 | 富士通株式会社 | Viewpoint selection support program, viewpoint selection support method and viewpoint selection support device |
US10939038B2 (en) * | 2017-04-24 | 2021-03-02 | Intel Corporation | Object pre-encoding for 360-degree view for optimal quality and latency |
US10663590B2 (en) | 2017-05-01 | 2020-05-26 | Symbol Technologies, Llc | Device and method for merging lidar data |
US10726273B2 (en) | 2017-05-01 | 2020-07-28 | Symbol Technologies, Llc | Method and apparatus for shelf feature and object placement detection from shelf images |
US10591918B2 (en) | 2017-05-01 | 2020-03-17 | Symbol Technologies, Llc | Fixed segmented lattice planning for a mobile automation apparatus |
US11367092B2 (en) | 2017-05-01 | 2022-06-21 | Symbol Technologies, Llc | Method and apparatus for extracting and processing price text from an image set |
US11449059B2 (en) | 2017-05-01 | 2022-09-20 | Symbol Technologies, Llc | Obstacle detection for a mobile automation apparatus |
US11093896B2 (en) | 2017-05-01 | 2021-08-17 | Symbol Technologies, Llc | Product status detection system |
US10949798B2 (en) | 2017-05-01 | 2021-03-16 | Symbol Technologies, Llc | Multimodal localization and mapping for a mobile automation apparatus |
DE112018002314T5 (en) | 2017-05-01 | 2020-01-23 | Symbol Technologies, Llc | METHOD AND DEVICE FOR DETECTING AN OBJECT STATUS |
WO2018201423A1 (en) | 2017-05-05 | 2018-11-08 | Symbol Technologies, Llc | Method and apparatus for detecting and interpreting price label text |
CN108881784B (en) * | 2017-05-12 | 2020-07-03 | 腾讯科技(深圳)有限公司 | Virtual scene implementation method and device, terminal and server |
US10165386B2 (en) | 2017-05-16 | 2018-12-25 | Nokia Technologies Oy | VR audio superzoom |
US10154176B1 (en) * | 2017-05-30 | 2018-12-11 | Intel Corporation | Calibrating depth cameras using natural objects with expected shapes |
WO2018227001A1 (en) * | 2017-06-07 | 2018-12-13 | Google Llc | High speed, high-fidelity face tracking |
CN110999281B (en) | 2017-06-09 | 2021-11-26 | Pcms控股公司 | Method and device for allowing exploration in virtual landscape |
BR102017012517A2 (en) * | 2017-06-12 | 2018-12-26 | Samsung Eletrônica da Amazônia Ltda. | method for 360 ° media display or bubble interface |
CN110832553B (en) | 2017-06-29 | 2024-05-14 | 索尼公司 | Image processing apparatus and image processing method |
US11049218B2 (en) | 2017-08-11 | 2021-06-29 | Samsung Electronics Company, Ltd. | Seamless image stitching |
US10521914B2 (en) | 2017-09-07 | 2019-12-31 | Symbol Technologies, Llc | Multi-sensor object recognition system and method |
US10572763B2 (en) | 2017-09-07 | 2020-02-25 | Symbol Technologies, Llc | Method and apparatus for support surface edge detection |
US11818401B2 (en) | 2017-09-14 | 2023-11-14 | Apple Inc. | Point cloud geometry compression using octrees and binary arithmetic encoding with adaptive look-up tables |
US10861196B2 (en) * | 2017-09-14 | 2020-12-08 | Apple Inc. | Point cloud compression |
US10897269B2 (en) | 2017-09-14 | 2021-01-19 | Apple Inc. | Hierarchical point cloud compression |
US10909725B2 (en) | 2017-09-18 | 2021-02-02 | Apple Inc. | Point cloud compression |
US11113845B2 (en) | 2017-09-18 | 2021-09-07 | Apple Inc. | Point cloud compression using non-cubic projections and masks |
CN107610182B (en) * | 2017-09-22 | 2018-09-11 | 哈尔滨工业大学 | A kind of scaling method at light-field camera microlens array center |
EP3467777A1 (en) * | 2017-10-06 | 2019-04-10 | Thomson Licensing | A method and apparatus for encoding/decoding the colors of a point cloud representing a 3d object |
WO2019099605A1 (en) | 2017-11-17 | 2019-05-23 | Kaarta, Inc. | Methods and systems for geo-referencing mapping systems |
US10607373B2 (en) | 2017-11-22 | 2020-03-31 | Apple Inc. | Point cloud compression with closed-loop color conversion |
US10951879B2 (en) | 2017-12-04 | 2021-03-16 | Canon Kabushiki Kaisha | Method, system and apparatus for capture of image data for free viewpoint video |
WO2019123547A1 (en) * | 2017-12-19 | 2019-06-27 | 株式会社ソニー・インタラクティブエンタテインメント | Image generator, reference image data generator, image generation method, and reference image data generation method |
KR102334070B1 (en) | 2018-01-18 | 2021-12-03 | 삼성전자주식회사 | Electric apparatus and method for control thereof |
US11158124B2 (en) | 2018-01-30 | 2021-10-26 | Gaia3D, Inc. | Method of providing 3D GIS web service |
US10417806B2 (en) * | 2018-02-15 | 2019-09-17 | JJK Holdings, LLC | Dynamic local temporal-consistent textured mesh compression |
JP2019144958A (en) * | 2018-02-22 | 2019-08-29 | キヤノン株式会社 | Image processing device, image processing method, and program |
WO2019165194A1 (en) * | 2018-02-23 | 2019-08-29 | Kaarta, Inc. | Methods and systems for processing and colorizing point clouds and meshes |
US10542368B2 (en) | 2018-03-27 | 2020-01-21 | Nokia Technologies Oy | Audio content modification for playback audio |
WO2019195270A1 (en) | 2018-04-03 | 2019-10-10 | Kaarta, Inc. | Methods and systems for real or near real-time point cloud map data confidence evaluation |
WO2019193696A1 (en) | 2018-04-04 | 2019-10-10 | 株式会社ソニー・インタラクティブエンタテインメント | Reference image generation device, display image generation device, reference image generation method, and display image generation method |
US10740911B2 (en) | 2018-04-05 | 2020-08-11 | Symbol Technologies, Llc | Method, system and apparatus for correcting translucency artifacts in data representing a support structure |
US10809078B2 (en) | 2018-04-05 | 2020-10-20 | Symbol Technologies, Llc | Method, system and apparatus for dynamic path generation |
US11327504B2 (en) | 2018-04-05 | 2022-05-10 | Symbol Technologies, Llc | Method, system and apparatus for mobile automation apparatus localization |
US10823572B2 (en) | 2018-04-05 | 2020-11-03 | Symbol Technologies, Llc | Method, system and apparatus for generating navigational data |
US10832436B2 (en) | 2018-04-05 | 2020-11-10 | Symbol Technologies, Llc | Method, system and apparatus for recovering label positions |
US11010928B2 (en) | 2018-04-10 | 2021-05-18 | Apple Inc. | Adaptive distance based point cloud compression |
US10909726B2 (en) | 2018-04-10 | 2021-02-02 | Apple Inc. | Point cloud compression |
US10867414B2 (en) | 2018-04-10 | 2020-12-15 | Apple Inc. | Point cloud attribute transfer algorithm |
US10939129B2 (en) | 2018-04-10 | 2021-03-02 | Apple Inc. | Point cloud compression |
US10909727B2 (en) | 2018-04-10 | 2021-02-02 | Apple Inc. | Hierarchical point cloud compression with smoothing |
US11017566B1 (en) | 2018-07-02 | 2021-05-25 | Apple Inc. | Point cloud compression with adaptive filtering |
WO2020009826A1 (en) | 2018-07-05 | 2020-01-09 | Kaarta, Inc. | Methods and systems for auto-leveling of point clouds and 3d models |
US11202098B2 (en) | 2018-07-05 | 2021-12-14 | Apple Inc. | Point cloud compression with multi-resolution video encoding |
US11012713B2 (en) | 2018-07-12 | 2021-05-18 | Apple Inc. | Bit stream structure for compressed point cloud data |
US11367224B2 (en) | 2018-10-02 | 2022-06-21 | Apple Inc. | Occupancy map block-to-patch information compression |
US11010920B2 (en) | 2018-10-05 | 2021-05-18 | Zebra Technologies Corporation | Method, system and apparatus for object detection in point clouds |
US11506483B2 (en) | 2018-10-05 | 2022-11-22 | Zebra Technologies Corporation | Method, system and apparatus for support structure depth determination |
US11430155B2 (en) | 2018-10-05 | 2022-08-30 | Apple Inc. | Quantized depths for projection point cloud compression |
US10972835B2 (en) * | 2018-11-01 | 2021-04-06 | Sennheiser Electronic Gmbh & Co. Kg | Conference system with a microphone array system and a method of speech acquisition in a conference system |
US11003188B2 (en) | 2018-11-13 | 2021-05-11 | Zebra Technologies Corporation | Method, system and apparatus for obstacle handling in navigational path generation |
US11090811B2 (en) | 2018-11-13 | 2021-08-17 | Zebra Technologies Corporation | Method and apparatus for labeling of support structures |
CN109661816A (en) * | 2018-11-21 | 2019-04-19 | 京东方科技集团股份有限公司 | The method and display device of panoramic picture are generated and shown based on rendering engine |
US11079240B2 (en) | 2018-12-07 | 2021-08-03 | Zebra Technologies Corporation | Method, system and apparatus for adaptive particle filter localization |
CN109618122A (en) * | 2018-12-07 | 2019-04-12 | 合肥万户网络技术有限公司 | A kind of virtual office conference system |
US11416000B2 (en) | 2018-12-07 | 2022-08-16 | Zebra Technologies Corporation | Method and apparatus for navigational ray tracing |
US11100303B2 (en) | 2018-12-10 | 2021-08-24 | Zebra Technologies Corporation | Method, system and apparatus for auxiliary label detection and association |
US11015938B2 (en) | 2018-12-12 | 2021-05-25 | Zebra Technologies Corporation | Method, system and apparatus for navigational assistance |
US11423572B2 (en) | 2018-12-12 | 2022-08-23 | Analog Devices, Inc. | Built-in calibration of time-of-flight depth imaging systems |
KR20210096285A (en) * | 2018-12-13 | 2021-08-04 | 삼성전자주식회사 | Method, apparatus and computer readable recording medium for compressing 3D mesh content |
US10731970B2 (en) | 2018-12-13 | 2020-08-04 | Zebra Technologies Corporation | Method, system and apparatus for support structure detection |
US10818077B2 (en) | 2018-12-14 | 2020-10-27 | Canon Kabushiki Kaisha | Method, system and apparatus for controlling a virtual camera |
CA3028708A1 (en) | 2018-12-28 | 2020-06-28 | Zih Corp. | Method, system and apparatus for dynamic loop closure in mapping trajectories |
JP7211835B2 (en) * | 2019-02-04 | 2023-01-24 | i-PRO株式会社 | IMAGING SYSTEM AND SYNCHRONIZATION CONTROL METHOD |
WO2020164044A1 (en) * | 2019-02-14 | 2020-08-20 | 北京大学深圳研究生院 | Free-viewpoint image synthesis method, device, and apparatus |
JP6647433B1 (en) * | 2019-02-19 | 2020-02-14 | 株式会社メディア工房 | Point cloud data communication system, point cloud data transmission device, and point cloud data transmission method |
US10797090B2 (en) | 2019-02-27 | 2020-10-06 | Semiconductor Components Industries, Llc | Image sensor with near-infrared and visible light phase detection pixels |
US11037365B2 (en) | 2019-03-07 | 2021-06-15 | Alibaba Group Holding Limited | Method, apparatus, medium, terminal, and device for processing multi-angle free-perspective data |
US11057564B2 (en) | 2019-03-28 | 2021-07-06 | Apple Inc. | Multiple layer flexure for supporting a moving image sensor |
JP7479793B2 (en) * | 2019-04-11 | 2024-05-09 | キヤノン株式会社 | Image processing device, system for generating virtual viewpoint video, and method and program for controlling the image processing device |
US11402846B2 (en) | 2019-06-03 | 2022-08-02 | Zebra Technologies Corporation | Method, system and apparatus for mitigating data capture light leakage |
US11341663B2 (en) | 2019-06-03 | 2022-05-24 | Zebra Technologies Corporation | Method, system and apparatus for detecting support structure obstructions |
US11151743B2 (en) | 2019-06-03 | 2021-10-19 | Zebra Technologies Corporation | Method, system and apparatus for end of aisle detection |
US11662739B2 (en) | 2019-06-03 | 2023-05-30 | Zebra Technologies Corporation | Method, system and apparatus for adaptive ceiling-based localization |
US11080566B2 (en) | 2019-06-03 | 2021-08-03 | Zebra Technologies Corporation | Method, system and apparatus for gap detection in support structures with peg regions |
US11200677B2 (en) | 2019-06-03 | 2021-12-14 | Zebra Technologies Corporation | Method, system and apparatus for shelf edge detection |
US11960286B2 (en) | 2019-06-03 | 2024-04-16 | Zebra Technologies Corporation | Method, system and apparatus for dynamic task sequencing |
US11711544B2 (en) | 2019-07-02 | 2023-07-25 | Apple Inc. | Point cloud compression with supplemental information messages |
CN110624220B (en) * | 2019-09-04 | 2021-05-04 | 福建师范大学 | Method for obtaining optimal standing long jump technical template |
WO2021055585A1 (en) | 2019-09-17 | 2021-03-25 | Boston Polarimetrics, Inc. | Systems and methods for surface modeling using polarization cues |
US11562507B2 (en) | 2019-09-27 | 2023-01-24 | Apple Inc. | Point cloud compression using video encoding with time consistent patches |
US11627314B2 (en) | 2019-09-27 | 2023-04-11 | Apple Inc. | Video-based point cloud compression with non-normative smoothing |
EP4036863A4 (en) * | 2019-09-30 | 2023-02-08 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Human body model reconstruction method and reconstruction system, and storage medium |
US11538196B2 (en) | 2019-10-02 | 2022-12-27 | Apple Inc. | Predictive coding for point cloud compression |
US11895307B2 (en) | 2019-10-04 | 2024-02-06 | Apple Inc. | Block-based predictive coding for point cloud compression |
MX2022004162A (en) | 2019-10-07 | 2022-07-12 | Boston Polarimetrics Inc | Systems and methods for augmentation of sensor systems and imaging systems with polarization. |
US11315326B2 (en) * | 2019-10-15 | 2022-04-26 | At&T Intellectual Property I, L.P. | Extended reality anchor caching based on viewport prediction |
US12058510B2 (en) * | 2019-10-18 | 2024-08-06 | Sphere Entertainment Group, Llc | Mapping audio to visual images on a display device having a curved screen |
US11202162B2 (en) | 2019-10-18 | 2021-12-14 | Msg Entertainment Group, Llc | Synthesizing audio of a venue |
KR20230116068A (en) | 2019-11-30 | 2023-08-03 | 보스턴 폴라리메트릭스, 인크. | System and method for segmenting transparent objects using polarization signals |
US11507103B2 (en) | 2019-12-04 | 2022-11-22 | Zebra Technologies Corporation | Method, system and apparatus for localization-based historical obstacle handling |
US11107238B2 (en) | 2019-12-13 | 2021-08-31 | Zebra Technologies Corporation | Method, system and apparatus for detecting item facings |
US11734873B2 (en) | 2019-12-13 | 2023-08-22 | Sony Group Corporation | Real-time volumetric visualization of 2-D images |
US11798196B2 (en) | 2020-01-08 | 2023-10-24 | Apple Inc. | Video-based point cloud compression with predicted patches |
US11625866B2 (en) | 2020-01-09 | 2023-04-11 | Apple Inc. | Geometry encoding using octrees and predictive trees |
CN115552486A (en) | 2020-01-29 | 2022-12-30 | 因思创新有限责任公司 | System and method for characterizing an object pose detection and measurement system |
WO2021154459A1 (en) | 2020-01-30 | 2021-08-05 | Boston Polarimetrics, Inc. | Systems and methods for synthesizing data for training statistical models on different imaging modalities including polarized images |
US11240465B2 (en) | 2020-02-21 | 2022-02-01 | Alibaba Group Holding Limited | System and method to use decoder information in video super resolution |
US11430179B2 (en) * | 2020-02-24 | 2022-08-30 | Microsoft Technology Licensing, Llc | Depth buffer dilation for remote rendering |
US11822333B2 (en) | 2020-03-30 | 2023-11-21 | Zebra Technologies Corporation | Method, system and apparatus for data capture illumination control |
US11700353B2 (en) * | 2020-04-06 | 2023-07-11 | Eingot Llc | Integration of remote audio into a performance venue |
US11953700B2 (en) | 2020-05-27 | 2024-04-09 | Intrinsic Innovation Llc | Multi-aperture polarization optical systems using beam splitters |
US11776205B2 (en) * | 2020-06-09 | 2023-10-03 | Ptc Inc. | Determination of interactions with predefined volumes of space based on automated analysis of volumetric video |
US11620768B2 (en) | 2020-06-24 | 2023-04-04 | Apple Inc. | Point cloud geometry compression using octrees with multiple scan orders |
US11615557B2 (en) | 2020-06-24 | 2023-03-28 | Apple Inc. | Point cloud compression using octrees with slicing |
US11450024B2 (en) | 2020-07-17 | 2022-09-20 | Zebra Technologies Corporation | Mixed depth object detection |
US11875452B2 (en) * | 2020-08-18 | 2024-01-16 | Qualcomm Incorporated | Billboard layers in object-space rendering |
JP7386888B2 (en) * | 2020-10-08 | 2023-11-27 | グーグル エルエルシー | Two-shot composition of the speaker on the screen |
US11593915B2 (en) | 2020-10-21 | 2023-02-28 | Zebra Technologies Corporation | Parallax-tolerant panoramic image generation |
US11392891B2 (en) | 2020-11-03 | 2022-07-19 | Zebra Technologies Corporation | Item placement detection and optimization in material handling systems |
US11847832B2 (en) | 2020-11-11 | 2023-12-19 | Zebra Technologies Corporation | Object classification for autonomous navigation systems |
US11527014B2 (en) * | 2020-11-24 | 2022-12-13 | Verizon Patent And Licensing Inc. | Methods and systems for calibrating surface data capture devices |
US11874415B2 (en) * | 2020-12-22 | 2024-01-16 | International Business Machines Corporation | Earthquake detection and response via distributed visual input |
US11703457B2 (en) * | 2020-12-29 | 2023-07-18 | Industrial Technology Research Institute | Structure diagnosis system and structure diagnosis method |
US12020455B2 (en) | 2021-03-10 | 2024-06-25 | Intrinsic Innovation Llc | Systems and methods for high dynamic range image reconstruction |
US12069227B2 (en) | 2021-03-10 | 2024-08-20 | Intrinsic Innovation Llc | Multi-modal and multi-spectral stereo camera arrays |
US11651538B2 (en) * | 2021-03-17 | 2023-05-16 | International Business Machines Corporation | Generating 3D videos from 2D models |
US11948338B1 (en) | 2021-03-29 | 2024-04-02 | Apple Inc. | 3D volumetric content encoding using 2D videos and simplified 3D meshes |
US11954886B2 (en) | 2021-04-15 | 2024-04-09 | Intrinsic Innovation Llc | Systems and methods for six-degree of freedom pose estimation of deformable objects |
US11290658B1 (en) | 2021-04-15 | 2022-03-29 | Boston Polarimetrics, Inc. | Systems and methods for camera exposure control |
US12067746B2 (en) | 2021-05-07 | 2024-08-20 | Intrinsic Innovation Llc | Systems and methods for using computer vision to pick up small objects |
US11954882B2 (en) | 2021-06-17 | 2024-04-09 | Zebra Technologies Corporation | Feature-based georegistration for mobile computing devices |
US11689813B2 (en) | 2021-07-01 | 2023-06-27 | Intrinsic Innovation Llc | Systems and methods for high dynamic range imaging using crossed polarizers |
CN113761238B (en) * | 2021-08-27 | 2022-08-23 | 广州文远知行科技有限公司 | Point cloud storage method, device, equipment and storage medium |
US11823319B2 (en) | 2021-09-02 | 2023-11-21 | Nvidia Corporation | Techniques for rendering signed distance functions |
CN114355287B (en) * | 2022-01-04 | 2023-08-15 | 湖南大学 | Ultra-short baseline underwater sound distance measurement method and system |
WO2023159180A1 (en) * | 2022-02-17 | 2023-08-24 | Nutech Ventures | Single-pass 3d reconstruction of internal surface of pipelines using depth camera array |
CN116800947A (en) * | 2022-03-16 | 2023-09-22 | 安霸国际有限合伙企业 | Rapid RGB-IR calibration verification for mass production process |
WO2024006997A1 (en) * | 2022-07-01 | 2024-01-04 | Google Llc | Three-dimensional video highlight from a camera source |
US20240185461A1 (en) * | 2022-12-05 | 2024-06-06 | Verizon Patent And Licensing Inc. | Calibration methods and systems for an under-calibrated camera capturing a scene |
WO2024144805A1 (en) * | 2022-12-29 | 2024-07-04 | Innopeak Technology, Inc. | Methods and systems for image processing with eye gaze redirection |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5850352A (en) * | 1995-03-31 | 1998-12-15 | The Regents Of The University Of California | Immersive video, including video hypermosaicing to generate from multiple video views of a scene a three-dimensional video mosaic from which diverse virtual video scene images are synthesized, including panoramic, scene interactive and stereoscopic images |
US5952993A (en) * | 1995-08-25 | 1999-09-14 | Kabushiki Kaisha Toshiba | Virtual object display apparatus and method |
US20030085992A1 (en) * | 2000-03-07 | 2003-05-08 | Sarnoff Corporation | Method and apparatus for providing immersive surveillance |
US20050075167A1 (en) * | 2001-08-09 | 2005-04-07 | Igt | Game interaction in 3-D gaming environments |
Family Cites Families (105)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5602903A (en) | 1994-09-28 | 1997-02-11 | Us West Technologies, Inc. | Positioning system and method |
US6327381B1 (en) | 1994-12-29 | 2001-12-04 | Worldscape, Llc | Image transformation and synthesis methods |
US6163337A (en) | 1996-04-05 | 2000-12-19 | Matsushita Electric Industrial Co., Ltd. | Multi-view point image transmission method and multi-view point image display method |
US5926400A (en) | 1996-11-21 | 1999-07-20 | Intel Corporation | Apparatus and method for determining the intensity of a sound in a virtual world |
US6064771A (en) | 1997-06-23 | 2000-05-16 | Real-Time Geometry Corp. | System and method for asynchronous, adaptive moving picture compression, and decompression |
US6072496A (en) | 1998-06-08 | 2000-06-06 | Microsoft Corporation | Method and system for capturing and representing 3D geometry, color and shading of facial expressions and other animated objects |
US6226003B1 (en) | 1998-08-11 | 2001-05-01 | Silicon Graphics, Inc. | Method for rendering silhouette and true edges of 3-D line drawings with occlusion |
US6556199B1 (en) | 1999-08-11 | 2003-04-29 | Advanced Research And Technology Institute | Method and apparatus for fast voxelization of volumetric models |
US6509902B1 (en) | 2000-02-28 | 2003-01-21 | Mitsubishi Electric Research Laboratories, Inc. | Texture filtering for surface elements |
US6968299B1 (en) | 2000-04-14 | 2005-11-22 | International Business Machines Corporation | Method and apparatus for reconstructing a surface using a ball-pivoting algorithm |
US6750873B1 (en) | 2000-06-27 | 2004-06-15 | International Business Machines Corporation | High quality texture reconstruction from multiple scans |
US7538764B2 (en) | 2001-01-05 | 2009-05-26 | Interuniversitair Micro-Elektronica Centrum (Imec) | System and method to obtain surface structures of multi-dimensional objects, and to represent those surface structures for animation, transmission and display |
US6919906B2 (en) | 2001-05-08 | 2005-07-19 | Microsoft Corporation | Discontinuity edge overdraw |
GB2378337B (en) | 2001-06-11 | 2005-04-13 | Canon Kk | 3D Computer modelling apparatus |
US6990681B2 (en) | 2001-08-09 | 2006-01-24 | Sony Corporation | Enhancing broadcast of an event with synthetic scene using a depth map |
US6781591B2 (en) | 2001-08-15 | 2004-08-24 | Mitsubishi Electric Research Laboratories, Inc. | Blending multiple images using local and global information |
US7023432B2 (en) | 2001-09-24 | 2006-04-04 | Geomagic, Inc. | Methods, apparatus and computer program products that reconstruct surfaces from data point sets |
US7096428B2 (en) | 2001-09-28 | 2006-08-22 | Fuji Xerox Co., Ltd. | Systems and methods for providing a spatially indexed panoramic video |
JPWO2003067527A1 (en) | 2002-02-06 | 2005-06-02 | デジタルプロセス株式会社 | 3D shape display program, 3D shape display method, and 3D shape display device |
US20040217956A1 (en) | 2002-02-28 | 2004-11-04 | Paul Besl | Method and system for processing, compressing, streaming, and interactive rendering of 3D color image data |
US7515173B2 (en) | 2002-05-23 | 2009-04-07 | Microsoft Corporation | Head pose tracking system |
US7030875B2 (en) | 2002-09-04 | 2006-04-18 | Honda Motor Company Ltd. | Environmental reasoning using geometric data structure |
US7106358B2 (en) | 2002-12-30 | 2006-09-12 | Motorola, Inc. | Method, system and apparatus for telepresence communications |
US20050017969A1 (en) | 2003-05-27 | 2005-01-27 | Pradeep Sen | Computer graphics rendering using boundary information |
US7480401B2 (en) | 2003-06-23 | 2009-01-20 | Siemens Medical Solutions Usa, Inc. | Method for local surface smoothing with application to chest wall nodule segmentation in lung CT data |
US7321669B2 (en) * | 2003-07-10 | 2008-01-22 | Sarnoff Corporation | Method and apparatus for refining target position and size estimates using image and depth data |
GB2405775B (en) | 2003-09-05 | 2008-04-02 | Canon Europa Nv | 3D computer surface model generation |
US7184052B2 (en) | 2004-06-18 | 2007-02-27 | Microsoft Corporation | Real-time texture rendering using generalized displacement maps |
US7292257B2 (en) | 2004-06-28 | 2007-11-06 | Microsoft Corporation | Interactive viewpoint video system and process |
US20060023782A1 (en) | 2004-07-27 | 2006-02-02 | Microsoft Corporation | System and method for off-line multi-view video compression |
US7671893B2 (en) | 2004-07-27 | 2010-03-02 | Microsoft Corp. | System and method for interactive multi-view video |
US7142209B2 (en) | 2004-08-03 | 2006-11-28 | Microsoft Corporation | Real-time rendering system and process for interactive viewpoint video that was generated using overlapping images of a scene captured from viewpoints forming a grid |
US7561620B2 (en) | 2004-08-03 | 2009-07-14 | Microsoft Corporation | System and process for compressing and decompressing multiple, layered, video streams employing spatial and temporal encoding |
US7221366B2 (en) | 2004-08-03 | 2007-05-22 | Microsoft Corporation | Real-time rendering system and process for interactive viewpoint video |
US8477173B2 (en) | 2004-10-15 | 2013-07-02 | Lifesize Communications, Inc. | High definition videoconferencing system |
JPWO2006062199A1 (en) | 2004-12-10 | 2008-06-12 | 国立大学法人京都大学 | Three-dimensional image data compression apparatus, method, program, and recording medium |
WO2006084385A1 (en) | 2005-02-11 | 2006-08-17 | Macdonald Dettwiler & Associates Inc. | 3d imaging system |
DE102005023195A1 (en) | 2005-05-19 | 2006-11-23 | Siemens Ag | Method for expanding the display area of a volume recording of an object area |
US8228994B2 (en) | 2005-05-20 | 2012-07-24 | Microsoft Corporation | Multi-view video coding based on temporal and view decomposition |
US20070070177A1 (en) | 2005-07-01 | 2007-03-29 | Christensen Dennis G | Visual and aural perspective management for enhanced interactive video telepresence |
JP4595733B2 (en) | 2005-08-02 | 2010-12-08 | カシオ計算機株式会社 | Image processing device |
US7551232B2 (en) | 2005-11-14 | 2009-06-23 | Lsi Corporation | Noise adaptive 3D composite noise reduction |
US7623127B2 (en) | 2005-11-29 | 2009-11-24 | Siemens Medical Solutions Usa, Inc. | Method and apparatus for discrete mesh filleting and rounding through ball pivoting |
US7577491B2 (en) | 2005-11-30 | 2009-08-18 | General Electric Company | System and method for extracting parameters of a cutting tool |
KR100810268B1 (en) | 2006-04-06 | 2008-03-06 | 삼성전자주식회사 | Embodiment Method For Color-weakness in Mobile Display Apparatus |
US7778491B2 (en) | 2006-04-10 | 2010-08-17 | Microsoft Corporation | Oblique image stitching |
US7679639B2 (en) | 2006-04-20 | 2010-03-16 | Cisco Technology, Inc. | System and method for enhancing eye gaze in a telepresence system |
EP1862969A1 (en) | 2006-06-02 | 2007-12-05 | Eidgenössische Technische Hochschule Zürich | Method and system for generating a representation of a dynamically changing 3D scene |
US20080043024A1 (en) | 2006-06-26 | 2008-02-21 | Siemens Corporate Research, Inc. | Method for reconstructing an object subject to a cone beam using a graphic processor unit (gpu) |
USD610105S1 (en) | 2006-07-10 | 2010-02-16 | Cisco Technology, Inc. | Telepresence system |
US20080095465A1 (en) | 2006-10-18 | 2008-04-24 | General Electric Company | Image registration system and method |
US8213711B2 (en) | 2007-04-03 | 2012-07-03 | Her Majesty The Queen In Right Of Canada As Represented By The Minister Of Industry, Through The Communications Research Centre Canada | Method and graphical user interface for modifying depth maps |
GB0708676D0 (en) | 2007-05-04 | 2007-06-13 | Imec Inter Uni Micro Electr | A Method for real-time/on-line performing of multi view multimedia applications |
US8253770B2 (en) | 2007-05-31 | 2012-08-28 | Eastman Kodak Company | Residential video communication system |
US8063901B2 (en) | 2007-06-19 | 2011-11-22 | Siemens Aktiengesellschaft | Method and apparatus for efficient client-server visualization of multi-dimensional data |
JP4947593B2 (en) | 2007-07-31 | 2012-06-06 | Kddi株式会社 | Apparatus and program for generating free viewpoint image by local region segmentation |
US8223192B2 (en) | 2007-10-31 | 2012-07-17 | Technion Research And Development Foundation Ltd. | Free viewpoint video |
US8451265B2 (en) | 2007-11-16 | 2013-05-28 | Sportvision, Inc. | Virtual viewpoint animation |
US8160345B2 (en) | 2008-04-30 | 2012-04-17 | Otismed Corporation | System and method for image segmentation in generating computer models of a joint to undergo arthroplasty |
KR101335346B1 (en) * | 2008-02-27 | 2013-12-05 | 소니 컴퓨터 엔터테인먼트 유럽 리미티드 | Methods for capturing depth data of a scene and applying computer actions |
TWI357582B (en) | 2008-04-18 | 2012-02-01 | Univ Nat Taiwan | Image tracking system and method thereof |
US8442355B2 (en) | 2008-05-23 | 2013-05-14 | Samsung Electronics Co., Ltd. | System and method for generating a multi-dimensional image |
US7840638B2 (en) | 2008-06-27 | 2010-11-23 | Microsoft Corporation | Participant positioning in multimedia conferencing |
US8106924B2 (en) | 2008-07-31 | 2012-01-31 | Stmicroelectronics S.R.L. | Method and system for video rendering, computer program product therefor |
WO2010023580A1 (en) | 2008-08-29 | 2010-03-04 | Koninklijke Philips Electronics, N.V. | Dynamic transfer of three-dimensional image data |
JP5170249B2 (en) | 2008-09-29 | 2013-03-27 | パナソニック株式会社 | Stereoscopic image processing apparatus and noise reduction method for stereoscopic image processing apparatus |
JP5243612B2 (en) | 2008-10-02 | 2013-07-24 | フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ | Intermediate image synthesis and multi-view data signal extraction |
US8200041B2 (en) | 2008-12-18 | 2012-06-12 | Intel Corporation | Hardware accelerated silhouette detection |
US8436852B2 (en) | 2009-02-09 | 2013-05-07 | Microsoft Corporation | Image editing consistent with scene geometry |
US8477175B2 (en) | 2009-03-09 | 2013-07-02 | Cisco Technology, Inc. | System and method for providing three dimensional imaging in a network environment |
JP5222205B2 (en) | 2009-04-03 | 2013-06-26 | Kddi株式会社 | Image processing apparatus, method, and program |
US20100259595A1 (en) | 2009-04-10 | 2010-10-14 | Nokia Corporation | Methods and Apparatuses for Efficient Streaming of Free View Point Video |
US8719309B2 (en) | 2009-04-14 | 2014-05-06 | Apple Inc. | Method and apparatus for media data transmission |
US8665259B2 (en) | 2009-04-16 | 2014-03-04 | Autodesk, Inc. | Multiscale three-dimensional navigation |
US8755569B2 (en) | 2009-05-29 | 2014-06-17 | University Of Central Florida Research Foundation, Inc. | Methods for recognizing pose and action of articulated objects with collection of planes in motion |
US8629866B2 (en) | 2009-06-18 | 2014-01-14 | International Business Machines Corporation | Computer method and apparatus providing interactive control and remote identity through in-world proxy |
KR101070591B1 (en) * | 2009-06-25 | 2011-10-06 | (주)실리콘화일 | distance measuring apparatus having dual stereo camera |
US9648346B2 (en) | 2009-06-25 | 2017-05-09 | Microsoft Technology Licensing, Llc | Multi-view video compression and streaming based on viewpoints of remote viewer |
US8194149B2 (en) | 2009-06-30 | 2012-06-05 | Cisco Technology, Inc. | Infrared-aided depth estimation |
US8633940B2 (en) | 2009-08-04 | 2014-01-21 | Broadcom Corporation | Method and system for texture compression in a system having an AVC decoder and a 3D engine |
US8908958B2 (en) | 2009-09-03 | 2014-12-09 | Ron Kimmel | Devices and methods of generating three dimensional (3D) colored models |
US8284237B2 (en) | 2009-09-09 | 2012-10-09 | Nokia Corporation | Rendering multiview content in a 3D video system |
US8441482B2 (en) | 2009-09-21 | 2013-05-14 | Caustic Graphics, Inc. | Systems and methods for self-intersection avoidance in ray tracing |
US20110084983A1 (en) | 2009-09-29 | 2011-04-14 | Wavelength & Resonance LLC | Systems and Methods for Interaction With a Virtual Environment |
US9154730B2 (en) | 2009-10-16 | 2015-10-06 | Hewlett-Packard Development Company, L.P. | System and method for determining the active talkers in a video conference |
US8537200B2 (en) | 2009-10-23 | 2013-09-17 | Qualcomm Incorporated | Depth map generation techniques for conversion of 2D video data to 3D video data |
US20110122225A1 (en) | 2009-11-23 | 2011-05-26 | General Instrument Corporation | Depth Coding as an Additional Channel to Video Sequence |
US8487977B2 (en) | 2010-01-26 | 2013-07-16 | Polycom, Inc. | Method and apparatus to virtualize people with 3D effect into a remote room on a telepresence call for true in person experience |
US20110211749A1 (en) | 2010-02-28 | 2011-09-01 | Kar Han Tan | System And Method For Processing Video Using Depth Sensor Information |
US8898567B2 (en) | 2010-04-09 | 2014-11-25 | Nokia Corporation | Method and apparatus for generating a virtual interactive workspace |
EP2383696A1 (en) | 2010-04-30 | 2011-11-02 | LiberoVision AG | Method for estimating a pose of an articulated object model |
US20110304619A1 (en) | 2010-06-10 | 2011-12-15 | Autodesk, Inc. | Primitive quadric surface extraction from unorganized point cloud data |
US8411126B2 (en) | 2010-06-24 | 2013-04-02 | Hewlett-Packard Development Company, L.P. | Methods and systems for close proximity spatial audio rendering |
KR20120011653A (en) * | 2010-07-29 | 2012-02-08 | 삼성전자주식회사 | Image processing apparatus and method |
US8659597B2 (en) | 2010-09-27 | 2014-02-25 | Intel Corporation | Multi-view ray tracing using edge detection and shader reuse |
US8787459B2 (en) | 2010-11-09 | 2014-07-22 | Sony Computer Entertainment Inc. | Video coding methods and apparatus |
US9123115B2 (en) * | 2010-11-23 | 2015-09-01 | Qualcomm Incorporated | Depth estimation based on global motion and optical flow |
JP5858380B2 (en) * | 2010-12-03 | 2016-02-10 | 国立大学法人名古屋大学 | Virtual viewpoint image composition method and virtual viewpoint image composition system |
US8693713B2 (en) | 2010-12-17 | 2014-04-08 | Microsoft Corporation | Virtual audio environment for multidimensional conferencing |
US8156239B1 (en) | 2011-03-09 | 2012-04-10 | Metropcs Wireless, Inc. | Adaptive multimedia renderer |
US9117113B2 (en) | 2011-05-13 | 2015-08-25 | Liberovision Ag | Silhouette-based pose estimation |
US8867886B2 (en) | 2011-08-08 | 2014-10-21 | Roy Feinson | Surround video playback |
WO2013049388A1 (en) | 2011-09-29 | 2013-04-04 | Dolby Laboratories Licensing Corporation | Representation and coding of multi-view images using tapestry encoding |
US9830743B2 (en) | 2012-04-03 | 2017-11-28 | Autodesk, Inc. | Volume-preserving smoothing brush |
US9058706B2 (en) | 2012-04-30 | 2015-06-16 | Convoy Technologies Llc | Motor vehicle camera and monitoring system |
-
2012
- 2012-08-03 US US13/566,877 patent/US9846960B2/en active Active
- 2012-08-17 US US13/588,917 patent/US20130321586A1/en not_active Abandoned
- 2012-08-29 US US13/598,536 patent/US20130321593A1/en not_active Abandoned
- 2012-08-30 US US13/599,263 patent/US8917270B2/en active Active
- 2012-08-30 US US13/599,170 patent/US20130321396A1/en not_active Abandoned
- 2012-08-30 US US13/598,747 patent/US20130321575A1/en not_active Abandoned
- 2012-08-30 US US13/599,678 patent/US20130321566A1/en not_active Abandoned
- 2012-08-30 US US13/599,436 patent/US9251623B2/en active Active
- 2012-09-13 US US13/614,852 patent/US9256980B2/en active Active
-
2013
- 2013-03-08 US US13/790,158 patent/US20130321413A1/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5850352A (en) * | 1995-03-31 | 1998-12-15 | The Regents Of The University Of California | Immersive video, including video hypermosaicing to generate from multiple video views of a scene a three-dimensional video mosaic from which diverse virtual video scene images are synthesized, including panoramic, scene interactive and stereoscopic images |
US5952993A (en) * | 1995-08-25 | 1999-09-14 | Kabushiki Kaisha Toshiba | Virtual object display apparatus and method |
US20030085992A1 (en) * | 2000-03-07 | 2003-05-08 | Sarnoff Corporation | Method and apparatus for providing immersive surveillance |
US20050075167A1 (en) * | 2001-08-09 | 2005-04-07 | Igt | Game interaction in 3-D gaming environments |
Non-Patent Citations (6)
Title |
---|
Funkhouser, Thomas A., and Carlo H. Séquin. "Adaptive display algorithm for interactive frame rates during visualization of complex virtual environments." Proceedings of the 20th annual conference on Computer graphics and interactive techniques. ACM, 1993. * |
Matthies, Larry, and Masatoshi Okutomi. "A Bayesian Foundation for Active Stereo Vision1." 1989 Advances in Intelligent Robotics Systems Conference. International Society for Optics and Photonics, 1990. * |
Petit, Benjamin, et al. "Multicamera real-time 3d modeling for telepresence and remote collaboration." International journal of digital multimedia broadcasting 2010 (2009). * |
Rankin, Arturo L., et al. "Passive perception system for day/night autonomous off-road navigation." Defense and Security. International Society for Optics and Photonics, 2005. * |
W�rmlin, Stephan, Edouard Lamboray, and Markus Gross. "3D video fragments: Dynamic point samples for real-time free-viewpoint video." Computers & Graphics 28.1 (2004): 3-14. * |
Yang, Zhenyu, et al. "A multi-stream adaptation framework for bandwidth management in 3D tele-immersion." Proceedings of the 2006 international workshop on Network and operating systems support for digital audio and video. ACM, 2006. * |
Cited By (60)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9191643B2 (en) | 2013-04-15 | 2015-11-17 | Microsoft Technology Licensing, Llc | Mixing infrared and color component data point clouds |
US11783409B1 (en) | 2013-10-25 | 2023-10-10 | Appliance Computing III, Inc. | Image-based rendering of real spaces |
US11449926B1 (en) | 2013-10-25 | 2022-09-20 | Appliance Computing III, Inc. | Image-based rendering of real spaces |
US10592973B1 (en) | 2013-10-25 | 2020-03-17 | Appliance Computing III, Inc. | Image-based rendering of real spaces |
US11610256B1 (en) | 2013-10-25 | 2023-03-21 | Appliance Computing III, Inc. | User interface for image-based rendering of virtual tours |
US10510111B2 (en) | 2013-10-25 | 2019-12-17 | Appliance Computing III, Inc. | Image-based rendering of real spaces |
US11948186B1 (en) | 2013-10-25 | 2024-04-02 | Appliance Computing III, Inc. | User interface for image-based rendering of virtual tours |
US11062384B1 (en) | 2013-10-25 | 2021-07-13 | Appliance Computing III, Inc. | Image-based rendering of real spaces |
US9661312B2 (en) * | 2015-01-22 | 2017-05-23 | Microsoft Technology Licensing, Llc | Synthesizing second eye viewport using interleaving |
US11870967B2 (en) | 2015-03-01 | 2024-01-09 | Nevermind Capital Llc | Methods and apparatus for supporting content generation, transmission and/or playback |
EP3266199A4 (en) * | 2015-03-01 | 2018-07-18 | NEXTVR Inc. | Methods and apparatus for supporting content generation, transmission and/or playback |
US10397538B2 (en) | 2015-03-01 | 2019-08-27 | Nextvr Inc. | Methods and apparatus for supporting content generation, transmission and/or playback |
US10742948B2 (en) | 2015-03-01 | 2020-08-11 | Nextvr Inc. | Methods and apparatus for requesting, receiving and/or playing back content corresponding to an environment |
US10574962B2 (en) * | 2015-03-01 | 2020-02-25 | Nextvr Inc. | Methods and apparatus for requesting, receiving and/or playing back content corresponding to an environment |
US10701331B2 (en) | 2015-03-01 | 2020-06-30 | Nextvr Inc. | Methods and apparatus for supporting content generation, transmission and/or playback |
US10554713B2 (en) | 2015-06-19 | 2020-02-04 | Microsoft Technology Licensing, Llc | Low latency application streaming using temporal frame transformation |
US11250619B2 (en) | 2016-11-30 | 2022-02-15 | Canon Kabushiki Kaisha | Image processing apparatus and method |
CN110073414A (en) * | 2016-11-30 | 2019-07-30 | 佳能株式会社 | Image processing equipment and method |
EP3550522A4 (en) * | 2016-11-30 | 2020-07-08 | C/o Canon Kabushiki Kaisha | Image processing device and method |
US11096004B2 (en) * | 2017-01-23 | 2021-08-17 | Nokia Technologies Oy | Spatial audio rendering point extension |
US20210329400A1 (en) * | 2017-01-23 | 2021-10-21 | Nokia Technologies Oy | Spatial Audio Rendering Point Extension |
US11665308B2 (en) | 2017-01-31 | 2023-05-30 | Tetavi, Ltd. | System and method for rendering free viewpoint video for sport applications |
US11632489B2 (en) | 2017-01-31 | 2023-04-18 | Tetavi, Ltd. | System and method for rendering free viewpoint video for studio applications |
US10944960B2 (en) * | 2017-02-10 | 2021-03-09 | Panasonic Intellectual Property Corporation Of America | Free-viewpoint video generating method and free-viewpoint video generating system |
US11044570B2 (en) | 2017-03-20 | 2021-06-22 | Nokia Technologies Oy | Overlapping audio-object interactions |
US11442693B2 (en) | 2017-05-05 | 2022-09-13 | Nokia Technologies Oy | Metadata-free audio-object interactions |
US11604624B2 (en) | 2017-05-05 | 2023-03-14 | Nokia Technologies Oy | Metadata-free audio-object interactions |
US11074036B2 (en) | 2017-05-05 | 2021-07-27 | Nokia Technologies Oy | Metadata-free audio-object interactions |
CN109214265A (en) * | 2017-07-06 | 2019-01-15 | 佳能株式会社 | Image processing apparatus, its image processing method and storage medium |
JP2019016161A (en) * | 2017-07-06 | 2019-01-31 | キヤノン株式会社 | Image processing device and control method thereof |
KR102316056B1 (en) * | 2017-07-06 | 2021-10-22 | 캐논 가부시끼가이샤 | Image processing apparatus, image processing method thereof and program |
KR20190005765A (en) * | 2017-07-06 | 2019-01-16 | 캐논 가부시끼가이샤 | Image processing apparatus, image processing method thereof and program |
US11025878B2 (en) | 2017-07-06 | 2021-06-01 | Canon Kabushiki Kaisha | Image processing apparatus, image processing method thereof and storage medium |
EP3425592A1 (en) * | 2017-07-06 | 2019-01-09 | Canon Kabushiki Kaisha | Image processing apparatus, image processing method, and program, for generating a virtual viewpoint image |
EP3669333A4 (en) * | 2017-08-15 | 2021-04-07 | Nokia Technologies Oy | Sequential encoding and decoding of volymetric video |
WO2019034807A1 (en) | 2017-08-15 | 2019-02-21 | Nokia Technologies Oy | Sequential encoding and decoding of volymetric video |
US11109066B2 (en) * | 2017-08-15 | 2021-08-31 | Nokia Technologies Oy | Encoding and decoding of volumetric video |
US20200244993A1 (en) * | 2017-08-15 | 2020-07-30 | Nokia Technologies Oy | Encoding and decoding of volumetric video |
WO2019034808A1 (en) | 2017-08-15 | 2019-02-21 | Nokia Technologies Oy | Encoding and decoding of volumetric video |
US11405643B2 (en) | 2017-08-15 | 2022-08-02 | Nokia Technologies Oy | Sequential encoding and decoding of volumetric video |
US11290758B2 (en) | 2017-08-30 | 2022-03-29 | Samsung Electronics Co., Ltd. | Method and apparatus of point-cloud streaming |
EP3669547A4 (en) * | 2017-08-30 | 2020-10-21 | Samsung Electronics Co., Ltd. | Method and apparatus for point-cloud streaming |
CN111052750A (en) * | 2017-08-30 | 2020-04-21 | 三星电子株式会社 | Method and device for point cloud stream transmission |
CN111770327A (en) * | 2017-09-06 | 2020-10-13 | 佳能株式会社 | Information processing apparatus, information processing method, and medium |
EP3454562A1 (en) * | 2017-09-06 | 2019-03-13 | Canon Kabushiki Kaisha | Information processing apparatus, information processing method, and program |
CN109462749A (en) * | 2017-09-06 | 2019-03-12 | 佳能株式会社 | Information processing unit, information processing method and medium |
US11202104B2 (en) | 2017-09-06 | 2021-12-14 | Canon Kabushiki Kaisha | Information processing apparatus, information processing method, and medium |
US10659822B2 (en) | 2017-09-06 | 2020-05-19 | Canon Kabushiki Kaisha | Information processing apparatus, information processing method, and medium |
CN111343447A (en) * | 2017-09-19 | 2020-06-26 | 佳能株式会社 | Data providing apparatus, control method of data providing apparatus, and storage medium |
US11750786B2 (en) | 2017-09-19 | 2023-09-05 | Canon Kabushiki Kaisha | Providing apparatus, providing method and computer readable storage medium for performing processing relating to a virtual viewpoint image |
US11196973B2 (en) | 2017-09-19 | 2021-12-07 | Canon Kabushiki Kaisha | Providing apparatus, providing method and computer readable storage medium for performing processing relating to a virtual viewpoint image |
EP3721957A1 (en) * | 2017-09-19 | 2020-10-14 | Canon Kabushiki Kaisha | Providing apparatus, providing method and computer program |
EP3460761A1 (en) * | 2017-09-22 | 2019-03-27 | Canon Kabushiki Kaisha | Image processing apparatus, image processing method, image processing system, and program |
US10701332B2 (en) | 2017-09-22 | 2020-06-30 | Canon Kabushiki Kaisha | Image processing apparatus, image processing method, image processing system, and storage medium |
US11395087B2 (en) | 2017-09-29 | 2022-07-19 | Nokia Technologies Oy | Level-based audio-object interactions |
CN110769241A (en) * | 2019-11-05 | 2020-02-07 | 广州虎牙科技有限公司 | Video frame processing method and device, user side and storage medium |
US11748918B1 (en) * | 2020-09-25 | 2023-09-05 | Apple Inc. | Synthesized camera arrays for rendering novel viewpoints |
US20230368432A1 (en) * | 2020-09-25 | 2023-11-16 | Apple Inc. | Synthesized Camera Arrays for Rendering Novel Viewpoints |
US12039632B2 (en) * | 2020-09-25 | 2024-07-16 | Apple Inc. | Synthesized camera arrays for rendering novel viewpoints |
CN113905221A (en) * | 2021-09-30 | 2022-01-07 | 福州大学 | Stereo panoramic video asymmetric transmission stream self-adaption method and system |
Also Published As
Publication number | Publication date |
---|---|
US20130321410A1 (en) | 2013-12-05 |
US20130321586A1 (en) | 2013-12-05 |
US20130321593A1 (en) | 2013-12-05 |
US20130321590A1 (en) | 2013-12-05 |
US20130321575A1 (en) | 2013-12-05 |
US9251623B2 (en) | 2016-02-02 |
US20130321413A1 (en) | 2013-12-05 |
US20130321418A1 (en) | 2013-12-05 |
US20130321589A1 (en) | 2013-12-05 |
US20130321566A1 (en) | 2013-12-05 |
US9256980B2 (en) | 2016-02-09 |
US8917270B2 (en) | 2014-12-23 |
US9846960B2 (en) | 2017-12-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20130321396A1 (en) | Multi-input free viewpoint video processing pipeline | |
US10893250B2 (en) | Free-viewpoint photorealistic view synthesis from casually captured video | |
Serrano et al. | Motion parallax for 360 RGBD video | |
Attal et al. | MatryODShka: Real-time 6DoF video view synthesis using multi-sphere images | |
AU2021203688B2 (en) | Volumetric depth video recording and playback | |
Casas et al. | 4d video textures for interactive character appearance | |
US9237330B2 (en) | Forming a stereoscopic video | |
US20080246759A1 (en) | Automatic Scene Modeling for the 3D Camera and 3D Video | |
US20130127988A1 (en) | Modifying the viewpoint of a digital image | |
US20130127993A1 (en) | Method for stabilizing a digital video | |
US20130129192A1 (en) | Range map determination for a video frame | |
US20130129193A1 (en) | Forming a steroscopic image using range map | |
Richardt et al. | Capture, reconstruction, and representation of the visual real world for virtual reality | |
US20230152883A1 (en) | Scene processing for holographic displays | |
WO2020193703A1 (en) | Techniques for detection of real-time occlusion | |
Thatte | Cinematic virtual reality with head-motion parallax | |
Lipski | Virtual video camera: a system for free viewpoint video of arbitrary dynamic scenes | |
Yang | Towards immersive VR experience | |
Wetzstein | Capture, Reconstruction, and Representation of the Visual Real World for Virtual Reality |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICROSOFT CORPORATION, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KIRK, ADAM;MITRA, KANCHAN;SWEENEY, PATRICK;AND OTHERS;SIGNING DATES FROM 20120807 TO 20120827;REEL/FRAME:028877/0521 |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034544/0541 Effective date: 20141014 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |