WO2009155688A1 - Method for seeing ordinary video in 3d on handheld media players without 3d glasses or lenticular optics - Google Patents

Method for seeing ordinary video in 3d on handheld media players without 3d glasses or lenticular optics Download PDF

Info

Publication number
WO2009155688A1
WO2009155688A1 PCT/CA2009/000847 CA2009000847W WO2009155688A1 WO 2009155688 A1 WO2009155688 A1 WO 2009155688A1 CA 2009000847 W CA2009000847 W CA 2009000847W WO 2009155688 A1 WO2009155688 A1 WO 2009155688A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
movie
viewpoint
scene
foreground objects
Prior art date
Application number
PCT/CA2009/000847
Other languages
French (fr)
Inventor
Craig Summers
Original Assignee
Craig Summers
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US4477708P priority Critical
Priority to US61,044,777 priority
Application filed by Craig Summers filed Critical Craig Summers
Publication of WO2009155688A1 publication Critical patent/WO2009155688A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/111Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
    • H04N13/117Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation the virtual viewpoint locations being selected by the viewers or determined by viewer tracking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/167Synchronising or controlling image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/275Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals
    • H04N13/279Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals the virtual viewpoint locations being selected by the viewers or determined by tracking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2213/00Details of stereoscopic systems
    • H04N2213/006Pseudo-stereoscopic systems, i.e. systems wherein a stereoscopic effect is obtained without sending different images to the viewer's eyes

Abstract

Ordinarily when a picture or a computer screen is moved, there is no occlusion or motion parallax with the flat image on its surface. The methods of the present invention involve keyframing ordinary video to segment foreground objects. By rendering and texture-mapping an ordinary movie into a 3D scene model, the movie and wireframe release lhc 3D depth information that is normally trapped inside a printed picture or flat display screen. The wireframe can move from one position to the next at regular intervals, in synch with the movie content. Until now, 3D could only be released from a flat monitor display with 3D glasses, a lenticular overlay, real-time motion tracking, or active navigation within the 3D scene. However, each of these methods has inherent limitations. Here, we outline a method for producing a convenient.3D viewing experience on mobile media players, by changing the viewpoint based on feedback from the till sensors. Tilting the phone in any direction moves the viewpoint of the 3D scene model, causing motion parallax and occlusion. It is then possible to see around foreground objects in tin ordinary video with depth perception on an ordinary display screen.

Description

Method for Seeing Ordinary Video in 3D on Handheld Media Players Without 3D Glasses or Lenticular Optics

Background

Automatic segmentation of foreground objects in digital video is often described as an unsolved problem. Biological visual systems make it seem easy. Adding supplementary data like laser range-finders or stereoscopic cameras can improve depth perception in computer vision. But for image processing with an ordinary movie clip, segmenting objects depth-wise is sometimes called the Holy Grail of content conversion.

Il is difficult to accurately segment video in real lime (i.e., automatically). Relative motion (motion parallax.) can be used to separate foreground objects from the background in single-camera video, although it is often difficult to define an object in the proper depth automatically. There arc many scenes in video that require higher knowledge to interpret properly. For example, if the camera coincidentally moves at the same time as an object docs. Or when a camera looks through a window, and the reflections on the window move in unexpected ways. It can also be difficult to detect the edges of foreground objects. An object like someone's nose does not have well-defined edges and changes shape from different angles. Il can also be difficult to find the edges to segment a foreground person when they stand in front of a colored wall, while wearing a shirt lhe same color. Single-camera automatic processing requires pattern recognition and sensory abilities that are generally not well enough understood to be implemented in computer programs. It is therefore a very difficult problem to generate a moving depth map automatically in running video.

Il is possible to pre-rendcr a movie in 3D to avoid the need for real-time automatic segmentation. The movie studios have been making 3D animated cartoon movies that are not photorealistic based on 3D modeling. This is also the state-of-the-art for modem 3D computer games. However, the 3D is still locked inside a flat display screen in most cases. There is a growing resurgence of movie theaters showing 3D movies with stereoscopic glasses based on polarized or shutter lenses. However, these movies are either cartoon-type animations, or were captured with specialized dual-lens stereoscopic cameras. This is a tiny percentage of all of the video available. Ironically, most of the demonstrations of converting video footage for "3D" viewing concern stereoscopic viewing, with "3D" glasses in movie theaters. These are only stereoscopic however, with horizontal displacement of the two viewpoints, and are nol full 3D data. 30 glasses have also been criticized for poor viewing. Red-blue anaglyphs cause ghosting and headaches. Polarized and shutter glasses are inconvenient and potentially expensive in large quantities for movie theaters. People don't like to wear headgear during their viewing entertainment. There are ;ιiso binocular video visors, although these are not transparent and prevent interaction wiih others and observation of the surrounding environment.

One approach that is emerging that allows a 3D viewing experience without the need to wear 3D glasses is lenticular monitors. Several 1.CD manufacturers have started to release lenticular versions of their large llat-screen displays, for television or computers. It is also possible to put a thin lenticular overlay On a display screen with the correct optics to enable 3D viewing without the need for headgear. However, lenticular overlays require precision optics and expensive manufacturing. Lenticular monitors generally only work from side to side, and are notorious for having a narrow "sweet spot" for optimal stereoseopic viewing. In addition, if a lenticular overlay is placed on a touch-screen smart phone, it obstructs the ability to touch the screen.

Taking a different approach to resolve some of these limitations, this author created a software program called the Hologram Controller in 2004. Λ web cam was used to track the user's movements, and convert these into navigation input from the mouse or arrow keys on a computer. Iu this way, as the person moved, the perspective changed on-sercen in a computer game or virtual lour. There was no optic involved, so in principle this software could be distributed easily and could produce an immersive 3D experience. On an ordinary computer screen, you could move back and forth and sec around foreground objects, like in a hologram (hence the name Hologram Controller).

Al that lime, the video segmentation was flawed, because it was fully automated and operating with the video from the web cam in real time. As noted above, we do not understand depth perception well enough in the current state of the art to implement it in software without mistakes. If you are playing n computer game and controlling your movement by moving your body in front of the web cam, if the segmentation is not accurate, the playing experience is nol good. For the camera and software to track you best, ihe room needs to be bright as well, although that can sometimes interfere with lhe gaming experience. A method is needed to enable motion parallax, occlusion and look-around effects for video running in a 3D scene. Unlike lhe Hologram Controller software, we need a method of motion analysis that does not rely on automatic scene segmentation in read time. A method is also needed for producing depth perception without lenticular optics or 3D glasses.

Detailed Description of the Preferred Embodiments

Accordingly, a 3D rendering process is disclosed for converting ordinary movies for 3D viewing on handheld media players without manual navigation, 3D glasses or lenticular overlays. Λs lilt is detecied within the handheld device while the. movie plays, the viewpoint is moved in the 3D scene producing 11 '3D viewing experience.

Depth perception is produced based on motion parallax, occlusion and look-around capabilities on mobile media players and smart phones, without the need foi 3D glasses or lenticular optics.

A new possibility has emerged to control the perspective in 3D video. Instead of using a web cam to track movement and control the viewpoint, we could prc-Segmenl the video with manual keyframing into a 3D depth map. Then, instead of using the web cam to control the virtual camera position to change the perspective in real time, we use a unique new feature that these handheld media players now have; 3-axis tilt sensors or aeeeleromercrs.

I landhcld elcclronic devices an: becoming capable of delecting their orientation, which opens up a new possibility for 3D depth perception. Till sensors are used in digital cameras and handheld media players to detecL whether the device is tilted in portrait or landscape mode. The display can then be automatically rotated and resized to the correcl aspect ratio. This raises an interesting opportunity with 3D video. If the viewpoint in the 3D scene can be moved with the user's physical movement, a compelling depth perception can be produced based on motion parallax. Motion parallax refers to the visual effect in which foreground objects move laterally al a faster rate than objects farther away. This is a depth cue. So even on an ordinary display screen, if the 3D scene is moved correctly as the handheld device is moved, 3D depth can be experienced. The problem with the web cam is that real-time segmentation had flaws. But with handheld devices with acceleromctcrs and/or tilt sensors, there is no ambiguity in the physical movement. The tilt detect ion is bused on accclcromclers in the device, which generate data that be used in software running on the device. Given the above problems wirli 3D viewing from web cams, this may be a powerful and cost-effective new way to enable a 3D viewing experience with video, withour need for 3D glasses or lenticular overlays.

There art*, limitations wilh both 3D glasses and lenticular screens. The 3D experience could also be created by navigating through the 3D scene. This is what modern computer games do, as opposed to the first generations of computer games thai were like flat board games. Those evolved into games that wore "side scrolling" (but still flat). However, when watching television for example, active navigation is a distraction from the intention to relax and obscive.

I Iere, we outline a method for producing a convenient 3D viewing experience on mobile media players, simply by moving the viewing perspective in the 3D scene based on feedback from the till sensors. There has been no way to view photorealistic video in 3D using tilt sensors until now. Using the methods disclosed here, we can pre- render the scene, and Then view it using till sensors. Rather lhan active navigation through the scene, the till sensors can be used to provide a compelling passive 3D viewing experience if the viewpoint is moved from side to side as the handheld device is. tilled in the hand.

In the method disclosed here, we pro process a movie with keyframing, and save the 3D scene model. Then we release the movie for others to play in 3D with XYZ coordinates on mobile media players. As you tilt the device a little bit in your hand, the viewpoinl in lhe movie shifts, and you can look around foreground objects in an ordinary movie. That is a slightly different way of producing a depth effect than with stereoscopic depth from lenticular lenses sending a different view to each eye. But it should be a better viewing experience than ordinary flat video.

'1 here are several industry standard formats or "graphics engines" for playing 3D on mobile devices such as smart phones and media players, including OpciiGL HS and M3G. OpenGL RS is a mobile version of the OpenGL graphics platform on graphics cards for computers. It is the method used for processing 3D graphics such as mobile computer games, in smart phone formats from several of the major manufacturers. M3G is the Mobile 3D Graphics format for running the Java programming language on mobile devices. While these standard formats would be understood by those skilled in the art, we do nor restrict the methods disclosed here to any particular graphics formal. Two main phases are involved in this process: the initial segmentation of the movie, and then later 3D till viewing. The method disclosed here is based on semi-automatic processing, in which a user manually segments foreground objects on every Xth frame (a process, called "keyframing"). Wc then save the video, mapped onto a 3D model. To help reduce the computing resources required when viewing on a smart phone or other handheld device, the 3D data can be reduced at this point. For example, we can reduce the dimensions of the movie frames from to typical resolution for handheld smart phones (e.g., 320 x 480 pixels). Wc can also reduce ihβ frame rate of the original movie for playback on the handheld, Io perhaps IO frames per second. This frame rate can also guide playback keyframes for the wireframe mesh.

Once lhe scene model exists for the 3D model with the viewpoint in the original real camera perspective, we will then control the camera perspective with the accelerømclcrs, as the smart phone is tilled.

There are several practical advantages of this approach:

• Lenticular effects only work in one direction; typically horizontal Iy, for stereoscopic depth. With this new 3D till viewing for phones, lhe 3D viewing experience works in all directions.

There is no "sweet spot" this way, Having a narrow viewing area where the eyes must be placed is a major limitation in lenticular viewing.

There is also no "screen door effect", where you can sec the plastic ridges interfering with the video.

There are no problems aligning plastic lcnticules on flic interlacing.

WiIh touch-screen phones, putting a lenticular overlay on on the screen blocks the touch ability. 3D tilt viewing does not obstruct the (ouch screen.

For lenticular optics that are not removable, there is also the problem that ordinary (non -interlaced) content is blurry and illegible behind lenticular optics. That is not A problem with the method disclosed hem, where no lenticular optic is used. Wc can bypass all of I he lime, cost, distribution and quality control issues with plastic manufacturing this way.

* Using alternating strips for the left and right views reduces the resolution in lenticular viewing. If there are two views, one tor each eye, |he resolution is halved. If there is a look-around effect with 6 frames, the resolution is only 1 /6th of normal, in (he method disclosed here, resolution is nol reduced at all.

The end effect will be like holding a small

Figure imgf000007_0001
in youi hand wilh some objects inside. As you till the box, ilic perspective changes. Some object* become closer, and some occlude others more. We can move lhe virtual camera position the same way in the 3D scene model, based on feedback from tilt sensors in the handheld media device Tilting is not used for navigation through a 3D space, il is used to control the location of the viewpoint.

With stereoscopic lenticulars, moving the viewpoint causes problems because of the narrow sweel spot. In the method disclosed here, moving the viewpoint actually causes the depth perception. Thus, we can use ordinary video Io create a photorealistic 3D scene in which both the video and wireframe run at a particular frame rate. When the handheld device is tilted, the aeceleromcters produce a dcplh perception in which you can look around foreground objects.

Feedback from the tilt sensors controls the viewpoint, similar ro the actual occlusion and motion parallax when a diorama is moved when held in the hand. This 3D viewing method based on till sensors has not appeared previously because of the difficulty in creating true 3D video. Even if stereoscopic cameras are used, this only gives horizontal motion parallax. 1I he method disclosed here allows lhe 3D perspceii vc to move in any direction horizontally or vertically, as the handheld device is tilted in any direction.

The method disclosed here provides a unique method for converting ordinary video Io 3D, for a SD viewing experience On handheld media players and smart phones lhal contain tilt ic-πsors. Tilt sensors are in use oil handheld devices already, although they are primarily used to conirol the orientation of photos or to navigate in 3D computer games based on cartoon animalion. It is an object of the present invention to use the acceleromcters or lilt sensors to control the viewpoint in 3D video, causing motion parallax, occlusion and look-around effects to create depth perception while the movie plays. One important distinction with the present embodiment is that this is nol a method for navigating with 3D video. Using tilt sensors to navigate in 3D video is an alternative embodiment of the methods disclosed here. However, while navigation is central Io 3D game playing, it is not a relaxing way Io watch a movie. The present embodiment uses the till sensors to control the viewpoint to enable motion parallax, occlusion and look-around effects.

In the present embodiment, keyframing is, used to segment foreground objects from the background in all frames of lhc video, and a corresponding depth mesh of interconnected vertices with XY7. coordinates is created. Λs the video plays at a specified frame rale, it is mapped onto the mesh surface using standard texture mapping methods that are available in industry-standard graphics engines. If a particular graphics engine does not support video texture-mapping, we can play the movie frames as a series of images in sequence at a particular rate.

We have previously attempted to allow look -around effects on ordinary monitors by tracking the person with a web cam in real time. We have also viewed the 3D scene model with lenticular displays. However, in the present embodiment, tilt sensors arc used to control the viewpoint so that the video can be passively viewed while btill getting the 3D viewing experience.

Part 1 : 2D-3D Conversion

Keyframing (or "rotoscυping") provides a way to manually verify the segmentation of foreground objects throughout a movie. This segmentation is done using Ά process previously descrihed in a US Provisional Patent filing by Craig Summers (Dec. 24, 2007), titled "2IX* D Conversion for Lenticular Video on Any Screen" (No. 61/016,523) I lowevcr, that previous patent filing generated the 3D model in order to create Iwo stereoscopic views of the scene. Here., we are only concerned with viewing the initial JD modol

To improve the precision of tracking and 3D depth, the method disclosed here uses keyframing to manually fine-tune the segmentation of foreground objects on every XtIi frame. The user may need Io adjust on frame 30 and then on frame 40. In-between these keyframes, the software will process the intervening frames automatically, to move the 3D model as the objects move in each successi ve image. This also allows more precise depth information to be entered for each object. In motion analysis alone, although a foreground object may be segmented, additional information is needed to know whether it h a large object far away or a small object up close, The method outlined here Therefore uses manual verification ol" segmentation and depths on the Z-axis, interpolating between keyframes.

The 2D-3D conversion process is embodied in a software program that follows the following steps

1. Λn ordinary two-dimensional movie is opened with the software. The number of frames in the movie and frames per second arc determined by the ςoflware using standard data available in the movie format. The user can then enter the desired keyframe spacing.

2. The piυgram saves frames as separate bitmap images from every Xth frame (keyframe).

3. The program displays the first keyframe with picture editing tools, Although a variety of line-drawing, selection, masking or motion analysis Ux)Is could be available, in the present embodiment, we use the cursor to click self-connecting dots around the edges of a foreground object to be segmented. Starting at the boltυai left, dots arc clicked in a clockwise direction going around the edge of the foreground object.

4. When the boundary of the foreground object has been outlined, the depth of the object along the 7. axis is entered in a text box. Although any scale could be used, foi convenience, in the present emkxiimcπt, the Z axis defines the distance from the camera with the background ai 100% distant. The X axis is defined as the lateral direction, and in the present embodiment, the Y axis is vertical. It an object is halfway Io the background, it would be given a depth value of 50 in the present embodiment. In the present embodiment, object distances are measured away lrom the camera, although distances from the background could also be used.

5. H would be easiest tυ texture map the video itself onto rhe wire mesh, although at present, few graphics formats support texture-mapping of video for mobile devices. We therefore either save the video for texture-mapping or texture-map a sequence of images, which can be huffercd and displayed in sequence at a specified frame rate.

6. I'he scene model could be projected onto H variety of geometric- shapes to model visual perspective. While not excluding common approaches like projecting an immersive scene model onto a ground plane or the inside of a sphere or cube, for simplicity the present embodiment begins with the visual perspective in the video and simply sets the foreground object shapes in front of a vertical background wall at Z= KX). 7. Additional objects can be added or removed as they enter and exit the scene. During the key f rami ng, a button is used to indicate that dots are being added or stopped for a particular object that is entering or disappearing.

8. Any number of objects can be segmented in the first keyframe. Then, a Next button is pressed, to go on to the marking dots for the same objects in each subsequent keyframe.

0, When every keyframe has been segmented manually, the software gives a message indicating that this is done. The segmentation process can be done on separate scenes from a long movie and later combined into a long series of scenes. Or, a complete movie can be processed.

10. It is also possible to use the motion analysis techniques outlined in a US Provisional Patent filing by Craig Summers (dated Feb. 23 2005) to partially or fully automate the process. This was entitled "Automatic Scene Modeling for the 3D Camera and 3D Video" (initially US Provisional Patent //60/655,.514, and later PCT International //CA2O06/0O0265). Using that method, even if keyframes are defined to frequently check on accurate segmentation of foreground objects, the dots could be moved automatically using motion analysis. In this way, we do not rely on fully automatic processing, hut can verify how the software has tracked foreground objects on each subsequent keyframe, to make the segmentation easier and faster while maintaining quality.

1 1. Indicate the frame rate for saving image sequences and for synchronizing the wireframe movement with the movement of foreground objects in these images.

12. An interpolation button in the software interface can then be selected to generate a single data file wiih XYZ coordinates and a frame number for each dot clicked, as well as the number of image files based on the frames per second.

13. For frames in-between keyframes, the difference between the X, Y and 7 coordinates in the previous keyframe and the next keyframe is calculated. These differences arc divided by the number of intervening frames, to indicate how far X, Y and Z vertices in the wireframe should change on each frame. 14. This data may be written in a standard format like the industry standard XML .. so that the data can be saved and moved from the rendering computer to a mobile device. The XML would be wiiuen for use in a standard mobile 313 graphics format such as OpenGI - ES.

15. Region-filling Standard methods involve stretching in the edges or cloning them to cover the blank spot behind foreground Objects. Another approach defined in the Summers (2005) filing is to buffer any background that is observed in a scene model independent of the. foreground objects. That way even if the background is occluded to the camera, it is still visible when the viewpoint moves. Our present embodiment with the method disclosed here is to "emboss" foreground objects by continuing their connection to the background with a soft corner and then a connecting surface Parameters can be defined to control the sharpness of lhe curve, the amount of pop-out on the Z axis, and the angle υl slope in surface connecting to the background, kmbossing is a way lυ cover the blank area behind foreground objects without artificial filling-in or scams

16. An alternative embodiment would be to create a buffer behind the foreground object in pre-rendering, which would be saved with the other images and XYZ locations for playback with tilt controls.

17. 1 he mesh is. generated using a method of generating triangle surfaces described in Summers 2005.

18. When foreground objects are moved from the background image inlυ XYZ foreground coordinates based on distances indicated during keyframing, they are reduced in size so that they still subtend the same angle when moved closer to the viewer. Following our method previously disclosed m Summers (2007), in one embodiment, we can use trigonometry to calculate the correct reduction in size of foreground objecrs that arc pulled closer in the 3D scene model. Wc can calculate the degrees thai an object subtends in the original image. Then we want to change the size so that it subtends the exact same number of degrees when ir is pulled into lhe foreground. Using tan = opposite / adjacent, we can calculate that the new size tor the height, width υr any cross section equals the distance from the camera times Tan of the degrees subtended. A right triangle is needed, so for objects in the periphery it is possible to measure distances from the center of the image (although this level of precision is usually not necessary in practice). The first step is Io get Tan of the angle subtended by the object in the original movie frame, by dividing the height of the object by the distance away, based on Tan = Opposite / Adjacent. (If the view is to the middle of the object, the overall height can be halved ro maintain a right angle from the camera line to the object on the background.) For the second step, we know Tan alpha and the new distance from the camera. From these, we can derive the missing variable which is the new size that subtends the same angle.

There is also an alternative embodiment that is more eoπvenient although noL as accurate. Conceptually, as the object is brought closer to the camera, it needs to get proportionately smaller to subtend the same angle. We can implement this logic quantitatively;

New Size = Original size in background x distance from camera / 100

This means that if we bring the object 70% closer, that it should be reduced to 70% smaller, to subtend the same angle.

19. Then, the software automatically moves the wireframe 3D scene model from the first manual keyframe to the next, interpolating on in-between frames. Λ data tile is wrillcπ of the location of the wireframe on each frame, or key framing is used in the later playback to move from one position at a particular time to the next.

20, In the software, a button or menu item can then be selected to look at the 3D scene model. This starts on frame 1 , but can be played and paused on any later frame as well. Λ number of industry -standard graphics engines can be used for this. 3D rendering. Wc can input the XYZ coordinates from the data file of wireframe vertices in each frame, using industry-standard 3D platforms such as Ui red X, OpenGL, OpenGL ES or VRML The methods disclosed here can be used with any of these standard graphic formats.

21 While viewing the 3D scene on the computer, the viewpoint can he moved (within adjustable limits) using keys such as the arrow keys or mouse, to verify what the scene will look like when viewed on a handheld media player.

22. In one embodiment, individual images are then saved from the 3D scene. The images would be saved during the export from the keyframiiig program, with a reduced size and frame rale suitable for the handheld device (e.g. 480x320, at a rate of 10 per second). In an alternate embodiment, it would also be possible to save a movie with a specified frame rate, for handheld devices that arc able to texture-map video. We place the images or video and the XML file containing lhc wireframe data all in the same folder.

Part 2: 3D Movie Player

Once the movie is prc rendered, the exported data can be distributed on disk, in email or from a servci, for later viewing. A separate viewing program is needed for playing the 3D content on the handheld media player. This viewer it, a software, program running on the operating system of the handheld device. In the present embodiment, this program would be running on an industry-standard 3D graphics engine. The methods disclosed here are not limited to OpenGI . ES or the Mobile 3D Graphics ("MJG") format, but are consistent with industry standards such as those.

Regardless of whether a folder of images or a video with a specified frame rate was saved in Part I , in both cases, a text file for XYZ data would also be saved in an industry standard format such as XMl .. XMI . is convenient to use because it can be saved during data export from the keyframing program, and read into a different program on the mobile device.

Several approaches can be used in the 3D playback to get the smoothest performance depending on whether the graphics card support., video texture-mapping or not. The present embodiment is to lcxlure- mnp the movie at a given frame rate onto the moving wireframe. However, if video texture-mapping is not supported, an alternate embodiment is to rapidly Copy frames from the movie file in sequence, and texture map each individual frame. Either of these approaches allows us to use an existing movie with the movie player. It can be specified with a local file path or a web address. The only data that then needs to lie provided is the XYZ data file defining the wireframe, Finally, another embodiment is also possible if these first two are not supported. In that case, during the pre-rendering phase, a series of images can be saved with a specified number per second. The folder of these images can then be provided with the sound track and datafile of XYZ coordinates defining the shape and movement of the wireframe.

The movie player on the handheld device would have standard VCR-type controls including play, pause and rewind. In the present embodiment, ir would also have a progress bar showing how much of the movie is complete and remaining The main difference from normal movie player programs is that this program docs not play Flat movies, but is actually displaying a 3D scene, in which the frames are played in sequence. The movie frames or separate images are texture-mapped onto the depth mesh in the 3D scene. The XYZ coordinates and timing for each vertex in the 3D surface will be provided from the keyframing program in the. previous section. In lhe present embodiment, the wireframe (also called a depth mesh) has a frame rate which is synchronised with the frame rate for lhe images. If a foreground object in the video moves across the picture, the wireframe model would therefore change at the same lime. Although the images are displayed at a certain rate per second, in the present embodiment the wireframe does not need to be defined for every image frame. To reduce processing resources, lhe wireframe could use its own keyframes for playback, and move in interpolated steps from a defined position in one frame to its. next- defined position in a later frame.

Those skilled in the art will know that texture-mapping is a standard process in which XY coordinates are given defining an image, and these are mopped onto XYZ coordinates on the wireframe. In this way, we drape the original video onto a relief map, not unlike projecting the movie onto a bumpy surface. The elevated parts of the wireframe are foreground areas that were segmented during the keyframing phase

To maintain the time sequencing, in the present embodiment the audio track is given priority during playback. Those skilled in the art will know that this is a standard procedure tor realistic playback of digital video. U is not essential, but is better to drop a frame of video if computer resources are limited, than to let the sound skip. The sound track is of a defined length. When it starts, that cues the frame rates for the images and wireframe movement. Λs the audio is played, there can be "callbacks" indicating what point lhe sound track is at. Those can be used to synchronize the frame rates of the images or video, and of the wireframe.

Although segmented objects are pulled closer to the camera on the Z axis, their boundaries arc slill connected (embossed) with a rounded edge to the background, so that you cannot see behind lhem to a blank area where they were cut out of the background.

Once the photorealistic 3D scene model exists with the viewpoint in the original real camera perspective, we will then control the camera perspective wiili till sensors or accelcrometcrs, as the mobile device or smart phone is tilled in the hand. Although tilt sensors are used for a variety of applications, the object of the present invention is to enable depth perception in a handheld media player without need for 3D glasses or lenticular optics, simply by moving the viewpoint in the 3D movie as it plays, based on movement of the device. This creates motion parallax and occlusion in the 3D movie, which are deprli cues. This could also bo described as a "look-wound effect", in which you can see around foreground objects in the movie when the device is tilled. The visual experience is like seeing a hologram, but on an ordinary display screen.

As till sensors arc activated, the viewpoint is moved as if an actual object or diorama was being held in the hand. In the present embodiment, there is a speed sensitivity parameter, to control the rate of movement of the XYZ viewpoint coordinates, based on the rate and amount of till of the device.

Although the viewpoint is moved around based on till, lhe view is still centered on the 3D scene model. In the present embodiment, limits can be set and adjusted for the amount the viewpoint can move. Obviously, the handheld media player could be lilted right around backwards until lhe screen could no longer be seen That would defeat the purpose. However, it would be possible to have a ratio so thai for every degree it is tilted, the viewpoint moves multiple degrees around the digital 3D model, so that you can see farther around foreground objects in the 3D scene without needing to tilt the handheld device so much

In the present embodiment, there is also an adjustable parameter for whether the viewpoint moves back and forth along the X axis, or whether it rotates around the object. This control could also allow a varying combination of both.

In an alternative embodiment the till sensors could be used Io navigate through the 3D movie scene as it plays using the lilt sensors, as can be done in computer games. However, most people want a passive viewing experience for watching video such as movies and television. Navigation interrupts their viewing experience, rather than enhancing it. Nevertheless, where 3D navigation or object manipulation is desired with moving video in a 3D s>eene model, sued as for photorealistic video game production, the methods. disclosed here would be useful for minimizing production costs for a 3D scene model in which lilt navigation is quite possible

Another alternative embodiment of the methods, outlined here would be Io use a different type of sensor such as a digital compass to control the viewpoint and displny of the 3D movie while it plays. In that case, as the device is moved left or right poiπling more north or south for example, the view of the movie- scene could move accordingly. The experience would be similar to looking through a camera and swiveling il left or right — except thut with the methods outlined here, you would be looking around in a moving photorealistic 3D scene that was initially an ordinary movie.

Although the 3D experience is generated by moving the viewpoint wilh the tilt sensors in the handheld device, this primarily produces a perception of depth behind the screen. In another embodiment, it is also possible to produce the perception of foreground objects popping out in front of the screen, by adding lenticular overlays lo combine both approaches

Rather than using tilt sensors or accdcrometers, an alternate embodiment could use a. videocamera in a smart phone or handheld device to detect movement and tilting of the screen. This would be an alternate way to control the viewpoint in the 3D scene model to produce motion parallax, occlusion and depth perception. However, motion analysis, edge dciccliυn and optic iiow analysis require real-time processing with the video camera, like our previous Hologram Controller software. White it is possible to use the video camera instead of tilt sensors to control the viewpoint, this would only be an alternate embodiment because of situations where performance could be limited, such as when attempting to infer movement of the handheld device and camera when the camera is pointed at a blank wall.

For a member of the public who wants to view 3D video using these methods, in one embodiment they would download one compiled executable file containing the images, audio, wireframe data and rnovic player. They could play that, and when finished, could delete it Another embodiment would be to download the 3D viewer, which could then tx: saved and re-used any time 3D video datH is accessed. In that case, a folder or compiled file is downloaded containing the- video (or images and audio) and wireframe data, each time the user wants to watch a movie.

The invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore Io be considered as illustrative and not restrictive, the scope of ihc invention being indicated by the appended claims rather than by the foregoing description, and all changes that come within the meaning iind range of equivalency of the claims are: therefore intended to be embraced therein.

Claims

Embodiments of the Invention in Which an Exclusive Privilege or Property is Claimed
I. A method for keyframing ordinary single-camera video, wherein a) foreground objects in rhe original video are segmented from the background Io create a 3D scene model. b) the user specifics the number of frames for automatic interpolation between each manual keyframe c) boundaries of foreground objects can be manually indicated in keyframes by clicking auto- connecting dots. d) distances between the camera and background (depths) can be manually defined as a percentage of the distance to the background, or ihown as defaults from the previous keyframe.
2 The method of keyframing disclosed in claim 1 , wherein motion analysis can be used to track objects during interpolation between keyframes, to automatically predict object locations in subsequent keyframes.
3. The method of motion analysis during keyframe interpolation defined in claim 2, wherein relative motion can also predict the depth value (Z) for foreground objects.
4. The method of keytraming and interpolation defined in claim 1, wherein the amount of movement between the object's XYZ coordinates in the earlier and later keyframes is divided by the number of intervening frames, to match the 3D wireframe model in each frame with the object's movement, in each frame of video.
5. 1 he method of segmenting described in Claim 1 , in which a parameter defines a range for the number of rows of polygons to be generated.
6. The method of defining polygons defined in claim 5, wherein triangles or polygons are drawn on each row to create flat surfaces between each of the vertices.
7. The method of building a wireframe mesh in each keyframe defined in claim 6, wherein video or individual images are texture-mapped onto polygons or triangles in the wireframe structure.
8. The method of texture mapping defined in claim 7, wherein the foreground objects may be set in lront of a planar background, within a volume such as a cube or sphere, or on a ground plane.
9. The method of texture mapping defined claim 7, wherein region-filling for the space in the background where the foreground object was segmented from, is accomplished by embossing foreground objects so that their edges are connected to the background edges they were pulled out from.
10. The method of segmenting foreground objects in claim I , wherein foreground objects arc pioportionately reduced in size the nearer they get to the viewpoint, based on projective geometry or a simpler formula in which Smaller Size = Original Size on Background X Distance from Camera / 100.
1 1 The method of embossing defined in claim 9, in which a variable can be defined to indicate the .lope of the embossing, and another variable can be defined to indicate how much of a curve or sharp edge should exist at the edges of the embossing
12. The method of segmenting foreground objects defined in claim 1 , wherein additional objects, ean be added or removed as they enter or exit the video frame.
13. The method of keyframing described in claim 1 , wherein the resulting 3D scene data can be saved and distributed for later use, avoiding any need for fully-automated, real-time video segmentation.
14 The method of saving described claim 13, wherein the 3D scene can be viewed before saving, moving the viewpoint with the keyboard or mouse to verify the result before saving.
15. The method of saving the 3D scene model defined in claim 13, wherein the XYZ data defining the wireframe and the video (or separate images and audio track) are saved at a given rate per second.
16 The mclhod of exporting 3D data described in claim 13, wherein the visuals are resized for the small screen
17. The method of segmenting foreground objects defined in claim 1 , wherein video is saved for texture- mapping onto the wireframe if the intended haπdheid device has a graphics engine supporting video texture-mapping, or wherein separate images are saved at a specified frame rate if the graphics engine is ofdy able to texture-map images.
18. The method of segmenting video foreground objects into a running 3D scene model in claim 1 , wherein the 3D scene can be played with software using standard video-cassette recorder (VCR) controls; pause, play, rewind, fast-forward, and progress bar.
19. The movie player in claim 18, wherein it does not play flat movies, it plays texture-mapped movies in 3D scenes.
20. A method for segmenting video foreground objects into a running 3D scene model in claim 1 , wherein tilt sensors or accclerometers are used to move the viewpoint producing motion parallax, occlusion and look-around effects when viewing 3D video.
21. The method of tilt viewing described in claim 20 wherein the 3D perspective and motion parallax can move in any direction side to side and up and down with no sweet spot with the optimal viewing optics.
22. The method of changing the perspective in claim 21 , in which moving the coordinates of the virtual camera viewpoint is an advantage for creating motion parallax, occlusion and look-around effects, as opposed to lenticular optics where the sweet spot is lost when the viewpoint is moved.
23. The method of controlling the viewpoint with tilt in claim 20 in which tilting the handheld device moves the XYZ coordinates of the viewpoint, within adjustable limits.
24. The method of controlling the viewpoint by tilting as defined in claim 20, wherein there is an adjustable ratio of speed of movement between till and movement of viewpoint.
25. The movie player described in claim 19, wherein a graphics engine is used to display the XYZ data from prc-rendcring, synchronized with the sound track and images at a given frame rate.
26. The method of controlling viewpoint described in claim 20, wherein motion parallax and occlusion arc created as the viewpoint moves, allowing you to see around foreground objects, producing depth perception in handheld devices without stereoscopic optical devices.
27. The method for viewing movies in 3D perspective as described in claim IS, wherein the software to play the 3D scene can be distributed online or on disk.
28. Movie data comprised of XYZ coordinates, images and audio as described in claim 17, wherein this data can be distributed online or on disk.
29. The method for distributing 3D movie data described in claim 28, in which the movie information (audio, images, wireframe) COuJd be downloaded for viewing in 3D player software that the user has ohmined separately, or in which the movie information and player can be downloaded in a single folder or compiled file.
30. The method for creating motion parallax in claim 20, wherein a passive TV- or movie-viewing experience is produced by moving the viewpoint while keeping the view centered on the video content, rather than navigating within 3D space.
31. The 3D viewing experience described in claim 20, wherein it is possible to look around foreground objects in lhe running movie by lilting the device, with no physical optics required.
32. The method of playing the movie data that was saved as defined in claim 17, in which (a) the movie is texture- mapped, (b) frames are taken sequentially from the movie by lhe player if only image-mapping is supported by the graphics engine available, or (c) individual images are saved and played in sequence, if only image-mapping is supported.
33. The method of play ing back the 3D content in a movie-player interface as described in claim 18, wherein the timing of the sound track is monitored using periodic callbacks, which are then used to synchronize the video or images and the wireframe.
PCT/CA2009/000847 2008-06-23 2009-06-19 Method for seeing ordinary video in 3d on handheld media players without 3d glasses or lenticular optics WO2009155688A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US4477708P true 2008-06-23 2008-06-23
US61,044,777 2008-06-23

Publications (1)

Publication Number Publication Date
WO2009155688A1 true WO2009155688A1 (en) 2009-12-30

Family

ID=41443934

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2009/000847 WO2009155688A1 (en) 2008-06-23 2009-06-19 Method for seeing ordinary video in 3d on handheld media players without 3d glasses or lenticular optics

Country Status (1)

Country Link
WO (1) WO2009155688A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015048529A1 (en) * 2013-09-27 2015-04-02 Amazon Technologies, Inc. Simulating three-dimensional views using planes of content
US9241147B2 (en) 2013-05-01 2016-01-19 Legend3D, Inc. External depth map transformation method for conversion of two-dimensional images to stereoscopic images
US9367203B1 (en) 2013-10-04 2016-06-14 Amazon Technologies, Inc. User interface techniques for simulating three-dimensional depth
US9437038B1 (en) 2013-09-26 2016-09-06 Amazon Technologies, Inc. Simulating three-dimensional views using depth relationships among planes of content
US9497501B2 (en) 2011-12-06 2016-11-15 Microsoft Technology Licensing, Llc Augmented reality virtual monitor
US9530243B1 (en) 2013-09-24 2016-12-27 Amazon Technologies, Inc. Generating virtual shadows for displayable elements
US9591295B2 (en) 2013-09-24 2017-03-07 Amazon Technologies, Inc. Approaches for simulating three-dimensional views
AU2015261677B2 (en) * 2012-10-12 2017-11-02 Ebay Inc. Guided photography and video on a mobile device
US9883090B2 (en) 2012-10-12 2018-01-30 Ebay Inc. Guided photography and video on a mobile device
CN107864372A (en) * 2017-09-22 2018-03-30 捷开通讯(深圳)有限公司 Solid picture-taking method, apparatus and terminal

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020048395A1 (en) * 2000-08-09 2002-04-25 Harman Philip Victor Image conversion and encoding techniques
US20050232587A1 (en) * 2004-04-15 2005-10-20 Microsoft Corporation Blended object attribute keyframing model
US20060244757A1 (en) * 2004-07-26 2006-11-02 The Board Of Trustees Of The University Of Illinois Methods and systems for image modification
US20080052242A1 (en) * 2006-08-23 2008-02-28 Gofigure! Llc Systems and methods for exchanging graphics between communication devices

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020048395A1 (en) * 2000-08-09 2002-04-25 Harman Philip Victor Image conversion and encoding techniques
US20050232587A1 (en) * 2004-04-15 2005-10-20 Microsoft Corporation Blended object attribute keyframing model
US20060244757A1 (en) * 2004-07-26 2006-11-02 The Board Of Trustees Of The University Of Illinois Methods and systems for image modification
US20080052242A1 (en) * 2006-08-23 2008-02-28 Gofigure! Llc Systems and methods for exchanging graphics between communication devices

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10497175B2 (en) 2011-12-06 2019-12-03 Microsoft Technology Licensing, Llc Augmented reality virtual monitor
US9497501B2 (en) 2011-12-06 2016-11-15 Microsoft Technology Licensing, Llc Augmented reality virtual monitor
US9883090B2 (en) 2012-10-12 2018-01-30 Ebay Inc. Guided photography and video on a mobile device
AU2015261677B2 (en) * 2012-10-12 2017-11-02 Ebay Inc. Guided photography and video on a mobile device
US10341548B2 (en) 2012-10-12 2019-07-02 Ebay Inc. Guided photography and video on a mobile device
US10750075B2 (en) 2012-10-12 2020-08-18 Ebay Inc. Guided photography and video on a mobile device
US9241147B2 (en) 2013-05-01 2016-01-19 Legend3D, Inc. External depth map transformation method for conversion of two-dimensional images to stereoscopic images
US10049490B2 (en) 2013-09-24 2018-08-14 Amazon Technologies, Inc. Generating virtual shadows for displayable elements
US9530243B1 (en) 2013-09-24 2016-12-27 Amazon Technologies, Inc. Generating virtual shadows for displayable elements
US9591295B2 (en) 2013-09-24 2017-03-07 Amazon Technologies, Inc. Approaches for simulating three-dimensional views
US9437038B1 (en) 2013-09-26 2016-09-06 Amazon Technologies, Inc. Simulating three-dimensional views using depth relationships among planes of content
US9224237B2 (en) 2013-09-27 2015-12-29 Amazon Technologies, Inc. Simulating three-dimensional views using planes of content
WO2015048529A1 (en) * 2013-09-27 2015-04-02 Amazon Technologies, Inc. Simulating three-dimensional views using planes of content
US9367203B1 (en) 2013-10-04 2016-06-14 Amazon Technologies, Inc. User interface techniques for simulating three-dimensional depth
CN107864372A (en) * 2017-09-22 2018-03-30 捷开通讯(深圳)有限公司 Solid picture-taking method, apparatus and terminal

Similar Documents

Publication Publication Date Title
US9883174B2 (en) System and method for creating a navigable, three-dimensional virtual reality environment having ultra-wide field of view
US10652522B2 (en) Varying display content based on viewpoint
US9202306B2 (en) Presenting a view within a three dimensional scene
US20160267720A1 (en) Pleasant and Realistic Virtual/Augmented/Mixed Reality Experience
US9274676B2 (en) Controlling three-dimensional views of selected portions of content
US9904056B2 (en) Display
CN107636534B (en) Method and system for image processing
US20160246061A1 (en) Display
CN106464854B (en) Image encodes and display
US20150199843A1 (en) Point reposition depth mapping
US9886102B2 (en) Three dimensional display system and use
US9438879B2 (en) Combining 3D image and graphical data
US9445072B2 (en) Synthesizing views based on image domain warping
KR101629865B1 (en) Extending 2d graphics in a 3d gui
US9041743B2 (en) System and method for presenting virtual and augmented reality scenes to a user
US6429867B1 (en) System and method for generating and playback of three-dimensional movies
JP4533895B2 (en) Motion control for image rendering
US10535197B2 (en) Live augmented reality guides
JP4214976B2 (en) Pseudo-stereoscopic image creation apparatus, pseudo-stereoscopic image creation method, and pseudo-stereoscopic image display system
US9204126B2 (en) Three-dimensional image display device and three-dimensional image display method for displaying control menu in three-dimensional image
US20130321396A1 (en) Multi-input free viewpoint video processing pipeline
TWI523488B (en) A method of processing parallax information comprised in a signal
US7787009B2 (en) Three dimensional interaction with autostereoscopic displays
US10210662B2 (en) Live augmented reality using tracking
US8675048B2 (en) Image processing apparatus, image processing method, recording method, and recording medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09768658

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09768658

Country of ref document: EP

Kind code of ref document: A1