GB2586831A

GB2586831A - Three-Dimensional Content Generation using real-time video compositing

Info

Publication number: GB2586831A
Application number: GB1912741.4A
Authority: GB
Inventors: Couche Guillaume
Original assignee: Wolf In Motion Ltd
Current assignee: Wolf In Motion Ltd
Priority date: 2019-09-05
Filing date: 2019-09-05
Publication date: 2021-03-10
Also published as: GB201912741D0

Abstract

A system and method for generating and inserting a virtual object into video footage in real time. The system has a data replication component 105 to replicate the attributes of a physical camera 100 to a virtual camera 110, wherein the attributes of the virtual camera synchronises with physical camera in respect to each frame of the video footage via a frame synchronisation component 160. The system has user inputs component comprising a playback sub-component 1030 that controls the playback speed and pausing of the video footage; a drawing sub-component 1010 that creates/draws the virtual object on a virtual drawing plane; and a layout sub-component 1020 for positioning and movement of the virtual object. Real time 3d compositing 190 spatially merges the user generated content and the video footage, allowing the compositing footage to be visualised on a physical display. The virtual object is created using ray casting on a drawing plane. Alternatively two or more video sources may be used and the user input may allow the user to chose the footage and camera 1040.

Description

Three-Dimensional Content Generation Using Real-Time Video Compositing

Field of the Invention

The present invention relates to methods and apparatus for three-dimensional content generation, and in particular for creating and displaying three-dimensional content using a computer.

Definitions, Acronyms and Abbreviations Throughout this specification, the following definitions are employed: Physical camera: refers to a movie camera, film camera or cine-camera that takes a rapid sequence of photographs (video footage) on an image sensor or on a film.

Physical context: refers to an interior, a park, a section of a city or any place existing in the real world that is chosen by the user as the target location for new developments represented by the three-dimensional (3D) content.

Polygon mesh: refers to a collection of virtual points (vertices), edges and faces (usually triangles or quadrilaterals) defining a virtual three-dimensional (3D) shape Three-dimensional (3D) content: refers to any virtual entity such as a point, a vector, a triangle, a quadrilateral or a group of any of those, that possesses spatial coordinates (position and orientation described on three axis).

Timecode: refers to a sequence of numeric codes generated for each frame captured by a physical camera to give them a chronological order.

Virtual camera: refers to an entity mimicking a physical camera in a virtual three-dimensional (3D) scene by using physically accurate optical projection laws.

Typical attributes include position, rotation and field of view.

Virtual display: refers to a plane that can display two-dimensional (2D) images or video footage in a virtual three-dimensional (3D) scene.

Background of the Invention

There are various ways to generate and display 3D content using a computer. Traditional systems rely on the use of a personal computer or touch-screen device to display the 3D content as a stream of two-dimensional images on the computer monitor. Using various input devices such as a mouse or a keyboard, the user can change the position and orientation of the virtual camera and hence gain a three-dimensional understanding of the 3D content displayed in two dimensions. Using the same input devices, the user can generate original 3D content or modify existing 3D content. A commercial example of such a system is Cinema4D® commercialised by Maxon®.

Another type of systems that we will refer to as immersive systems, use technologies commonly referred to as Virtual Reality, Augmented Reality or Mixed Reality. In that case, the system will generally use two virtual cameras acting as a pair of virtual eyes that display, in a stereoscopic fashion, the 3D content in front of the eyes of the user as a replacement of reality or as an overlay on reality. This is achieved through spatial headset tracking used to drive the movement of the virtual cameras, hence simplifying the user interaction and allowing for an immediate three-dimensional understanding of the 3D content displayed. A commercial example of such a system is Tilt Brush® commercialised by Google®.

When it comes to generating and displaying 3D content in a desired physical context, whether this 3D content is made of simple notes and sketches or more advanced 3D shapes, both categories of systems rely on complex 3D operations including, for instance, 3D scanning and importing which adds a substantial amount of 3D data to process and extra complexity to the creation process.

Alternatively, since they natively overlay 3D content on reality, immersive systems using Augmented Reality or Mixed Reality technologies allow for generating and displaying 3D content on location without 3D scanning. This can be a limitation as both the person who create and the audience are required to be on location in order to see the 3D content in its desired physical context. Another limitation of this approach is that the 3D content can only be visualised in its desired physical context from a point of view accessible by a human.

It is an object of the present invention to make the process of generating and displaying 3D content in a desired physical context easier, faster and cheaper.

Summary of the Invention

The present invention relates to a method and system for generating 3D content in its desired physical context, wherein the virtual 3D content is composited in real-time on a video footage describing the desired physical context, thereby eliminating the need for complex 3D operations such as 3D scanning or the need for the user to be physically on location to generate, manipulate or display the 3D content in the desired physical context. This makes the process of both creating and sharing the virtual 3D content in the desired physical context substantially easier, faster and cheaper than with existing methods and systems.

Therefore, according to the present invention there is provided for the first time a system for generating and displaying 3D content on a video footage in real-time, the system comprising: (a) a data replication component to drive a virtual camera based on the data from a physical camera, (b) a playback component to display the video footage captured by the physical camera on a virtual display, (c) a frame synchronisation component to ensure that, for each frame, the virtual camera behaves as the physical camera at the time the frame was captured, (d) a user inputs component to receive user inputs comprising: (I) a drawing sub-component, (II) a layout sub-component and (III) a playback sub-component, (e) a real-time 3D compositing component to spatially merge together user generated content and the video footage and (f) an output component to allow for visualisation of the composite scene on a physical display.

The playback sub-component allows the user to: play the video footage captured by the physical camera at different speeds, pause it at any frame, or choose a specific frame to display.

Preferably, the drawing sub-component turns 2D position inputs into strokes by ray casting on a virtual drawing plane or a virtual ground plane to form a 2D drawing or note.

Optionally, the drawing sub-component turns 2D position inputs into curves by ray casting on a virtual drawing plane or a virtual ground plane to form a 3D drawing.

Optionally, the drawing sub-component turns 2D position inputs into polygon meshes by ray casting on a virtual drawing plane or a virtual ground plane to form a 3D drawing Preferably, the layout sub-component turns 2D position inputs into selection and movement instructions by ray casting on a virtual ground plane to move existing user generated content.

Optionally, two or more video sources are used and positioned in space in reference to each other through a cameras synchronisation component allowing the user to select at any time which camera to use through a camera selection sub-component Preferably, the 2D position inputs are read from touch screen on which the output component outputs the composite image in a 2D form Optionally, the 2D position inputs are read from a touch input or a mouse Optionally, the output component outputs the composite image in a 2D form on a traditional display Optionally, the output component outputs the composite image in a 3D form on a holographic display

Detailed Description of the Embodiments

Figure 1 is a schematic diagram of the system.

In the data replication 105 component, the changes of position, rotation and field of view over time of the physical camera 100, are used to drive the motion including changes of position, rotation and field of view over time of the virtual camera 110. In the playback 145 component, the video footage captured by the physical camera 140 is used for the video playback on the virtual display 150 Using frame synchronisation 160, the system ensures that for any given frame of the video footage captured by the physical camera 140, the position, rotation and field of view changes of the virtual camera in reference to its position, rotation and field of view at the previous frame match the position, rotation and field of view changes of the physical camera at the moment where the frame was captured in reference to the position, rotation and focal length of the physical camera at the moment where the previous frame was captured.

If more than one physical camera has been used, a camera synchronisation 165 component position the corresponding virtual cameras in space in reference to each other A user inputs 1000 component gathers instructions for the following subcomponents: drawing 1010, layout 1020, playback 1030 and camera selection 1040.

The changes of position, rotation and field of view over time of the virtual camera 110, the video playback on the virtual display 150 and user inputs 1000 are combined through real-time 3D compositing 190 and shown to the user through an output 195.

Figures 2A, 3A, 4A, 5A, 6A and 7A show details of the real-time 3D compositing 190 component in specific configurations A virtual camera 200, a virtual display 300 and a virtual drawing plane 400 are shown in Figures 2A and 3A. Both the virtual display 300 movements and the virtual drawing plane 400 movements are synchronised with those of the virtual camera 200 in a way that their position and rotation in reference to the virtual camera 200 in the cartesian system 210 stay constant over time.

Figure 2A shows details of the real-time 3D compositing 190 component where the user has drawn a stick character using ray casting on the virtual drawing plane 400 though the virtual camera 200. This outline is shown as reference 500. The timecode 600 is 00:00. The virtual display 300 is displaying the frame corresponding to this timecode. This frame is shown as reference 700.

Figure 2B shows the output 195 of the real-time 3D compositing 190 in Figure 2A captured by the virtual camera 200 in Figure 2A, displayed on a physical display device 800. The stick character 500 is composited on the frame 700.

Figure 3A shows details of the real-time 3D compositing 190 component with the same elements as in Figure 2A. The timecode 610 is 05:00. The virtual display 300 is displaying the frame corresponding to this timecode. This frame is shown as reference 710. Due to frame synchronisation, the virtual camera 200 has moved on the X axis of the cartesian system 210. The stick character 500 is static in the cartesian system 210.

Figure 3B shows the output 195 of the real-time 3D compositing 190 in Figure 3A captured by the virtual camera 200 in Figure 3A, displayed on a physical display device 800 The stick character 500 is composited on the frame 710 The projected shape of the character 500 in Figure 3B in comparison with the projected size of the character 500 in Figure 2B together with the change of frames between Figure 3B and Figure 2B give the visual impression that the character 500 is part of the physical context described by the change of frames.

A virtual camera 200, a virtual display 300 and a virtual ground plane 420 are shown in Figures 4A, 5A, 6A and 7A. The virtual display 300 movements are synchronised with those of the virtual camera 200 in a way that its position and its rotation in reference to the virtual camera 200 in the cartesian system 210 stay constant over time. The virtual ground plane 420 remains static in the cartesian system 210 Figure 4A shows details of the real-time 3D compositing 190 component where the user has drawn a road using ray casting on the virtual ground plane 420 though the virtual camera 200. This outline is shown as reference 510. The timecode 600 is 00:00.

The virtual display 300 is displaying the frame corresponding to this timecode. This frame is shown as reference 700.

Figure 4B shows the output 195 of the real-time 3D compositing 190 in Figure 4A captured by the virtual camera 200 in Figure 4A, displayed on a physical display device 800. The road outline 510 is composited on the frame 700.

Figure 5A shows details of the real-time 3D compositing 190 component with the same elements as in Figure 4A. The timecode 620 is 20:00. The virtual display 300 is displaying the frame corresponding to this timecode. This frame is shown as reference 720. Due to frame synchronisation, the virtual camera 200 has moved on the X axis and Z axis and rotated around the Y axis of the cartesian system 210.

Figure 5B shows the output 195 of the real-time 3D compositing 190 in Figure 5A captured by the virtual camera 200 in Figure 5A, displayed on a physical display device 800. The road outline 510 is composited on the frame 720.

The projected shape of the road outline 510 in Figure 5B in comparison with the projected size of road outline 510 in Figure 4B together with the change of frames between Figure 4B and Figure 5B give the visual impression that the road outline 510 is part of the physical context described by the change of frames.

Figure 6A shows details of the real-time 3D compositing 190 component where the user has drawn a first tree outline 520 and a second tree outline 530 using drawing techniques detailed in Figure 2A and a road outline 510 using drawing techniques detailed in Figure 4A. The timecode 600 is 00:00. The virtual display 300 is displaying the frame corresponding to this timecode. This frame is shown as reference 700.

Figure GB shows the output 195 of the real-time 3D compositing 190 in Figure 6A captured by the virtual camera 200 in Figure 6A, displayed on a physical display device 800. The road outline 510, the first tree outline 520 and the second tree outline 530 are composited on the frame 700.

Figure 7A shows details of the real-time 3D compositing 190 component where the user has moved the second tree outline 530 on the X axis of the cartesian system 210 using ray casting on the virtual ground plane 420 through the virtual camera 200. The timecode 600 is 00:00. The virtual display 300 is displaying the frame corresponding to this timecode. This frame is shown as reference 700.

Figure 7B shows the output 195 of the real-time 3D compositing 190 in Figure 7A captured by the virtual camera 200 in Figure 7A, displayed on a physical display device 800. The road outline 510, the first tree outline 520 and the second tree outline 530 are composited on the frame 700.

The projected shape of the second tree outline 530 in Figure 7B in comparison with the projected size of the second tree outline 530 in Figure 6B gives the visual impression that the second tree outline 530 is part of the physical context as it is being moved by the user.

Claims

Claims What is claimed is: 1. A system for generating and displaying 3D content on a video footage in real-time, the system comprising: a a data replication component to drive the changes of position, rotation and field of view over time of a virtual camera based on the changes of position, rotation and field of view over time of a physical camera b, a playback component to display the video footage captured by the physical camera on a virtual display a frame synchronisation component to ensure that the position, rotation and field of view changes of the virtual camera in reference to its position, rotation and field of view at the previous frame match the position, rotation and field of view changes of the physical camera at the moment where the frame was captured in reference to the position, rotation and focal length of the physical camera at the moment where the previous frame was captured d. a user inputs component to receive user inputs comprising: I a drawing sub-component II a layout sub-component III a playback sub-component e, a real-time 3D compositing component to spatially merge together user generated content and the video footage 11 an output component to allow for visualisation of the composite scene on a physical display 2. The system of claim 1, wherein said drawing sub-component turns 2D position inputs into strokes by ray casting on a virtual drawing plane to form a 2D drawing or a note 3. The system of claim 1, wherein said drawing sub-component turns 2D position inputs into curves by ray casting on a virtual drawing plane to form a 3D drawing 4. The system of claim 1, wherein said drawing sub-component turns 2D position inputs into polygon meshes by ray casting on a virtual drawing plane to form a 3D drawing 5. The drawing sub-component of claims 2,3 and 4, wherein the ray casting is performed on a virtual ground plane instead 6. The system of claim 1, wherein said layout sub-component turns 2D position inputs into selection and movement instructions by ray casting on a virtual ground plane to move existing user generated content 7. The user inputs component of claims 2, 3, 4, 5 and 6, wherein the 2D position inputs are read from a touch surface or touch screen 8. The user inputs component of claims 2, 3, 4, 5 and 6, wherein the 2D position inputs are read from a mouse 9. The system of claim I, wherein said playback sub-component allows for: * playing the video footage captured by the physical camera at different speeds * pausing the video footage captured by the physical camera at any frame * Choosing a specific frame of the video footage captured by the physical camera to display 10. The system of claim 1, wherein: * two or more video sources are used and positioned in space in reference to each other through a cameras synchronisation component * said user inputs component comprises an additional sub-component: camera selection sub-component allowing for selecting which virtual camera and footage to use 11. The system of claim 1, wherein said output component outputs the composite image onto the same touch screen used by said user inputs component 12. The system of claim 1, wherein said output component outputs the composite image in a 2D form onto a traditional display 13. The system of claim 1, wherein said output component outputs the composite image in a 3D form onto a holographic display