GB2549723A

GB2549723A - A system and method for video editing in a virtual reality enviroment

Info

Publication number: GB2549723A
Application number: GB1607189.6A
Authority: GB
Inventors: Juhani Oikkonen Markku
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2016-04-26
Filing date: 2016-04-26
Publication date: 2017-11-01

Abstract

An apparatus and method for generating a three dimensional virtual environment for editing video content; populating the generated 3 dimensional virtual environment with video content, wherein the video content is divided into a plurality of pyramid shaped three dimensional sectors, and wherein the vertex of each sector corresponds to the virtual position of an imaging device which captured or created the corresponding video content and sides of each sector define the field of view of the imaging device; receiving a selection input from a user interface, and in response to the selection input, causing a first one of the plurality of three dimensional sectors to be selected in the three dimensional virtual environment; and receiving an editing input from a user interface, and in response to the editing input, causing one or more visual properties of the first three dimensional sector to be changed (e.g. brightness, contrast, balance, colour hue, position and/or zoom).

Description

A System and Method for Video Editing in a Virtual Reality Environment

Field

The present invention relates to video editing, and in particular to a system and method for facilitating editing of video content in a three dimensional virtual reality environment.

Background

In the field of video editing, there are challenges relating to the handling of multidirectional image content. Although 360 degree virtual reality is becoming a part of consumer audiovisual media, the tools and methods for creating and editing this content are not well developed. Conventional editing systems reduce the three dimensional content into a series of two dimensional images. Generally an editor uses a flat panel screen on their desk to view and edit the images. Thus a key component of the content is not clearly communicated to the editor. The editing process is also time consuming and the editor must attempt to visualize what the full three dimensional environment would look like. A more intuitive and efficient way of editing multidirectional image content is therefore required. There are additional challenges relating to multidirectional video content, as 360 degree cameras will inevitably record unwanted elements, such as the camera crew and lights.

Summary A first aspect of the invention provides an apparatus configured to: generate a three dimensional virtual environment for editing video content; populate the generated three dimensional virtual environment with video content, wherein the video content is divided into a plurality of pyramid shaped three dimensional sectors, and wherein the vertex of each sector corresponds to the virtual position of an imaging device which captured or created the corresponding video content and sides of each sector define the field of view of the imaging device; receive a selection input from a user interface, and in response to the selection input, cause a first one of the plurality of three dimensional sectors to be selected in the three dimensional virtual environment; and receive an editing input from a user interface, and in response to the editing input, cause one or more visual properties of the first three dimensional sector to be changed. A base of each three dimensional sector maybe rectangular. Each pyramid shaped three dimensional sector may be defined by the vertex located at the virtual position of the imaging device, a direction vector and first and second orthogonal angles.

If the video content defined by one of the plurality of three dimensional sectors is two dimensional, or a two dimensional projection of three dimensional content, then the video content may occupy a base of that three dimensional sector. If the video content defined by one of the plurality of three dimensional sectors is three dimensional, then the video content may occupy the three dimensional space defined by that three dimensional sector.

The video content defined by at least one of the plurality of three dimensional sectors may be three dimensional video content. The visual property of the selected first three dimensional sector may be the length of the direction vector. The visual property of the selected first three dimensional sector may be the direction of the direction vector. The visual property of the selected first three dimensional sector may be the position of the vertex. The visual property of the selected first three dimensional sector may be the size of the first angle and/or the size of the second orthogonal angle.

The selection input may comprise a gestural input. The editing input may comprise a gestural input. The selection input may comprise a user interface input from one or more handheld controllers. The editing input may comprise a user interface input from one or more handheld controllers

The apparatus may be configured to receive a second editing input, and in response to the second editing input, may cause a computer generated three dimensional sector containing video content to be added to the three dimensional virtual environment.

The video content may be created as 360 degree content and subsequently divided into the plurality of three dimensional sectors. Alternatively, the video content may be created as the plurality of three dimensional sectors.

The apparatus may be further configured to cause a first timeline of the video content corresponding to the first three dimensional sector to be displayed such that successive frames of the video content are arranged along the direction vector of the first three dimensional sector and such that the frames of the video content are perpendicular to the direction vector. The apparatus maybe further configured to receive a third editing input, and in response to the third editing input, may cause a different one of the successive frames of the video content to be displayed. The apparatus may be further configured to (a) receive a second selection input from a user interface, and in response to the second selection input, cause a second one of the plurality of three dimensional sectors to be selected in the three dimensional virtual environment; (b) cause a second timeline of video content corresponding to the second three dimensional sector to be displayed such that successive frames of the video content are arranged along a direction vector of the second three dimensional sector, and such that the frames of the video content are perpendicular to the direction vector; and (c) receive a fourth editing input, and in response to the fourth editing input, cause a time offset between the first and second timelines to be changed. A second aspect of the invention provides a method comprising: generating a three dimensional virtual environment for editing \ideo content; populating the generated three dimensional virtual environment with video content, wherein the video content is divided into a plurality of pyramid shaped three dimensional sectors, and wherein the vertex of each sector corresponds to the virtual position of an imaging device which captured or created the corresponding video content and sides of each sector define the field of view of the imaging device; receiving a selection input from a user interface, and in response to the selection input, causing a first one of the plurality of three dimensional sectors to be selected in the three dimensional virtual environment; and receiving an editing input from a user interface, and in response to the editing input, causing one or more visual properties of the first three dimensional sector to be changed.

The method may further comprise receiving a second editing input, and in response to the second editing input, causing a computer generated three dimensional sector containing video content to be added to the three dimensional virtual environment.

The method may further comprise causing a first timeline of the video content corresponding to the first three dimensional sector to be displayed such that successive frames of the video content are arranged along the direction vector of the first three dimensional sector and such that the frames of the video content are perpendicular to the direction vector. The method may further comprise receiving a third editing input, and in response to the third editing input, causing a different one of the successive frames of the video content to be displayed. The method may further comprise (a) receiving a second selection input from a user interface, and in response to the second selection input, causing a second one of the plurality of three dimensional sectors to be selected in the three dimensional virtual environment; (b) causing a second timeline of video content corresponding to the second three dimensional sector to be displayed such that successive frames of the video content are arranged along a direction vector of the second three dimensional sector, and such that the frames of the video content are perpendicular to the direction vector; and (c) receiving a fourth editing input, and in response to the fourth editing input, causing a time offset between the first and second timelines to be changed. A third aspect of the invention provides computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform the method of the second aspect of the invention. A fourth aspect of the invention provides a computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by a least one processor, cause performance of at least: generating a three dimensional virtual environment for editing \ideo content; populating the generated three dimensional virtual environment with video content, wherein the video content is divided into a plurality of pyramid shaped three dimensional sectors, and wherein the vertex of each sector corresponds to the virtual position of an imaging device which captured or created the corresponding video content and sides of each sector define the field of view of the imaging device; receiving a selection input from a user interface, and in response to the selection input, causing a first one of the plurality of three dimensional sectors to be selected in the three dimensional virtual environment; and receiving an editing input from a user interface, and in response to the editing input, causing one or more visual properties of the first three dimensional sector to be changed. A fifth aspect of the invention provides an apparatus comprising: means for generating a three dimensional virtual environment for editing video content; means for populating the generated three dimensional virtual environment with video content, wherein the video content is divided into a plurality of pyramid shaped three dimensional sectors, and wherein the vertex of each sector corresponds to the virtual position of an imaging device which captured or created the corresponding video content and sides of each sector define the field of view of the imaging device; means for receiving a selection input from a user interface, and in response to the selection input, causing a first one of the plurality of three dimensional sectors to be selected in the three dimensional virtual environment; and means for receiving an editing input from a user interface, and in response to the editing input, causing one or more visual properties of the first three dimensional sector to be changed.

Brief Description of the Figures

For a more complete understanding of the methods, apparatuses and computer-readable instructions described herein, reference is now made to the following description taken in connection with the accompanying figures in which:

Figure i is a simplified schematic of a first example of a multidirectional video editing system;

Figure 2 is a schematic illustration of components of a three dimensional content rendering device;

Figure 3 is another schematic illustration of the three dimensional content rendering device of Figure 2 incorporated into a system to allow editorial input;

Figure 4a shows a pyramid shaped 3D sector representing an exemplary building block of the editing system described herein;

Figure 4b shows several 3D sectors arranged with a common vertex;

Figure 4c shows an example of video content in the virtual reality environment, where the virtual reality environment is divided into six sectors;

Figure 4d shows the 3D sectors of Figure 4b rearranged so as to have different orientations and an additional 3D sector containing computer generated content;

Figure 5a shows an embodiment in which the video content is 3D content which has been filmed with a stereoscopic camera and processed to create a 3D model;

Figure 5b shows an arrangement of four 3D pyramid sectors each containing three dimensional video content;

Figures 6a-d show two timelines each comprising a stack of video frames and a process of imposing a time offset and viewing the result;

Figures 6e-f show a generalised case where a number of 3D timeline stacks having different frame sizes and var)dng clip lengths are combined;

Figures 7a, 7b, 7c and 7d show another embodiment of how timelines are presented and manipulated inside the virtual reality environment;

Figure 8 is a flowchart describing exemplary operation of the video editing system according to embodiments of the invention.

Detailed Description

In the description and drawings, like reference numerals may refer to like elements throughout.

Figure 1 is a simplified schematic of a first example of a virtual reality system 100 (also referred to herein as a virtual environment system 100). The system 100 comprises a virtual reality content rendering device 102. The virtual reality content rendering device 102 may be a head mounted display or pair of glasses. An example of such a device currently available is the Oculus Rift headset, developed by Oculus VR, LLC.

The virtual reality content rendering device 102 may furthermore comprise headphones which maybe integral with or separate from the head mounted display. The headphones may be capable of providing spatialized audio. The content (video and audio) presented by the virtual reality content rendering device 102 may be referred to herein as a virtual reality environment. The content may be 360 degree video content captured, for example, by the OZO virtual reality camera, manufactured by Nokia Technologies. Alternatively the content maybe a 360 degree virtual reality movie or game. Alternatively, the content may be a mixture of recorded video and computer generated content. A user experiencing the virtual reality environment provided by the virtual reality content rendering device 102 may be referred to herein as an immersed user 108.

The system 100 also comprises a virtual reality content providing device 104. The virtual reality content providing device 104 may be a computer such as a desktop or laptop PC or a tablet computer. The virtual reality content providing device 104 may alternatively be a video player such as a DVD or Blu-ray player. The virtual reality content providing device 104 could also be a console computer or other computing device specifically designed for use with the virtual reality content rendering device 102. The virtual reality content providing device 104 may have a wired or wireless link to the virtual reality content providing device 104 for exchanging information between these components.

The system 100 may comprise further user interface devices no with which the immersed user 108 may interact. These may include one or more hand held controllers, a keyboard, mouse, trackball or microphone. These peripheral devices may be controlled by the immersed user to interact with the virtual reality content and to perform editing operations on the virtual reality content. The user interface devices no may communicate directly with the virtual reality content rendering device 102 or directly with the virtual reality content providing device 104, or both.

The system 100 further comprises a motion sensor device 106. The motion sensor device 106 may be a depth sensor or stereo camera for example. In some examples, the motion sensor device 106 is a depth sensor using infrared projection and an infrared camera to sense the motion of nearby objects in three dimensions. In some other examples, the motion sensor device 106 may emit infrared light in predetermined pattern and a peripheral controller may detect this light and determine its position in three dimensions. The peripheral controller may feed its position back to the motion sensor device 106 or directly to the virtual reality content providing device 104 via a wireless link. In some other examples, the motion sensor device 106 comprises a stereo camera comprising two or more optical axes for capturing two or more images from different positions. Software running of the motion sensor device 106 or on the virtual reality content providing device 104 may compare the multiple captured images and calculate the depth of the different parts of the images.

The motion sensor device 106 is configured to communicate with the virtual reality content providing device 104 over a wired or wireless data connection. Software for interpreting the signals produced by the motion sensor device 106 may be stored on and run on the virtual reality content pro\iding device 104. The motion sensor device 106 and the virtual reality content providing device 104 may be co-located, i.e. occupy the same general physical space or room. Therefore, the sensor device 106 is co-located with the virtual reality content rendering device 102 and immersed user 108. The motion sensor device 106 is configured to detect gestures made by the user 108. The motion sensor device 106 sends signals associated with the gestures to the virtual reality content providing device 104 which runs software for interpreting these gestures. These gestures may be a form of input for interacting with the virtual reality content.

Figure 2 is a schematic illustration of components of the virtual reality content providing device 104. The device 104 comprises a processor 202 for executing software and controlling various operations of the device 104. The device 104 comprises at least one memory 204. The memory 204 may be a writable memory such as a magnetic hard drive or flash memory. The memory 204 may store an operating system (not shown) for controlling general operation of the device 104 in conjunction with the processor 202. The memory 204 stores a rendering software module 205 which contains program code for rendering the virtual reality environment. The memory 204 also stores an editing software module 206 which contains program code for managing and executing editing operation on the rendered content in the virtual reality environment.

The virtual reality content providing device 104 has a first communication port 208 and a second communication port 210. The first communication port 208 is used to exchange data with the virtual reality content rendering device 102. This includes sending video and audio data to the virtual reality content rendering device 102 and receiving movement and positioning data back from the device 102. The second communication port 210 is used to exchange data with the motion sensor device 106 and/or the other user interface devices no.

The editing software module 206 may comprise instructions for interpreting the signals received from the motion sensor device 106. For example, the editing software module 206 may be able to determine a number of different types of gestures based on the information received and to treat the different types of gestures as different user inputs respectively. The editing software module 206 may comprise instnictions for interpreting the signals received from the other user interface devices no. Although the rendering software module 205 and the editing software module 206 are depicted as separate software modules, there may instead be a single software module which handles both the rendering and the editing. For example, the software running on the virtual reality content providing device 104 may need to register the direction in which the immersed user is looking at the same time as an editing command is received in order to accurately allow for editing of the displayed content.

Figure 3 is another schematic illustration of the virtual reality content providing device 104 of Figure 2 incorporated into a larger system. The system further comprises the virtual reality content rendering device 102, motion sensor 106 and optionally the other user interface devices no shown in Figure 1.

The features of the virtual reality content providing device 104 are the same as those described with reference to Figure 2 and are not described in detail again here. The virtual reality content providing device 104 communicates with the motion sensor device 106 and the other user interface devices 110 using the second communication port 210.

The virtual reality content rendering device 102 comprises its own processor 302 and memory 304 storing software 306. The virtual reality content rendering device 102 has a communication port 308 for exchanging data with the virtual reality content providing device 104. The virtual reality content rendering device 102 has one or more display devices 310 for displaying the virtual reality environment to the immersed user 108 and optionally a power input port 312. The software 306 may for example comprise display drivers for controlling the display device 310. The virtual reality content providing device 104 may comprise a corresponding power output port 212 for supplying power to the virtual reality content rendering device 102. Alternatively, the virtual reality content rendering device 102 may have an internal power source.

The virtual reality content rendering device 102 also optionally comprises one or more gyroscopes 314, one or more accelerometers 316. The gyroscopes 314 and accelerometers 316 allow the virtual reality content rendering device 102 to report its position and aspect to the software 306. The system of Figure 3 comprises headphones 320 for rendering virtual reality audio to the immersed user. The headphones 320 may be integral with the virtual reality content rendering device 102 or a separate device.

The headphones 320 may be capable of producing spatialized audio output.

The present application describes a system and method for editing video content in a three dimensional virtual reality environment. In other words, the user (editor) uses the virtual reality content rendering device 102 to immerse themselves in a virtual reality environment. The virtual reality environment is populated with the video content to be edited and the user is able to view the video content in situ inside the virtual reality environment. The editor is also able to use a peripheral user interface device to issue editorial commands to the system. The editor can then view the effects of these editorial commands on the displayed content in the virtual reality environment.

Figure 4a shows a 3D sector 400. The 3D sector 400 may also be referred to herein as an “angle of view” or “3D angle of view”, or as a 3D pyramid. The 3D sector is defined by a vertex (x,y,z), a direction vector (F) which originates at the vertex and two plane angles (a and β) which subtend the vertex, and in the example of Figure 4a are orthogonal. Thus the 3D sector 400 forms a 3D pyramid sector, sometimes referred to as a pyramid of vision, with the image capture position being the vertex of the pyramid and the view area being the base of the pyramid. Although the 3D pyramid sector 400 shown in Figure 4a has a rectangular base perpendicular to the direction vector, the term “pyramid” is intended to encompass bases formed by any closed curves or any regular or irregular base shape, provided that all the ‘sides’ of the pyramid meet at a common vertex. The 3D sector 400 shown in Figure 4a is a specific example of a building block of the editing system described herein.

Figure 4b shows first, second and third exemplary 3D pyramid sectors, 402,404,406 which are arranged so that they have a common vertex position. The 3D pyramids 402, 404,406 may be arranged in the same orientation as they were when originally captured, or their orientation may be changed by the user editing the content but keeping the vertices at the same location In some embodiments, when the virtual environment is first populated with the video content, the 3D pyramid sectors are displayed in their original orientations. Each of the 3D pyramid sectors, 402,404,406 may have different values for the angles a and β and a different length of the direction vector F. Therefore, the size, shape and view distance of each pyramid base may be different for each sector.

When the edited video content is eventually viewed by a user who is consuming the multimedia, that user observes the rendered video content from the position of the vertex of the pyramids, i.e. the position from which each 3D pyramid sector 400 was captured. However, the editor of the content may also choose to view the sectors from different angles. For example, as shown in Figures 4b, 4d, 5a and 5b, the editing user may view the content from “outside” of the 3D sectors. The editing user may select. modify and change the orientation of the sectors from this viewpoint. The editing use may also cause their vievφoint to be rotated or zoomed in or out.

The three sectors shown in Figure 4b are narrow and fill only a small part of the 3D space around the vertex (x, y, z). The sectors 402,404,406 may represent a live image video capture intended to be incorporated into a computer generated 3D environment. The example of Figure 4b in which a few narrow sectors are shown is to illustrate the principle of the editing system. In a more realistic implementation, the sectors would be larger, adjacent to each other and cover wide parts of the space around the vertex. Computer generated sectors may also be created and added to the virtual reality environment to be displayed with the video capture sectors. For example. Figure 4d shows an example in which the angle of riew of the second and third sectors 404,406 has been changed, while the first sector 402 remain in the same orientation. A fourth sector 408, which contains computer generated content is added and is also orientated so that the vertex of the pyramid 408 is co-located with the vertices of the other pyramids.

Figure 4c shows another example of video content in the virtual reality environment. For clarity only two dimensions of the pyramids are shown. The example consists of six pyramids which occupy the whole of the area around the vertex (x, y, z). Representations of the video content in each sector are shown. For example, the video content may have been captured in all six sectors simultaneously with a 360 degree camera and then divided into sectors for edition. Alternatively, a camera with a 60 degree viewing angle may have been used to film the content in each sector individually, and the sectors subsequently combined in the virtual reality environment. The sectors together may cover the whole 360 degree environment.

Figure 5a illustrates embodiments in which the video content is 3D content whieh has been filmed with a stereoscopic camera. The content in each sector has been processed to create a 3D model with objects at varying distances from the vertex x,y,z. This may be achieved using Lidar or some other suitable technology utilizing the depth information about the objects. Figure 5a shows a sector 502 and a sector 504, both containing 3D models of their respective captured scenes. As before, when editing the content, the editing user may view the scene from “outside” of the sectors 502,504.

This allows the three dimensional nature of the content in the sectors (in particular the distance of the 3D models from the vertex) to be better appreciated.

Figure 5b shows an arrangement of four 3D pyramid sectors each containing three dimensional video content. First to third 3D pyramid sectors, 402,404,406 are shown in the same altered orientation as in Figure 4d. Another sector is added consisting of two parts 506,508. This other sector is composed of two source sectors containing content taken from two different filming sequences. From the first filming sequence the closest part 506 has been taken and from the second filming sequence the farthest part 508 has been taken. The first and second parts 506,508 may have been captured sequentially in the position shown in Figure 5b, or one or both of the parts may have been captured in a different orientation. In some other embodiments, a sector may be constructed which consists of three or more source sectors. Each source sector may represent a different depth region of the 3D content, or the source sectors may partially or completely overlap. The editing user may adjust and select the depth position (the distance along the direction vector) at which the sectors are stitched together.

The immersed user 108 may perform a number of different editing operations on the displayed content. For example, the user may adjust the direction of the direction vector ^ so that a sector appears in a different orientation from that in which it was captured (as shown in Figure qd). The user may also change the size of one or both of angles a and β, so that the image is magnified or shrunk (where a and β are changed in proportion) or distorted in one direction (where a and β are not changed in proportion). Where the sectors contain three dimensional content, the user may also change the length of the direction vector ^ so that the focus of the image is changed. In this way, the user can choose the viewing distance from the camera so that different objects come into focus, since the area that will be in focus is represented by the base of the pyramid.

Various other editing operations may be performed on the sectors, either individually, in groups or on all of the sectors. These include adjusting the brightness, contrast, balance and/or colour hue in various parts of the spectrum. The editing user may also cause one or more of the sectors to be copied and may then paste copied sectors elsewhere in the virtual reality environment.

All of the features described above allow the editing user to intuitively and effectively handle three dimensional video content. The present invention allows the user to perform the video editing from inside the 3D virtual environment and also provides a number of editing methods for use in the 3D environment which utilise the spatial dimensions of that environment to allow quick and intuitive editing of three dimensional video and image content.

The editing user 108 may enable or disable a “snap” feature, which causes the vertex of a three dimensional sector which is being moved to locate precisely to the central vertex position. The central vertex position is the location from which the content will eventually be viewed. Once the sector has snapped to the central vertex position, the editor may change the orientation of the sector, while its vertex remains at the central vertex position. This feature allows the editing user to quickly populate the virtual environment with multiple sectors.

The immersed user may also view and adjust the timelines of the video content in each 3D pyramid sector, as will now be described with reference to Figures 6a-f and ya-c.

In the embodiments represented by Figures 6a-f, a 360 degree video has been captured in a number of 3D sectors. For the present description, it will be assumed that the video is a 2D video (or a 3D video which has been transferred into a 2D projection format for the editing phase). The timelines are represented as a stack of still frames. The frame stacks shown in Figure 6a-6f have been simplified into rectangular prisms for clarity of illustration. When implemented in the virtual reality environment, the frame stacks retain the pyramid 3D sector shape shown in Figure 4d for example, such that the frames which are further away form the virtual point of view of the editor appear larger. The vertex of the pyramid may be removed such that the stacks have the shape of a truncated pyramid or frustum (see for example figure 7a).

Two of these timelines 600, 602 are shown adjacent one another in Figure 6a. The editing user 108 uses the virtual reality content rendering device 102 and one or more user interface devices 110 (or the motion sensor device 106) to view and manipulate the timelines 600, 602 in the virtual reality environment. The timelines 600, 602 are initially displayed synchronously, i.e. such that frames having the same timecodes are next to each other.

The user may then provide selection and editing inputs via the user interface device 110 or by making a gesture that is detected by the motion sensor device 106 in order to scroll through the displayed timelines 600, 602. The user may therefore change the relative positions of the two stacks in order to change the time offset between the corresponding timelines. The editing user may select either timeline and cause it to be moved backwards or forwards along the direction vector. Figure 6b shows the two timelines 600, 602 after the user has scrolled to a later point in the second timeline 602, by selecting the second timeline 602 and causing it to be moved backwards relative to the first timeline 600. The bold line 604 in Figure 6b shows the frame in stack 600 which is now in synchronicity with the first frame of stack 602.. In this way the user can easily manipulate the timelines and compare the resulting image frames side by side to choose a time offset at which to combine the two sectors. To assist with this determination, and to view the resulting video combination, the user may scroll along both offset timelines simultaneously. Figure 6c shows how the user can scroll along the two timelines. For example, a cursor like coloured line 606 may be superimposed onto the upper side and/or sides of the frames to indicate the selected position. While scrolling the time lines, the two selected, adjacent, synchronized frame contents are shown and the remainder of the timelines in front are shown as wireframes. One can scroll along the two timelines without the wireframe, as in Fig 6d. The editing user can therefore easily see the resulting combination of video frames. Figures 6e and 6f show a generalised case where a number of 3D timeline stacks (again simplified as rectangular prisms) having different frame sizes and varying clip lengths are combined. As with the previous example, a cursor like coloured line 608 maybe superimposed onto the upper side and/or sides of the frames to indicate the selected position. The cursor line 608 in Figure 6e illustrates a point in the time lines selected by the editing user 108. The user 108 may also select and move (along the time axis) individual ones of the timelines and change the time offsets between the timelines, as previously discussed. Having selected a particular point in one timeline, the user 108 can then view all the frames syncronized to that point of time and select a new subsector 610 for further use, as shown in Figure 6f.

Figures 7a, 7b, 7c and 7d show another embodiment of how the timelines may be presented and manipulated inside the virtual reality environment. Again, for the sake of clarity, we will assume the captured footage is in 2D, or that it was captured in 3D but in the editing phase, a 2D projection of it is used. Figure 7a illustrates a simplified representation of a case with video footage shot in three horizontal sectors 700, 702, 704. In the full sphere case several such sectors would cover a sphere around the editing user 108. It is not necessary that the sectors have been shot separately. In some embodiments, a full 360 degree camera (such as the OZO system) records a video in full sphere. Later, in the editing phase, the user 108 splits the content into sectors, so that different actions (effects etc.) can be applied separately onto the sectors. The shaded image frames 701, 703, 705 represent the first frame in each timeline and all initially have the same timestamp.

Figures yb-d show a two dimensional representation of the same sectors, viewed from above. Figures 7a and 7b illustrate a situation where the content in all the sectors 700, 702, 704 is in the original time synchronicity, as it was when the footage was captured. Several frames of each sector are illustrated.. The user may then provide selection and editing inputs via the user interface device 110 or by making a gesture that is detected by the motion sensor device 106 in order to pull the middle sector 702 inwards (towards the users viewpoint) and thus cause a timecode difference between the middle sector 702 and the remaining sectors 700, 704, as shown in Figure 7c. The difference is one frame in this example, such that the second frame 706 of the middle sector 702 is aligned with the first frames 701, 703 of the other sectors 700, 704..

In some embodiments, when the middle sector 702 is pulled inwards, the size of the front most image frame 703 and the subsequent image frames is decreased, as shown in Figure 7c. This allows the immersed user 108 to see which new frame of the middle sector 702 corresponds to the time incident of the remaining sectors 700,704, as this frame will have the same size as the adjacent frames in the remaining sectors 700, 704. The effect for the editing user 108 may be that the frames of the middle sector 702 appear to move towards and underneath them, while shrinking in size. At some limit, the closest frames may vanish from view or become blurred. Once the user 108 has aligned the desired frame 706 in the middle sector 702 with the remaining frames 701, 705, they may confirm their selection and view the resulting composition, as shown in Figure yd.

The process begins at step 800, in which the virtual environment is generated. The virtual environment is generated by the virtual reality content providing device 104, and in particular by the processor 202 in conjunction with the memory 204. At step 802 the virtual environment is populated with video content. The video content is stored in the memory 204 of the virtual reality content providing device 104 and rendered by the rendering software 205 in conjunction with the processor 202. The generated virtual environment and the video content within the environment are communicated to the virtual reality content rendering device 102 via the communication ports 208,308. The editing user 108 wears the virtual reality content rendering device 102 to view and edit the video content.

At step 804 a selection input is received. The selection input is received and interpreted by the virtual reality content providing device 104. The selection input relates to one of the plurality of three dimensional sectors. The selection input maybe received via the motion sensor 106 in response to a gestural movement performed by the editing user 108. This gestural movement maybe an arm or hand movement for example. The selection input may alternatively or in addition, be received via a user interface device 110, such as a handheld controller. The user may wear VR gloves, either to allow their hand and finger movements to be tracked more accurately or to act as a t5φe of user interface device 110. The VR gloves may also provide haptic feedback to the user that their inputs are registered. In some embodiments, multiple three dimensional sectors maybe selected simultaneously. Therefore step 804 maybe followed by further selection steps before step 806 is performed.

At step 806, in response to receiving the selection input, the virtual reality content providing device 104 identifies the three dimensional sector to which the input relates and causes it to be selected. The way in which the selected three dimensional sector is rendered in the virtual environment may be changed to show the editing user 108 that the selection input has been successful. For example, the selected three dimensional sector may be provided with a colored outline or may periodically changed in brightness or transparency.

At step 808 an editing input is received. The editing input is received and interpreted by the virtual reality content providing device 104. The editing input relates to the previously selected three dimensional sector or sectors. The editing input may be received via the motion sensor 106 in response to a gestural movement performed by the editing user 108. This gestural movement may be an arm or hand movement for example or a combination or sequence of different arm and/or hand movements. The editing input may alternatively or in addition, be received via a user interface device 110, such as a handheld controller. The user may wear VR gloves, either to allow their hand and finger movements to be tracked more accurately or to act as a type of user interface device no. The VR gloves may also provide haptic feedback to the user that their inputs are registered. The editing input may take one of a number of forms depending on the type of editing command desired by the user.

At step 810, in response to receiving the editing input, the virtual reality content providing device 104 causes a visual property of the selected three dimensional sector(s) to be changed. As previously discussed, this change may relate to the position of the three dimensional sector, the size and aspect ratio of the three dimensional sector, the focus of the three dimensional sector, particularly where the sector contains 3D information and/or the brightness, contrast or colour hue of the three dimensional sector. The editing input may also cause the selected three dimensional sector to be copied, pasted or deleted.

In the embodiments described above, the video content being editing was pre-recorded 2D or 3D content. However, the invention is equally applicable to real-time situations, such as a live broadcast or video streaming. As an example, consider the live virtual reality broadcast of an opera. The director/editor views the content, divided into three dimensional sectors as described above, in the live shooting phase. The editor may then, for example, eliminate one sector from a first camera output {e.g. the sector where the camera crew is visible) and substitute that with another sector output from a second camera. The output of the second camera may be substantially similar to the deleted sector, but without the camera crew, or it may show something different, such as a background which compliments the scene from the first camera. The live output of the broadcast is then the first camera, without the one deleted sector, and the substituted sector from the second camera.

Claims

Claims

1. Apparatus configured to: generate a three dimensional virtual environment for editing video content; populate the generated three dimensional virtual environment with video content, wherein the video content is divided into a plurality of pyramid shaped three dimensional sectors, and wherein the vertex of each sector corresponds to the virtual position of an imaging device which captured or created the corresponding video content and sides of each sector define the field of view of the imaging device; receive a selection input from a user interface, and in response to the selection input, cause a first one of the plurality of three dimensional sectors to be selected in the three dimensional virtual environment; and receive an editing input from a user interface, and in response to the editing input, cause one or more visual properties of the first three dimensional sector to be changed.
2. Apparatus according to claim i, wherein a base of each three dimensional sector is rectangular.
3. Apparatus according to claim 2, wherein each pyramid shaped three dimensional sector is defined by the vertex located at the virtual position of the imaging device, a direction vector and first and second orthogonal angles.
4. Apparatus according to any preceding claim, wherein if the video content defined by one of the plurality of three dimensional sectors is two dimensional, or a two dimensional projection of three dimensional content, then the video content occupies a base of that three dimensional sector.
5. Apparatus according to any preceding claim, wherein if the video content defined by one of the plurality of three dimensional sectors is three dimensional, then the video content occupies the three dimensional space defined by that three dimensional sector.
6. Apparatus according to any preceding claim, wherein the video content defined by at least one of the plurality of three dimensional sectors is three dimensional video content. 7· Apparatus according to claim 6, wherein the visual property of the selected first three dimensional sector is the length of the direction vector.
8. Apparatus according to any preceding claim, wherein the visual property of the selected first three dimensional sector is the direction of the direction vector.
9. Apparatus according to any preceding claim, wherein the visual property of the selected first three dimensional sector is the position of the vertex.
10. Apparatus according to claim 3, wherein the visual property of the selected first three dimensional sector is the size of the first angle and/or the size of the second orthogonal angle.
11. Apparatus according to any preceding claim, wherein the selection input and/or editing input comprises a gestural input.
12. Apparatus according to any preceding claim, wherein the selection input and/or editing input comprises a user interface input from one or more handheld controllers.
13. Apparatus according to any preceding claim, wherein the apparatus is configured to receive a second editing input, and in response to the second editing input, cause a computer generated three dimensional sector containing video content to be added to the three dimensional virtual environment.
14. Apparatus according to any preceding claim, wherein the video content is created as 360 degree content and subsequently divided into the plurality of three dimensional sectors.
15. Apparatus according to any of claims 1 to 13, wherein the video content is created as the plurality of three dimensional sectors.
16. Apparatus according to any preceding claim, wherein the apparatus is further configured to cause a first timeline of the video content corresponding to the first three dimensional sector to be displayed such that successive frames of the video content are arranged along the direction vector of the first three dimensional sector and such that the frames of the video content are perpendicular to the direction vector.
17. Apparatus according to claim 16, wherein the apparatus is further configured to receive a third editing input, and in response to the third editing input, cause a different one of the successive frames of the video content to be displayed.
18. Apparatus according to claim 17, wherein the apparatus is further configured to: receive a second selection input from a user interface, and in response to the second selection input, cause a second one of the plurality of three dimensional sectors to be selected in the three dimensional virtual environment; cause a second timeline of video content corresponding to the second three dimensional sector to be displayed such that successive frames of the video content are arranged along a direction vector of the second three dimensional sector, and such that the frames of the video content are perpendicular to the direction vector; and receive a fourth editing input, and in response to the fourth editing input, cause a time offset between the first and second timelines to be changed.
19. A method comprising: generating a three dimensional virtual environment for editing \ideo content; populating the generated three dimensional virtual environment with video content, wherein the video content is divided into a plurality of pyramid shaped three dimensional sectors, and wherein the vertex of each sector corresponds to the virtual position of an imaging device which captured or created the corresponding video content and sides of each sector define the field of view of the imaging device; receiving a selection input from a user interface, and in response to the selection input, causing a first one of the plurality of three dimensional sectors to be selected in the three dimensional virtual environment; and receiving an editing input from a user interface, and in response to the editing input, causing one or more visual properties of the first three dimensional sector to be changed.
20. Method according to claim 19, wherein a base of each three dimensional sector is rectangular.
21. Method according to claim 20, wherein each pyramid shaped three dimensional sector is defined by the vertex located at the virtual position of the imaging device, a direction vector and first and second orthogonal angles.
22. Method according to any of claims 19 to 21, wherein if the video content defined by one of the plurality of three dimensional sectors is two dimensional, or a two dimensional projection of three dimensional content, then the video content occupies a base of that three dimensional sector.
23. Method according to any of claims 19 to 22, wherein if the video content defined by one of the plurality of three dimensional sectors is three dimensional, then the video content occupies the three dimensional space defined by that three dimensional sector.
24. Method according to any of claims 19 to 23, wherein the video content defined by at least one of the plurality of three dimensional sectors is three dimensional video content.
25. Method according to claim 24, wherein the visual property of the selected first three dimensional sector is the length of the direction vector.
26. Method according to any of claims 19 to 25, wherein the visual property of the selected first three dimensional sector is the direction of the direction vector.
27. Method according to any of claims 19 to 26, wherein the visual property of the selected first three dimensional sector is the position of the vertex.
28. Method according to claim 21, wherein the visual property of the selected first three dimensional sector is the size of the first angle and/or the size of the second orthogonal angle.
29. Method according to any of claims 19 to 28, wherein the selection input and/or editing input comprises a gestural input.
30. Method according to any of claims 19 to 29, wherein the selection input and/or editing input comprises a user interface input from one or more handheld controllers.
31. Method according to any of claims 19 to 30, wherein the method further comprises receiving a second editing input, and in response to the second editing input, causing a computer generated three dimensional sector containing video content to be added to the three dimensional virtual environment.
32. Method according to any of claims 19 to 31, wherein the video content is created as 360 degree content and subsequently divided into the plurality of three dimensional sectors.
33. Method according to any of claims 19 to 31, wherein the video content is created as the plurality of three dimensional sectors.
34. Method according to any of claims 19 to 33, wherein the method further comprises causing a first timeline of the video content corresponding to the first three dimensional sector to be displayed such that successive frames of the video content are arranged along the direction vector of the first three dimensional sector and such that the frames of the video content are perpendicular to the direction vector.
35. Method according to claim 34, wherein the method further comprises receiving a third editing input, and in response to the third editing input, causing a different one of the successive frames of the video content to be displayed.
36. Method according to claim 35, wherein the method further comprises: receiving a second selection input from a user interface, and in response to the second selection input, causing a second one of the plurality of three dimensional sectors to be selected in the three dimensional virtual environment; causing a second timeline of video content corresponding to the second three dimensional sector to be displayed such that successive frames of the video content are arranged along a direction vector of the second three dimensional sector, and such that the frames of the video content are perpendicular to the direction vector; and receiving a fourth editing input, and in response to the fourth editing input, causing a time offset between the first and second timelines to be changed.
37. Computer-readable instructions which, when executed by an apparatus, cause the computing apparatus to perform methods as claimed according to any of claims 19 to 36.
38. Computer-readable instructions according to claim 37, wherein the computer-readable instructions are comprised in a non-transitoiy computer readable medium.
39. A computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by a least one processor, cause performance of at least: generating a three dimensional virtual environment for editing \ideo content; populating the generated three dimensional virtual environment with video content, wherein the video content is divided into a plurality of pyramid shaped three dimensional sectors, and wherein the vertex of each sector corresponds to the virtual position of an imaging device which captured or created the corresponding video content and sides of each sector define the field of view of the imaging device; receiving a selection input from a user interface, and in response to the selection input, causing a first one of the plurality of three dimensional sectors to be selected in the three dimensional virtual environment; and receiving an editing input from a user interface, and in response to the editing input, causing one or more visual properties of the first three dimensional sector to be changed.
40. Apparatus comprising: means for generating a three dimensional virtual environment for editing video content; means for populating the generated three dimensional virtual environment with video content, wherein the video content is divided into a plurality of pyramid shaped three dimensional sectors, and wherein the vertex of each sector corresponds to the virtual position of an imaging device which captured or created the corresponding video content and sides of each sector define the field of view of the imaging device; means for receiving a selection input from a user interface, and in response to the selection input, causing a first one of the plurality of three dimensional sectors to be selected in the three dimensional virtual environment; and means for receiving an editing input from a user interface, and in response to the editing input, causing one or more visual properties of the first three dimensional sector to be changed.
41. Apparatus according to claim 40, wherein the apparatus comprises: means for receiving a second editing input; and means, responsive to receiving the second editing input, for causing a computer generated three dimensional sector containing video content to be added to the three dimensional virtual environment.
42. Apparatus according to claim 40 or claim 41, comprising means for divided the video content into the plurality of three dimensional sectors.
43. Apparatus according to any of claims 40 to 42, further comprising means for causing a first timeline of the video content corresponding to the first three dimensional sector to be displayed such that successive frames of the video content are arranged along the direction vector of the first three dimensional sector and such that the frames of the video content are perpendicular to the direction vector.
44. Apparatus according to claim 43, further comprising: means for receiving a third editing input; and means, responsive to receiving the third editing input, for causing a different one of the successive frames of the video content to be displayed.
45. Apparatus according to claim 44, further comprising: means for receive a second selection input from a user interface; means, responsive to receiving the second selection input, for causing a second one of the plurality of three dimensional sectors to be selected in the three dimensional virtual environment; means for causing a second timeline of video content corresponding to the second three dimensional sector to be displayed such that successive frames of the video content are arranged along a direction vector of the second three dimensional sector, and such that the frames of the video content are perpendicular to the direction vector; means for receiving a fourth editing input; and means, responsive to receiving the fourth editing input, for causing a time offset between the first and second timelines to be changed.