EP3120541A1

EP3120541A1 - Encoding and decoding of three-dimensional image data

Info

Publication number: EP3120541A1
Application number: EP15715495.6A
Authority: EP
Inventors: Anand Avinash Jayanth CHANGA
Original assignee: Individual
Current assignee: Individual
Priority date: 2014-03-18
Filing date: 2015-03-18
Publication date: 2017-01-25
Also published as: NL2012462A; NL2012462B1; WO2015142174A1

Abstract

The various aspects and embodiments thereof relate to coding of stereoscopic omnidirectional data in a container that may be conveniently used for further coding and transmission by means of legacy technology. The container may comprises image data acquired by means of multiple cameras, located at substantially the same location, of which the camera views cover substantially a full omnidirectional view. From data in the containers thus received at another side, omnidirectional views may be created for a left observation point and a right observation point, for example a pair of eyes. Image spheres may be constructed based on data in the containers and a virtual viewpoint may be presented near the centres of the spheres. Alternatively, data in the containers may be mapped directly to images to be shown. Observation data comprising the position of the observation points may be derived by means of a position sensor.

Description

ENCODING AND DECODING OF THREE-DIMENSIONAL IMAGE DATA

TECHNICAL FIELD

The various aspects relate to encoding and decoding of three- dimensional image data for rendering of data for display of data to enable a user to perceive a three-dimensional view of a scenery.

BACKGROUND

Presenting a three-dimensional view of a scenery to a user is known for example in cinemas, where people are presented with spectacles with polarised lenses. This allows a right view to be delivered to the right eye and the left view to the left eye. Data for delivery is acquired by a two-way camera system, used for filming.

Presentation may also be provided by means of a two-display system that may be worn over the head of a user. By sensing movements of the head, the image provided may be different, in accordance with a new position of the head.

In the International patent application no. WO2012/166593 a system and method for capturing video of a real-world scene is disclosed enabling the creation of a navigable, panoramic three-dimensional virtual; reality environment. Two optical elements are used, each for imaging different viewpoints of the real-world scene, and combining them in a stereoscopic view. The optical elements may be provided in the form of parabolic reflectors, or elements spaced equidistant around a circle, however such optical set ups are not capable of generating a omnidirectional image.

In US patent application no. 2013/0201296A1 a multi-camera head is disclosed, comprising a head frame and a plurality of stereo cameras mounted to the head frame and arranged around an axis. On top of the head frame a stereo camera is mounted. US patent application no. 2007/0201296A1 relates to a memory arrangement including an interface configured to transmit data in the form of data packets according to a predefined protocol. SUMMARY OF THE INVENTION

It would be advantageous to provide a method of providing stereoscopic image data for constructing a stereoscopic image of a scene, being an actual scene, based on omnidirectional data acquired by means of cameras. A method of constructing a stereoscopic view of the scene, based on the stereoscopic image data provided, would be appreciated as well.

A first aspect provides a method of providing stereoscopic image data for constructing a stereoscopic image of a scene. The method comprises receiving a first multitude of left images from at least one left image capturing device, the captured left images forming a substantially omnidirectional image data set representing the scene from a left point of view and receiving a second multitude of right captured images from at least one right image capturing device, the captured right images forming a substantially omnidirectional image data set representing the scene from a right point of view. The left captured images are mapped in a left frame comprising left compound image data and the right images are mapped in a right frame comprising right compound image data. The left frame and the right frame are communicated.

A second aspect provides a method of constructing a stereoscopic view of a scene. The method comprises receiving a left frame comprising data representing a substantially omnidirectional image data set representing the scene from a left point of view, receiving a right frame comprising data representing a substantially omnidirectional image data set representing the scene from a right point of view and receiving virtual observer data comprising data on a virtual observer position relative to the scene. Based on the virtual observer position, left view data comprised by the left frame is determined and based on the virtual observer position, right view data comprised by the right frame is determined. The left view data and the right view data as stereoscopic view data of the scene are provided to a display arrangement.

A third aspect provides a computer programme product comprising computer executable code enabling a computer programmed with the computer executable code to perform the method according to the first aspect.

A fourth aspect provides a computer programme product comprising computer executable code enabling a computer programmed with the computer executable code to perform the method according to the second aspect.

A fifth aspect provides a device for providing stereoscopic image data for constructing a stereoscopic image of a scene. The device comprises a data input module arranged to receive a first multitude of left images from at least one left image capturing device, the captured left images forming a substantially omnidirectional image data set representing the scene from a left point of view and to receive a second multitude of right captured images from at least one right image capturing device, the captured right images forming a substantially omnidirectional image data set representing the scene from a right point of view. The device further comprises a processing unit arranged to map the left captured images in a left frame comprising left compound image data and map the right images in a right frame comprising right compound image data. The device also comprises a data communication module arranged to communicate the left frame and the right frame.

A sixth aspect provides a device for constructing a stereoscopic view of a scene. The device comprises a data input module arranged to receive a left frame comprising data representing a substantially omnidirectional image data set representing the scene from a left point of view, receive a right frame comprising data representing a substantially omnidirectional image data set representing the scene from a right point of view and receive virtual observer data comprising a data on a virtual observer position relative to the scene. The device further comprises a processing unit arranged to, based on the virtual observer position, determine left view data comprised by the left frame, and, based on the virtual observer position, determine right view data comprised by the right frame. The device also comprises a data communication module arranged to provide the left view data and the right view data as stereoscopic view data of the scene to a display arrangement.

BRIEF DESCRIPTION OF THE FIGURES

The various aspects and embodiments thereof will now be discussed in conjunction with Figures. In the Figures,

Figure 1 : shows a device for encoding image data;

Figure 2: shows a device for decoding image data and constructing three-dimensional view data;

Figure 3: shows a first flowchart;

Figure 4A: shows a camera rig;

Figure 4B: shows a three-dimensional view of the camera rig; Figure 5A: shows a view mapping cube;

Figure 5B: shows a plan of the view mapping cube;

Figure 5C: shows a frame comprising three-dimensional image data;

Figure 6: shows a second flowchart;

Figure 7A: shows an image sphere;

Figure 7B: shows a top view of the image sphere;

Figures 8A-8D other embodiments of a device mounted to a camera rig according to the invention; and

Figures 9A-9C other embodiments of a camera rig according to the invention. DETAILED DESCRIPTION OF THE INVENTION

Figure 1 shows an image coding device 100. The image coding device 100 is coupled to a left image capturing device (camera module) 152 and a right image capturing device (camera module) 154. The image coding device 100 comprises a data input unit 1 12, a first buffer 1 14, a stitching module 1 16, a second buffer 1 18, an encoder module 120, a third buffer 122, a data output unit 124, an encoding processing module 102 and an encoding memory module104.

The encoding processing module 102 is arranged for controlling the various components of the image coding device 100. In another embodiment, all functions carried out by the various modules depicted in Figure 1 are carried out by an encoding processing module specifically programmed to carry out the specific functions. The encoding processing module 102 is coupled to a first memory module 104. The first memory module 104 is arranged for storing data received by means of the data input unit 1 12, data processed by the various components of the image coding device 100 and computer executable code enabling the various components of the image coding device 100 to execute various methods as discussed below.

The functionality of the various components will be further elucidated in conjunction with flowcharts that will be discussed below. Optional intermediate buffers are provided between various components to enable smooth transfer of data.

Figure 2 shows an image decoding device 200. The image decoding device 200 is coupled to a personal display device 250 comprising a right display 252 for displaying a right image to a right eye of a user and a left display 254 for displaying a left image to a left eye of a user.

The image decoding device 200 comprises a data receiving module 212, a fourth buffer 214, a decoding module 216, an data mapping module 218, a view determining module 220, a rendering module 222 and a data output module 224. The image decoding device further comprises a decoding processing module 202 and a decoding memory module 204.

The decoding processing module 102 is arranged for controlling the various components of the image coding device 100. In another embodiment, all functions carried out by the various modules depicted in Figure 1 are carried out by a decoding processing module specifically programmed to carry out the specific functions. The decoding processing module 102 is coupled to a first memory module 104. The first memory module 104 is arranged for storing data received by means of the data input unit 1 12, data processed by the various components of the image coding device 100 and computer executable code enabling the various components of the image coding device 100 to execute various methods as discussed below.

Figure 3 shows a first flowchart 300 for encoding image data. The method depicted by the first flowchart 300 may be executed by the image encoding device 100. The list below provides a short summary of the components of the first flowchart 300

302 start procedure

304 Receive left image data

306 receive right image data

308 stitch left image data

310 stitch right image data

312 encode left image data

314 encode right image data

316 prepare image data for transfer

318 send image data

320 end procedure The process starts in a terminator 302. This is preferably combined or performed after initialization of the image coding device 100. Subsequently, the image coding device 100 receives a left image from the left image capturing device or left camera module 152 in step 304 and a right image from the right image capturing device or right camera module 154 in step 306. The images are received by means of the data input unit 1 12 and subsequently stored in the first buffer 1 14 for transmission to the stitching module 1 16. The left image capturing device of left camera module 152 and the right image capturing device or right camera module 154 are arranged to provide substantially omnidirectional image data of a scenery. Omnidirectional may be interpreted as 360 degrees around a camera module, with a pre- determined angle defining an upper limit and a lower limit. Such angle may have a value of 90 degrees. Alternatively, omnidirectional may be interpreted as 360 degrees around a camera module, with no upper and lower limit. This means the pre-determined viewing angle is 180 degrees. Such omnidirectional image acquisition is also known as 360-360 or full-spherical data acquisition.

To obtain omnidirectional image data of a scenery, the left image capturing device or left camera module 152 and the right image capturing device or right camera module 154 preferably comprise six cameras each or more. In will be clear that for each left or right image capturing device (camera module) any image sensing or image capturing device can be implemented, such as a camera module (e.g. light-field cameras, also called plenoptic cameras, CCD cameras, night vision cameras, EBCMOS cameras, etc.) or a photo sensor, all capable of capturing images in the visible and invisible wavelength domain of the human eye.

And in an even more preferred embodiment, the left camera module 152 and the right camera module 154 are integrated. This is depicted by Figure 4A and Figure 4B. Figure 4A shows a side view of an omnidirectional stereoscopic camera module 400. Figure 4B shows a three- dimensional view of the omnidirectional stereoscopic camera module 400. The omnidirectional stereoscopic camera unit 400 comprises a first left camera 402, a second left camera unit 412, a third left camera unit 422, a fourth left camera unit 432, a fifth left camera unit 442 and a sixth left camera module as the left camera module 152. The omnidirectional stereoscopic camera unit 400 further comprises a first right camera 404, a second right camera unit 414, a third right camera unit 424, a fourth right camera unit 434, a fifth right camera unit 444 and a sixth right camera module as the right camera module 154. The omnidirectional stereoscopic camera module 400 comprises a mounting unit 450 for mounting the omnidirectional stereoscopic camera module 400 to a tripod or a similar device envisaged for the same purpose.

In this embodiment, the camera modules comprise six cameras each. Alternatively, each camera module comprises only one camera. Single omnidirectional cameras, however, are exotic devices. For 360 degree data acquisition, catadioptric cameras are an option. When omnidirectional is truly 360-360 omnidirectional, preferably at least two cameras are used for the left camera module 152 and two for the right camera module 152. In such scenario, fish eye cameras with a viewing angle of over 1 80 may be used. But in this preferred embodiment, six cameras are used per camera module, with a substantially square angle between adjacent cameras. A viewing angle of ninety degrees per camera is in theory sufficient for capturing an omnidirectional image, though a somewhat larger viewing angle is preferred to create some overlap. This theory, however, departs from an assumption that images are captures from one and the same position, with focus points of the cameras being located at one and the same position. Therefore, in practice, a slightly larger viewing angle of the cameras is preferred when using the set up as depicted by Figure 4A and Figure 4B. When using cameras with a viewing angle of 120 degrees, using four cameras is sufficient, where each camera is positioned under an angle of 120 in relation to each adjacent camera.

If the omnidirectional image data is acquired by multiple camera units, a first image data set is formed for all left image data and a right image data set is formed for all right image data. With the acquisition of image data - visual and audio data, also further data may be acquired. For example, location data may be acquired by means of a GPS (Global Positioning System) data reception unit.

Per side - left or right - six images are acquired from each camera module. In this embodiment, image data comprises visual data and may comprise audio data as well. Each of the cameras of the camera modules may be equipped with a microphone for capturing audio data. If that is the case the audio data is coupled to visual data acquired by the same camera. For a significant amount of processing steps discussed in this description, audio data is processed in a way similar to visual data. If this is not the case, this may be explained in further detail below with respect to audio data.

Figure 5A shows an image data cube 500. Figure 5B shows a plan of the image data cube 500. The image data cube visualises how image data is captured. An assumption is the image data cube 500 is acquired for a left view, depicting the first image data set. At the top, indicated by number 5, data is acquired by the a fifth left camera unit 442. At the front, indicated by number 1 , data is acquired by the first left camera unit 402. Number 2 corresponds to the third left camera unit 422, number 3 to the fourth left camera unit 432, number 4 to the second left camera unit, number 5 to the fifth left camera unit 442 and number 6 to the sixth left camera unit 452.

Most data transfer protocols and multimedia transfer protocols in particular are suitable for transfer of visual data in frames. To efficiently transfer the image data acquired, image data and the visual image data in particular is consolidated in a single frame for each data set. The first image data set is consolidated in a first frame and the second image data set is consolidated in a second frame.

Figure 5C shows the first frame 550. In the first frame 550, regions are indicated where data of the image data cube 500 is mapped to. Adjacent regions of the first frame 550 comprise data of adjacent sides of the image data cube 500. This is preferably done using a so-called stitching algorithm. As indicated above, the viewing angle of the left camera units of the left camera module 152 have a viewing angle larger than 90 degrees. This results in an overlap of data captured by adjacent left camera units. This overlap in data may be used for stitching image data acquired by the left camera units into the first frame 550. This results in the first frame 550 as a left image frame comprising substantially omnidirectional image data.

The first frame 550 is obtained in step 308; its right counterpart is obtained in step 310. The first frame 550 and its right counterpart are provided by the stitching module 1 16. In this way, a frame is provided comprising compound image data, with the images acquired by the multitude of left camera units the first frame 550. Thus, the first frame 550 comprises image data from a left point of view, acquired at a specific position. The frames obtained by stitching are stored in the second buffer 1 1 8 for transmittal to the encoder module 120.

This procedure of providing a compound image applies to image data acquired at one specific moment, allowing the first frame 500 to be obtained without requiring additional data acquired before or after the data for the first frame 500 was acquired. This enables real-time processing of the image data.

If a video stream is to be provided, the video stream comprising omnidirectional data, multiple frames may be formed as discussed above by mapping data acquired by camera units as shown in Figure 5C. Subsequently, the frames are encoded by the encoder module 120. The encoding may comprise compressing, encrypting, other, or a combination thereof. The encryption may be inter-frame, for example according to an MPEG encoding algorithm, or intra-frame, for example according to the JPEG encoding algorithm. The left compound frame is encoded in step 312 and the right compound frame is encoded in step 314. The encoded video stream may be incorporated in a suitable container, like a Matroska container.

The audio data is preferably provided in six pairs of audio streams. Alternatively, only one instead of two audio streams is acquired per camera unit. In that case, six audio streams are acquired. The audio data may be compressed and provided with each encoded video stream. In step 316, the encoded video streams may be further processed for transfer of the video data. The further processing may comprise embedding the video data together with audio data in a transport stream for further transport according to the DVB protocol. Subsequently, in step 318, the transport stream is sent out to another device in step 320 by the data output unit 124. Having sent out the transport stream, the procedure ends in a terminator 320. Embedding the video and audio streams - the visual and audio data - in a transport stream is optional. If the data is directly sent to another device further embedding may not be required. If the data were to be sent as part of a broadcasting service, embedding in a DVB stream would be advantageous.

Figure 6 shows a second flowchart 600 depicting a procedure for reconstructing a three dimensional view of a scenery, based on data captured earlier and received. The data used as input for the procedure depicted by the second flowchart 600 may be provided in accordance with the procedure depicted by the first flowchart 300or a similar procedure. The procedure may be executed by the image decoding device 200 depicted by Figure 2. The list below provides a short summary of the components of the second flowchart 600.

602 start procedure

604 receive left frame

606 receive right frame

608 decode left frame

610 decode right frame

612 map left frame to spherical coordinates

614 map right frame to spherical coordinates

616 receive observer data

618 receive left observer position

620 receive right observer position

622 determine left view 624 determine right view

626 generate left view data

628 generate right view data

630 render view data

632 provide rendered data

634 end

The procedure starts in a terminator 602 and continues with reception of left frame data in step 604 and reception of right frame data in step 606. The left frame data and the right frame data may be received directly via the data receiving module 212. Alternatively, the left frame data and the right frame data are obtained from a transport stream obtained by the data receiving module 212.

In step 608, the left frame data is decoded to obtain a left frame comprising omnidirectional data from a left point of view. Likewise, in step 610, the right frame data is decoded to obtain a right frame comprising omnidirectional data from a right point of view. The decoding may comprise decompression, decryption, other, or a combination thereof. The left frame and the right frame are preferably organised ass depicted by Figure 5C.

In step 612, data comprised by the left frame is mapped to spherical coordinates for obtaining a left spherical image. Data of the left frame is in a preferred embodiment provided in a format as depicted by Figure 5C, indicating how omnidirectional data is mapped to a rectangular frame. Figure 7 shows a spherical coordinate system 700 with a sphere 710 drawn in it. Referring to the first frame 550 as shown in Figure 5C and the image data cube 500 as shown by Figure 5A, the data indicated by number 5 is mapped to the top of the sphere, the data indicated by number 6 is mapped to the bottom of the sphere and the data indicated by numbers 1 , 2, 3 and 4 is mapped around the side of the sphere. The data is mapped such that an observer being positioned in the centre of the sphere 710 would observe the image data at the inner side of the sphere, the observer would observe the scenery of which image data is captured by the left camera module 152 and/or the right camera module 154. Likewise, a right spherical image is generated by mapping data comprised by the right frame data to spherical coordinates in a similar way. If the data comprised by the first frame 550 is compressed or encoded otherwise, the data comprised by the left frame 550 is decoded and may be further rendered prior to mapping to spherical coordinates. Such rendering is not to be confused with rendering data for display by a display device.

In step 616, observer data is obtained by the image decoding device 200. The observer data comprises data on a position of a virtual observer relative to image data received. The observer data may comprise a viewing direction, preferably indicated by an azimuth angle and an elevation angle, a distance between a left observation point and a right observation point, a viewing width angle, a position of the left observation point and/or the right observation point relative to the spherical coordinate system 700, a centre position of an observer which is preferably in the middle of the left observation point and the right observation point, other data or a combination thereof.

The left observation point and the right observation point may be considered as a left eye and a right eye of a virtual observer. The observer data may be comprised by data received earlier by means of the data receiving module. Alternatively or additionally, the observer data is obtained from a position sensing module 260. The position sensing module 260 is in this embodiment comprised by the personal display device 250. This is advantageous as the observer data is used for providing a view to a wearer of the personal display device 250 depending on a position of the personal display device 250. The observer data is obtained from the position sensing module 260 by means of an auxiliary input module 230. Certain observer data may also be pre-determined.

In step 618, information on a left observer position 722 is derived from the observer data. In step 620, information on a right observer point 724 is derived from the observer data. The positions are preferably indicated as locations in the spherical coordinate system 700. Data on the left observer position 722 and the right observer position 724 may be directly derived as such from the observer data. Alternatively, the observer data provides information on a centre point of an observer, distance between the left observer position 722 and the right observer position 724 - the interpupillary distance or IPD - and an inclination of a line through the left observer point and the right observer point. From this information, the left observer point and the right observer point may be derived as well. As discussed, some observer data may be pre-determined. The pre-determined data may be fixed in the system or received by means of user input. For example, a pre-determined IPD may be received. A smaller IPD may be received to enable a viewer to perceive displayed image as being smaller than in real life. And a larger smaller IPD may be received to enable a viewer to perceive displayed image as being larger than in real life.

As indicated, the observation data may also comprise azimuth and elevation angle data. With this data, a left view vector 732 and a right view vector 734 are determined. The left view vector 732 starts from the left observer position 722 and the right view vector 734 starts from the right observer position 724. The view vectors indicate a viewing direction. Preferably, the view vectors are parallel. Alternatively, the view vectors may be provided under an angle. Parallel view vectors would indicate a human-like view, whereas view vectors having been provided under an angle would result in generating for example a deer-like view.

The left view has a left viewing angle related to it and the right view has a right viewing angle related to it. With an observer position, a viewing direction and a viewing angle, view data is determined as a part of image data mapped to the sphere 700 that coincides with a cone defined by the observer position as the top, the viewing direction and the viewing angle. For the left view, data comprised by the left spherical image is used and for the right view, data comprised by the right spherical image is used. In this way, left view data is defined by a left view contour 742 on the left spherical image and right view data is defined by a right view contour 744 on the right spherical image. This allows for determination of the left view in step 622 by the view determining module 220 and determination of the right view in step 624 by the view determining module 220.

For the avoidance of doubt, Figure 7A only shows the sphere 700. The sphere 700 indicates the left spherical image as well as the right spherical image. To elucidate this further, Figure 7B shows a top view of the left spherical image, shown by a dash-dot circle 712 and of the right spherical image, shown by a dotted circle 714. Figure 7B also shows a dash-dot triangle 752 indicating a left view and a dotted triangle 754 indicating a right view. Both circles are provided with the same centre point. The left observation point 722 is provided left from the centre of the dash-dot circle 712 indicating the left spherical image. The right observation point is provided right from the centre of the dotted circle 714 indicating the right spherical image.

Having determined the views by means of the view determining module 220, the procedure continues by generating left view data in step 626 and right view data in step 628. Data in the left view contour 742 is captured and provided in a left view renderable image object. Data in the right view contour 744 is captured and provided in a right view renderable image object, of which data may be further rendered for display on a display device. These steps of generating the view data may be executed by the view determining module 220 as well.

Subsequently, the left view renderable image object and the right view renderable image object are rendered for display in step 630 by means of the rendering module 222. The rendered image data is subsequently provided to the personal display device 250 for display by the right display 252 for displaying a right image to a right eye of a user and by the left display 254 for displaying a left image to a left eye of a user. Subsequently, the procedure ends in a terminator 634. Alternatively, data is provided to a projection device for projection on a screen or to a single display. In these scenarios, both left view data and right view data reaches both eyes of a viewer of the data. To still provide the viewer with a three-dimensional viewing experience, other means may be employed to have left view data only reach the left eye and right view data only reach the right eye. This may be enabled by providing left view data with left marker data and the right view data with right marker data. Marker data may be added to the view data by means of the view determining module 220.

Alternatively or additionally, marker data may be applied by means of polarisation filters upon display of the data. The rendered data with the marker data may be projected on one or more rectangular screen. Alternatively, data may be projected on a dome-shaped screen. The dome- shaped screen may have the shape of a hemisphere or even a full sphere. When projecting on a hemisphere and in particular on a sphere, the procedure as discussed in conjunction with the second flowchart 600 may be employed as well, with a viewing angle of 360 degrees.

As an alternative to the procedure discussed in conjunction with the second flowchart 600, the mapping steps and the view determining steps may be replaced by a single selection step. In this single selection step, observer data is used for generating view data from the first frame 550. This operation may be performed by combining a mapping algorithm with a view determination algorithm employed for determining view data form a spherical image, with the observer data as input.

Various parts of the procedure of the second flowchart 600 may be carried out by different components. This may require that steps are carried out in a different order. In a further embodiment, the mapping to spherical coordinates and with that mapping, rendering of image data is performed by a first data handling module. The first data handling module provides the image data - for left and right - mapped to spherical components to a second data handling module, together with observer position data and/or other observer data for the left and right mapped image data. As discussed above, the observer position for determining left and right views is preferably placed off-centre of the spherical images. For the left spherical image, a left observer position is placed left from the centre, viewed in the viewing direction. For the right spherical image, a right observer position is placed right from the centre, viewed in the viewing direction.

The second data handling module subsequently determines view data, based on the mapped image data and the observer data, the latter including the observer position data. Alternatively, the observer position data only comprises a centre position, a viewing direction and, optionally, an inter- pupillary distance (IPD). The right observation point and the left observation point may be determined as discussed above by the second module.

The embodiment discussed directly above may be further implemented with the Oculus Rift serving as the second data handling module implemented on a computer and providing the two displays as depicted in Figure 2. The first data handling module, for mapping the first frame 550 to a spherical image and for providing the observer data to the second module, is one aspect provided here. In such scenario, the first data handling module does not require any sub-module for determining a left view and a right view. However, an image data mapping module for mapping the first frame 550 to a spherical image is preferred to be comprised by such first data handling module.

The various aspects and embodiments thereof relate to coding of stereoscopic omnidirectional data in a container that may be conveniently used for further coding and transmission by means of legacy technology. The container may comprise image data acquired by means of multiple cameras, located at substantially the same location, of which the camera views cover substantially a full omnidirectional view. From data in the containers thus received at another side, omnidirectional views may be created for a left observation point and a right observation point, for example a pair of eyes. Image spheres may be constructed based on data in the containers and a virtual viewpoint may be presented near the centres of the spheres. Alternatively, data in the containers may be mapped directly to images to be shown. Observation data comprising the position of the observation points may be derived by means of a position sensor.

Expressions such as "comprise", "include", "incorporate", "contain", "is" and "have" are to be construed in a non-exclusive manner when interpreting the description and its associated claims, namely construed to allow for other items or components which are not explicitly defined also to be present. Reference to the singular is also to be construed in be a reference to the plural and vice versa. When data is being referred to as audio-visual data, it can represent audio only, video only or still pictures only or a combination thereof, unless specifically indicated otherwise in the description of the embodiments.

In the description above, it will be understood that when an element such as layer, region or substrate is referred to as being "on", "onto" or "connected to" another element, the element is either directly on or connected to the other element, or intervening elements may also be present.

Furthermore, the invention may also be embodied with less components than provided in the embodiments described here, wherein one component carries out multiple functions. Just as well may the invention be embodied using more elements than depicted in Figure 1 , wherein functions carried out by one component in the embodiment provided are distributed over multiple components.

An important aspect of the invention is that the multiple image capturing devices each capture images forming an omnidirectional image data set from a certain view point of the scene to be captured and that said different omnidirectional image data sets are further mapped forming a full omnidirectional view in one process step.

In order to allow said array of individual image capturing devices (such as complete image capturing devices as outline previously in the detailed specification or as separated image sensor/lens-arrays) to capture proper individual image data sets for each individual view point, which then are be combined to form the omnidirectional stereoscopic image, the multiple image capturing devices need to be properly synchronized.

Traditional setups such as wired (or wireless) systems that simulate a "start" or "record" signal are generally not perfectly synchronized, resulting in an offset between the individual devices amounting one or more frames, or one or more milliseconds.

According to the invention the image coding device 1 00 of Figure 1 comprises a power-controlling module 801 . Such embodiment is depicted in Figure 8A. Usually each image capturing device in the array can be provided with a discrete dedicated power source supply. A central power-controlling module 801 in the system or device 100 facilitates the controlled power supply to each of the connected (here five pairs of) left and right image capturing devices 402-412-422-432-152 and 404-414-424-434-154, either as an additional power source supply or as an alternative for each device's own dedicated power source supply. The (here five pairs of, but not limited to five pairs of) left and right image capturing devices 402-412-422-432-152 and 404- 414-424-434-154 are for example mounted to a camera rig 400 as depicted in Figures 4A and 4B.

The central power-controlling module 801 is arranged in controlling and distributing power to the individual image capturing devices 402-412-422-432-152 and 404-414-424-434-154 in the array via power supply lines 802. The power-controlling module 801 itself can be battery-based, wall- socket-supplied (via an external supply line 804 and wall-socket supply 803) or a combination thereof.

In yet another advantageous embodiment of the invention the image coding device 100 of Figure 1 may comprise a signal pulse generation module 81 1 . Such embodiment is depicted in Figure 8B. Said pulse generation module 81 1 generates a synchronization pulse 813, which functions as a generator lock or "genlock", synchronizing all (here five pairs of) left and right image capture devices (either complete cameras or parts thereof) 402-412-422-432-152 and 404-414-424-434-154 in capturing an image in perfect synchronization. The synchronization pulse 813 is transferred via pulse transfer lines 812 to all left and right image capture devices and serves as an activation signal for image capturing. The generator locking of the devices assures an optimal image capturing, resulting in a high-quality merging of the different captured data images during post-processing.

Although in a preferred embodiment the genlock synchronisation for all connected image capturing devices is performed by a separate signal pulse generation module 81 1 of the image coding device 100, in another embodiment the image capturing devices can be set in a MASTER-SLAVE arrangement, where one of the image capturing devices (for example device 152) in the array functions as a "MASTER"-device, to which the other SLAVE devices synchronize. In this embodiment module 81 1 is absent and the MASTER device 152 generates the gen lock pulse 81 3 which is transferred from said MASTER device 152 via the pulse transfer lines 812 to all other SLAVE denoted left and right image capture devices 402-412-422-432 and 404-414-424-434-154, respectively.

Also in Figure 8B the (here five pairs of, but not limited to five pairs of) left and right image capturing devices 402-412-422-432-152 and 404- 414-424-434-154 are for example mounted to a camera rig 400 as depicted in Figures 4A and 4B.

Alternatively the image coding device 100 of Figure 1 may comprise a start/stop module for controlling (starting and stopping) the image capturing of all image capturing devices in the array. Said start/stop module can interact together with the "genlock" signal pulse generation module 81 1 of Figure 8B thus adding a further functionality to the system, in particular in regard to the simultaneously starting/image capturing/stopping of the image capturing devices.

In the embodiment start/stop module is operated independently from the "genlock" module, all captured data images have to be synchronized in post-processing. In yet another embodiment of the invention the image coding device 100 of Figure 1 may comprise a monitoring module 821 for monitoring the current operational status of each image capturing device 402-412-422- 432-152 and 404-414-424-434-154 in the array. Such embodiment is depicted in Figure 8C. The monitoring functionality of the monitoring module 821 is denoted with the signal arrow 823 pointing inwards to the module 821 . In case of a malfunction, for example one image capturing device in the array stops recording (capturing images) or does not start recording (for example upon a "genlock" pulse), the monitoring module 821 is capable in performing certain fault-checks and command protocols. A command protocol could comprise a reboot script, that is sent outwards the module 821 via the command signal 822 to the failing image capturing device to resume the proper capturing-state.

Also in Figure 8C the (here five pairs of, but not limited to five pairs of) left and right image capturing devices 402-412-422-432-152 and 404- 414-424-434-154 are for example mounted to a camera rig 400 as depicted in Figures 4A and 4B.

In yet another advantageous embodiment of the invention the image coding device 100 of Figure 1 may comprise a so-called capture-modes module. Depending on the type of image capturing device being used in the present invention, one or more or even each image capturing device being used can have multiple capture-modes. The image capturing device having multiple capture-modes can set in one of said capture-modes, which defines the current mode of image capturing operation.

In such embodiment a capture-modes module in the image coding device 100 is arranged in verifying - during operation of the device 100 and whilst performing the steps of the method according to the invention - the current or actual status of the capture-mode settings of the respective image capturing device (or even of each of the device) prior to capturing data images, and capable in adjusting (or re-adjusting) the capture-mode settings when required to ensure that all image capturing devices are set to the same capture-mode stings and thus captures the data images based on the same parameter settings.

Such an embodiment can have the same configuration and rig mounting as the embodiment shown in Figure 8C. In this capture-modes embodiment reference numeral 821 of Figure 8C denotes the capture-modes module which is capable of verifying (denoted by the inward directed signal arrow 823) the current or actual status of the capture-mode settings of the respective left and right image capturing devices 402-412-422-432-152 and 404-414-424-434-154. Adjustment or re-adjustment of the capture-mode settings takes place by generating the suitable setting signals in the capture- modes module 821 and transmitting those signals via signal lines 822.

Yet another embodiment is denoted in Figure 8D. In this embodiment the image coding device 100 implements a central docking module 831 to centrally offload the captured image data to a location outside the device 100 for further (post)processing, as opposed to extracting the image data from each individual module separately. The central docking module 831 can be implemented as a standard USB-module to which the respective left and right image capturing devices 402-412-422-432-152 and 404-414-424-434-154 are connected via signal lines 832.

The central docking USB module 831 is arranged in obtaining the captured image data from the respective left and right image capturing devices 402-412-422-432-152 and 404-414-424-434-154 (denoted with the inward directed signal arrow 833). The central docking USB module 831 can be provided with a local memory module for storing the obtained captured image date and/or comprising a multiplex module (MUX) 834 for establishing a data connection with an external data interface (not shown, can be a wired or a wireless data connection) for transferring the captured image data from the central docking USB module 831 for post processing.

The (re)adjusting of the capture-mode settings pertain, but not limiting to, adjusting the frame rate, the resolution, the shutter-speed, the colour-control (colour-saturation), and the light-management settings. The image coding device 100 according to the invention is by default configured for stereoscopic image data capture. The device is modular based, and can be reconfigured to support monoscopic capturing as well. The practical implementation of this monoscopic capturing setting is that a smaller dataset is generated, allowing for quick testing of a specific shot or camera- setup.

The image coding device 100 according to the invention is arranged in depth-mapping for stereoscopic convergence calculation based on an offset between a camera-pair. The data images being captured in the array are being used to calculate the best stitching values based on post- shoot determined depth mapping of captured action. The offset between the left and right data images being captured, as well as light-information and movement-information captured over time can be used to determine the distance between a subject and the focal point of the lenses of the image capturing devices. This distance can in turn be used to improve the quality of the stitching result to be performed by the stitching algorithm, and/or to speed up the processing time.

Similarly, the image coding device 100 according to the invention is arranged in determining a post-calculated depth of field. This is relevant for image capturing setups in which light-field camera, also called a plenoptic camera are implemented. For those setups, the image coding device 100 will be able to dynamically alter the depth of field in the reproduction of the captured image data material.

Alternatively, a combination of the above feature together with an eye-tracking feature in display-devices for real-time depth-of-field. By implementing an user-facing camera, either inside a VR-headset or as an external camera device, a position of an user's eye and eye pupil can be tracked. The image coding device 100 according to the invention is in this arrangement arranged in measuring the position of both pupils of the user, in particular the interpupillary distance as well as the horizontal and vertical orientation of the pupils. Based on these measured pupil position data the image coding device 100 is arranged in determining the correct point of visual convergence for said user. This data is used by the image coding device 100 to manipulate the image data captured with the afore-mentioned post- calculated depth of field technique.

With this feature, the image coding device 100 according to the invention is arranged in calculating and displaying a depth-of-field, which matches the user, which is an improvement compared to having an more common image displayed with a depth of field set to infinity.

Yet another embodiment of the camera rig of Figure 4A and 4B is disclosed in Figure 9A-9C. In particular Figure 9A relates to the mounting unit 450 of Figure 4A and 4B for mounting the omnidirectional stereoscopic camera module 400 to a tripod or a similar device envisaged for the same purpose.

The left and right image capture devices (either complete cameras or parts thereof), which are denoted with reference numerals 402- 412-422-432-442-152 and 404-414-424-434-444-154 respectively in Figures 4A and 4B, and are mounted to the camera rig 400 are all oriented horizontally relative to the ground level. This ensures - during operation of the camera rig - an optimally consistent horizon, which helps with establishing a proper horizontally consistent captured image dataset with all the image capturing devices mounted to the camera rig 400.

This in turn assists in a consistently stitched image in the individual spherical compositions, and consistency between the stereoscopic pair.

Known devices in this field of technology generally exhibit a diagonally oriented configuration relative to the ground level. However in those known configurations the mounting point for a tripod or other supporting system is positioned directly underneath the camera rig. A drawback of such mounted orientation is that, in the resulting stitched image a black circle or other "patch" is used to cover the "hole", that is the area underneath the mounting configuration, where the tripod is positioned. In some known monoscopic camera rigs, the tripod-area can be covered in the overlapping images captured from the various devices in the camera rig and thus obscured. With a stereoscopic system, this is much harder to accomplish, and in many cases not possible.

In all embodiments shown Figures 8A-8D the image capturing device 100 is mounted to a pentagon shaped camera rig 400. The base structure of the pentagon shaped camera rig 400 forms a support plane or support face for the several image capturing devices, denoted with reference numerals 402-412-422-432-152 and 404-414-424-434-154.

However it is noted that this five sided polygon or pentagon embodiment is not limiting for the invention. In fact the camera rig 400 can be construed as any regular polygon, that is equiangular (all inner angles between adjacent sides are equal in measure) and equilateral (all sides have the same length). In any regular polygon shaped camera rig, each side of the regular polygon serves as a view point of the omnidirectional stereoscopic 360° scene to be captured.

Each side of the regular polygon shaped camera rig serves to accommodate a left image capturing device for capturing left images forming a substantially omnidirectional image data set representing the scene from the left point of view of that as well as a right image capturing device for capturing right images forming a substantially omnidirectional image data set representing the scene from the right point of view.

Hence, in the embodiment of a pentagon shaped camera rig 400 of Figures 8A-8D the five pentagon sides each accommodate a pair of left image capturing devices 402-412-422-432-152 and right left image capturing device 404-414-424-434-154 (in total five pairs of devices 152-154; 402-404; 412-414; 422-424; and 432-434).

It will be clear that other embodiments, where the camera rig has a hexagon (6 sides), heptagon (7 sides), octagon (8 sides), decagon (10 sides), etc. etc. configuration, are also feasible, implementing six, seven, eight, ten and more pairs of left and right image capturing devices (total 12, 14, 16, 20, etc.).

As shown in Figures 9A-9C a camera rig 400 is in this embodiment of a (rectangular) cuboid shape consisting of multiple side walls 400a. Each side wall 400a has multiple opening 400' and studs 400" which function as mounting means for mounting the respective left and right image capturing devices 402-412-422-432-152 and 404-414-424-434-154 in a manner as shown in Figures 4A and 4B. The camera rig 400 can also be constructed as a cube having side walls 400a of a similar size and dimension.

Preferably the camera rig 400 is light weight build as each side wall 400a is provided with an opening 400"' thereby obtaining a weight reduction. The camera rig 400 as shown in Figures 9A-9C has an open light weight structure, which improves handling and an easy set up. Also the light weight will not adversely affect the operation of the image capturing device 100.

The camera rig 400 comprises a mounting unit 450 connected to the rig 400 at a (cube) corner, which extends - when the rig 400 is connected to a stand, tripod or other base - at a diagonal angle relative to the horizontal orientation of the rig 400. This angle falls between the multiple capture (viewing) areas of the various image capturing devices 402-412-422-432-1 52 and 404-414-424-434-154 mounted in the rig.

As shown in the Figures 9A-9C the mounting unit 450 has an elongated body element 456, which connects via a support part 457 to the rig 400. An elongated body element 456 has a longitudinal axis has an angled orientation with respect to a horizontal plane (formed by a longitudinal axis and a transversal axis) of the camera rig 400. This is shown in Figures 9B and 9C.

The elongated body element 456 is provided with an inner bore or opening 455 for accommodating a mating element of a stand, tripod or other base. The inner bore 455 has a specific inner geometry which interacts (mates) with a mating element of the stand, tripod or other base having a corresponding outer geometry.

In Figure 9B the mounting unit 450 (the elongated body element 456) is directed under an angle a with respect to a (horizontal oriented) longitudinal axis of the camera rig 400. As shown in Figure 9B the mounting unit 450 is orientated in the same orientation as the longitudinal axis of the rig 400 but skewed under an angle a. As shown in Figure 9C the mounting unit 400 450 (the elongated body element 456) is also directed under an angle β with respect to a (also horizontal oriented) transversal axis (perpendicular to the longitudinal axis) of the camera rig 400. As shown in Figure 9C the mounting unit 450 is orientated in the same orientation as the transversal axis of the rig 400 but skewed under an angle β.

The result of this skewed orientation of the mounting unit 450 relative to the camera rig 400 is that there is an unobstructed view directly beneath the rig 400, effectively resulting in a full immersive spherical image, without a "hole" in the bottom of the stitched stereoscopic spherical image.

Concerning all possible functionalities of the image capturing device 100 described above (and for example with reference to the Figures 1 - 3 and 6 and Figures 8A-8D) it is to be noted that the embodiments described can be implemented in a combined configuration. Hence, configurations of the device 100 implementing any combination of the central power-controlling module 801 , the genlock module 81 1 , the monitoring module 821 , the capture- modes module, the central docking USB module 831 , the start/stop module, etc. etc. are envisaged and are considered embodiments of an apparatus or device 100 for providing stereoscopic image data for constructing a stereoscopic image of a scene according to the invention.

It also noted that all these configurations or embodiments concerning the image capturing device 100 can also be implemented on any type of a regular polygon camera rig, such as the pentagon embodiment shown in Figures 8A-8D or on a cube or (rectangular) cuboid shape camera rig of Figures 4A-4B and 9A-9C. A person skilled in the art will readily appreciate that various parameters disclosed in the description may be modified and that various embodiments disclosed and/or claimed may be combined without departing from the scope of the invention.

It is stipulated that the reference signs in the claims do not limit the scope of the claims, but are merely inserted to enhance the legibility of the claims.

Claims

1 . Method of providing stereoscopic image data for constructing a stereoscopic image of a scene, the method comprising:

- Receiving a first multitude of left images from at least one left image capturing device, the captured left images forming a substantially omnidirectional image data set representing the scene from a left point of view;

Receiving a second multitude of right captured images from at least one right image capturing device, the captured right images forming a substantially omnidirectional image data set representing the scene from a right point of view;

Mapping the left captured images in a left frame comprising left compound image data;

- Mapping the right images in a right frame comprising right compound image data; and

Communicating the left frame and the right frame.

2. Method according to claim 1 , wherein:

At least a part of the left captured images representing data from adjacent views of the scene are mapped in the left frame adjacent to one another; and

At least a part of the right captured images representing data from adjacent views of the scene are mapped in the right frame adjacent to one another.

3. Method according to any of the preceding claims, wherein:

The first multitude is six and the second multitude is six;

The left frame and the right frame are rectangular frames having a width and a height, the width being larger than the height;

Four left images are mapped over the width of the left frame at substantially the middle of the frame; One left image is mapped above the four left images mapped at the middle of the frame; and

One left image is mapped below the four left images mapped at the middle of the frame.

4. Method of constructing a stereoscopic view of a scene, the method comprising:

Receiving a left frame comprising data representing a substantially omnidirectional image data set representing the scene from a left point of view;

- Receiving a right frame comprising data representing a substantially omnidirectional image data set representing the scene from a right point of view;

Receiving virtual observer data comprising data on a virtual observer position relative to the scene;

- Based on the virtual observer position, determining left view data comprised by the left frame;

Based on the virtual observer position, determining right view data comprised by the right frame;

Providing the left view data and the right view data as stereoscopic view data of the scene to a display arrangement.

5. Method according to claim 4, further comprising:

Based on the data on the virtual observer position, defining a left observation point relative to the data comprised by the left frame and a right observation point relative to the data comprised by the right frame; and

- Based on the left observation point, determining the left view data comprised by the left frame; and

Based on the right observation point, determining the right view data comprised by the right frame.

6. Method according to claim 5, wherein: The data on the virtual observer position comprises a centre point of observation relative to the data comprised by the right frame and the data comprised by the left frame;

The left frame and the right frame have equal sizes;

- A first distance between the centre point and the left observation point is equal to a second distance between the centre point and the right observation point; and

the left observation point and the right observation point are positioned such that the centre point, the left observation point and the right observation point are positioned on one line.

7. Method according to claim 6, further comprising:

Mapping the data comprised by the right frame to spherical coordinates;

Mapping the data comprised by the left frame to the same spherical coordinates as to which the data comprised by the right frame is mapped;

Defining the centre point in the centre of the spherical coordinates;

wherein the data on the virtual observer position comprises direction data defining a viewing direction indicating a direction relative from the centre point towards image data mapped to spherical coordinates.

8. Method according to any of the claims 4 to 7, wherein the virtual observer data further comprises a viewing width angle and the right view data and the left view data are also determined based on the viewing width angle.

9. Method according to any of the claims 4 to 8, wherein the display arrangement comprises a left display module and a right display module, the method further comprising providing the left view data to the left display module and the right view data to the right display module.

10. Method according to any of the claims 4 to 8, further comprising: - processing the left view data for providing the left view data with first marker data; processing the right view data for providing the left view data with second marker data;

merging the processed left view data and the processed right view data in a single data set for display.

1 1 . Computer programme product comprising computer executable code enabling a computer programmed with the computer executable code to perform the method according to any of the claims 1 to 3.

12. Computer programme product comprising computer executable code enabling a computer programmed with the computer executable code to perform the method according to any of the claims 4 to 10.

13. Device for providing stereoscopic image data for constructing a stereoscopic image of a scene, the device comprising:

A data input module arranged to:

Receive a first multitude of left images from at least one left image capturing device, the captured left images forming a substantially omnidirectional image data set representing the scene from a left point of view;

Receive a second multitude of right captured images from at least one right image capturing device, the captured right images forming a substantially omnidirectional image data set representing the scene from a right point of view;

A processing unit arranged to:

Map the left captured images in a left frame comprising left compound image data;

- Map the right images in a right frame comprising right compound image data; and

A data communication module arranged to communicate the left frame and the right frame.

14. Device for constructing a stereoscopic view of a scene, the device comprising:

A data input module arranged to: Receive a left frame comprising data representing a substantially omnidirectional image data set representing the scene from a left point of view;

Receive a right frame comprising data representing a substantially omnidirectional image data set representing the scene from a right point of view;

Receive virtual observer data comprising a data on a virtual observer position relative to the scene;

A processing unit arranged to:

Based on the virtual observer position, determine left view data comprised by the left frame;

Based on the virtual observer position, determine right view data comprised by the right frame; and

A data communication module arranged to provide the left view data and the right view data as stereoscopic view data of the scene to a display arrangement.