WO2015115946A1

WO2015115946A1 - Methods for encoding and decoding three-dimensional video content

Info

Publication number: WO2015115946A1
Application number: PCT/SE2014/050118
Authority: WO
Inventors: Julien Michot
Original assignee: Telefonaktiebolaget L M Ericsson (Publ)
Priority date: 2014-01-30
Filing date: 2014-01-30
Publication date: 2015-08-06

Abstract

A method of encoding three-dimensional video content is disclosed. The method comprises encoding at least one view of the video content and at least part of a depth representation associated with the view (110), and defining a camera focal length for the encoded view(120). The method further comprises selecting at least one reference parameter for a focus plane of the encoded view (130), selecting at least one reference parameter for a camera f-Number for the encoded view (140), and transmitting the encoded at least one view and at least part of an associated depth representation, the defined focal length, the selected reference parameter for a focus plane of the encoded view and the selected reference parameter for a camera f- Number for the encoded view to a node (170). Further, a method for decoding three- dimensional video content, an encoder, a decoder,and a corresponding computer program product, are also disclosed.

Description

METHODS FOR ENCODING AND DECODING THREE-DIMENSIONAL VIDEO

CONTENT

Technical Field

The present invention relates to methods for encoding and decoding three- dimensional (3D) video content. The invention also relates to an encoder, a decoder and a computer program product configured to implement methods for encoding and decoding three-dimensional video content.

Background

Three-dimensional video technology continues to grow in popularity, and 3D technology capabilities in the entertainment and communications industries in particular have evolved rapidly in recent years.

3D technology provides an observer with an impression of depth in a compound image, causing parts of the image to appear to project out in front of a display screen, into what is known as observer space, while other parts of the image appear to project backwards into the space behind the screen, into what is known as CRT space. The term 3D is usually used to refer to a stereoscopic experience, in which an observer's eyes are provided with two slightly different images of a scene, which images are fused in the observer's brain to create the impression of depth. This effect is known as binocular parallax and provides an excellent 3D experience to a stationary observer, usually requiring the use of glasses or other filtering elements that enable the different images to be shown to the left and right eyes of an observer.

A new generation of "auto-stereoscopic" displays allows a user to experience three- dimensional video without glasses. These displays project slightly different views of a scene in different directions, as illustrated in Figure 1. A viewer located a suitable distance in front of the display will see slightly different pictures of the same scene in their left and right eyes, creating a perception of depth. In order to achieve smooth parallax and to enable a change of viewpoint as users move in front of the screen, a number of views (typically between 7 and 28 views) are generated. Auto-stereoscopic functionality is enabled by capturing or digitally generating a scene using many different cameras which observe a scene from different angles or viewpoints. These cameras generate what is known as multiview video.

Multiview video can be relatively efficiently encoded for transmission by exploiting both temporal and spatial similarities that exist in different views. However, even with multiview coding (MVC, MV-HEVC), the transmission cost for multiview video remains prohibitively high. To address this, current technologies only actually transmit a subset of key captured or generated multiple views, typically between 2 and 3 of the available views. To compensate for the missing information, depth or disparity maps are used to recreate the missing data. From the multiview video and depth/disparity information, virtual views can be generated at any arbitrary viewing position using view synthesis processes. These viewing positions are sometimes known as virtual cameras, and may be located between the transmitted key views (interpolated) or outside the range covered by the key views (extrapolated). In addition to the coding efficiency offered by view synthesis, the ability to generate views at more or less arbitrary positions means that the depth perception of a viewer may be changed or adjusted and depth perception may be matched to the size of the display screen on which the video will be shown. Many view synthesis techniques exist in the literature, depth image-based rendering (DIBR) being one of the most prominent.

A depth map, as used in DIBR, is simply a greyscale image of a scene in which each pixel indicates the distance between the corresponding pixel in a video object and the capturing camera optical centre. A disparity map is an intensity image conveying the apparent shift of a pixel which results from moving from one viewpoint to another. Depth and disparity are mathematically related, and the link between them can be appreciated by considering that the closer an object is to a capturing camera, the greater will be the apparent positional shift resulting from a change in viewpoint. A key advantage of depth and disparity maps is that they contain large smooth surfaces of constant grey levels, making them comparatively easy to compress for transmission using current video coding technology.

A 3D points cloud can be reconstructed from a depth map using the 3D camera parameters of the capturing camera. These parameters include the matrix K for a pinhole camera model, which contains the camera focal lengths, principal point, etc. The 3D points cloud can be reconstructed as follows: Q = d*(KR) ^"1 *q - R^"1 T (Equation 1 )

Where q is a 2D point (expressed in the camera coordinate frame, in homogeneous coordinates), d is the point's associated depth (measured by a sensor for example) and Q is the corresponding 3D point in a 3D coordinate frame. R is the camera orientation and T the camera translation. R and T are linked by the relation P = K * [R T].

A 3D point Q(Qx,Qy,Qz) may thus be projected onto an image at the 2D location q of homogeneous coordinates (qx,qy,1) by the equation: q*d = P * Q (Equation 2)

A depth map can be measured by specialized cameras, including structured-light or time-of-flight (ToF) cameras, where the depth is correlated with the deformation of a projected light pattern or with the round-trip time of a pulse of light. A principle limitation of these depth sensors is the depth range they can measure: objects that are too close to or too far away from the device will have no depth information.

It will be appreciated from the above discussion that in order to conduct DIBR view synthesis, a number of parameters need to be signalled to the device or programme module that performs the view synthesis. Among those parameters are "Znear" and "Zfar", which represent the closest and the farthest depth values in the depth maps for the video frame under consideration. These values are needed in order to map the quantized depth map samples to the real depth values that they represent.

Capturing camera parameters are also required to conduct DIBR view synthesis, and these parameters are usually divided into two groups. The first group is internal camera parameters, representing the optical characteristics of the camera for the image taken. This includes the focal length, the coordinates of the image's principal point and the lens distortions. The second group, or external camera parameters, represent the camera position and the direction of its optical axis in the chosen real world coordinates (conveying the position of the cameras relative to each other and to the objects in the scene). Both internal and external camera parameters are required in view synthesis processes based on usage of the depth information (such as DIBR). There exist standardized methods for sending Znear, Zfar and camera parameters to the decoding module performing view synthesis. One of these methods is defined in the multi-view video coding (MVC) standard, which is defined in the annex H of the well-known advanced video coding (AVC) standard, also known as H.264. The scope of MVC covers joint coding of stereo or multiple views representing the scene from several viewpoints. The MVC standard also covers sending the camera parameters information to the decoder. The camera parameters are sent as a supplementary enhancement information (SEI) message. The syntax of this SEI message is shown in Table 1 below:

Table 1 : Multiview acquisition information SEI message syntax multiview_acquisition_info( payloadSize ) { C Descriptor num_views_minus1 ue(v) intrinsic_param_flag 5 u(1) extrinsic_param_flag 5 u(1) if ( instrinsic_param_flag ) {

intrinsic_params_equal 5 u(1)

prec_focal_length 5 ue(v)

prec_principal_point 5 ue(v)

prec_skew_factor 5 ue(v) if( intrinsic_params_equal )

num_of_param_sets = 1

else

num_of_param_sets = num_views_minus1 + 1 for( i = 0; i < num_of_param_sets; i++ ) {

sign_focal_length_x[ i ] 5 u(1)

exponent_focal_length_x[ i ] 5 u(6)

mantissa_focal_length_x[ i ] 5 u(v)

sign_focal_length_y[ i ] 5 u(1)

exponent_focal_length_y[ i ] 5 u(6)

mantissa_focal_length_y[ i ] 5 u(v)

sign_principal_point_x[ i ] 5 u(1)

exponent_principal_point_x[ i ] 5 u(6)

mantissa_principal_point_x[ i ] 5 u(v)

sign_principal_point_y[ i ] 5 u(1)

exponent_principal_point_y[ i ] 5 u(6) mantissa_principal_point_y[ i ] 5 u(v)

sign_skew_factor[ i ] 5 u(1)

exponent_skew_factor[ i ] 5 u(6)

mantissa_skew_factor[ i ] 5 u(v)

}

if( extrinsic_param_flag ) {

prec_rotation_param 5 ue(v)

prec_translation_param 5 ue(v) for( i = 0; i <= num_views_minus1 ; i++) {

for ( j = 1 ; j <= 3; j++) { /* row

for ( k = 1 ; k <= 3; k++) { /* column 7

sign_r[ i ][ j ][ k ] 5 u(1)

exponent_r[ i ][j ][ k ] 5 u(6)

mantissa_r[ i ][ j ][ k ] 5 u(v)

}

sign_t[ i ][ j ] 5 u(1)

exponent_t[ i ][ j ] 5 u(6)

mantissa_t[ i ][ j ] 5 u(v)

}

The camera parameters in Table 1 are sent in floating point representation, which offers high precision as well as supporting a high dynamic range of parameters. Tables 2 and 3 show an example Sequence Parameter Set (SPS) message for sending Znear and Zfar values associated with a depth map:

Table 2: Sequence parameter set 3D-AVC extension syntax seq_parameter_set_3davc_extension( ) { C Descriptor if( NumDepthViews > 0 ) {

3dv_acquisition_idc 0 ue(v) for( i = 0; i < NumDepthViews; i++ )

view_id_3dv[ i ] 0 ue(v) if( 3dv_acquisition_idc ) {

depth_ranges( NumDepthViews, 2, 0 )

vsp_param( NumDepthViews, 2, 0 )

}

reduced_resolution_flag 0 u(1) slice_header_prediction_flag 0 u(1) seq_view_synthesis_flag 0 u(1) nonlinear_depth_representation_num 0 ue(v) for( i = 1 ; i <= nonlinear_depth_representation_num; i++ )

nonlinear_depth_representation_model[ i ] 0 ue(v)

}

alc_sps_enable_flag 0 u(1) enable_rle_skip_flag 0 u(1)

}

Table 3. Depth ranges syntax

nonlinear_depth_representation_model[ i ] specifies the piecewise linear segments for mapping of decoded luma sample values of depth views to a scale that is uniformly quantized in terms of disparity.

Variable Depthl_UT[ i] for i in the range of 0 to 255, inclusive, is specified as follows: depth_nonlinear_representation_model[ 0] = 0

depth_nonlinear_representation_model[depth_nonlinear_representation_num + for( k=0; k<= depth_nonlinear_representation_num_minus1 + 1; ++k ) {

posl = C 255 * k ) / (depth_nonlinear_representation_num_minus1 + 2 ) devl = depth_nonlinear_representation_model[ k ]

pos2 = C 255 * C k+1 ) ) / (depth_nonlinear_representation_num_minus1 + 2 ) ) dev2 = depth_nonlinear_representation_model[ k+1 ]

x1 = posl - devl

y1 = posl + devl

x2 = pos2 - dev2

y2 = pos2 + dev2

for (x = max( x1, 0 ); x <= min( x2, 255 ); ++x )

DepthLUT[x ] = Clip3( 0, 255, Round( ( ( x - x1 ) * ( y2 - y1 ) ) ÷ ( x2 - x1 ) + y1 ) )

}

When depth_representation_type is equal to 3, Depthl_UT[ dS ] for all decoded luma sample values dS of depth views in the range of 0 to 255, inclusive, represents disparity that is uniformly quantized into the range of 0 to 255, inclusive.

The present specification discusses the case of 1 D linear camera arrangement with cameras pointing at directions parallel to each other. The z axis and camera centers have the same x and y coordinates, with only the x coordinate changing from camera to camera. This is a common camera setup for stereoscopic and "3D multiview" video. The so-called "toed-in" or general case camera setup, in which the cameras are not aligned, can be converted to the 1 D linear camera setup by the rectification process. The distance between two cameras in stereo/3D setup is usually called the baseline (or the baseline distance). In a stereo camera setup, the baseline is often approximately equal to the distance between the human eyes (normally about 6.5 centimeters) in order to achieve natural depth perception when showing these left and right pictures on a stereo screen. Other baseline values may be chosen depending on the scene characteristics, camera parameters and the intended stereo effects.

The present specification refers to a baseline as the distance between the cameras for the left and the right views in the units of the external (extrinsic) camera coordinates. In the case of a stereo screen, the baseline is the distance between the virtual (or real) cameras used to obtain the views for the stereo-pair. In the case of a multi-view screen, the baseline is considered as the distance between two cameras that the left and the right eyes of a spectator see when watching the video on the multiview screen at the preferred viewing distance. In case of a multi-view screen, the views (cameras) seen by the left and the right eyes of the viewer may not be consecutive views. However, this information is known to the display manufacturer and can be used in the view synthesis process. The baseline is not therefore the distance between the two closest generated views, as it may be greater than this distance, depending upon the particular setup being used.

JCT-3V is an on-going standardization work where multiview texture (normal 2D videos) and depth maps are being compressed and transmitted using the MV-HEVC or 3 D-HEVC future codecs.

View synthesis techniques such as DIBR thus address many of the problems associated with providing multiview three-dimensional video content. However, view synthesis can encounter difficulties in rendering views in which part of the content is blurred.

The depth of field (DoF) of an image corresponds to the distance between the nearest D_N and farthest D_F objects in a scene that appear acceptably sharp. Acceptably sharp may be defined with reference to criteria relating to the capturing or display equipment, and may for example comprise all areas of an image where the extent of blur for an image point is less than the pixel diameter of the capturing or display equipment. DoF is thus defined as D_F - D_N. A small DoF corresponds to an image in which significant parts of the foreground and/or background image texture are blurred. Encompassed within the DoF, between D_F and D_N, is the focus plane s, which is the depth at which the content of the image is sharpest. Current techniques for synthesising views of an image in which part of the texture is blurred involve blurring the depth map with the same amount of blur as appears in the image texture, and then conducting a normal DIBR process. These techniques work relatively well when only the far background of the image is out of focus (i.e. outside the DoF). However, current techniques do not work properly for images having a small DoF, or blurred content close to the camera. Figures 2a and 2b illustrate synthesis results for a blurred image texture in which the colours (y axis) of object f and object e at different locations along the x axis have blurred into each other. As seen in Figure 2b, if the depth map is not blurred, a distinct color leap (case B) or repetitive color (case A) will appear in the synthesized image. As seen in Figure 2a, if the depth map is blurred, a wider (case A) or more constricted (case B) blur will be generated in the synthesized image, thus generating a non-natural effect DoF. In extreme rendering conditions, with a virtual camera located relatively far away from the real reference camera, in case A, the blur will act as a linear smoothing and in case B it will act as a sharp edge, resulting in a unwanted artifact. Summary

It is an aim of the present invention to provide a method, apparatus and computer program product which obviate or reduce at least one or more of the disadvantages mentioned above.

According to a first aspect of the present invention, there is provided a method of encoding three-dimensional video content, the method comprising encoding at least one view of the video content and at least part of a depth representation associated with the view, and defining a camera focal length for the encoded view. The method further comprises selecting at least one reference parameter for a focus plane of the encoded view, selecting at least one reference parameter for a camera f-Number for the encoded view, and transmitting the encoded at least one view and at least part of an associated depth representation, the defined focal length, the selected reference parameter for a focus plane of the encoded view and the selected reference parameter for a camera f-Number for the encoded view to a node. In some examples, the node may be a decoder.

In some examples, the depth representation maybe a depth or disparity map. In other examples, the depth representation may be a dense (comprising a matrix) or sparse (comprising sets) representation. The representation may be deduced from a 3D model or a previously reconstructed depth map projected onto the camera view or may be estimated from multiple camera views.

In some examples, defining a camera focal length may comprise extracting a focal length of a camera used to capture the encoded view. In other examples, defining a camera focal length may comprise defining a focal length of a virtual camera used to digitally generate the encoded view.

According to an embodiment of the invention, the reference parameter for a focus plane of the encoded view may comprise a location of a focus plane for the encoded view. In some examples, the location of the focus plane may comprise the actual location of the focus plane in captured video content. In other examples, the location of the focus plane may comprise a selected location for the focus plane for captured or digitally generated video content.

According to an embodiment of the invention, the reference parameter for a focus plane of the encoded view may comprise a look-up table index corresponding to a focus plane of the encoded view.

In some examples, more than one view of the video content may be encoded. In such examples, an identification of the view to which the look-up table index applies may be included as part of the reference parameter for a focus plane of the video content. According to an embodiment of the invention, the reference parameter for a focus plane of the encoded view may comprise a distance between a recording surface and an optical system of a camera for the encoded view. The distance may permit calculation of the focus plane location for the encoded view. According to an embodiment of the invention, the reference parameter for a focus plane of the encoded view may comprise the distances of the nearest and farthest objects in a scene of the encoded view that fulfil criteria for an acceptable focus level. The acceptable focus level may correspond to a measured focus level or to a desired focus level, for example in digitally generated content.

According to an embodiment of the invention, the reference parameter for a focus plane of the video content may comprise look-up table indexes corresponding to the distances of the nearest and farthest objects in a scene of the encoded view that fulfil criteria for an acceptable focus level.

The nearest and farthest in-focus depths may allow estimation of a location of the focus plane for the encoded view. In some examples, more than one view of the video content may be encoded. In such examples, an identification of the view to which the look-up table index applies may be included as part of the reference parameter for a focus plane of the video content. According to an embodiment of the invention, the reference parameter for a camera f- Number for the encoded view may comprise a camera f-Number for the encoded view.

In some examples, the camera f-Number may be a camera f-Number of a capturing camera of the video content. In other examples, the camera f-Number may be a selected camera f-Number for captured or digitally generated video content.

According to an embodiment of the invention, the reference parameter for a camera f- Number for the encoded view may comprise a camera aperture diameter. The camera aperture diameter may allow calculation of the camera f-Number.

According to an embodiment of the invention, transmitting the selected reference parameters may comprise transmitting at least one of the selected reference parameters in floating point representation.

According to another embodiment of the invention, transmitting the selected reference parameters may comprises transmitting at least one of the selected reference parameters in unsigned integer representation. According to an embodiment of the invention, the selected reference parameters for focus plane and camera f-Number may correspond to a first depth of focus, and the method may further comprise selecting at least one additional reference parameter for a focus plane of the encoded view, selecting at least one additional reference parameter for a camera f-Number for the encoded view, and transmitting the selected additional reference parameters with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.

According to an embodiment of the invention, the video content may comprise captured video content, and the selected reference parameters for focus plane and camera f- Number may correspond to an actual focus plane and camera f-Number of a capturing camera.

According to an embodiment of the invention, the selected reference parameters for focus plane and camera f-Number may correspond to a selected focus place and camera f-Number for one of a capturing camera or a virtual camera. According to an embodiment of the invention, the method may further comprise selecting a shutter speed for the video content and transmitting the shutter speed with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.

In some examples, the shutter speed may be the actual speed for captured video content. In other examples, the shutter speed may be a selected speed for captured or for digitally generated content. According to an embodiment of the invention, the method may further comprise selecting a shutter shape for the video content and transmitting a parameter representing the shutter shape with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation. In some examples, the shutter shape may be the actual shape for captured video content. In other examples, the shutter shape may be a selected shape for captured or for digitally generated content.

According to an embodiment of the invention, transmitting the selected reference parameters to the node may comprise including the selected reference parameters in a supplementary enhancement information, SEI, message.

According to an embodiment of the invention, transmitting the selected reference parameters to the node may comprise including the selected reference parameters in the multiview_acquisition_info SEI message.

According to another embodiment of the invention, transmitting the selected reference parameters to the node may comprise including the selected reference parameters in a dedicated SEI message.

According to another aspect of the present invention, there is provided a method for decoding three-dimensional video content, comprising receiving: at least one encoded view of the video content and at least part of a depth representation associated with the view, a camera focal length for the encoded view, at least one reference parameter for a focus plane of the encoded view, and at least one reference parameter for a camera f-Number for the encoded view. The method further comprises synthesising at least one view of the video content, dimensioning a blur filter according to the received focal length and reference parameters and applying the dimensioned blur filter to the synthesised view. In some examples, synthesising at least one view may comprise at least partially decoding the received encoded view

In some examples, the received reference parameters may comprise focus plane and f-Number values. In other examples, the received reference parameters may comprise other values permitting calculation of the focus plane and f-Number values, for example a distance between a recording surface and an optical system of a camera and a camera aperture diameter.

In some examples, the method may comprise receiving a plurality of reference parameters for focus plane and camera f-Number of the encoded view, and the method may further comprise selecting a reference parameter for a focus plane of the encoded view and a reference parameter for a camera f-Number of an encoded view. The reference parameters may be selected according to at least one of display or viewing conditions for the three-dimensional video content.

According to an embodiment of the invention, dimensioning a blur filter according to the received focal length and reference parameters may comprise at least one of: calculating a focus plane from the received reference parameter for a focus plane; and calculating a camera f-Number from the received reference parameter for a camera f- Number.

According to an embodiment of the invention, synthesising at least one view of the video content may comprise applying a Depth Image Based Rendering, DIBR, process to the encoded view of the video content. In some examples, an inpainting process may be conducted as part of the DIBR process, for example if the synthesized view has disocclusions.

According to an embodiment of the invention, the method may further comprise receiving a parameter representing shutter shape for the video content and applying the shutter shape outline corresponding to the received parameter to the dimensioned blur filter. According to another embodiment of the invention, the method may further comprise receiving a shutter speed for the video content and dimensioning a motion blur filter according to the received shutter speed. According to such an embodiment, applying the dimensioned blur filter to the synthesised view may comprise combining the dimensioned blur filter with the dimensioned motion blur filter and applying the combined blur filter to the synthesised view.

In some examples, dimensioning a motion blur filter may comprise calculating a direction and a length of motion blur. In some embodiments of the invention, motion and blur filters may be applied separately.

According to some embodiments, the method may further comprise estimating a motion blur direction from a motion model.

According to an embodiment of the invention, the received encoded view may include blur, and the method may further comprise sharpening the received encoded view according to the received focal length and reference parameters before synthesising the at least one view of the video content.

In some examples, sharpening may comprise applying a deblurring filter dimensioned according to the received focal length and parameters, for example using a Wiener deconvolution. According to an embodiment of the invention, the method may further comprise applying a sharpening process to a depth map of the received view before synthesising the at least one view of the video content. In some examples, the sharpening process may comprise applying at least one median filter to the depth map. According to another aspect of the present invention, there is provided a computer program product configured, when run on a computer, to execute a method according to any one of the preceding claims.

According to another aspect of the present invention, there is provided an encoder configured for encoding three-dimensional video content, the encoder comprising a processor and a memory, the memory containing instructions executable by the processor whereby the encoder is operative to encode at least one view of the video content and at least part of a depth representation associated with the view, define a camera focal length for the encoded view, select at least one reference parameter for a focus plane of the encoded view, select at least one reference parameter for a camera f-Number for the encoded view and transmit the encoded at least one view and at least part of an associated depth representation, the defined focal length, the selected reference parameter for a focus plane of the encoded view and the selected reference parameter for a camera f-Number for the encoded view to a node. The node may in some examples be a decoder.

According to an embodiment of the invention, the encoder may be further operative to select a reference parameter for a focus plane of the encoded view which comprises a location of a focus plane for the encoded view. According to another embodiment of the invention, the encoder may be further operative to select a reference parameter for a focus plane of the encoded view which comprises a look-up table index corresponding to a focus plane of the encoded view.

According to another embodiment of the invention, the encoder may be further operative to select a reference parameter for a camera f-Number for the encoded view which comprises a camera f-Number for the encoded view.

According to another embodiment of the invention, the encoder may be further operative to transmit at least one of the selected reference parameters in floating point representation.

According to an embodiment of the invention, the encoder may be further operative to select a shutter speed for the video content and transmit the shutter speed with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.

According to another aspect of the present invention, there is provided a decoder, the decoder comprising a processor and a memory, the memory containing instructions executable by the processor whereby the decoder is operative to receive: at least one encoded view of the video content and at least part of a depth representation associated with the view, a camera focal length for the encoded view, at least one reference parameter for a focus plane of the encoded view and at least one reference parameter for a camera f-Number for the encoded view. The decoder is further operative to synthesise at least one view of the video content, dimension a blur filter according to the received focal length and reference parameters and apply the dimensioned blur filter to the synthesised view.

According to an embodiment of the invention, the decoder may be further operative to receive a shutter speed for the video content and dimension a motion blur filter according to the received shutter speed. According to this embodiment, applying the dimensioned blur filter to the synthesised view may comprise combining the dimensioned blur filter with the dimensioned motion blur filter and applying the combined blur filter to the synthesised view.

Brief description of the drawings

For a better understanding of the present invention, and to show more clearly how it may be carried into effect, reference will now be made, by way of example, to the following drawings in which: Figure 1 is a representation of multiview 3D display;

Figures 2a and 2b are graphs illustrating the result of view synthesis with blurred texture and depth map (Figure 2a) and with blurred texture only (Figure 2b); Figure 3 is a flow chart illustrating process steps in a method for encoding three- dimensional video content;

Figure 4 is a flow chart illustrating additional steps that may be conducted as part of the method of Figure 3.

Figure 5 is a block diagram illustrating functional units in an encoder;

Figure 6 is a block diagram illustrating another embodiment of encoder; Figure 7 is a flow chart illustrating process steps in a method for decoding three- dimensional video content; Figure 8 is a block diagram illustrating functional units in a decoder;

Figure 9 is a flow chart illustrating additional steps that may be conducted as part of the method of Figure 7;

Figure 10 is a graph illustrating motion blur path estimation;

Figure 1 1 is a graph illustrating the result of view synthesis with all in focus texture and depth maps.

Figure 12 is a block diagram illustrating another embodiment of decoder; and Figure 13 is a block diagram illustrating another embodiment of decoder.

Detailed Description

Aspects of the present invention address the issues of synthesising views of three- dimensional video content containing blur by transmitting to the decoder information allowing for the dimensioning of a blur filter, which may be applied as a post processing step following view synthesis. The transmitted parameters allow for the creation of a filter that generates a desired amount of blur in desired regions of the image. For original video content containing blur, the transmitted parameters may allow for the accurate recreation of the original blur in synthesised views. For original video content that is all in focus or has been deblurred, the transmitted parameters may be selected by a creator of the content to ensure that synthesised views are blurred according to the creator's intensions. Aspects of the present invention may thus allow a creator or supplier of three-dimensional video content to maintain control over how the content is displayed.

Figure 3 illustrates steps in a method 100 for encoding three-dimensional video content according to an embodiment of the present invention. In a first step 1 10, the method comprises encoding at least one view of the video content and at least part of a depth representation associated with the view. The method then comprises, at step 120, defining a camera focal length for the encoded view. The method further comprises, at step 130, selecting at least one reference parameter for a focus plane of the encoded view, and at step 140 selecting at least one reference parameter for a camera f- Number for the encoded view. Finally, at step 170, the method comprises transmitting the encoded at least one view and at least part of an associated depth representation, the defined focal length, the selected reference parameter for a focus plane of the encoded view and the selected reference parameter for a camera f-Number for the encoded view to a node.

Step 1 10 of the method comprises encoding one or more views of the three- dimensional video content. The at least part of a depth representation associated with the view, and which is encoded with the view, may for example be a depth map for the view or may take other forms. At least part of the depth representation is encoded with the view to enable view synthesis to be conducted at a decoder.

Step 120 of the method may comprise extracting the camera focal length of a capturing camera for the encoded view, or may comprise defining a focal length for the encoded view. For example, in the case of digitally generated three-dimensional video content, there is no physical capturing camera, and the focal length of a virtual capturing camera may be defined according to the requirements of the creator of the content. Step 130 of the method comprises selecting at least one reference parameter for a focus plane of the encoded view. The reference parameter may take a range of different forms according to different embodiments of the invention. In one embodiment the reference parameter for a focus place of the encoded view may comprise the location of the focus plane s. The units of s may be the same as the units used for the external camera parameters; baseline, Znear and Zfar. However in some circumstances it may be more practical to send the value of s in standard length units of metres or feet.

In another embodiment, the reference parameter for a focus plane of the encoded view may comprise a LUT (Look-up table) index for the focus plane in the encoded view. According to this embodiment, the value for the focus plane s may then be determined by performing the LUT operation: s = DepthLUT[received index value]. As discussed above, in some examples of the invention, more than one view may be encoded for transmission. In such cases, a view identifier may also be transmitted as part of the reference parameter, indicating which view the LUT index applies to. In another embodiment, the reference parameter for a focus plane of the encoded view may comprise a distance between a recording surface and an optical system of a camera for the encoded view. In the case of a charge coupled device, this distance is known as the CCD distance g. A corresponding distance may be defined for other camera types. The focus plane s may be calculated from the CCD distance g using the equation:

1/f = 1/g + 1/s (Equation 3) Similar equations may be developed for equivalent distances in other camera types. The CCD distance g or equivalent may be sent in the same units as are used for the Znear and Zfar depth values. Alternatively, the CCD distance g or equivalent may be sent in units of millimetres or inches. It is useful to ensure that the parameter that is sent for the distance is measured in the same units as the actual distance.

In another embodiment, the reference parameter for a focus plane of the encoded view may comprise the depths D_N and D_F of the nearest and farthest objects in a scene that appear acceptably sharp. Acceptably sharp may be defined with reference to the capturing or display conditions. The focus plane s may be approximated from D_N and D_F using the equations: s = D_N * H / (H - D_N) (Equation 4) or s = D_F * H / (H + D_F) (Equation 5)

In another embodiment, the reference parameter for a focus plane of the encoded view may comprise LUT (Look-up table) indexes for D_N and D_F in the encoded view. In the case of multiple encoded views, a view identifier may also be included.

Step 140 of the method comprises selecting at least one reference parameter for a camera f-Number for the encoded view. In one embodiment, the reference parameter for the camera f-Number for the encoded view may be the camera f-Number. In another embodiment, the reference parameter for the camera f-Number may be a camera aperture diameter d. The camera f-Number N may then be calculated using the following equation:

N = f / d (Equation 6) where f is the camera focal length. The unit of measurement of the camera aperture diameter can be the same as the units of the focus plane. However it may be more practical to send the aperture diameter in units of millimeters or inches. It is useful to ensure that the aperture diameter as selected and transmitted is measured in the same units as the actual aperture diameter.

Step 170 of the method comprises transmitting the encoded at least one view and at least part of an associated depth representation, focal length and selected parameters to a node, which may for example be a decoder.

Transmission of the selected reference parameters may be conducted in a number of different formats according to different embodiments of the invention. In one embodiment, the selected reference parameters are transmitted in a dedicated SEI message. In another embodiment, the selected reference parameters are transmitted as part of the standard multiview_acquisition_info SEI message. In other embodiments, the selected parameters may be sent as part of other SEI messages or any other message signalling to a decoder. The parameters may be sent in floating point representation, in unsigned integer representation or in another appropriate format.

It will be appreciated that various combinations of the above discussed embodiments may be contemplated. The following examples illustrate different transmission methods for different combinations of reference parameters. Example 1

Focus plane s and camera f-Number N are transmitted in a dedicated SEI message in floating point representation (in the same format as is used in sending camera parameters in the multiview_acquisition_info message in MVC): focus_info( payloadSize ) { Descriptor prec_focus_plane 5 ue(v) prec_aperture_f_number 5 ue(v) exponent_focus_plane 5 u(6) mantissa_focus_plane 5 u(v) exponent_aperture_f_number 5 u(6) mantissa_aperture_f_number 5 u(v)

}

Example 2

Camera f-Number is transmitted using floating point representation (in the same format that is used in sending camera parameters in the multiview_acquisition_info message in MVC). The reference parameter for the focus plane is the LUT index focus_plane_depth corresponding to the focus plane of an encoded view. Multiple views are encoded for transmission so the reference parameter for focus plane also comprises a view identifier focus_plane_view_id to indicate which view the LUT index applies to:

The focus plane may then being found at the decoder by the operation s = DepthLUT[focus_plane_depth], where DepthLUT is computed for the view identified by focus_plane_view_id.

In an alternative version of Example 3, only a single view may be sent, in which case there is no advantage to sending a view identifier as the index may be assumed to apply to the single view that is transmitted. In this case, the look up table index may be added directly into the multiview_acquisition_info message, in the loop "for( i = 0; i <= num_views_minus1 ; i++) { focus_plane_depth [i]}".

Example 3 Focus plane s is sent in floating point representation and the camera aperture diameter d is sent in floating point representation:

The aperture f-Number N may then be calculated at the decoder using equation 6 above.

Example 4

Focus plane s is sent in floating point representation and the camera aperture diameter is sent in unsigned integer representation:

Example 5

Focus plane s and camera aperture f-Number N are sent in floating point representation in the multiview_aquisition_info message: multiview_acquisition_info( payloadSize ) { C Descriptor num_views_minus1 ue(v) intrinsic_param_flag 5 u(1) extrinsic_param_flag 5 u(1) aperture_f_number_flag 5 u(1) focus_plane_flag 5 u(1) if ( instrinsic_param_flag ) {

intrinsic_params_equal 5 u(1) prec_focal_length 5 ue(v) prec_principal_point 5 ue(v) prec_skew_factor 5 ue(v) if( intrinsic_params_equal )

num_of_param_sets = 1

else

num_of_param_sets = num_views_minus1 + 1

for( i = 0; i < num_of_param_sets; i++ ) {

sign_focal_length_x[ i ] 5 u(1) exponent_focal_length_x[ i ] 5 u(6) mantissa_focal_length_x[ i ] 5 u(v) sign_focal_length_y[ i ] 5 u(1) exponent_focal_length_y[ i ] 5 u(6) mantissa_focal_length_y[ i ] 5 u(v) sign_principal_point_x[ i ] 5 u(1) exponent_principal_point_x[ i ] 5 u(6) mantissa_principal_point_x[ i ] 5 u(v) sign_principal_point_y[ i ] 5 u(1) exponent_principal_point_y[ i ] 5 u(6) mantissa_principal_point_y[ i ] 5 u(v) sign_skew_factor[ i ] 5 u(1) exponent_skew_factor[ i ] 5 u(6) mantissa_skew_factor[ i ] 5 u(v)

}

if( extrinsic_param_flag ) {

prec_rotation_param 5 ue(v) prec_translation_param 5 ue(v) for( i = 0; i <= num_views_minus1 ; i++) {

for ( j = 1 ; j <= 3; j++) { /* row

for ( k = 1 ; k <= 3; k++) { /* column 7

sign_r[ i ][ j ][ k ] 5 u(1) exponent_r[ i ][j ][ k ] 5 u(6)

mantissa_r[ i ][ j ][ k ] 5 u(v)

}

sign_t[ i ][ j ] 5 u(1) exponent_t[ i ][ j ] 5 u(6) mantissa_t[ i ][ j ] 5 u(v)

}

if(aperture_f_number_flag ) {

prec_aperture_f_number 5 ue(v) exponent_aperture_f_number 5 u(6) mantissa_aperture_f_number 5 u(v)

}

if(focus_plane_flag ) {

prec_focus_plane 5 ue(v) exponent_focus_plane 5 u(6) mantissa_focus_plane 5 u(v)

}

Example 6 The camera f-Number reference parameter is sent using any of the above discussed methods and D_N and D_F are sent using floating point representation: focus_plane _info( payloadSize ) { C Descriptor prec_focus_plane 5 ue(v) prec_aperture_f_number 5 ue(v) exponent_focus_plane 5 u(6) mantissa_focus_plane 5 u(v) exponent_aperture_f_number 5 u(6) mantissa_aperture_f_number 5 u(v) exponent_nearest_plane_focused 5 u(6) mantissa_nearest_plane_focused 5 u(v) exponent_farthest_plane_focused 5 u(6) mantissa_farthest_plane_focused 5 u(v)

}

Or using unsigned integer representation:

Example 7

CCD distance g is sent in floating point representation and camera f-Number is sent in floating point representation:

It will be appreciated that the above discussed examples are merely illustrative of possible combinations of reference parameters and transmission methods. Other combinations may be envisaged. In further examples, more than one pair of reference parameters for focus plane and camera f-Number may be selected and transmitted. A single reference parameter for each of focus plane and f-Number corresponds to a particular Depth of Focus in a synthesised view. In some examples, it may be desirable to send parameters corresponding to a range of different Depths of Field in a single SEI message. A decision may then be made on the decoding side as to which Depth of Field, and hence which pair of reference parameters, is most appropriate for the display and viewing conditions:

The selected reference parameters may correspond to actual values for the focus plane and camera f-Number in the encoded view, for example in the case of captured three-dimensional video content. In such cases the focus plane may be provided, for example by an autofocus system or by a user. Alternatively, the focus plane may be estimated using a software-based autofocus technique (which may involve estimating the amount of blur and taking the depth of the sharpest areas as the focus plane). In such cases the aperture f-number may also be provided by the lens, by the camera or by the user. It is generally more complicated to estimate the camera f-Number using only the image. In other examples, the reference parameters may be selected for the view in order to create a desired blur effect. This may be the case for example in fully digitally generated content but may also be the case for captured video content. For example, when video content is captured all in focus but it is desired for artistic or other reasons to introduce blur, parameters may be selected to ensure generation of blur to the extent and in the regions of the image required. Figure 4 illustrates possible additional steps in the method 100, which may be conducted during the encoding of three-dimensional video content. With reference to Figure 4, the method 100 may also comprise a step 150 of selecting a shutter speed for the video content, and step 160 of selecting a shutter shape for video content. The selected shutter speed and a parameter representing the selected shutter shape may then be transmitted with the selected reference parameters, focal length and encoded view and depth representation in a modified step 170B.

Camera shutter speed (or exposure time) may be used at the decoder side to create camera motion blur. Detail of this procedure is discussed below with reference to Figures 9 and 10. The selected shutter speed may be the actual shutter speed of a capturing camera for the video content, or may be a selected shutter speed for example of a virtual camera applied to digitally generated content. The motion blur created using the transmitted shutter speed may thus recreate actual motion blur or introduce a desired motion blur. In one example, shutter speed may be sent using floating point representation in an SEI message:

The unit of the shutter speed may be seconds, milliseconds of other time units.

In another example, the inverse of the shutter speed may be sent using unsigned integer representation:

The unit of this inverse shutter speed may be 1 /seconds.

Camera shutter shape (or Bokeh shape) may be used at the decoding side in the dimensioning of a blur filter to accurately represent blur from a capturing or virtual camera. The shape corresponds to the actual shape of a camera shutter and may be a perfect disk, triangle, square, hexagon, octagon, etc.

In one example, shutter shape may be included with focus plane s and camera f- Number N as follows:

The parameter shutter_shape_type has no unit but may correspond to a predetermined set of values, for example:

0 for SHUTTER_SHAPE_DISK,

1 for SHUTTER_SHAPE_TRIANGLE

2 for SHUTTER_SHAPE_SQUARE

3 for SHUTTER_SHAPE_HEXAGON etc.

The value of shutter_shape_type defines the shape (outline) of a blur filter to be used in the blurring process at the decoder side, as discussed below. In one example, three bits may be used to signal shutter shape, supporting up to 8 different shutter shapes. Other examples using fewer or more bits may be envisaged.

Figure 5 illustrates functional units of an encoder 200 in accordance with an embodiment of the invention. The encoder 200 may execute the steps of the method 100, for example according to computer readable instructions received from a computer program.

With reference to Figure 5, the encoder 200 comprises an encoding unit 210, a selection unit 230 and a transmission unit 270. It will be understood that the units of the apparatus are functional units, and may be realised in any appropriate combination of hardware and/or software.

The encoding unit 210 is configured to encode at least one view of the video content and at least part of a depth representation associated with the view, for example a depth map. The selection unit 230 is configured to define a camera focal length for the encoded view, select at least one reference parameter for a focus plane of the encoded view and select at least one reference parameter for a camera f-Number for the encoded view. The transmission unit 270 is configured to transmit the encoded at least one view and at least part of an associated depth representation, focal length and selected reference parameters to a node. In some embodiments, the selection unit 230 may also be configured to select a shutter shape and shutter speed for the encoded view, and the transmission unit 270 may be configured to transmit the shutter speed and a parameter representing the shutter shape with the encoded view and other elements.

Figure 6 illustrates another embodiment of encoder 300. The encoder 300 comprises a processor 380 and a memory 390. The memory 390 contains instructions executable by the processor 380 such that the encoder 300 is operative to conduct the steps of the methods of Figures 3 and 4 described above.

Figure 7 illustrates steps in a method 400 for decoding three-dimensional video content in accordance with an embodiment of the present invention. In a first step 410 the method comprises receiving at least one encoded view of the video content and at least part of a depth representation associated with the view, a camera focal length for the encoded view, at least one reference parameter for a focus plane of the encoded view, and at least one reference parameter for a camera f-Number for the encoded view. The method then comprises, at step 430, synthesising at least one view of the video content. At step 440, the method comprises dimensioning a blur filter according to the received focal length and reference parameters and finally at step 450, the method comprises applying the dimensioned blur filter to the synthesised view.

The encoded view and depth representation, focal length and reference parameters may be received at step 410 from an encoder operating according to the methods of Figures 3 and 4. Synthesising at least one view of the video content at step 430 may comprise at least partially decoding the received view or views and associated depth representation, and then running a view synthesis process, such as DIBR to synthesise the required view or views.

Step 440 of the method then comprises dimensioning a blur filter according to the received reference parameters, and step 450 comprises applying the dimensioned blur filter to the synthesised view. As discussed above, the Depth of Field of an image in a view corresponds to the distance between the nearest D_N and farthest D_F objects in a scene that appear acceptably sharp. Outside the Depth of Field, an image is blurred to a greater or lesser extent. Dimensioning a blur filter using received reference parameters, and applying the dimensioned blur filter to a synthesized view, allows the creation of blur in parts of the synthesized view to create a Depth of Field in the synthesized view that corresponds to the received parameters. The received parameters may correspond to capture conditions, resulting in a Depth of Field may in the synthesized view that matches the Depth of Field in the original captured views. Alternatively, the received parameters may have been selected to ensure the creation of a particular Depth of Field desired by a creator of the original content. The process of dimensioning a blur filter according to step 440 of the method, and applying the blur filter according to step 450, is discussed below.

and can be approximately calculated with:

2Hs²

DDF af , ^" for s < H (Equation 7)

IP -

Where s is the focus plane and H is the hyperfocal of the lens. The hyperfocal is defined as: (Equation 8)

Nc

Where f is the focal length, N the aperture f-number and c, the circle of confusion, c is typically considered to be equal to the pixel size of the optical sensor. As discussed above, the camera f-Number N is defined as: N = f / d (Equation 9)

Where f is the focal length and d is the aperture diameter. For a certain image depth, the amount of blur in an image is characterized by a blur diameter b, such that a pixel in the image appears as a blur disc of diameter b.

If a subject is at distance s and the foreground or background is at distance D, the distance x_d between the subject and the foreground or background is indicated by

10

¾ = |D—

(Equation 10)

The blur disk diameter b of a detail at distance x_d from the subject can be expressed as a function of the subject magnification m_s according to: b =— = dm,— (Equation 1 1)

V s ± ?¾ U

The minus sign applies to a foreground object, and the plus sign applies to a background object and where m_s is defined as:

on

f

/».·,. -— (Equation 12)

When a depth map Z associated to an all-in-focus image I is provided, all depth values (D) are known. A blur filter F(Z) may then be applied to each pixel of the image I by the following convolution:

J = I * F(D) (Equation 13)

The diameter b of the filter is calculated for each pixel according to the pixel's depth within the image. The resulting output image J has a Depth of Focus according to the parameters used to dimension the blur filter F(Z). The blur filter may be a Gaussian filter or may have a specific shape (such as a disk, triangle, square, hexagon etc.) in order to mimic a specific aperture shape. It can be seen from the above discussion that a blur filter can be dimensioned using Equation 11 from the focal length f, focus plane s and camera f-Number N. The focal length f and reference parameters for the focus plane s and f-Number N are received in step 410 of the method of Figure 7, and used in step 440 to dimension the blur filter which is then applied to the synthesised view in step 450. The resulting Depth of Field in the synthesised view corresponds to the received parameters, thus ensuring that the synthesised view appears as desired by the creator of the original content. The depth of field in the synthesized view may match that of the original content, or may be imposed on the content. If multiple pairs of parameters are received, the most appropriate pairs for the display and viewing conditions may be selected, ensuring that the synthesised view appears as best suited to the display and viewing conditions.

Figure 8 illustrates functional units of a decoder 600 in accordance with an embodiment of the invention. The decoder 600 may execute the steps of the method 400, for example according to computer readable instructions received from a computer program.

With reference to Figure 8, the decoder 600 comprises a receiving unit 610, a synthesis unit 630 and a filter unit 660. The filter unit comprises a dimensioning sub unit 640 and an application sub unit 650. It will be understood that the units of the apparatus are functional units, and may be realised in any appropriate combination of hardware and/or software.

The receiving unit 610 is configured to receive at least one encoded view of the video content and at least part of a depth representation associated with the view, a camera focal length for the encoded view, at least one reference parameter for a focus plane of the encoded view, and at least one reference parameter for a camera f-Number for the encoded view. The synthesis unit 630 is configured to synthesise at least one view of the video content. The dimensioning sub unit 640 of the filter unit 660 is configured to dimension a blur filter according to the received focal length and reference parameters. The application sub unit 650 of the filter unit 660 is configured to apply the dimensioned blur filter to the synthesised view.

Figure 9 illustrates additional steps that may be conducted as part of the method 400 for decoding three-dimensional video content. The method may for example be conducted in a decoder such as the decoders 700, 800 illustrated in Figures 12 and 13 and discussed in further detail below. With reference to Figure 9, in a first step 510, the decoder receives at least one encoded view of the video content and associated depth representation, as well as camera focal length and reference parameters for focus plane and camera f-Number for the encoded view. The decoder may also receive a camera shutter speed and a parameter representative of shutter shape.

In step 512, the decoder checks whether multiple pairs of reference parameters for focus plane and f-Number have been received. Multiple pairs may be sent by an encoder if different Depth of Field options are available. If multiple pairs of reference parameters have been received (Yes at step 512), the decoder then proceeds to select the most appropriate pair for the display and viewing conditions at step 514. This selection step may be automated or may be made with user input.

In some examples (not illustrated), the decoder may adjust a received or selected reference parameter according to viewing conditions or to known user requirements.

Having selected the appropriate pair of reference parameters, or if only one parameter for each of focus plane and f-Number have been received, the decoder proceeds to check, in step 516, whether the reference parameters received are the focus plane location and camera f-Number. If the reference parameters are the focus plane location and camera f-Number, then the decoder proceeds directly to step 520. If this is not the case, (No at step 516), the decoder proceeds to calculate either or both of the focus plane location and/or camera f-number from the received parameters at step 518. This may involve performing a LUT operation or a calculation, as discussed in further detail above. With the focus plane location and camera f-Number available, the decoder then proceeds to check, at step 520, whether blur is present in the image texture in the received encoded view.

As discussed above, the methods described herein may be used to recreate blur that was present in originally captured video content, or to impose blur onto all in focus video content, which content may have been physically captured or digitally generated. In the case of original video content containing blur, it can be advantageous to sharpen the received view before synthesising views. If the received view contains blur (Yes at step 520), the decoder therefore proceeds to step 522, in which the received reference parameters are used to sharpen the received view. In one example, this sharpening process comprises using the received reference parameters to calculate the blur diameters for pixels in the view. From these diameters, an approximation of the point spread function for each pixel can be generated, allowing the application of a deblurring filter to the received view. The deblurring filter may for example be a Wiener deconvolution. In some examples, motion blur parameters may also be calculated in order to sharpen the image for motion blur. Motion blur is discussed in further detail below.

Having sharpened the image texture, or if the received texture is fully in focus, the decoder then proceeds to check, in step 524, whether blur is present in the depth representation received, for example in a received depth map. If blur is present in the received depth map or other depth representation (Yes at step 524), the decoder sharpens the depth map at step 526. This may involve applying a plurality of median filters to the original depth map in order to remove smoothed edges from the depth map. Other methods for sharpening the depth map may be considered.

Once a sharp depth map and image texture are available, the decoder proceeds to synthesise at least one view of the video content at step 530. This may for example comprise running a DIBR process. Once the at least one view has been synthesised, the decoder proceeds to step 540 in which a blur filter is dimensioned according to the received focal length and reference parameters. This dimensioning process is described in detail above with respect to Figure 7. If a parameter representing shutter shape has been received then this may also be used in the dimensioning of the blur filter. The decoder then proceeds, in steps 542, 544 and 546 to address motion blur. Motion blur is described in greater detail below but in brief, the decoder assesses whether camera parameters for a frame t and frame t-1 are available in step 542. If such parameters are not available, (No at step 542), the decoder proceeds to estimate a motion blur direction from a motion model at step 544. Using either the estimation or the available camera parameters, the decoder then dimensions a motion blur filter at step 546, calculating motion blur direction and length. In step 548 the decoder combines the motion blur filter and dimensioned blur filter from steps 546 and 540 before, in step 550, applying the combined blur filter to the synthesised view. Referring again to steps 542 to 546 of Figure 9, motion blur can occur in captured video content, for example when a camera is moving or zooming rapidly. Transmitting the camera shutter speed as discussed with reference to Figure 4 allows the recreation of camera motion blur in synthesised views, or the imposition of camera motion blur where it is desired to introduce this blur onto images where it is not already present. When a camera shutter speed is received at the decoder, indicating motion blur may be applied, a first step is to determine the direction of the camera motion blur. By taking the depth map of frame t and projecting it on the camera frame t and t-1 , thus using different camera parameters, it is possible to determine the path that each pixel is taking. As illustrated in Figure 10, using Equation 2 (with Z_s = d and q' and q^t_1 expressed in homogeneous coordinates with the last coordinate being 1): _qt-i _ pt-1 * Qt-1 / _Zst-1 _{and q}t = pt * _Qt _Zgt

For a static scene, or slowly moving content, the 3D point Q^t_1 may be approximated to be equal to Q' (and Z_s'^"1 = Z_s'). This approximation allows an approximation of the path v of the current pixel q' generated by the camera motion over time. V being equal to :

V = q'^"1 - q* The camera projection matrices may be computed from the camera parameters such as translation_param and orientation_param given in the multiview_acquisition_info SEI message illustrated in Table 1. If the camera projections matrices P^t_1 and P^l are found to be identical, then no motion blur is present. If the camera parameters are not available for the frame t and/or t-1 , a motion model may be used in order to predict the missing camera parameters. One example involves using constant speed models, for instance:

Τ' = Τ'^"Χ + V'^"x / fps,

R* = R'^"x* AngleAxisToRotationMatrix (W'^"x/fps),

t-X being the last known camera parameter data, rotation speed and focal length speed (zoom). V is the estimated translational velocity vector (m/s if T is in meters and 1/fps in seconds). W(x) is the estimated angular velocity vector (expressed here as an angle axis where the norm corresponds to the angular speed in rad/s). VF is the estimated focal length speed (px/s if f is in pixel and 1/fps in second). Other motion models may be envisaged.

With the motion blur direction established, the amount of motion blur may be calculated. Having received the shutter speed ss (which, as noted may be an actual shutter speed for the captured content or may be a selected shutter speed) and by retrieving the video framerate fps, and depth map for the received video view, it is possible to calculate the Point Spread Function (PSF) approximation for each pixel and synthesize a motion blur due to the camera motion. The direction and length of the motion blur (per pixel) D_moti₀n to apply can be defined using the following formula:

Dmotion = ss * fps * v (Equation 14)

If the shutter speed equals 1/fps, than the pixel will be spread over the whole path v, otherwise just a part of it, as illustrated in Figure 10.

With the direction and length of the motion blur established, the dimensioned motion blur filter may be applied, for example according to the following equation:

/[m, n] (Equation 15)

Where J is the blurred image, I the original image, N the number of local pixel used (typically D_mation 's norm) and x(i) and y(i) are respectively the x and y components of the discrete segment D_mation starting from q':

[x(i),y(i)] = q^{t +} (i/(N-1)) * D_motion L(t) is the intensity model of an image pixel and can be modelled by

(Equation 16)

Where d is the effective aperture diameter, L_max is the maximum intensity (typically 255 for each image channel), and k and p are two constants. The above description provides one example of the application of motion blur, other equations may be used, for instance, a 2D kernel K of size NxM may be constructed from Dmotion With the following formula:

K[k,l] = 1 , if it exists a i such that [x(i),y(i)] - q' + [N/2.M/2] = [k,l] and K[k,l] = 0 otherwise.

The image I may then be blurred with the following equation:

10

With S, being the sum of all the kernel weights:

N-l M-l

Σ £=0 Σ *

'=0

It will be appreciated that the dimensioned blur and motion blur filters may be applied consecutively or may be combined before application. It may be that better final results are obtained using a combined motion blur and blur filter.

Figure 11 illustrates synthesis results using the method of Figure 9. An all in focus texture and depth map is received or obtained through sharpening procedures before view synthesis and blurring are applied. In case A, where a disocclusion is present, an inpainter is used in order to fill the disoccluded area and then the blur filter is applied. In case B, no inpainter is required, and the blur filter is applied after DIBR. It can be seen from a comparison of Figures 2 and 1 1 that using the method of Figure 9, it is possible to recreate the exact blur characteristics in a synthesized image as were observed in the captured image. The synthesized image is thus a correct match for the captured image, providing a smooth and realistic three-dimensional experience for a viewer. Figure 12 illustrates functional units of another embodiment of decoder 700 in accordance with an embodiment of the invention. The decoder 700 may execute the steps of the method 500, for example according to computer readable instructions received from a computer program.

With reference to Figure 12, the decoder 700 comprises an receiving unit 710, analysis and calculation unit 715, sharpening unit 720, synthesis unit 730 and filter unit 740. The filter unit 740 comprises a blur dimensioning sub unit 740, a motion dimensioning sub unit 740, a combining sub unit 748 and an application sub unit 750. It will be understood that the units of the apparatus are functional units, and may be realised in any appropriate combination of hardware and/or software.

The receiving unit 710 is configured to receive at least one encoded view of the video content and at least part of a depth representation associated with the view, a camera focal length for the encoded view, at least one reference parameter for a focus plane of the encoded view, and at least one reference parameter for a camera f-Number for the encoded view. The receiving unit is also configured to receive a shutter speed and a parameter for shutter shape. The analysis and calculation unit 715 is configured to conduct steps 512 to 518 of the method of Figure 9, checking for multiple pairs of reference parameters and selecting and appropriate pair, and calculating the focus plane and f-Number from the respective reference parameters, if such calculation is necessary. The sharpening unit 720 is configured to check for blur in the received image texture and depth representation, and to sharpen the image texture and/or depth representation if blur is present. The synthesis unit 730 is configured to synthesise at least one view of the video content. The blur dimensioning sub unit 740 of the filter unit 760 is configured to dimension a blur filter according to the received focal length and reference parameters. The motion dimensioning sub unit 746 is configured to dimension a motion blur filter according to a received shutter speed and extracted parameters form the encoded view as discussed above. The combination sub unit 748 is configured to combine the dimensioned blur and motion blur filters. The application sub unit 750 is configured to apply the combined filter to the synthesised view or views.

Figure 13 illustrates another embodiment of decoder 800. The decoder 800 comprises a processor 880 and a memory 890. The memory 890 contains instructions executable by the processor 880 such that the decoder 800 is operative to conduct the steps of the method of Figure 9 described above. The method of the present invention may be implemented in hardware, or as software modules running on one or more processors. The method may also be carried out according to the instructions of a computer program, and the present invention also provides a computer readable medium having stored thereon a program for carrying out any of the methods described herein. A computer program embodying the invention may be stored on a computer-readable medium, or it could, for example, be in the form of a signal such as a downloadable data signal provided from an Internet website, or it could be in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. The word "comprising" does not exclude the presence of elements or steps other than those listed in a claim, "a" or "an" does not exclude a plurality, and a single processor or other unit may fulfil the functions of several units recited in the claims. Any reference signs in the claims shall not be construed so as to limit their scope.

Claims

1. A method of encoding three-dimensional video content, the method comprising: encoding at least one view of the video content and at least part of a depth representation associated with the view;

defining a camera focal length for the encoded view;

selecting at least one reference parameter for a focus plane of the encoded view; selecting at least one reference parameter for a camera f-Number for the encoded view; and

transmitting the encoded at least one view and at least part of an associated depth representation, the defined focal length, the selected reference parameter for a focus plane of the encoded view and the selected reference parameter for a camera f- Number for the encoded view to a node.

2. A method as claimed in claim 1 , wherein the reference parameter for a focus plane of the encoded view comprises a location of a focus plane for the encoded view.

3. A method as claimed in claim 1 , wherein the reference parameter for a focus plane of the encoded view comprises a look-up table index corresponding to a focus plane of the encoded view.

4. A method as claimed in claim 1 , wherein the reference parameter for a focus plane of the encoded view comprises a distance between a recording surface and an optical system of a camera for the encoded view.

5. A method as claimed in claim 1 , wherein the reference parameter for a focus plane of the encoded view comprises the distances of the nearest and farthest objects in a scene of the encoded view that fulfil criteria for an acceptable focus level.

6. A method as claimed in claim 1 , wherein the reference parameter for a focus plane of the video content comprises look-up table indexes corresponding to the distances of the nearest and farthest objects in a scene of the encoded view that fulfil criteria for an acceptable focus level.

7. A method as claimed in any one of the preceding claims, wherein the reference parameter for a camera f-Number for the encoded view comprises a camera f-Number for the encoded view.

8. A method as claimed in any one of claims 1 to 6, wherein the reference parameter for a camera f-Number for the encoded view comprises a camera aperture diameter.

9. A method as claimed in any one of the preceding claims, wherein the selected reference parameters for focus plane and camera f-Number correspond to a first depth of focus, and wherein the method further comprises:

selecting at least one additional reference parameter for a focus plane of the encoded view;

selecting at least one additional reference parameter for a camera f-Number for the encoded view; and

transmitting the selected additional reference parameters with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.

10. A method as claimed in any one of the preceding claims, wherein the video content comprises captured video content, and wherein the selected reference parameters for focus plane and camera f-Number correspond to an actual focus plane and camera f-Number of a capturing camera.

1 1. A method as claimed in any one of claims 1 to 9, wherein the selected reference parameters for focus plane and camera f-Number correspond to a selected focus place and camera f-Number for one of a capturing camera or a virtual camera.

12. A method as claimed in any one of the preceding claims, further comprising: selecting a shutter speed for the video content; and

transmitting the shutter speed with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.

13. A method as claimed in any one of the preceding claims, further comprising: selecting a shutter shape for the video content; and transmitting a parameter representing the shutter shape with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.

14. A method as claimed in any one of the preceding claims, wherein transmitting the selected reference parameters to the node comprises including the selected reference parameters in a supplementary enhancement information, SEI, message.

15. A method for decoding three-dimensional video content, comprising:

receiving:

at least one encoded view of the video content and at least part of a depth representation associated with the view,

a camera focal length for the encoded view;

at least one reference parameter for a focus plane of the encoded view; and

at least one reference parameter for a camera f-Number for the encoded view;

synthesising at least one view of the video content;

dimensioning a blur filter according to the received focal length and reference parameters; and

applying the dimensioned blur filter to the synthesised view.

16. A method as claimed in claim 15, wherein dimensioning a blur filter according to the received focal length and reference parameters comprises at least one of:

calculating a focus plane from the received reference parameter for a focus plane; and

calculating a camera f-Number from the received reference parameter for a camera f-Number.

17. A method as claimed in claim 15 or 16, wherein synthesising at least one view of the video content comprises applying a Depth Image Based Rendering, DIBR, process to the encoded view of the video content.

18. A method as claimed in any one of claims 15 to 17, further comprising:

receiving a parameter representing shutter shape for the video content; and applying the shutter shape outline corresponding to the received parameter to the dimensioned blur filter.

19. A method as claimed in any one of claims 15 to 18, further comprising:

receiving a shutter speed for the video content; and

dimensioning a motion blur filter according to the received shutter speed;

wherein applying the dimensioned blur filter to the synthesised view comprises:

combining the dimensioned blur filter with the dimensioned motion blur filter; and applying the combined blur filter to the synthesised view.

20. A method as claimed in claim 19, further comprising estimating a motion blur direction from a motion model.

21. A method as claimed in any one of claims 15 to 20, wherein the received encoded view includes blur, and wherein the method further comprises:

sharpening the received encoded view according to the received focal length and reference parameters before synthesising the at least one view of the video content.

22. A method as claimed in claim 21 , further comprising applying a sharpening process to a depth map of the received view before synthesising the at least one view of the video content.

23. A computer program product configured, when run on a computer, to execute a method according to any one of the preceding claims.

24. An encoder configured for encoding three-dimensional video content, the encoder comprising a processor and a memory, the memory containing instructions executable by the processor whereby the encoder is operative to:

encode at least one view of the video content and at least part of a depth representation associated with the view;

define a camera focal length for the encoded view;

select at least one reference parameter for a focus plane of the encoded view; select at least one reference parameter for a camera f-Number for the encoded view; and

transmit the encoded at least one view and at least part of an associated depth representation, the defined focal length, the selected reference parameter for a focus plane of the encoded view and the selected reference parameter for a camera f- Number of the encoded view to a node.

25. An encoder as claimed in claim 24, wherein the encoder is further operative to select a reference parameter for a focus plane of the encoded view which comprises a location of a focus plane for the encoded view.

26. An encoder as claimed in claim 24, wherein the encoder is further operative to select a reference parameter for a focus plane of the encoded view which comprises a look-up table index corresponding to a focus plane of the encoded view.

27. An encoder as claimed in any one of claims 24 to 26, wherein the encoder is further operative to select a reference parameter for a camera f-Number for the encoded view which comprises a camera f-Number for the encoded view.

28. An encoder as claimed in any one of claims 24 to 27, wherein the encoder is further operative to transmit at least one of the selected reference parameters in floating point representation.

29. An encoder as claimed in any one of claims 24 to 28, wherein the encoder is further operative to:

select a shutter speed for the video content; and

transmit the shutter speed with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.

30. A decoder, the decoder comprising a processor and a memory, the memory containing instructions executable by the processor whereby the decoder is operative to:

receive:

a camera focal length for the encoded view;

at least one reference parameter for a focus plane of the encoded view; and

at least one reference parameter for a camera f-Number for the encoded view; synthesise at least one view of the video content;

dimension a blur filter according to the received focal length and reference parameters; and

apply the dimensioned blur filter to the synthesised view.

31. A decoder as claimed in claim 30, wherein the decoder is further operative to: receive a shutter speed for the video content; and

dimension a motion blur filter according to the received shutter speed;

wherein applying the dimensioned blur filter to the synthesised view comprises:

combining the dimensioned blur filter with the dimensioned motion blur filter; and apply the combined blur filter to the synthesised view.