WO2015115946A1 - Methods for encoding and decoding three-dimensional video content - Google Patents

Methods for encoding and decoding three-dimensional video content Download PDF

Info

Publication number
WO2015115946A1
WO2015115946A1 PCT/SE2014/050118 SE2014050118W WO2015115946A1 WO 2015115946 A1 WO2015115946 A1 WO 2015115946A1 SE 2014050118 W SE2014050118 W SE 2014050118W WO 2015115946 A1 WO2015115946 A1 WO 2015115946A1
Authority
WO
WIPO (PCT)
Prior art keywords
view
camera
encoded
video content
focus plane
Prior art date
Application number
PCT/SE2014/050118
Other languages
French (fr)
Inventor
Julien Michot
Original Assignee
Telefonaktiebolaget L M Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget L M Ericsson (Publ) filed Critical Telefonaktiebolaget L M Ericsson (Publ)
Priority to PCT/SE2014/050118 priority Critical patent/WO2015115946A1/en
Publication of WO2015115946A1 publication Critical patent/WO2015115946A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/172Processing image signals image signals comprising non-image signal components, e.g. headers or format information
    • H04N13/178Metadata, e.g. disparity information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/194Transmission of image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/179Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scene or a shot
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • the present invention relates to methods for encoding and decoding three- dimensional (3D) video content.
  • the invention also relates to an encoder, a decoder and a computer program product configured to implement methods for encoding and decoding three-dimensional video content.
  • Three-dimensional video technology continues to grow in popularity, and 3D technology capabilities in the entertainment and communications industries in particular have evolved rapidly in recent years.
  • 3D technology provides an observer with an impression of depth in a compound image, causing parts of the image to appear to project out in front of a display screen, into what is known as observer space, while other parts of the image appear to project backwards into the space behind the screen, into what is known as CRT space.
  • the term 3D is usually used to refer to a stereoscopic experience, in which an observer's eyes are provided with two slightly different images of a scene, which images are fused in the observer's brain to create the impression of depth. This effect is known as binocular parallax and provides an excellent 3D experience to a stationary observer, usually requiring the use of glasses or other filtering elements that enable the different images to be shown to the left and right eyes of an observer.
  • a new generation of "auto-stereoscopic" displays allows a user to experience three- dimensional video without glasses. These displays project slightly different views of a scene in different directions, as illustrated in Figure 1. A viewer located a suitable distance in front of the display will see slightly different pictures of the same scene in their left and right eyes, creating a perception of depth. In order to achieve smooth parallax and to enable a change of viewpoint as users move in front of the screen, a number of views (typically between 7 and 28 views) are generated. Auto-stereoscopic functionality is enabled by capturing or digitally generating a scene using many different cameras which observe a scene from different angles or viewpoints. These cameras generate what is known as multiview video.
  • Multiview video can be relatively efficiently encoded for transmission by exploiting both temporal and spatial similarities that exist in different views.
  • MVC multiview coding
  • MV-HEVC MV-HEVC
  • the transmission cost for multiview video remains prohibitively high.
  • current technologies only actually transmit a subset of key captured or generated multiple views, typically between 2 and 3 of the available views.
  • depth or disparity maps are used to recreate the missing data.
  • virtual views can be generated at any arbitrary viewing position using view synthesis processes. These viewing positions are sometimes known as virtual cameras, and may be located between the transmitted key views (interpolated) or outside the range covered by the key views (extrapolated).
  • the ability to generate views at more or less arbitrary positions means that the depth perception of a viewer may be changed or adjusted and depth perception may be matched to the size of the display screen on which the video will be shown.
  • DIBR depth image-based rendering
  • a depth map is simply a greyscale image of a scene in which each pixel indicates the distance between the corresponding pixel in a video object and the capturing camera optical centre.
  • a disparity map is an intensity image conveying the apparent shift of a pixel which results from moving from one viewpoint to another. Depth and disparity are mathematically related, and the link between them can be appreciated by considering that the closer an object is to a capturing camera, the greater will be the apparent positional shift resulting from a change in viewpoint.
  • a key advantage of depth and disparity maps is that they contain large smooth surfaces of constant grey levels, making them comparatively easy to compress for transmission using current video coding technology.
  • a 3D points cloud can be reconstructed from a depth map using the 3D camera parameters of the capturing camera. These parameters include the matrix K for a pinhole camera model, which contains the camera focal lengths, principal point, etc.
  • q is a 2D point (expressed in the camera coordinate frame, in homogeneous coordinates)
  • d is the point's associated depth (measured by a sensor for example)
  • Q is the corresponding 3D point in a 3D coordinate frame.
  • a depth map can be measured by specialized cameras, including structured-light or time-of-flight (ToF) cameras, where the depth is correlated with the deformation of a projected light pattern or with the round-trip time of a pulse of light.
  • ToF time-of-flight
  • a principle limitation of these depth sensors is the depth range they can measure: objects that are too close to or too far away from the device will have no depth information.
  • Capturing camera parameters are also required to conduct DIBR view synthesis, and these parameters are usually divided into two groups.
  • the first group is internal camera parameters, representing the optical characteristics of the camera for the image taken. This includes the focal length, the coordinates of the image's principal point and the lens distortions.
  • the second group, or external camera parameters represent the camera position and the direction of its optical axis in the chosen real world coordinates (conveying the position of the cameras relative to each other and to the objects in the scene). Both internal and external camera parameters are required in view synthesis processes based on usage of the depth information (such as DIBR).
  • DIBR depth information
  • MVC multi-view video coding
  • AVC advanced video coding
  • SEI supplementary enhancement information
  • Multiview acquisition information SEI message syntax multiview_acquisition_info( payloadSize ) ⁇ C Descriptor num_views_minus1 ue(v) intrinsic_param_flag 5 u(1) extrinsic_param_flag 5 u(1) if ( instrinsic_param_flag ) ⁇
  • Table 1 The camera parameters in Table 1 are sent in floating point representation, which offers high precision as well as supporting a high dynamic range of parameters.
  • Tables 2 and 3 show an example Sequence Parameter Set (SPS) message for sending Znear and Zfar values associated with a depth map:
  • SPS Sequence Parameter Set
  • nonlinear_depth_representation_model[ i ] specifies the piecewise linear segments for mapping of decoded luma sample values of depth views to a scale that is uniformly quantized in terms of disparity.
  • DepthLUT[x ] Clip3( 0, 255, Round( ( ( x - x1 ) * ( y2 - y1 ) ) ⁇ ( x2 - x1 ) + y1 ) )
  • Depthl_UT[ dS ] for all decoded luma sample values dS of depth views in the range of 0 to 255, inclusive represents disparity that is uniformly quantized into the range of 0 to 255, inclusive.
  • the present specification discusses the case of 1 D linear camera arrangement with cameras pointing at directions parallel to each other.
  • the z axis and camera centers have the same x and y coordinates, with only the x coordinate changing from camera to camera.
  • This is a common camera setup for stereoscopic and "3D multiview" video.
  • the so-called “toed-in” or general case camera setup, in which the cameras are not aligned, can be converted to the 1 D linear camera setup by the rectification process.
  • the distance between two cameras in stereo/3D setup is usually called the baseline (or the baseline distance).
  • the baseline is often approximately equal to the distance between the human eyes (normally about 6.5 centimeters) in order to achieve natural depth perception when showing these left and right pictures on a stereo screen.
  • Other baseline values may be chosen depending on the scene characteristics, camera parameters and the intended stereo effects.
  • the present specification refers to a baseline as the distance between the cameras for the left and the right views in the units of the external (extrinsic) camera coordinates.
  • the baseline is the distance between the virtual (or real) cameras used to obtain the views for the stereo-pair.
  • the baseline is considered as the distance between two cameras that the left and the right eyes of a spectator see when watching the video on the multiview screen at the preferred viewing distance.
  • the views (cameras) seen by the left and the right eyes of the viewer may not be consecutive views. However, this information is known to the display manufacturer and can be used in the view synthesis process.
  • the baseline is not therefore the distance between the two closest generated views, as it may be greater than this distance, depending upon the particular setup being used.
  • JCT-3V is an on-going standardization work where multiview texture (normal 2D videos) and depth maps are being compressed and transmitted using the MV-HEVC or 3 D-HEVC future codecs.
  • View synthesis techniques such as DIBR thus address many of the problems associated with providing multiview three-dimensional video content.
  • view synthesis can encounter difficulties in rendering views in which part of the content is blurred.
  • the depth of field (DoF) of an image corresponds to the distance between the nearest D N and farthest D F objects in a scene that appear acceptably sharp.
  • Acceptably sharp may be defined with reference to criteria relating to the capturing or display equipment, and may for example comprise all areas of an image where the extent of blur for an image point is less than the pixel diameter of the capturing or display equipment.
  • DoF is thus defined as D F - D N .
  • a small DoF corresponds to an image in which significant parts of the foreground and/or background image texture are blurred.
  • the focus plane s which is the depth at which the content of the image is sharpest.
  • FIGS. 2a and 2b illustrate synthesis results for a blurred image texture in which the colours (y axis) of object f and object e at different locations along the x axis have blurred into each other.
  • a method of encoding three-dimensional video content comprising encoding at least one view of the video content and at least part of a depth representation associated with the view, and defining a camera focal length for the encoded view.
  • the method further comprises selecting at least one reference parameter for a focus plane of the encoded view, selecting at least one reference parameter for a camera f-Number for the encoded view, and transmitting the encoded at least one view and at least part of an associated depth representation, the defined focal length, the selected reference parameter for a focus plane of the encoded view and the selected reference parameter for a camera f-Number for the encoded view to a node.
  • the node may be a decoder.
  • the depth representation maybe a depth or disparity map.
  • the depth representation may be a dense (comprising a matrix) or sparse (comprising sets) representation.
  • the representation may be deduced from a 3D model or a previously reconstructed depth map projected onto the camera view or may be estimated from multiple camera views.
  • defining a camera focal length may comprise extracting a focal length of a camera used to capture the encoded view. In other examples, defining a camera focal length may comprise defining a focal length of a virtual camera used to digitally generate the encoded view.
  • the reference parameter for a focus plane of the encoded view may comprise a location of a focus plane for the encoded view.
  • the location of the focus plane may comprise the actual location of the focus plane in captured video content.
  • the location of the focus plane may comprise a selected location for the focus plane for captured or digitally generated video content.
  • the reference parameter for a focus plane of the encoded view may comprise a look-up table index corresponding to a focus plane of the encoded view.
  • the reference parameter for a focus plane of the encoded view may comprise a distance between a recording surface and an optical system of a camera for the encoded view. The distance may permit calculation of the focus plane location for the encoded view.
  • the reference parameter for a focus plane of the encoded view may comprise the distances of the nearest and farthest objects in a scene of the encoded view that fulfil criteria for an acceptable focus level.
  • the acceptable focus level may correspond to a measured focus level or to a desired focus level, for example in digitally generated content.
  • the reference parameter for a focus plane of the video content may comprise look-up table indexes corresponding to the distances of the nearest and farthest objects in a scene of the encoded view that fulfil criteria for an acceptable focus level.
  • the nearest and farthest in-focus depths may allow estimation of a location of the focus plane for the encoded view.
  • more than one view of the video content may be encoded.
  • an identification of the view to which the look-up table index applies may be included as part of the reference parameter for a focus plane of the video content.
  • the reference parameter for a camera f- Number for the encoded view may comprise a camera f-Number for the encoded view.
  • the camera f-Number may be a camera f-Number of a capturing camera of the video content. In other examples, the camera f-Number may be a selected camera f-Number for captured or digitally generated video content.
  • the reference parameter for a camera f- Number for the encoded view may comprise a camera aperture diameter.
  • the camera aperture diameter may allow calculation of the camera f-Number.
  • transmitting the selected reference parameters may comprise transmitting at least one of the selected reference parameters in floating point representation.
  • transmitting the selected reference parameters may comprises transmitting at least one of the selected reference parameters in unsigned integer representation.
  • the selected reference parameters for focus plane and camera f-Number may correspond to a first depth of focus
  • the method may further comprise selecting at least one additional reference parameter for a focus plane of the encoded view, selecting at least one additional reference parameter for a camera f-Number for the encoded view, and transmitting the selected additional reference parameters with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.
  • the video content may comprise captured video content
  • the selected reference parameters for focus plane and camera f- Number may correspond to an actual focus plane and camera f-Number of a capturing camera.
  • the selected reference parameters for focus plane and camera f-Number may correspond to a selected focus place and camera f-Number for one of a capturing camera or a virtual camera.
  • the method may further comprise selecting a shutter speed for the video content and transmitting the shutter speed with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.
  • the shutter speed may be the actual speed for captured video content. In other examples, the shutter speed may be a selected speed for captured or for digitally generated content. According to an embodiment of the invention, the method may further comprise selecting a shutter shape for the video content and transmitting a parameter representing the shutter shape with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation. In some examples, the shutter shape may be the actual shape for captured video content. In other examples, the shutter shape may be a selected shape for captured or for digitally generated content.
  • transmitting the selected reference parameters to the node may comprise including the selected reference parameters in a supplementary enhancement information, SEI, message.
  • transmitting the selected reference parameters to the node may comprise including the selected reference parameters in the multiview_acquisition_info SEI message.
  • transmitting the selected reference parameters to the node may comprise including the selected reference parameters in a dedicated SEI message.
  • a method for decoding three-dimensional video content comprising receiving: at least one encoded view of the video content and at least part of a depth representation associated with the view, a camera focal length for the encoded view, at least one reference parameter for a focus plane of the encoded view, and at least one reference parameter for a camera f-Number for the encoded view.
  • the method further comprises synthesising at least one view of the video content, dimensioning a blur filter according to the received focal length and reference parameters and applying the dimensioned blur filter to the synthesised view.
  • synthesising at least one view may comprise at least partially decoding the received encoded view
  • the received reference parameters may comprise focus plane and f-Number values.
  • the received reference parameters may comprise other values permitting calculation of the focus plane and f-Number values, for example a distance between a recording surface and an optical system of a camera and a camera aperture diameter.
  • the method may comprise receiving a plurality of reference parameters for focus plane and camera f-Number of the encoded view, and the method may further comprise selecting a reference parameter for a focus plane of the encoded view and a reference parameter for a camera f-Number of an encoded view.
  • the reference parameters may be selected according to at least one of display or viewing conditions for the three-dimensional video content.
  • dimensioning a blur filter according to the received focal length and reference parameters may comprise at least one of: calculating a focus plane from the received reference parameter for a focus plane; and calculating a camera f-Number from the received reference parameter for a camera f- Number.
  • synthesising at least one view of the video content may comprise applying a Depth Image Based Rendering, DIBR, process to the encoded view of the video content.
  • DIBR Depth Image Based Rendering
  • an inpainting process may be conducted as part of the DIBR process, for example if the synthesized view has disocclusions.
  • the method may further comprise receiving a parameter representing shutter shape for the video content and applying the shutter shape outline corresponding to the received parameter to the dimensioned blur filter.
  • the method may further comprise receiving a shutter speed for the video content and dimensioning a motion blur filter according to the received shutter speed.
  • applying the dimensioned blur filter to the synthesised view may comprise combining the dimensioned blur filter with the dimensioned motion blur filter and applying the combined blur filter to the synthesised view.
  • dimensioning a motion blur filter may comprise calculating a direction and a length of motion blur.
  • motion and blur filters may be applied separately.
  • the method may further comprise estimating a motion blur direction from a motion model.
  • the received encoded view may include blur
  • the method may further comprise sharpening the received encoded view according to the received focal length and reference parameters before synthesising the at least one view of the video content.
  • sharpening may comprise applying a deblurring filter dimensioned according to the received focal length and parameters, for example using a Wiener deconvolution.
  • the method may further comprise applying a sharpening process to a depth map of the received view before synthesising the at least one view of the video content.
  • the sharpening process may comprise applying at least one median filter to the depth map.
  • an encoder configured for encoding three-dimensional video content
  • the encoder comprising a processor and a memory, the memory containing instructions executable by the processor whereby the encoder is operative to encode at least one view of the video content and at least part of a depth representation associated with the view, define a camera focal length for the encoded view, select at least one reference parameter for a focus plane of the encoded view, select at least one reference parameter for a camera f-Number for the encoded view and transmit the encoded at least one view and at least part of an associated depth representation, the defined focal length, the selected reference parameter for a focus plane of the encoded view and the selected reference parameter for a camera f-Number for the encoded view to a node.
  • the node may in some examples be a decoder.
  • the encoder may be further operative to select a reference parameter for a focus plane of the encoded view which comprises a location of a focus plane for the encoded view. According to another embodiment of the invention, the encoder may be further operative to select a reference parameter for a focus plane of the encoded view which comprises a look-up table index corresponding to a focus plane of the encoded view.
  • the encoder may be further operative to select a reference parameter for a camera f-Number for the encoded view which comprises a camera f-Number for the encoded view.
  • the encoder may be further operative to transmit at least one of the selected reference parameters in floating point representation.
  • the encoder may be further operative to select a shutter speed for the video content and transmit the shutter speed with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.
  • a decoder comprising a processor and a memory, the memory containing instructions executable by the processor whereby the decoder is operative to receive: at least one encoded view of the video content and at least part of a depth representation associated with the view, a camera focal length for the encoded view, at least one reference parameter for a focus plane of the encoded view and at least one reference parameter for a camera f-Number for the encoded view.
  • the decoder is further operative to synthesise at least one view of the video content, dimension a blur filter according to the received focal length and reference parameters and apply the dimensioned blur filter to the synthesised view.
  • the decoder may be further operative to receive a shutter speed for the video content and dimension a motion blur filter according to the received shutter speed.
  • applying the dimensioned blur filter to the synthesised view may comprise combining the dimensioned blur filter with the dimensioned motion blur filter and applying the combined blur filter to the synthesised view.
  • Figure 1 is a representation of multiview 3D display
  • Figures 2a and 2b are graphs illustrating the result of view synthesis with blurred texture and depth map (Figure 2a) and with blurred texture only (Figure 2b);
  • Figure 3 is a flow chart illustrating process steps in a method for encoding three- dimensional video content;
  • Figure 4 is a flow chart illustrating additional steps that may be conducted as part of the method of Figure 3.
  • Figure 5 is a block diagram illustrating functional units in an encoder
  • Figure 6 is a block diagram illustrating another embodiment of encoder
  • Figure 7 is a flow chart illustrating process steps in a method for decoding three- dimensional video content
  • Figure 8 is a block diagram illustrating functional units in a decoder
  • Figure 9 is a flow chart illustrating additional steps that may be conducted as part of the method of Figure 7;
  • Figure 10 is a graph illustrating motion blur path estimation
  • Figure 12 is a block diagram illustrating another embodiment of decoder
  • Figure 13 is a block diagram illustrating another embodiment of decoder.
  • aspects of the present invention address the issues of synthesising views of three- dimensional video content containing blur by transmitting to the decoder information allowing for the dimensioning of a blur filter, which may be applied as a post processing step following view synthesis.
  • the transmitted parameters allow for the creation of a filter that generates a desired amount of blur in desired regions of the image.
  • the transmitted parameters may allow for the accurate recreation of the original blur in synthesised views.
  • the transmitted parameters may be selected by a creator of the content to ensure that synthesised views are blurred according to the creator's intensions.
  • aspects of the present invention may thus allow a creator or supplier of three-dimensional video content to maintain control over how the content is displayed.
  • Figure 3 illustrates steps in a method 100 for encoding three-dimensional video content according to an embodiment of the present invention.
  • the method comprises encoding at least one view of the video content and at least part of a depth representation associated with the view.
  • the method then comprises, at step 120, defining a camera focal length for the encoded view.
  • the method further comprises, at step 130, selecting at least one reference parameter for a focus plane of the encoded view, and at step 140 selecting at least one reference parameter for a camera f- Number for the encoded view.
  • the method comprises transmitting the encoded at least one view and at least part of an associated depth representation, the defined focal length, the selected reference parameter for a focus plane of the encoded view and the selected reference parameter for a camera f-Number for the encoded view to a node.
  • Step 1 10 of the method comprises encoding one or more views of the three- dimensional video content.
  • the at least part of a depth representation associated with the view, and which is encoded with the view may for example be a depth map for the view or may take other forms.
  • At least part of the depth representation is encoded with the view to enable view synthesis to be conducted at a decoder.
  • Step 120 of the method may comprise extracting the camera focal length of a capturing camera for the encoded view, or may comprise defining a focal length for the encoded view.
  • Step 130 of the method comprises selecting at least one reference parameter for a focus plane of the encoded view.
  • the reference parameter may take a range of different forms according to different embodiments of the invention.
  • the reference parameter for a focus place of the encoded view may comprise the location of the focus plane s.
  • the units of s may be the same as the units used for the external camera parameters; baseline, Znear and Zfar. However in some circumstances it may be more practical to send the value of s in standard length units of metres or feet.
  • the reference parameter for a focus plane of the encoded view may comprise a LUT (Look-up table) index for the focus plane in the encoded view.
  • a view identifier may also be transmitted as part of the reference parameter, indicating which view the LUT index applies to.
  • the reference parameter for a focus plane of the encoded view may comprise a distance between a recording surface and an optical system of a camera for the encoded view. In the case of a charge coupled device, this distance is known as the CCD distance g. A corresponding distance may be defined for other camera types.
  • the focus plane s may be calculated from the CCD distance g using the equation:
  • the reference parameter for a focus plane of the encoded view may comprise the depths D N and D F of the nearest and farthest objects in a scene that appear acceptably sharp. Acceptably sharp may be defined with reference to the capturing or display conditions.
  • Step 140 of the method comprises selecting at least one reference parameter for a camera f-Number for the encoded view.
  • the reference parameter for the camera f-Number for the encoded view may be the camera f-Number.
  • the reference parameter for the camera f-Number may be a camera aperture diameter d.
  • the camera f-Number N may then be calculated using the following equation:
  • N f / d (Equation 6) where f is the camera focal length.
  • the unit of measurement of the camera aperture diameter can be the same as the units of the focus plane. However it may be more practical to send the aperture diameter in units of millimeters or inches. It is useful to ensure that the aperture diameter as selected and transmitted is measured in the same units as the actual aperture diameter.
  • Step 170 of the method comprises transmitting the encoded at least one view and at least part of an associated depth representation, focal length and selected parameters to a node, which may for example be a decoder.
  • Transmission of the selected reference parameters may be conducted in a number of different formats according to different embodiments of the invention.
  • the selected reference parameters are transmitted in a dedicated SEI message.
  • the selected reference parameters are transmitted as part of the standard multiview_acquisition_info SEI message.
  • the selected parameters may be sent as part of other SEI messages or any other message signalling to a decoder.
  • the parameters may be sent in floating point representation, in unsigned integer representation or in another appropriate format.
  • Example 1 illustrate different transmission methods for different combinations of reference parameters.
  • Example 1 illustrate different transmission methods for different combinations of reference parameters.
  • Focus plane s and camera f-Number N are transmitted in a dedicated SEI message in floating point representation (in the same format as is used in sending camera parameters in the multiview_acquisition_info message in MVC): focus_info( payloadSize ) ⁇ Descriptor prec_focus_plane 5 ue(v) prec_aperture_f_number 5 ue(v) exponent_focus_plane 5 u(6) mantissa_focus_plane 5 u(v) exponent_aperture_f_number 5 u(6) mantissa_aperture_f_number 5 u(v)
  • the reference parameter for the focus plane is the LUT index focus_plane_depth corresponding to the focus plane of an encoded view. Multiple views are encoded for transmission so the reference parameter for focus plane also comprises a view identifier focus_plane_view_id to indicate which view the LUT index applies to:
  • Example 3 In an alternative version of Example 3, only a single view may be sent, in which case there is no advantage to sending a view identifier as the index may be assumed to apply to the single view that is transmitted.
  • Example 3 Focus plane s is sent in floating point representation and the camera aperture diameter d is sent in floating point representation:
  • the aperture f-Number N may then be calculated at the decoder using equation 6 above.
  • Focus plane s is sent in floating point representation and the camera aperture diameter is sent in unsigned integer representation:
  • Focus plane s and camera aperture f-Number N are sent in floating point representation in the multiview_aquisition_info message: multiview_acquisition_info( payloadSize ) ⁇ C Descriptor num_views_minus1 ue(v) intrinsic_param_flag 5 u(1) extrinsic_param_flag 5 u(1) aperture_f_number_flag 5 u(1) focus_plane_flag 5 u(1) if ( instrinsic_param_flag ) ⁇
  • intrinsic_params_equal 5 u(1) prec_focal_length 5 ue(v) prec_principal_point 5 ue(v) prec_skew_factor 5 ue(v) if( intrinsic_params_equal )
  • num_of_param_sets num_views_minus1 + 1
  • Example 6 The camera f-Number reference parameter is sent using any of the above discussed methods and D N and D F are sent using floating point representation: focus_plane _info( payloadSize ) ⁇ C Descriptor prec_focus_plane 5 ue(v) prec_aperture_f_number 5 ue(v) exponent_focus_plane 5 u(6) mantissa_focus_plane 5 u(v) exponent_aperture_f_number 5 u(6) mantissa_aperture_f_number 5 u(v) exponent_nearest_plane_focused 5 u(6) mantissa_nearest_plane_focused 5 u(v) exponent_farthest_plane_focused 5 u(6) mantissa_farthest_plane_focused 5 u(v)
  • CCD distance g is sent in floating point representation and camera f-Number is sent in floating point representation:
  • more than one pair of reference parameters for focus plane and camera f-Number may be selected and transmitted.
  • a single reference parameter for each of focus plane and f-Number corresponds to a particular Depth of Focus in a synthesised view.
  • the selected reference parameters may correspond to actual values for the focus plane and camera f-Number in the encoded view, for example in the case of captured three-dimensional video content.
  • the focus plane may be provided, for example by an autofocus system or by a user.
  • the focus plane may be estimated using a software-based autofocus technique (which may involve estimating the amount of blur and taking the depth of the sharpest areas as the focus plane).
  • the aperture f-number may also be provided by the lens, by the camera or by the user. It is generally more complicated to estimate the camera f-Number using only the image.
  • the reference parameters may be selected for the view in order to create a desired blur effect.
  • Figure 4 illustrates possible additional steps in the method 100, which may be conducted during the encoding of three-dimensional video content.
  • the method 100 may also comprise a step 150 of selecting a shutter speed for the video content, and step 160 of selecting a shutter shape for video content.
  • the selected shutter speed and a parameter representing the selected shutter shape may then be transmitted with the selected reference parameters, focal length and encoded view and depth representation in a modified step 170B.
  • Camera shutter speed (or exposure time) may be used at the decoder side to create camera motion blur. Detail of this procedure is discussed below with reference to Figures 9 and 10.
  • the selected shutter speed may be the actual shutter speed of a capturing camera for the video content, or may be a selected shutter speed for example of a virtual camera applied to digitally generated content.
  • the motion blur created using the transmitted shutter speed may thus recreate actual motion blur or introduce a desired motion blur.
  • shutter speed may be sent using floating point representation in an SEI message:
  • the unit of the shutter speed may be seconds, milliseconds of other time units.
  • the inverse of the shutter speed may be sent using unsigned integer representation:
  • the unit of this inverse shutter speed may be 1 /seconds.
  • Camera shutter shape may be used at the decoding side in the dimensioning of a blur filter to accurately represent blur from a capturing or virtual camera.
  • the shape corresponds to the actual shape of a camera shutter and may be a perfect disk, triangle, square, hexagon, octagon, etc.
  • shutter shape may be included with focus plane s and camera f- Number N as follows:
  • the parameter shutter_shape_type has no unit but may correspond to a predetermined set of values, for example:
  • shutter_shape_type defines the shape (outline) of a blur filter to be used in the blurring process at the decoder side, as discussed below.
  • three bits may be used to signal shutter shape, supporting up to 8 different shutter shapes. Other examples using fewer or more bits may be envisaged.
  • Figure 5 illustrates functional units of an encoder 200 in accordance with an embodiment of the invention.
  • the encoder 200 may execute the steps of the method 100, for example according to computer readable instructions received from a computer program.
  • the encoder 200 comprises an encoding unit 210, a selection unit 230 and a transmission unit 270. It will be understood that the units of the apparatus are functional units, and may be realised in any appropriate combination of hardware and/or software.
  • the encoding unit 210 is configured to encode at least one view of the video content and at least part of a depth representation associated with the view, for example a depth map.
  • the selection unit 230 is configured to define a camera focal length for the encoded view, select at least one reference parameter for a focus plane of the encoded view and select at least one reference parameter for a camera f-Number for the encoded view.
  • the transmission unit 270 is configured to transmit the encoded at least one view and at least part of an associated depth representation, focal length and selected reference parameters to a node.
  • the selection unit 230 may also be configured to select a shutter shape and shutter speed for the encoded view, and the transmission unit 270 may be configured to transmit the shutter speed and a parameter representing the shutter shape with the encoded view and other elements.
  • Figure 6 illustrates another embodiment of encoder 300.
  • the encoder 300 comprises a processor 380 and a memory 390.
  • the memory 390 contains instructions executable by the processor 380 such that the encoder 300 is operative to conduct the steps of the methods of Figures 3 and 4 described above.
  • Figure 7 illustrates steps in a method 400 for decoding three-dimensional video content in accordance with an embodiment of the present invention.
  • the method comprises receiving at least one encoded view of the video content and at least part of a depth representation associated with the view, a camera focal length for the encoded view, at least one reference parameter for a focus plane of the encoded view, and at least one reference parameter for a camera f-Number for the encoded view.
  • the method then comprises, at step 430, synthesising at least one view of the video content.
  • the method comprises dimensioning a blur filter according to the received focal length and reference parameters and finally at step 450, the method comprises applying the dimensioned blur filter to the synthesised view.
  • the encoded view and depth representation, focal length and reference parameters may be received at step 410 from an encoder operating according to the methods of Figures 3 and 4.
  • Synthesising at least one view of the video content at step 430 may comprise at least partially decoding the received view or views and associated depth representation, and then running a view synthesis process, such as DIBR to synthesise the required view or views.
  • Step 440 of the method then comprises dimensioning a blur filter according to the received reference parameters, and step 450 comprises applying the dimensioned blur filter to the synthesised view.
  • the Depth of Field of an image in a view corresponds to the distance between the nearest D N and farthest D F objects in a scene that appear acceptably sharp. Outside the Depth of Field, an image is blurred to a greater or lesser extent.
  • Dimensioning a blur filter using received reference parameters, and applying the dimensioned blur filter to a synthesized view allows the creation of blur in parts of the synthesized view to create a Depth of Field in the synthesized view that corresponds to the received parameters.
  • the received parameters may correspond to capture conditions, resulting in a Depth of Field may in the synthesized view that matches the Depth of Field in the original captured views.
  • the received parameters may have been selected to ensure the creation of a particular Depth of Field desired by a creator of the original content.
  • the process of dimensioning a blur filter according to step 440 of the method, and applying the blur filter according to step 450, is discussed below. and can be approximately calculated with:
  • N f / d (Equation 9)
  • f is the focal length and d is the aperture diameter.
  • d is the aperture diameter.
  • the amount of blur in an image is characterized by a blur diameter b, such that a pixel in the image appears as a blur disc of diameter b.
  • a blur filter F(Z) may then be applied to each pixel of the image I by the following convolution:
  • the diameter b of the filter is calculated for each pixel according to the pixel's depth within the image.
  • the resulting output image J has a Depth of Focus according to the parameters used to dimension the blur filter F(Z).
  • the blur filter may be a Gaussian filter or may have a specific shape (such as a disk, triangle, square, hexagon etc.) in order to mimic a specific aperture shape. It can be seen from the above discussion that a blur filter can be dimensioned using Equation 11 from the focal length f, focus plane s and camera f-Number N.
  • the focal length f and reference parameters for the focus plane s and f-Number N are received in step 410 of the method of Figure 7, and used in step 440 to dimension the blur filter which is then applied to the synthesised view in step 450.
  • the resulting Depth of Field in the synthesised view corresponds to the received parameters, thus ensuring that the synthesised view appears as desired by the creator of the original content.
  • the depth of field in the synthesized view may match that of the original content, or may be imposed on the content. If multiple pairs of parameters are received, the most appropriate pairs for the display and viewing conditions may be selected, ensuring that the synthesised view appears as best suited to the display and viewing conditions.
  • Figure 8 illustrates functional units of a decoder 600 in accordance with an embodiment of the invention.
  • the decoder 600 may execute the steps of the method 400, for example according to computer readable instructions received from a computer program.
  • the decoder 600 comprises a receiving unit 610, a synthesis unit 630 and a filter unit 660.
  • the filter unit comprises a dimensioning sub unit 640 and an application sub unit 650. It will be understood that the units of the apparatus are functional units, and may be realised in any appropriate combination of hardware and/or software.
  • the receiving unit 610 is configured to receive at least one encoded view of the video content and at least part of a depth representation associated with the view, a camera focal length for the encoded view, at least one reference parameter for a focus plane of the encoded view, and at least one reference parameter for a camera f-Number for the encoded view.
  • the synthesis unit 630 is configured to synthesise at least one view of the video content.
  • the dimensioning sub unit 640 of the filter unit 660 is configured to dimension a blur filter according to the received focal length and reference parameters.
  • the application sub unit 650 of the filter unit 660 is configured to apply the dimensioned blur filter to the synthesised view.
  • Figure 9 illustrates additional steps that may be conducted as part of the method 400 for decoding three-dimensional video content.
  • the method may for example be conducted in a decoder such as the decoders 700, 800 illustrated in Figures 12 and 13 and discussed in further detail below.
  • the decoder receives at least one encoded view of the video content and associated depth representation, as well as camera focal length and reference parameters for focus plane and camera f-Number for the encoded view.
  • the decoder may also receive a camera shutter speed and a parameter representative of shutter shape.
  • step 512 the decoder checks whether multiple pairs of reference parameters for focus plane and f-Number have been received. Multiple pairs may be sent by an encoder if different Depth of Field options are available. If multiple pairs of reference parameters have been received (Yes at step 512), the decoder then proceeds to select the most appropriate pair for the display and viewing conditions at step 514. This selection step may be automated or may be made with user input.
  • the decoder may adjust a received or selected reference parameter according to viewing conditions or to known user requirements.
  • the decoder proceeds to check, in step 516, whether the reference parameters received are the focus plane location and camera f-Number. If the reference parameters are the focus plane location and camera f-Number, then the decoder proceeds directly to step 520. If this is not the case, (No at step 516), the decoder proceeds to calculate either or both of the focus plane location and/or camera f-number from the received parameters at step 518. This may involve performing a LUT operation or a calculation, as discussed in further detail above. With the focus plane location and camera f-Number available, the decoder then proceeds to check, at step 520, whether blur is present in the image texture in the received encoded view.
  • the methods described herein may be used to recreate blur that was present in originally captured video content, or to impose blur onto all in focus video content, which content may have been physically captured or digitally generated.
  • the decoder therefore proceeds to step 522, in which the received reference parameters are used to sharpen the received view.
  • this sharpening process comprises using the received reference parameters to calculate the blur diameters for pixels in the view. From these diameters, an approximation of the point spread function for each pixel can be generated, allowing the application of a deblurring filter to the received view.
  • the deblurring filter may for example be a Wiener deconvolution.
  • motion blur parameters may also be calculated in order to sharpen the image for motion blur. Motion blur is discussed in further detail below.
  • the decoder then proceeds to check, in step 524, whether blur is present in the depth representation received, for example in a received depth map. If blur is present in the received depth map or other depth representation (Yes at step 524), the decoder sharpens the depth map at step 526. This may involve applying a plurality of median filters to the original depth map in order to remove smoothed edges from the depth map. Other methods for sharpening the depth map may be considered.
  • the decoder proceeds to synthesise at least one view of the video content at step 530. This may for example comprise running a DIBR process. Once the at least one view has been synthesised, the decoder proceeds to step 540 in which a blur filter is dimensioned according to the received focal length and reference parameters. This dimensioning process is described in detail above with respect to Figure 7. If a parameter representing shutter shape has been received then this may also be used in the dimensioning of the blur filter. The decoder then proceeds, in steps 542, 544 and 546 to address motion blur. Motion blur is described in greater detail below but in brief, the decoder assesses whether camera parameters for a frame t and frame t-1 are available in step 542.
  • the decoder proceeds to estimate a motion blur direction from a motion model at step 544. Using either the estimation or the available camera parameters, the decoder then dimensions a motion blur filter at step 546, calculating motion blur direction and length. In step 548 the decoder combines the motion blur filter and dimensioned blur filter from steps 546 and 540 before, in step 550, applying the combined blur filter to the synthesised view.
  • motion blur can occur in captured video content, for example when a camera is moving or zooming rapidly.
  • Transmitting the camera shutter speed as discussed with reference to Figure 4 allows the recreation of camera motion blur in synthesised views, or the imposition of camera motion blur where it is desired to introduce this blur onto images where it is not already present.
  • a first step is to determine the direction of the camera motion blur. By taking the depth map of frame t and projecting it on the camera frame t and t-1 , thus using different camera parameters, it is possible to determine the path that each pixel is taking.
  • the camera projection matrices may be computed from the camera parameters such as translation_param and orientation_param given in the multiview_acquisition_info SEI message illustrated in Table 1. If the camera projections matrices P t_1 and P l are found to be identical, then no motion blur is present. If the camera parameters are not available for the frame t and/or t-1 , a motion model may be used in order to predict the missing camera parameters.
  • a motion model may be used in order to predict the missing camera parameters.
  • constant speed models for instance:
  • R* R' "x * AngleAxisToRotationMatrix (W' "x /fps),
  • V is the estimated translational velocity vector (m/s if T is in meters and 1/fps in seconds).
  • W(x) is the estimated angular velocity vector (expressed here as an angle axis where the norm corresponds to the angular speed in rad/s).
  • VF is the estimated focal length speed (px/s if f is in pixel and 1/fps in second). Other motion models may be envisaged.
  • the amount of motion blur may be calculated.
  • the shutter speed ss which, as noted may be an actual shutter speed for the captured content or may be a selected shutter speed
  • PSF Point Spread Function
  • the dimensioned motion blur filter may be applied, for example according to the following equation:
  • J is the blurred image
  • I the original image
  • N the number of local pixel used (typically D mation 's norm)
  • x(i) and y(i) are respectively the x and y components of the discrete segment D mation starting from q':
  • d is the effective aperture diameter
  • L max is the maximum intensity (typically 255 for each image channel)
  • k and p are two constants.
  • the image I may then be blurred with the following equation:
  • the dimensioned blur and motion blur filters may be applied consecutively or may be combined before application. It may be that better final results are obtained using a combined motion blur and blur filter.
  • Figure 11 illustrates synthesis results using the method of Figure 9.
  • An all in focus texture and depth map is received or obtained through sharpening procedures before view synthesis and blurring are applied.
  • an inpainter is used in order to fill the disoccluded area and then the blur filter is applied.
  • case B no inpainter is required, and the blur filter is applied after DIBR.
  • Figure 12 illustrates functional units of another embodiment of decoder 700 in accordance with an embodiment of the invention.
  • the decoder 700 may execute the steps of the method 500, for example according to computer readable instructions received from a computer program.
  • the decoder 700 comprises an receiving unit 710, analysis and calculation unit 715, sharpening unit 720, synthesis unit 730 and filter unit 740.
  • the filter unit 740 comprises a blur dimensioning sub unit 740, a motion dimensioning sub unit 740, a combining sub unit 748 and an application sub unit 750. It will be understood that the units of the apparatus are functional units, and may be realised in any appropriate combination of hardware and/or software.
  • the receiving unit 710 is configured to receive at least one encoded view of the video content and at least part of a depth representation associated with the view, a camera focal length for the encoded view, at least one reference parameter for a focus plane of the encoded view, and at least one reference parameter for a camera f-Number for the encoded view.
  • the receiving unit is also configured to receive a shutter speed and a parameter for shutter shape.
  • the analysis and calculation unit 715 is configured to conduct steps 512 to 518 of the method of Figure 9, checking for multiple pairs of reference parameters and selecting and appropriate pair, and calculating the focus plane and f-Number from the respective reference parameters, if such calculation is necessary.
  • the sharpening unit 720 is configured to check for blur in the received image texture and depth representation, and to sharpen the image texture and/or depth representation if blur is present.
  • the synthesis unit 730 is configured to synthesise at least one view of the video content.
  • the blur dimensioning sub unit 740 of the filter unit 760 is configured to dimension a blur filter according to the received focal length and reference parameters.
  • the motion dimensioning sub unit 746 is configured to dimension a motion blur filter according to a received shutter speed and extracted parameters form the encoded view as discussed above.
  • the combination sub unit 748 is configured to combine the dimensioned blur and motion blur filters.
  • the application sub unit 750 is configured to apply the combined filter to the synthesised view or views.
  • Figure 13 illustrates another embodiment of decoder 800.
  • the decoder 800 comprises a processor 880 and a memory 890.
  • the memory 890 contains instructions executable by the processor 880 such that the decoder 800 is operative to conduct the steps of the method of Figure 9 described above.
  • the method of the present invention may be implemented in hardware, or as software modules running on one or more processors. The method may also be carried out according to the instructions of a computer program, and the present invention also provides a computer readable medium having stored thereon a program for carrying out any of the methods described herein.
  • a computer program embodying the invention may be stored on a computer-readable medium, or it could, for example, be in the form of a signal such as a downloadable data signal provided from an Internet website, or it could be in any other form.

Abstract

A method of encoding three-dimensional video content is disclosed. The method comprises encoding at least one view of the video content and at least part of a depth representation associated with the view (110), and defining a camera focal length for the encoded view(120). The method further comprises selecting at least one reference parameter for a focus plane of the encoded view (130), selecting at least one reference parameter for a camera f-Number for the encoded view (140), and transmitting the encoded at least one view and at least part of an associated depth representation, the defined focal length, the selected reference parameter for a focus plane of the encoded view and the selected reference parameter for a camera f- Number for the encoded view to a node (170). Further, a method for decoding three- dimensional video content, an encoder, a decoder,and a corresponding computer program product, are also disclosed.

Description

METHODS FOR ENCODING AND DECODING THREE-DIMENSIONAL VIDEO
CONTENT
Technical Field
The present invention relates to methods for encoding and decoding three- dimensional (3D) video content. The invention also relates to an encoder, a decoder and a computer program product configured to implement methods for encoding and decoding three-dimensional video content.
Background
Three-dimensional video technology continues to grow in popularity, and 3D technology capabilities in the entertainment and communications industries in particular have evolved rapidly in recent years.
3D technology provides an observer with an impression of depth in a compound image, causing parts of the image to appear to project out in front of a display screen, into what is known as observer space, while other parts of the image appear to project backwards into the space behind the screen, into what is known as CRT space. The term 3D is usually used to refer to a stereoscopic experience, in which an observer's eyes are provided with two slightly different images of a scene, which images are fused in the observer's brain to create the impression of depth. This effect is known as binocular parallax and provides an excellent 3D experience to a stationary observer, usually requiring the use of glasses or other filtering elements that enable the different images to be shown to the left and right eyes of an observer.
A new generation of "auto-stereoscopic" displays allows a user to experience three- dimensional video without glasses. These displays project slightly different views of a scene in different directions, as illustrated in Figure 1. A viewer located a suitable distance in front of the display will see slightly different pictures of the same scene in their left and right eyes, creating a perception of depth. In order to achieve smooth parallax and to enable a change of viewpoint as users move in front of the screen, a number of views (typically between 7 and 28 views) are generated. Auto-stereoscopic functionality is enabled by capturing or digitally generating a scene using many different cameras which observe a scene from different angles or viewpoints. These cameras generate what is known as multiview video.
Multiview video can be relatively efficiently encoded for transmission by exploiting both temporal and spatial similarities that exist in different views. However, even with multiview coding (MVC, MV-HEVC), the transmission cost for multiview video remains prohibitively high. To address this, current technologies only actually transmit a subset of key captured or generated multiple views, typically between 2 and 3 of the available views. To compensate for the missing information, depth or disparity maps are used to recreate the missing data. From the multiview video and depth/disparity information, virtual views can be generated at any arbitrary viewing position using view synthesis processes. These viewing positions are sometimes known as virtual cameras, and may be located between the transmitted key views (interpolated) or outside the range covered by the key views (extrapolated). In addition to the coding efficiency offered by view synthesis, the ability to generate views at more or less arbitrary positions means that the depth perception of a viewer may be changed or adjusted and depth perception may be matched to the size of the display screen on which the video will be shown. Many view synthesis techniques exist in the literature, depth image-based rendering (DIBR) being one of the most prominent.
A depth map, as used in DIBR, is simply a greyscale image of a scene in which each pixel indicates the distance between the corresponding pixel in a video object and the capturing camera optical centre. A disparity map is an intensity image conveying the apparent shift of a pixel which results from moving from one viewpoint to another. Depth and disparity are mathematically related, and the link between them can be appreciated by considering that the closer an object is to a capturing camera, the greater will be the apparent positional shift resulting from a change in viewpoint. A key advantage of depth and disparity maps is that they contain large smooth surfaces of constant grey levels, making them comparatively easy to compress for transmission using current video coding technology.
A 3D points cloud can be reconstructed from a depth map using the 3D camera parameters of the capturing camera. These parameters include the matrix K for a pinhole camera model, which contains the camera focal lengths, principal point, etc. The 3D points cloud can be reconstructed as follows: Q = d*(KR) "1 *q - R"1 T (Equation 1 )
Where q is a 2D point (expressed in the camera coordinate frame, in homogeneous coordinates), d is the point's associated depth (measured by a sensor for example) and Q is the corresponding 3D point in a 3D coordinate frame. R is the camera orientation and T the camera translation. R and T are linked by the relation P = K * [R T].
A 3D point Q(Qx,Qy,Qz) may thus be projected onto an image at the 2D location q of homogeneous coordinates (qx,qy,1) by the equation: q*d = P * Q (Equation 2)
A depth map can be measured by specialized cameras, including structured-light or time-of-flight (ToF) cameras, where the depth is correlated with the deformation of a projected light pattern or with the round-trip time of a pulse of light. A principle limitation of these depth sensors is the depth range they can measure: objects that are too close to or too far away from the device will have no depth information.
It will be appreciated from the above discussion that in order to conduct DIBR view synthesis, a number of parameters need to be signalled to the device or programme module that performs the view synthesis. Among those parameters are "Znear" and "Zfar", which represent the closest and the farthest depth values in the depth maps for the video frame under consideration. These values are needed in order to map the quantized depth map samples to the real depth values that they represent.
Capturing camera parameters are also required to conduct DIBR view synthesis, and these parameters are usually divided into two groups. The first group is internal camera parameters, representing the optical characteristics of the camera for the image taken. This includes the focal length, the coordinates of the image's principal point and the lens distortions. The second group, or external camera parameters, represent the camera position and the direction of its optical axis in the chosen real world coordinates (conveying the position of the cameras relative to each other and to the objects in the scene). Both internal and external camera parameters are required in view synthesis processes based on usage of the depth information (such as DIBR). There exist standardized methods for sending Znear, Zfar and camera parameters to the decoding module performing view synthesis. One of these methods is defined in the multi-view video coding (MVC) standard, which is defined in the annex H of the well-known advanced video coding (AVC) standard, also known as H.264. The scope of MVC covers joint coding of stereo or multiple views representing the scene from several viewpoints. The MVC standard also covers sending the camera parameters information to the decoder. The camera parameters are sent as a supplementary enhancement information (SEI) message. The syntax of this SEI message is shown in Table 1 below:
Table 1 : Multiview acquisition information SEI message syntax multiview_acquisition_info( payloadSize ) { C Descriptor num_views_minus1 ue(v) intrinsic_param_flag 5 u(1) extrinsic_param_flag 5 u(1) if ( instrinsic_param_flag ) {
intrinsic_params_equal 5 u(1)
prec_focal_length 5 ue(v)
prec_principal_point 5 ue(v)
prec_skew_factor 5 ue(v) if( intrinsic_params_equal )
num_of_param_sets = 1
else
num_of_param_sets = num_views_minus1 + 1 for( i = 0; i < num_of_param_sets; i++ ) {
sign_focal_length_x[ i ] 5 u(1)
exponent_focal_length_x[ i ] 5 u(6)
mantissa_focal_length_x[ i ] 5 u(v)
sign_focal_length_y[ i ] 5 u(1)
exponent_focal_length_y[ i ] 5 u(6)
mantissa_focal_length_y[ i ] 5 u(v)
sign_principal_point_x[ i ] 5 u(1)
exponent_principal_point_x[ i ] 5 u(6)
mantissa_principal_point_x[ i ] 5 u(v)
sign_principal_point_y[ i ] 5 u(1)
exponent_principal_point_y[ i ] 5 u(6) mantissa_principal_point_y[ i ] 5 u(v)
sign_skew_factor[ i ] 5 u(1)
exponent_skew_factor[ i ] 5 u(6)
mantissa_skew_factor[ i ] 5 u(v)
}
}
if( extrinsic_param_flag ) {
prec_rotation_param 5 ue(v)
prec_translation_param 5 ue(v) for( i = 0; i <= num_views_minus1 ; i++) {
for ( j = 1 ; j <= 3; j++) { /* row
for ( k = 1 ; k <= 3; k++) { /* column 7
sign_r[ i ][ j ][ k ] 5 u(1)
exponent_r[ i ][j ][ k ] 5 u(6)
mantissa_r[ i ][ j ][ k ] 5 u(v)
}
sign_t[ i ][ j ] 5 u(1)
exponent_t[ i ][ j ] 5 u(6)
mantissa_t[ i ][ j ] 5 u(v)
}
}
}
}
The camera parameters in Table 1 are sent in floating point representation, which offers high precision as well as supporting a high dynamic range of parameters. Tables 2 and 3 show an example Sequence Parameter Set (SPS) message for sending Znear and Zfar values associated with a depth map:
Table 2: Sequence parameter set 3D-AVC extension syntax seq_parameter_set_3davc_extension( ) { C Descriptor if( NumDepthViews > 0 ) {
3dv_acquisition_idc 0 ue(v) for( i = 0; i < NumDepthViews; i++ )
view_id_3dv[ i ] 0 ue(v) if( 3dv_acquisition_idc ) {
depth_ranges( NumDepthViews, 2, 0 )
vsp_param( NumDepthViews, 2, 0 )
}
reduced_resolution_flag 0 u(1) slice_header_prediction_flag 0 u(1) seq_view_synthesis_flag 0 u(1) nonlinear_depth_representation_num 0 ue(v) for( i = 1 ; i <= nonlinear_depth_representation_num; i++ )
nonlinear_depth_representation_model[ i ] 0 ue(v)
}
alc_sps_enable_flag 0 u(1) enable_rle_skip_flag 0 u(1)
}
Table 3. Depth ranges syntax
Figure imgf000007_0001
nonlinear_depth_representation_model[ i ] specifies the piecewise linear segments for mapping of decoded luma sample values of depth views to a scale that is uniformly quantized in terms of disparity.
Variable Depthl_UT[ i] for i in the range of 0 to 255, inclusive, is specified as follows: depth_nonlinear_representation_model[ 0] = 0
depth_nonlinear_representation_model[depth_nonlinear_representation_num + for( k=0; k<= depth_nonlinear_representation_num_minus1 + 1; ++k ) {
posl = C 255 * k ) / (depth_nonlinear_representation_num_minus1 + 2 ) devl = depth_nonlinear_representation_model[ k ]
pos2 = C 255 * C k+1 ) ) / (depth_nonlinear_representation_num_minus1 + 2 ) ) dev2 = depth_nonlinear_representation_model[ k+1 ]
x1 = posl - devl
y1 = posl + devl
x2 = pos2 - dev2
y2 = pos2 + dev2
for (x = max( x1, 0 ); x <= min( x2, 255 ); ++x )
DepthLUT[x ] = Clip3( 0, 255, Round( ( ( x - x1 ) * ( y2 - y1 ) ) ÷ ( x2 - x1 ) + y1 ) )
}
When depth_representation_type is equal to 3, Depthl_UT[ dS ] for all decoded luma sample values dS of depth views in the range of 0 to 255, inclusive, represents disparity that is uniformly quantized into the range of 0 to 255, inclusive.
The present specification discusses the case of 1 D linear camera arrangement with cameras pointing at directions parallel to each other. The z axis and camera centers have the same x and y coordinates, with only the x coordinate changing from camera to camera. This is a common camera setup for stereoscopic and "3D multiview" video. The so-called "toed-in" or general case camera setup, in which the cameras are not aligned, can be converted to the 1 D linear camera setup by the rectification process. The distance between two cameras in stereo/3D setup is usually called the baseline (or the baseline distance). In a stereo camera setup, the baseline is often approximately equal to the distance between the human eyes (normally about 6.5 centimeters) in order to achieve natural depth perception when showing these left and right pictures on a stereo screen. Other baseline values may be chosen depending on the scene characteristics, camera parameters and the intended stereo effects.
The present specification refers to a baseline as the distance between the cameras for the left and the right views in the units of the external (extrinsic) camera coordinates. In the case of a stereo screen, the baseline is the distance between the virtual (or real) cameras used to obtain the views for the stereo-pair. In the case of a multi-view screen, the baseline is considered as the distance between two cameras that the left and the right eyes of a spectator see when watching the video on the multiview screen at the preferred viewing distance. In case of a multi-view screen, the views (cameras) seen by the left and the right eyes of the viewer may not be consecutive views. However, this information is known to the display manufacturer and can be used in the view synthesis process. The baseline is not therefore the distance between the two closest generated views, as it may be greater than this distance, depending upon the particular setup being used.
JCT-3V is an on-going standardization work where multiview texture (normal 2D videos) and depth maps are being compressed and transmitted using the MV-HEVC or 3 D-HEVC future codecs.
View synthesis techniques such as DIBR thus address many of the problems associated with providing multiview three-dimensional video content. However, view synthesis can encounter difficulties in rendering views in which part of the content is blurred.
The depth of field (DoF) of an image corresponds to the distance between the nearest DN and farthest DF objects in a scene that appear acceptably sharp. Acceptably sharp may be defined with reference to criteria relating to the capturing or display equipment, and may for example comprise all areas of an image where the extent of blur for an image point is less than the pixel diameter of the capturing or display equipment. DoF is thus defined as DF - DN. A small DoF corresponds to an image in which significant parts of the foreground and/or background image texture are blurred. Encompassed within the DoF, between DF and DN, is the focus plane s, which is the depth at which the content of the image is sharpest. Current techniques for synthesising views of an image in which part of the texture is blurred involve blurring the depth map with the same amount of blur as appears in the image texture, and then conducting a normal DIBR process. These techniques work relatively well when only the far background of the image is out of focus (i.e. outside the DoF). However, current techniques do not work properly for images having a small DoF, or blurred content close to the camera. Figures 2a and 2b illustrate synthesis results for a blurred image texture in which the colours (y axis) of object f and object e at different locations along the x axis have blurred into each other. As seen in Figure 2b, if the depth map is not blurred, a distinct color leap (case B) or repetitive color (case A) will appear in the synthesized image. As seen in Figure 2a, if the depth map is blurred, a wider (case A) or more constricted (case B) blur will be generated in the synthesized image, thus generating a non-natural effect DoF. In extreme rendering conditions, with a virtual camera located relatively far away from the real reference camera, in case A, the blur will act as a linear smoothing and in case B it will act as a sharp edge, resulting in a unwanted artifact. Summary
It is an aim of the present invention to provide a method, apparatus and computer program product which obviate or reduce at least one or more of the disadvantages mentioned above.
According to a first aspect of the present invention, there is provided a method of encoding three-dimensional video content, the method comprising encoding at least one view of the video content and at least part of a depth representation associated with the view, and defining a camera focal length for the encoded view. The method further comprises selecting at least one reference parameter for a focus plane of the encoded view, selecting at least one reference parameter for a camera f-Number for the encoded view, and transmitting the encoded at least one view and at least part of an associated depth representation, the defined focal length, the selected reference parameter for a focus plane of the encoded view and the selected reference parameter for a camera f-Number for the encoded view to a node. In some examples, the node may be a decoder.
In some examples, the depth representation maybe a depth or disparity map. In other examples, the depth representation may be a dense (comprising a matrix) or sparse (comprising sets) representation. The representation may be deduced from a 3D model or a previously reconstructed depth map projected onto the camera view or may be estimated from multiple camera views.
In some examples, defining a camera focal length may comprise extracting a focal length of a camera used to capture the encoded view. In other examples, defining a camera focal length may comprise defining a focal length of a virtual camera used to digitally generate the encoded view.
According to an embodiment of the invention, the reference parameter for a focus plane of the encoded view may comprise a location of a focus plane for the encoded view. In some examples, the location of the focus plane may comprise the actual location of the focus plane in captured video content. In other examples, the location of the focus plane may comprise a selected location for the focus plane for captured or digitally generated video content.
According to an embodiment of the invention, the reference parameter for a focus plane of the encoded view may comprise a look-up table index corresponding to a focus plane of the encoded view.
In some examples, more than one view of the video content may be encoded. In such examples, an identification of the view to which the look-up table index applies may be included as part of the reference parameter for a focus plane of the video content. According to an embodiment of the invention, the reference parameter for a focus plane of the encoded view may comprise a distance between a recording surface and an optical system of a camera for the encoded view. The distance may permit calculation of the focus plane location for the encoded view. According to an embodiment of the invention, the reference parameter for a focus plane of the encoded view may comprise the distances of the nearest and farthest objects in a scene of the encoded view that fulfil criteria for an acceptable focus level. The acceptable focus level may correspond to a measured focus level or to a desired focus level, for example in digitally generated content.
According to an embodiment of the invention, the reference parameter for a focus plane of the video content may comprise look-up table indexes corresponding to the distances of the nearest and farthest objects in a scene of the encoded view that fulfil criteria for an acceptable focus level.
The nearest and farthest in-focus depths may allow estimation of a location of the focus plane for the encoded view. In some examples, more than one view of the video content may be encoded. In such examples, an identification of the view to which the look-up table index applies may be included as part of the reference parameter for a focus plane of the video content. According to an embodiment of the invention, the reference parameter for a camera f- Number for the encoded view may comprise a camera f-Number for the encoded view.
In some examples, the camera f-Number may be a camera f-Number of a capturing camera of the video content. In other examples, the camera f-Number may be a selected camera f-Number for captured or digitally generated video content.
According to an embodiment of the invention, the reference parameter for a camera f- Number for the encoded view may comprise a camera aperture diameter. The camera aperture diameter may allow calculation of the camera f-Number.
According to an embodiment of the invention, transmitting the selected reference parameters may comprise transmitting at least one of the selected reference parameters in floating point representation.
According to another embodiment of the invention, transmitting the selected reference parameters may comprises transmitting at least one of the selected reference parameters in unsigned integer representation. According to an embodiment of the invention, the selected reference parameters for focus plane and camera f-Number may correspond to a first depth of focus, and the method may further comprise selecting at least one additional reference parameter for a focus plane of the encoded view, selecting at least one additional reference parameter for a camera f-Number for the encoded view, and transmitting the selected additional reference parameters with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.
According to an embodiment of the invention, the video content may comprise captured video content, and the selected reference parameters for focus plane and camera f- Number may correspond to an actual focus plane and camera f-Number of a capturing camera.
According to an embodiment of the invention, the selected reference parameters for focus plane and camera f-Number may correspond to a selected focus place and camera f-Number for one of a capturing camera or a virtual camera. According to an embodiment of the invention, the method may further comprise selecting a shutter speed for the video content and transmitting the shutter speed with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.
In some examples, the shutter speed may be the actual speed for captured video content. In other examples, the shutter speed may be a selected speed for captured or for digitally generated content. According to an embodiment of the invention, the method may further comprise selecting a shutter shape for the video content and transmitting a parameter representing the shutter shape with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation. In some examples, the shutter shape may be the actual shape for captured video content. In other examples, the shutter shape may be a selected shape for captured or for digitally generated content.
According to an embodiment of the invention, transmitting the selected reference parameters to the node may comprise including the selected reference parameters in a supplementary enhancement information, SEI, message.
According to an embodiment of the invention, transmitting the selected reference parameters to the node may comprise including the selected reference parameters in the multiview_acquisition_info SEI message.
According to another embodiment of the invention, transmitting the selected reference parameters to the node may comprise including the selected reference parameters in a dedicated SEI message.
According to another aspect of the present invention, there is provided a method for decoding three-dimensional video content, comprising receiving: at least one encoded view of the video content and at least part of a depth representation associated with the view, a camera focal length for the encoded view, at least one reference parameter for a focus plane of the encoded view, and at least one reference parameter for a camera f-Number for the encoded view. The method further comprises synthesising at least one view of the video content, dimensioning a blur filter according to the received focal length and reference parameters and applying the dimensioned blur filter to the synthesised view. In some examples, synthesising at least one view may comprise at least partially decoding the received encoded view
In some examples, the received reference parameters may comprise focus plane and f-Number values. In other examples, the received reference parameters may comprise other values permitting calculation of the focus plane and f-Number values, for example a distance between a recording surface and an optical system of a camera and a camera aperture diameter.
In some examples, the method may comprise receiving a plurality of reference parameters for focus plane and camera f-Number of the encoded view, and the method may further comprise selecting a reference parameter for a focus plane of the encoded view and a reference parameter for a camera f-Number of an encoded view. The reference parameters may be selected according to at least one of display or viewing conditions for the three-dimensional video content.
According to an embodiment of the invention, dimensioning a blur filter according to the received focal length and reference parameters may comprise at least one of: calculating a focus plane from the received reference parameter for a focus plane; and calculating a camera f-Number from the received reference parameter for a camera f- Number.
According to an embodiment of the invention, synthesising at least one view of the video content may comprise applying a Depth Image Based Rendering, DIBR, process to the encoded view of the video content. In some examples, an inpainting process may be conducted as part of the DIBR process, for example if the synthesized view has disocclusions.
According to an embodiment of the invention, the method may further comprise receiving a parameter representing shutter shape for the video content and applying the shutter shape outline corresponding to the received parameter to the dimensioned blur filter. According to another embodiment of the invention, the method may further comprise receiving a shutter speed for the video content and dimensioning a motion blur filter according to the received shutter speed. According to such an embodiment, applying the dimensioned blur filter to the synthesised view may comprise combining the dimensioned blur filter with the dimensioned motion blur filter and applying the combined blur filter to the synthesised view.
In some examples, dimensioning a motion blur filter may comprise calculating a direction and a length of motion blur. In some embodiments of the invention, motion and blur filters may be applied separately.
According to some embodiments, the method may further comprise estimating a motion blur direction from a motion model.
According to an embodiment of the invention, the received encoded view may include blur, and the method may further comprise sharpening the received encoded view according to the received focal length and reference parameters before synthesising the at least one view of the video content.
In some examples, sharpening may comprise applying a deblurring filter dimensioned according to the received focal length and parameters, for example using a Wiener deconvolution. According to an embodiment of the invention, the method may further comprise applying a sharpening process to a depth map of the received view before synthesising the at least one view of the video content. In some examples, the sharpening process may comprise applying at least one median filter to the depth map. According to another aspect of the present invention, there is provided a computer program product configured, when run on a computer, to execute a method according to any one of the preceding claims.
According to another aspect of the present invention, there is provided an encoder configured for encoding three-dimensional video content, the encoder comprising a processor and a memory, the memory containing instructions executable by the processor whereby the encoder is operative to encode at least one view of the video content and at least part of a depth representation associated with the view, define a camera focal length for the encoded view, select at least one reference parameter for a focus plane of the encoded view, select at least one reference parameter for a camera f-Number for the encoded view and transmit the encoded at least one view and at least part of an associated depth representation, the defined focal length, the selected reference parameter for a focus plane of the encoded view and the selected reference parameter for a camera f-Number for the encoded view to a node. The node may in some examples be a decoder.
According to an embodiment of the invention, the encoder may be further operative to select a reference parameter for a focus plane of the encoded view which comprises a location of a focus plane for the encoded view. According to another embodiment of the invention, the encoder may be further operative to select a reference parameter for a focus plane of the encoded view which comprises a look-up table index corresponding to a focus plane of the encoded view.
According to another embodiment of the invention, the encoder may be further operative to select a reference parameter for a camera f-Number for the encoded view which comprises a camera f-Number for the encoded view.
According to another embodiment of the invention, the encoder may be further operative to transmit at least one of the selected reference parameters in floating point representation.
According to an embodiment of the invention, the encoder may be further operative to select a shutter speed for the video content and transmit the shutter speed with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.
According to another aspect of the present invention, there is provided a decoder, the decoder comprising a processor and a memory, the memory containing instructions executable by the processor whereby the decoder is operative to receive: at least one encoded view of the video content and at least part of a depth representation associated with the view, a camera focal length for the encoded view, at least one reference parameter for a focus plane of the encoded view and at least one reference parameter for a camera f-Number for the encoded view. The decoder is further operative to synthesise at least one view of the video content, dimension a blur filter according to the received focal length and reference parameters and apply the dimensioned blur filter to the synthesised view.
According to an embodiment of the invention, the decoder may be further operative to receive a shutter speed for the video content and dimension a motion blur filter according to the received shutter speed. According to this embodiment, applying the dimensioned blur filter to the synthesised view may comprise combining the dimensioned blur filter with the dimensioned motion blur filter and applying the combined blur filter to the synthesised view.
Brief description of the drawings
For a better understanding of the present invention, and to show more clearly how it may be carried into effect, reference will now be made, by way of example, to the following drawings in which: Figure 1 is a representation of multiview 3D display;
Figures 2a and 2b are graphs illustrating the result of view synthesis with blurred texture and depth map (Figure 2a) and with blurred texture only (Figure 2b); Figure 3 is a flow chart illustrating process steps in a method for encoding three- dimensional video content;
Figure 4 is a flow chart illustrating additional steps that may be conducted as part of the method of Figure 3.
Figure 5 is a block diagram illustrating functional units in an encoder;
Figure 6 is a block diagram illustrating another embodiment of encoder; Figure 7 is a flow chart illustrating process steps in a method for decoding three- dimensional video content; Figure 8 is a block diagram illustrating functional units in a decoder;
Figure 9 is a flow chart illustrating additional steps that may be conducted as part of the method of Figure 7;
Figure 10 is a graph illustrating motion blur path estimation;
Figure 1 1 is a graph illustrating the result of view synthesis with all in focus texture and depth maps.
Figure 12 is a block diagram illustrating another embodiment of decoder; and Figure 13 is a block diagram illustrating another embodiment of decoder.
Detailed Description
Aspects of the present invention address the issues of synthesising views of three- dimensional video content containing blur by transmitting to the decoder information allowing for the dimensioning of a blur filter, which may be applied as a post processing step following view synthesis. The transmitted parameters allow for the creation of a filter that generates a desired amount of blur in desired regions of the image. For original video content containing blur, the transmitted parameters may allow for the accurate recreation of the original blur in synthesised views. For original video content that is all in focus or has been deblurred, the transmitted parameters may be selected by a creator of the content to ensure that synthesised views are blurred according to the creator's intensions. Aspects of the present invention may thus allow a creator or supplier of three-dimensional video content to maintain control over how the content is displayed.
Figure 3 illustrates steps in a method 100 for encoding three-dimensional video content according to an embodiment of the present invention. In a first step 1 10, the method comprises encoding at least one view of the video content and at least part of a depth representation associated with the view. The method then comprises, at step 120, defining a camera focal length for the encoded view. The method further comprises, at step 130, selecting at least one reference parameter for a focus plane of the encoded view, and at step 140 selecting at least one reference parameter for a camera f- Number for the encoded view. Finally, at step 170, the method comprises transmitting the encoded at least one view and at least part of an associated depth representation, the defined focal length, the selected reference parameter for a focus plane of the encoded view and the selected reference parameter for a camera f-Number for the encoded view to a node.
Step 1 10 of the method comprises encoding one or more views of the three- dimensional video content. The at least part of a depth representation associated with the view, and which is encoded with the view, may for example be a depth map for the view or may take other forms. At least part of the depth representation is encoded with the view to enable view synthesis to be conducted at a decoder.
Step 120 of the method may comprise extracting the camera focal length of a capturing camera for the encoded view, or may comprise defining a focal length for the encoded view. For example, in the case of digitally generated three-dimensional video content, there is no physical capturing camera, and the focal length of a virtual capturing camera may be defined according to the requirements of the creator of the content. Step 130 of the method comprises selecting at least one reference parameter for a focus plane of the encoded view. The reference parameter may take a range of different forms according to different embodiments of the invention. In one embodiment the reference parameter for a focus place of the encoded view may comprise the location of the focus plane s. The units of s may be the same as the units used for the external camera parameters; baseline, Znear and Zfar. However in some circumstances it may be more practical to send the value of s in standard length units of metres or feet.
In another embodiment, the reference parameter for a focus plane of the encoded view may comprise a LUT (Look-up table) index for the focus plane in the encoded view. According to this embodiment, the value for the focus plane s may then be determined by performing the LUT operation: s = DepthLUT[received index value]. As discussed above, in some examples of the invention, more than one view may be encoded for transmission. In such cases, a view identifier may also be transmitted as part of the reference parameter, indicating which view the LUT index applies to. In another embodiment, the reference parameter for a focus plane of the encoded view may comprise a distance between a recording surface and an optical system of a camera for the encoded view. In the case of a charge coupled device, this distance is known as the CCD distance g. A corresponding distance may be defined for other camera types. The focus plane s may be calculated from the CCD distance g using the equation:
1/f = 1/g + 1/s (Equation 3) Similar equations may be developed for equivalent distances in other camera types. The CCD distance g or equivalent may be sent in the same units as are used for the Znear and Zfar depth values. Alternatively, the CCD distance g or equivalent may be sent in units of millimetres or inches. It is useful to ensure that the parameter that is sent for the distance is measured in the same units as the actual distance.
In another embodiment, the reference parameter for a focus plane of the encoded view may comprise the depths DN and DF of the nearest and farthest objects in a scene that appear acceptably sharp. Acceptably sharp may be defined with reference to the capturing or display conditions. The focus plane s may be approximated from DN and DF using the equations: s = DN * H / (H - DN) (Equation 4) or s = DF * H / (H + DF) (Equation 5)
In another embodiment, the reference parameter for a focus plane of the encoded view may comprise LUT (Look-up table) indexes for DN and DF in the encoded view. In the case of multiple encoded views, a view identifier may also be included.
Step 140 of the method comprises selecting at least one reference parameter for a camera f-Number for the encoded view. In one embodiment, the reference parameter for the camera f-Number for the encoded view may be the camera f-Number. In another embodiment, the reference parameter for the camera f-Number may be a camera aperture diameter d. The camera f-Number N may then be calculated using the following equation:
N = f / d (Equation 6) where f is the camera focal length. The unit of measurement of the camera aperture diameter can be the same as the units of the focus plane. However it may be more practical to send the aperture diameter in units of millimeters or inches. It is useful to ensure that the aperture diameter as selected and transmitted is measured in the same units as the actual aperture diameter.
Step 170 of the method comprises transmitting the encoded at least one view and at least part of an associated depth representation, focal length and selected parameters to a node, which may for example be a decoder.
Transmission of the selected reference parameters may be conducted in a number of different formats according to different embodiments of the invention. In one embodiment, the selected reference parameters are transmitted in a dedicated SEI message. In another embodiment, the selected reference parameters are transmitted as part of the standard multiview_acquisition_info SEI message. In other embodiments, the selected parameters may be sent as part of other SEI messages or any other message signalling to a decoder. The parameters may be sent in floating point representation, in unsigned integer representation or in another appropriate format.
It will be appreciated that various combinations of the above discussed embodiments may be contemplated. The following examples illustrate different transmission methods for different combinations of reference parameters. Example 1
Focus plane s and camera f-Number N are transmitted in a dedicated SEI message in floating point representation (in the same format as is used in sending camera parameters in the multiview_acquisition_info message in MVC): focus_info( payloadSize ) { Descriptor prec_focus_plane 5 ue(v) prec_aperture_f_number 5 ue(v) exponent_focus_plane 5 u(6) mantissa_focus_plane 5 u(v) exponent_aperture_f_number 5 u(6) mantissa_aperture_f_number 5 u(v)
}
Example 2
Camera f-Number is transmitted using floating point representation (in the same format that is used in sending camera parameters in the multiview_acquisition_info message in MVC). The reference parameter for the focus plane is the LUT index focus_plane_depth corresponding to the focus plane of an encoded view. Multiple views are encoded for transmission so the reference parameter for focus plane also comprises a view identifier focus_plane_view_id to indicate which view the LUT index applies to:
Figure imgf000022_0001
The focus plane may then being found at the decoder by the operation s = DepthLUT[focus_plane_depth], where DepthLUT is computed for the view identified by focus_plane_view_id.
In an alternative version of Example 3, only a single view may be sent, in which case there is no advantage to sending a view identifier as the index may be assumed to apply to the single view that is transmitted. In this case, the look up table index may be added directly into the multiview_acquisition_info message, in the loop "for( i = 0; i <= num_views_minus1 ; i++) { focus_plane_depth [i]}".
Example 3 Focus plane s is sent in floating point representation and the camera aperture diameter d is sent in floating point representation:
Figure imgf000023_0001
The aperture f-Number N may then be calculated at the decoder using equation 6 above.
Example 4
Focus plane s is sent in floating point representation and the camera aperture diameter is sent in unsigned integer representation:
Figure imgf000023_0002
Example 5
Focus plane s and camera aperture f-Number N are sent in floating point representation in the multiview_aquisition_info message: multiview_acquisition_info( payloadSize ) { C Descriptor num_views_minus1 ue(v) intrinsic_param_flag 5 u(1) extrinsic_param_flag 5 u(1) aperture_f_number_flag 5 u(1) focus_plane_flag 5 u(1) if ( instrinsic_param_flag ) {
intrinsic_params_equal 5 u(1) prec_focal_length 5 ue(v) prec_principal_point 5 ue(v) prec_skew_factor 5 ue(v) if( intrinsic_params_equal )
num_of_param_sets = 1
else
num_of_param_sets = num_views_minus1 + 1
for( i = 0; i < num_of_param_sets; i++ ) {
sign_focal_length_x[ i ] 5 u(1) exponent_focal_length_x[ i ] 5 u(6) mantissa_focal_length_x[ i ] 5 u(v) sign_focal_length_y[ i ] 5 u(1) exponent_focal_length_y[ i ] 5 u(6) mantissa_focal_length_y[ i ] 5 u(v) sign_principal_point_x[ i ] 5 u(1) exponent_principal_point_x[ i ] 5 u(6) mantissa_principal_point_x[ i ] 5 u(v) sign_principal_point_y[ i ] 5 u(1) exponent_principal_point_y[ i ] 5 u(6) mantissa_principal_point_y[ i ] 5 u(v) sign_skew_factor[ i ] 5 u(1) exponent_skew_factor[ i ] 5 u(6) mantissa_skew_factor[ i ] 5 u(v)
}
}
if( extrinsic_param_flag ) {
prec_rotation_param 5 ue(v) prec_translation_param 5 ue(v) for( i = 0; i <= num_views_minus1 ; i++) {
for ( j = 1 ; j <= 3; j++) { /* row
for ( k = 1 ; k <= 3; k++) { /* column 7
sign_r[ i ][ j ][ k ] 5 u(1) exponent_r[ i ][j ][ k ] 5 u(6)
mantissa_r[ i ][ j ][ k ] 5 u(v)
}
sign_t[ i ][ j ] 5 u(1) exponent_t[ i ][ j ] 5 u(6) mantissa_t[ i ][ j ] 5 u(v)
}
}
}
}
if(aperture_f_number_flag ) {
prec_aperture_f_number 5 ue(v) exponent_aperture_f_number 5 u(6) mantissa_aperture_f_number 5 u(v)
}
if(focus_plane_flag ) {
prec_focus_plane 5 ue(v) exponent_focus_plane 5 u(6) mantissa_focus_plane 5 u(v)
}
}
Example 6 The camera f-Number reference parameter is sent using any of the above discussed methods and DN and DF are sent using floating point representation: focus_plane _info( payloadSize ) { C Descriptor prec_focus_plane 5 ue(v) prec_aperture_f_number 5 ue(v) exponent_focus_plane 5 u(6) mantissa_focus_plane 5 u(v) exponent_aperture_f_number 5 u(6) mantissa_aperture_f_number 5 u(v) exponent_nearest_plane_focused 5 u(6) mantissa_nearest_plane_focused 5 u(v) exponent_farthest_plane_focused 5 u(6) mantissa_farthest_plane_focused 5 u(v)
}
Or using unsigned integer representation:
Figure imgf000026_0001
Example 7
CCD distance g is sent in floating point representation and camera f-Number is sent in floating point representation:
Figure imgf000026_0002
It will be appreciated that the above discussed examples are merely illustrative of possible combinations of reference parameters and transmission methods. Other combinations may be envisaged. In further examples, more than one pair of reference parameters for focus plane and camera f-Number may be selected and transmitted. A single reference parameter for each of focus plane and f-Number corresponds to a particular Depth of Focus in a synthesised view. In some examples, it may be desirable to send parameters corresponding to a range of different Depths of Field in a single SEI message. A decision may then be made on the decoding side as to which Depth of Field, and hence which pair of reference parameters, is most appropriate for the display and viewing conditions:
Figure imgf000027_0001
The selected reference parameters may correspond to actual values for the focus plane and camera f-Number in the encoded view, for example in the case of captured three-dimensional video content. In such cases the focus plane may be provided, for example by an autofocus system or by a user. Alternatively, the focus plane may be estimated using a software-based autofocus technique (which may involve estimating the amount of blur and taking the depth of the sharpest areas as the focus plane). In such cases the aperture f-number may also be provided by the lens, by the camera or by the user. It is generally more complicated to estimate the camera f-Number using only the image. In other examples, the reference parameters may be selected for the view in order to create a desired blur effect. This may be the case for example in fully digitally generated content but may also be the case for captured video content. For example, when video content is captured all in focus but it is desired for artistic or other reasons to introduce blur, parameters may be selected to ensure generation of blur to the extent and in the regions of the image required. Figure 4 illustrates possible additional steps in the method 100, which may be conducted during the encoding of three-dimensional video content. With reference to Figure 4, the method 100 may also comprise a step 150 of selecting a shutter speed for the video content, and step 160 of selecting a shutter shape for video content. The selected shutter speed and a parameter representing the selected shutter shape may then be transmitted with the selected reference parameters, focal length and encoded view and depth representation in a modified step 170B.
Camera shutter speed (or exposure time) may be used at the decoder side to create camera motion blur. Detail of this procedure is discussed below with reference to Figures 9 and 10. The selected shutter speed may be the actual shutter speed of a capturing camera for the video content, or may be a selected shutter speed for example of a virtual camera applied to digitally generated content. The motion blur created using the transmitted shutter speed may thus recreate actual motion blur or introduce a desired motion blur. In one example, shutter speed may be sent using floating point representation in an SEI message:
Figure imgf000028_0001
The unit of the shutter speed may be seconds, milliseconds of other time units.
In another example, the inverse of the shutter speed may be sent using unsigned integer representation:
Figure imgf000028_0002
The unit of this inverse shutter speed may be 1 /seconds.
Camera shutter shape (or Bokeh shape) may be used at the decoding side in the dimensioning of a blur filter to accurately represent blur from a capturing or virtual camera. The shape corresponds to the actual shape of a camera shutter and may be a perfect disk, triangle, square, hexagon, octagon, etc.
In one example, shutter shape may be included with focus plane s and camera f- Number N as follows:
Figure imgf000029_0001
The parameter shutter_shape_type has no unit but may correspond to a predetermined set of values, for example:
0 for SHUTTER_SHAPE_DISK,
1 for SHUTTER_SHAPE_TRIANGLE
2 for SHUTTER_SHAPE_SQUARE
3 for SHUTTER_SHAPE_HEXAGON etc.
The value of shutter_shape_type defines the shape (outline) of a blur filter to be used in the blurring process at the decoder side, as discussed below. In one example, three bits may be used to signal shutter shape, supporting up to 8 different shutter shapes. Other examples using fewer or more bits may be envisaged.
Figure 5 illustrates functional units of an encoder 200 in accordance with an embodiment of the invention. The encoder 200 may execute the steps of the method 100, for example according to computer readable instructions received from a computer program.
With reference to Figure 5, the encoder 200 comprises an encoding unit 210, a selection unit 230 and a transmission unit 270. It will be understood that the units of the apparatus are functional units, and may be realised in any appropriate combination of hardware and/or software.
The encoding unit 210 is configured to encode at least one view of the video content and at least part of a depth representation associated with the view, for example a depth map. The selection unit 230 is configured to define a camera focal length for the encoded view, select at least one reference parameter for a focus plane of the encoded view and select at least one reference parameter for a camera f-Number for the encoded view. The transmission unit 270 is configured to transmit the encoded at least one view and at least part of an associated depth representation, focal length and selected reference parameters to a node. In some embodiments, the selection unit 230 may also be configured to select a shutter shape and shutter speed for the encoded view, and the transmission unit 270 may be configured to transmit the shutter speed and a parameter representing the shutter shape with the encoded view and other elements.
Figure 6 illustrates another embodiment of encoder 300. The encoder 300 comprises a processor 380 and a memory 390. The memory 390 contains instructions executable by the processor 380 such that the encoder 300 is operative to conduct the steps of the methods of Figures 3 and 4 described above.
Figure 7 illustrates steps in a method 400 for decoding three-dimensional video content in accordance with an embodiment of the present invention. In a first step 410 the method comprises receiving at least one encoded view of the video content and at least part of a depth representation associated with the view, a camera focal length for the encoded view, at least one reference parameter for a focus plane of the encoded view, and at least one reference parameter for a camera f-Number for the encoded view. The method then comprises, at step 430, synthesising at least one view of the video content. At step 440, the method comprises dimensioning a blur filter according to the received focal length and reference parameters and finally at step 450, the method comprises applying the dimensioned blur filter to the synthesised view.
The encoded view and depth representation, focal length and reference parameters may be received at step 410 from an encoder operating according to the methods of Figures 3 and 4. Synthesising at least one view of the video content at step 430 may comprise at least partially decoding the received view or views and associated depth representation, and then running a view synthesis process, such as DIBR to synthesise the required view or views.
Step 440 of the method then comprises dimensioning a blur filter according to the received reference parameters, and step 450 comprises applying the dimensioned blur filter to the synthesised view. As discussed above, the Depth of Field of an image in a view corresponds to the distance between the nearest DN and farthest DF objects in a scene that appear acceptably sharp. Outside the Depth of Field, an image is blurred to a greater or lesser extent. Dimensioning a blur filter using received reference parameters, and applying the dimensioned blur filter to a synthesized view, allows the creation of blur in parts of the synthesized view to create a Depth of Field in the synthesized view that corresponds to the received parameters. The received parameters may correspond to capture conditions, resulting in a Depth of Field may in the synthesized view that matches the Depth of Field in the original captured views. Alternatively, the received parameters may have been selected to ensure the creation of a particular Depth of Field desired by a creator of the original content. The process of dimensioning a blur filter according to step 440 of the method, and applying the blur filter according to step 450, is discussed below.
Figure imgf000031_0001
and can be approximately calculated with:
2Hs2
DDF af , " for s < H (Equation 7)
IP -
Where s is the focus plane and H is the hyperfocal of the lens. The hyperfocal is defined as: (Equation 8)
Nc
Where f is the focal length, N the aperture f-number and c, the circle of confusion, c is typically considered to be equal to the pixel size of the optical sensor. As discussed above, the camera f-Number N is defined as: N = f / d (Equation 9)
Where f is the focal length and d is the aperture diameter. For a certain image depth, the amount of blur in an image is characterized by a blur diameter b, such that a pixel in the image appears as a blur disc of diameter b.
If a subject is at distance s and the foreground or background is at distance D, the distance xd between the subject and the foreground or background is indicated by
10
¾ = |D—
Figure imgf000032_0001
(Equation 10)
The blur disk diameter b of a detail at distance xd from the subject can be expressed as a function of the subject magnification ms according to: b =— = dm,— (Equation 1 1)
V s ± ?¾ U
The minus sign applies to a foreground object, and the plus sign applies to a background object and where ms is defined as:
on
f
/».·,. -— (Equation 12)
When a depth map Z associated to an all-in-focus image I is provided, all depth values (D) are known. A blur filter F(Z) may then be applied to each pixel of the image I by the following convolution:
J = I * F(D) (Equation 13)
The diameter b of the filter is calculated for each pixel according to the pixel's depth within the image. The resulting output image J has a Depth of Focus according to the parameters used to dimension the blur filter F(Z). The blur filter may be a Gaussian filter or may have a specific shape (such as a disk, triangle, square, hexagon etc.) in order to mimic a specific aperture shape. It can be seen from the above discussion that a blur filter can be dimensioned using Equation 11 from the focal length f, focus plane s and camera f-Number N. The focal length f and reference parameters for the focus plane s and f-Number N are received in step 410 of the method of Figure 7, and used in step 440 to dimension the blur filter which is then applied to the synthesised view in step 450. The resulting Depth of Field in the synthesised view corresponds to the received parameters, thus ensuring that the synthesised view appears as desired by the creator of the original content. The depth of field in the synthesized view may match that of the original content, or may be imposed on the content. If multiple pairs of parameters are received, the most appropriate pairs for the display and viewing conditions may be selected, ensuring that the synthesised view appears as best suited to the display and viewing conditions.
Figure 8 illustrates functional units of a decoder 600 in accordance with an embodiment of the invention. The decoder 600 may execute the steps of the method 400, for example according to computer readable instructions received from a computer program.
With reference to Figure 8, the decoder 600 comprises a receiving unit 610, a synthesis unit 630 and a filter unit 660. The filter unit comprises a dimensioning sub unit 640 and an application sub unit 650. It will be understood that the units of the apparatus are functional units, and may be realised in any appropriate combination of hardware and/or software.
The receiving unit 610 is configured to receive at least one encoded view of the video content and at least part of a depth representation associated with the view, a camera focal length for the encoded view, at least one reference parameter for a focus plane of the encoded view, and at least one reference parameter for a camera f-Number for the encoded view. The synthesis unit 630 is configured to synthesise at least one view of the video content. The dimensioning sub unit 640 of the filter unit 660 is configured to dimension a blur filter according to the received focal length and reference parameters. The application sub unit 650 of the filter unit 660 is configured to apply the dimensioned blur filter to the synthesised view.
Figure 9 illustrates additional steps that may be conducted as part of the method 400 for decoding three-dimensional video content. The method may for example be conducted in a decoder such as the decoders 700, 800 illustrated in Figures 12 and 13 and discussed in further detail below. With reference to Figure 9, in a first step 510, the decoder receives at least one encoded view of the video content and associated depth representation, as well as camera focal length and reference parameters for focus plane and camera f-Number for the encoded view. The decoder may also receive a camera shutter speed and a parameter representative of shutter shape.
In step 512, the decoder checks whether multiple pairs of reference parameters for focus plane and f-Number have been received. Multiple pairs may be sent by an encoder if different Depth of Field options are available. If multiple pairs of reference parameters have been received (Yes at step 512), the decoder then proceeds to select the most appropriate pair for the display and viewing conditions at step 514. This selection step may be automated or may be made with user input.
In some examples (not illustrated), the decoder may adjust a received or selected reference parameter according to viewing conditions or to known user requirements.
Having selected the appropriate pair of reference parameters, or if only one parameter for each of focus plane and f-Number have been received, the decoder proceeds to check, in step 516, whether the reference parameters received are the focus plane location and camera f-Number. If the reference parameters are the focus plane location and camera f-Number, then the decoder proceeds directly to step 520. If this is not the case, (No at step 516), the decoder proceeds to calculate either or both of the focus plane location and/or camera f-number from the received parameters at step 518. This may involve performing a LUT operation or a calculation, as discussed in further detail above. With the focus plane location and camera f-Number available, the decoder then proceeds to check, at step 520, whether blur is present in the image texture in the received encoded view.
As discussed above, the methods described herein may be used to recreate blur that was present in originally captured video content, or to impose blur onto all in focus video content, which content may have been physically captured or digitally generated. In the case of original video content containing blur, it can be advantageous to sharpen the received view before synthesising views. If the received view contains blur (Yes at step 520), the decoder therefore proceeds to step 522, in which the received reference parameters are used to sharpen the received view. In one example, this sharpening process comprises using the received reference parameters to calculate the blur diameters for pixels in the view. From these diameters, an approximation of the point spread function for each pixel can be generated, allowing the application of a deblurring filter to the received view. The deblurring filter may for example be a Wiener deconvolution. In some examples, motion blur parameters may also be calculated in order to sharpen the image for motion blur. Motion blur is discussed in further detail below.
Having sharpened the image texture, or if the received texture is fully in focus, the decoder then proceeds to check, in step 524, whether blur is present in the depth representation received, for example in a received depth map. If blur is present in the received depth map or other depth representation (Yes at step 524), the decoder sharpens the depth map at step 526. This may involve applying a plurality of median filters to the original depth map in order to remove smoothed edges from the depth map. Other methods for sharpening the depth map may be considered.
Once a sharp depth map and image texture are available, the decoder proceeds to synthesise at least one view of the video content at step 530. This may for example comprise running a DIBR process. Once the at least one view has been synthesised, the decoder proceeds to step 540 in which a blur filter is dimensioned according to the received focal length and reference parameters. This dimensioning process is described in detail above with respect to Figure 7. If a parameter representing shutter shape has been received then this may also be used in the dimensioning of the blur filter. The decoder then proceeds, in steps 542, 544 and 546 to address motion blur. Motion blur is described in greater detail below but in brief, the decoder assesses whether camera parameters for a frame t and frame t-1 are available in step 542. If such parameters are not available, (No at step 542), the decoder proceeds to estimate a motion blur direction from a motion model at step 544. Using either the estimation or the available camera parameters, the decoder then dimensions a motion blur filter at step 546, calculating motion blur direction and length. In step 548 the decoder combines the motion blur filter and dimensioned blur filter from steps 546 and 540 before, in step 550, applying the combined blur filter to the synthesised view. Referring again to steps 542 to 546 of Figure 9, motion blur can occur in captured video content, for example when a camera is moving or zooming rapidly. Transmitting the camera shutter speed as discussed with reference to Figure 4 allows the recreation of camera motion blur in synthesised views, or the imposition of camera motion blur where it is desired to introduce this blur onto images where it is not already present. When a camera shutter speed is received at the decoder, indicating motion blur may be applied, a first step is to determine the direction of the camera motion blur. By taking the depth map of frame t and projecting it on the camera frame t and t-1 , thus using different camera parameters, it is possible to determine the path that each pixel is taking. As illustrated in Figure 10, using Equation 2 (with Zs = d and q' and qt_1 expressed in homogeneous coordinates with the last coordinate being 1): qt-i _ pt-1 * Qt-1 / Zst-1 and qt = pt * Qt Zgt
For a static scene, or slowly moving content, the 3D point Qt_1 may be approximated to be equal to Q' (and Zs'"1 = Zs'). This approximation allows an approximation of the path v of the current pixel q' generated by the camera motion over time. V being equal to :
V = q'"1 - q* The camera projection matrices may be computed from the camera parameters such as translation_param and orientation_param given in the multiview_acquisition_info SEI message illustrated in Table 1. If the camera projections matrices Pt_1 and Pl are found to be identical, then no motion blur is present. If the camera parameters are not available for the frame t and/or t-1 , a motion model may be used in order to predict the missing camera parameters. One example involves using constant speed models, for instance:
Τ' = Τ' + V'"x / fps,
R* = R'"x* AngleAxisToRotationMatrix (W'"x/fps),
Figure imgf000036_0001
t-X being the last known camera parameter data, rotation speed and focal length speed (zoom). V is the estimated translational velocity vector (m/s if T is in meters and 1/fps in seconds). W(x) is the estimated angular velocity vector (expressed here as an angle axis where the norm corresponds to the angular speed in rad/s). VF is the estimated focal length speed (px/s if f is in pixel and 1/fps in second). Other motion models may be envisaged.
With the motion blur direction established, the amount of motion blur may be calculated. Having received the shutter speed ss (which, as noted may be an actual shutter speed for the captured content or may be a selected shutter speed) and by retrieving the video framerate fps, and depth map for the received video view, it is possible to calculate the Point Spread Function (PSF) approximation for each pixel and synthesize a motion blur due to the camera motion. The direction and length of the motion blur (per pixel) Dmoti0n to apply can be defined using the following formula:
Dmotion = ss * fps * v (Equation 14)
If the shutter speed equals 1/fps, than the pixel will be spread over the whole path v, otherwise just a part of it, as illustrated in Figure 10.
With the direction and length of the motion blur established, the dimensioned motion blur filter may be applied, for example according to the following equation:
/[m, n] (Equation 15)
Figure imgf000037_0001
Where J is the blurred image, I the original image, N the number of local pixel used (typically Dmation 's norm) and x(i) and y(i) are respectively the x and y components of the discrete segment Dmation starting from q':
[x(i),y(i)] = qt + (i/(N-1)) * Dmotion L(t) is the intensity model of an image pixel and can be modelled by
(Equation 16)
Figure imgf000037_0002
Where d is the effective aperture diameter, Lmax is the maximum intensity (typically 255 for each image channel), and k and p are two constants. The above description provides one example of the application of motion blur, other equations may be used, for instance, a 2D kernel K of size NxM may be constructed from Dmotion With the following formula:
K[k,l] = 1 , if it exists a i such that [x(i),y(i)] - q' + [N/2.M/2] = [k,l] and K[k,l] = 0 otherwise.
The image I may then be blurred with the following equation:
10
Figure imgf000038_0001
With S, being the sum of all the kernel weights:
N-l M-l
Σ £=0 Σ *
'=0
It will be appreciated that the dimensioned blur and motion blur filters may be applied consecutively or may be combined before application. It may be that better final results are obtained using a combined motion blur and blur filter.
Figure 11 illustrates synthesis results using the method of Figure 9. An all in focus texture and depth map is received or obtained through sharpening procedures before view synthesis and blurring are applied. In case A, where a disocclusion is present, an inpainter is used in order to fill the disoccluded area and then the blur filter is applied. In case B, no inpainter is required, and the blur filter is applied after DIBR. It can be seen from a comparison of Figures 2 and 1 1 that using the method of Figure 9, it is possible to recreate the exact blur characteristics in a synthesized image as were observed in the captured image. The synthesized image is thus a correct match for the captured image, providing a smooth and realistic three-dimensional experience for a viewer. Figure 12 illustrates functional units of another embodiment of decoder 700 in accordance with an embodiment of the invention. The decoder 700 may execute the steps of the method 500, for example according to computer readable instructions received from a computer program.
With reference to Figure 12, the decoder 700 comprises an receiving unit 710, analysis and calculation unit 715, sharpening unit 720, synthesis unit 730 and filter unit 740. The filter unit 740 comprises a blur dimensioning sub unit 740, a motion dimensioning sub unit 740, a combining sub unit 748 and an application sub unit 750. It will be understood that the units of the apparatus are functional units, and may be realised in any appropriate combination of hardware and/or software.
The receiving unit 710 is configured to receive at least one encoded view of the video content and at least part of a depth representation associated with the view, a camera focal length for the encoded view, at least one reference parameter for a focus plane of the encoded view, and at least one reference parameter for a camera f-Number for the encoded view. The receiving unit is also configured to receive a shutter speed and a parameter for shutter shape. The analysis and calculation unit 715 is configured to conduct steps 512 to 518 of the method of Figure 9, checking for multiple pairs of reference parameters and selecting and appropriate pair, and calculating the focus plane and f-Number from the respective reference parameters, if such calculation is necessary. The sharpening unit 720 is configured to check for blur in the received image texture and depth representation, and to sharpen the image texture and/or depth representation if blur is present. The synthesis unit 730 is configured to synthesise at least one view of the video content. The blur dimensioning sub unit 740 of the filter unit 760 is configured to dimension a blur filter according to the received focal length and reference parameters. The motion dimensioning sub unit 746 is configured to dimension a motion blur filter according to a received shutter speed and extracted parameters form the encoded view as discussed above. The combination sub unit 748 is configured to combine the dimensioned blur and motion blur filters. The application sub unit 750 is configured to apply the combined filter to the synthesised view or views.
Figure 13 illustrates another embodiment of decoder 800. The decoder 800 comprises a processor 880 and a memory 890. The memory 890 contains instructions executable by the processor 880 such that the decoder 800 is operative to conduct the steps of the method of Figure 9 described above. The method of the present invention may be implemented in hardware, or as software modules running on one or more processors. The method may also be carried out according to the instructions of a computer program, and the present invention also provides a computer readable medium having stored thereon a program for carrying out any of the methods described herein. A computer program embodying the invention may be stored on a computer-readable medium, or it could, for example, be in the form of a signal such as a downloadable data signal provided from an Internet website, or it could be in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. The word "comprising" does not exclude the presence of elements or steps other than those listed in a claim, "a" or "an" does not exclude a plurality, and a single processor or other unit may fulfil the functions of several units recited in the claims. Any reference signs in the claims shall not be construed so as to limit their scope.

Claims

1. A method of encoding three-dimensional video content, the method comprising: encoding at least one view of the video content and at least part of a depth representation associated with the view;
defining a camera focal length for the encoded view;
selecting at least one reference parameter for a focus plane of the encoded view; selecting at least one reference parameter for a camera f-Number for the encoded view; and
transmitting the encoded at least one view and at least part of an associated depth representation, the defined focal length, the selected reference parameter for a focus plane of the encoded view and the selected reference parameter for a camera f- Number for the encoded view to a node.
2. A method as claimed in claim 1 , wherein the reference parameter for a focus plane of the encoded view comprises a location of a focus plane for the encoded view.
3. A method as claimed in claim 1 , wherein the reference parameter for a focus plane of the encoded view comprises a look-up table index corresponding to a focus plane of the encoded view.
4. A method as claimed in claim 1 , wherein the reference parameter for a focus plane of the encoded view comprises a distance between a recording surface and an optical system of a camera for the encoded view.
5. A method as claimed in claim 1 , wherein the reference parameter for a focus plane of the encoded view comprises the distances of the nearest and farthest objects in a scene of the encoded view that fulfil criteria for an acceptable focus level.
6. A method as claimed in claim 1 , wherein the reference parameter for a focus plane of the video content comprises look-up table indexes corresponding to the distances of the nearest and farthest objects in a scene of the encoded view that fulfil criteria for an acceptable focus level.
7. A method as claimed in any one of the preceding claims, wherein the reference parameter for a camera f-Number for the encoded view comprises a camera f-Number for the encoded view.
8. A method as claimed in any one of claims 1 to 6, wherein the reference parameter for a camera f-Number for the encoded view comprises a camera aperture diameter.
9. A method as claimed in any one of the preceding claims, wherein the selected reference parameters for focus plane and camera f-Number correspond to a first depth of focus, and wherein the method further comprises:
selecting at least one additional reference parameter for a focus plane of the encoded view;
selecting at least one additional reference parameter for a camera f-Number for the encoded view; and
transmitting the selected additional reference parameters with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.
10. A method as claimed in any one of the preceding claims, wherein the video content comprises captured video content, and wherein the selected reference parameters for focus plane and camera f-Number correspond to an actual focus plane and camera f-Number of a capturing camera.
1 1. A method as claimed in any one of claims 1 to 9, wherein the selected reference parameters for focus plane and camera f-Number correspond to a selected focus place and camera f-Number for one of a capturing camera or a virtual camera.
12. A method as claimed in any one of the preceding claims, further comprising: selecting a shutter speed for the video content; and
transmitting the shutter speed with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.
13. A method as claimed in any one of the preceding claims, further comprising: selecting a shutter shape for the video content; and transmitting a parameter representing the shutter shape with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.
14. A method as claimed in any one of the preceding claims, wherein transmitting the selected reference parameters to the node comprises including the selected reference parameters in a supplementary enhancement information, SEI, message.
15. A method for decoding three-dimensional video content, comprising:
receiving:
at least one encoded view of the video content and at least part of a depth representation associated with the view,
a camera focal length for the encoded view;
at least one reference parameter for a focus plane of the encoded view; and
at least one reference parameter for a camera f-Number for the encoded view;
synthesising at least one view of the video content;
dimensioning a blur filter according to the received focal length and reference parameters; and
applying the dimensioned blur filter to the synthesised view.
16. A method as claimed in claim 15, wherein dimensioning a blur filter according to the received focal length and reference parameters comprises at least one of:
calculating a focus plane from the received reference parameter for a focus plane; and
calculating a camera f-Number from the received reference parameter for a camera f-Number.
17. A method as claimed in claim 15 or 16, wherein synthesising at least one view of the video content comprises applying a Depth Image Based Rendering, DIBR, process to the encoded view of the video content.
18. A method as claimed in any one of claims 15 to 17, further comprising:
receiving a parameter representing shutter shape for the video content; and applying the shutter shape outline corresponding to the received parameter to the dimensioned blur filter.
19. A method as claimed in any one of claims 15 to 18, further comprising:
receiving a shutter speed for the video content; and
dimensioning a motion blur filter according to the received shutter speed;
wherein applying the dimensioned blur filter to the synthesised view comprises:
combining the dimensioned blur filter with the dimensioned motion blur filter; and applying the combined blur filter to the synthesised view.
20. A method as claimed in claim 19, further comprising estimating a motion blur direction from a motion model.
21. A method as claimed in any one of claims 15 to 20, wherein the received encoded view includes blur, and wherein the method further comprises:
sharpening the received encoded view according to the received focal length and reference parameters before synthesising the at least one view of the video content.
22. A method as claimed in claim 21 , further comprising applying a sharpening process to a depth map of the received view before synthesising the at least one view of the video content.
23. A computer program product configured, when run on a computer, to execute a method according to any one of the preceding claims.
24. An encoder configured for encoding three-dimensional video content, the encoder comprising a processor and a memory, the memory containing instructions executable by the processor whereby the encoder is operative to:
encode at least one view of the video content and at least part of a depth representation associated with the view;
define a camera focal length for the encoded view;
select at least one reference parameter for a focus plane of the encoded view; select at least one reference parameter for a camera f-Number for the encoded view; and
transmit the encoded at least one view and at least part of an associated depth representation, the defined focal length, the selected reference parameter for a focus plane of the encoded view and the selected reference parameter for a camera f- Number of the encoded view to a node.
25. An encoder as claimed in claim 24, wherein the encoder is further operative to select a reference parameter for a focus plane of the encoded view which comprises a location of a focus plane for the encoded view.
26. An encoder as claimed in claim 24, wherein the encoder is further operative to select a reference parameter for a focus plane of the encoded view which comprises a look-up table index corresponding to a focus plane of the encoded view.
27. An encoder as claimed in any one of claims 24 to 26, wherein the encoder is further operative to select a reference parameter for a camera f-Number for the encoded view which comprises a camera f-Number for the encoded view.
28. An encoder as claimed in any one of claims 24 to 27, wherein the encoder is further operative to transmit at least one of the selected reference parameters in floating point representation.
29. An encoder as claimed in any one of claims 24 to 28, wherein the encoder is further operative to:
select a shutter speed for the video content; and
transmit the shutter speed with the selected reference parameters, focal length and encoded at least one view and at least part of an associated depth representation.
30. A decoder, the decoder comprising a processor and a memory, the memory containing instructions executable by the processor whereby the decoder is operative to:
receive:
at least one encoded view of the video content and at least part of a depth representation associated with the view,
a camera focal length for the encoded view;
at least one reference parameter for a focus plane of the encoded view; and
at least one reference parameter for a camera f-Number for the encoded view; synthesise at least one view of the video content;
dimension a blur filter according to the received focal length and reference parameters; and
apply the dimensioned blur filter to the synthesised view.
31. A decoder as claimed in claim 30, wherein the decoder is further operative to: receive a shutter speed for the video content; and
dimension a motion blur filter according to the received shutter speed;
wherein applying the dimensioned blur filter to the synthesised view comprises:
combining the dimensioned blur filter with the dimensioned motion blur filter; and apply the combined blur filter to the synthesised view.
PCT/SE2014/050118 2014-01-30 2014-01-30 Methods for encoding and decoding three-dimensional video content WO2015115946A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/SE2014/050118 WO2015115946A1 (en) 2014-01-30 2014-01-30 Methods for encoding and decoding three-dimensional video content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SE2014/050118 WO2015115946A1 (en) 2014-01-30 2014-01-30 Methods for encoding and decoding three-dimensional video content

Publications (1)

Publication Number Publication Date
WO2015115946A1 true WO2015115946A1 (en) 2015-08-06

Family

ID=50193565

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE2014/050118 WO2015115946A1 (en) 2014-01-30 2014-01-30 Methods for encoding and decoding three-dimensional video content

Country Status (1)

Country Link
WO (1) WO2015115946A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3185560A1 (en) * 2015-12-23 2017-06-28 Thomson Licensing System and method for encoding and decoding information representative of a bokeh model to be applied to an all-in-focus light-field content
WO2019082958A1 (en) * 2017-10-27 2019-05-02 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Three-dimensional model encoding device, three-dimensional model decoding device, three-dimensional model encoding method, and three-dimensional model decoding method
WO2020185853A3 (en) * 2019-03-11 2020-10-29 Dolby Laboratories Licensing Corporation Frame-rate scalable video coding
WO2023274129A1 (en) * 2021-06-28 2023-01-05 Beijing Bytedance Network Technology Co., Ltd. Enhanced signaling of depth representation information supplemental enhancement information
US11877000B2 (en) 2019-08-06 2024-01-16 Dolby Laboratories Licensing Corporation Canvas size scalable video coding

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040247175A1 (en) * 2003-06-03 2004-12-09 Konica Minolta Photo Imaging, Inc. Image processing method, image capturing apparatus, image processing apparatus and image recording apparatus
US20080043095A1 (en) * 2006-04-04 2008-02-21 Anthony Vetro Method and System for Acquiring, Encoding, Decoding and Displaying 3D Light Fields
WO2010087955A1 (en) * 2009-01-30 2010-08-05 Thomson Licensing Coding of depth maps
EP2360930A1 (en) * 2008-12-18 2011-08-24 LG Electronics Inc. Method for 3d image signal processing and image display for implementing the same
US20120050474A1 (en) * 2009-01-19 2012-03-01 Sharp Laboratories Of America, Inc. Stereoscopic dynamic range image sequence
EP2582135A2 (en) * 2010-06-11 2013-04-17 Samsung Electronics Co., Ltd 3d video encoding/decoding apparatus and 3d video encoding/decoding method using depth transition data
US20130195350A1 (en) * 2011-03-29 2013-08-01 Kabushiki Kaisha Toshiba Image encoding device, image encoding method, image decoding device, image decoding method, and computer program product
WO2014011103A1 (en) * 2012-07-10 2014-01-16 Telefonaktiebolaget L M Ericsson (Publ) Methods and arrangements for supporting view synthesis

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040247175A1 (en) * 2003-06-03 2004-12-09 Konica Minolta Photo Imaging, Inc. Image processing method, image capturing apparatus, image processing apparatus and image recording apparatus
US20080043095A1 (en) * 2006-04-04 2008-02-21 Anthony Vetro Method and System for Acquiring, Encoding, Decoding and Displaying 3D Light Fields
EP2360930A1 (en) * 2008-12-18 2011-08-24 LG Electronics Inc. Method for 3d image signal processing and image display for implementing the same
US20120050474A1 (en) * 2009-01-19 2012-03-01 Sharp Laboratories Of America, Inc. Stereoscopic dynamic range image sequence
WO2010087955A1 (en) * 2009-01-30 2010-08-05 Thomson Licensing Coding of depth maps
EP2582135A2 (en) * 2010-06-11 2013-04-17 Samsung Electronics Co., Ltd 3d video encoding/decoding apparatus and 3d video encoding/decoding method using depth transition data
US20130195350A1 (en) * 2011-03-29 2013-08-01 Kabushiki Kaisha Toshiba Image encoding device, image encoding method, image decoding device, image decoding method, and computer program product
WO2014011103A1 (en) * 2012-07-10 2014-01-16 Telefonaktiebolaget L M Ericsson (Publ) Methods and arrangements for supporting view synthesis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Joint Draft 7.0 on Multiview Video Coding", 27. JVT MEETING; 6-4-2008 - 10-4-2008; GENEVA, ; (JOINT VIDEO TEAM OFISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ),, no. JVT-AA209, 6 June 2008 (2008-06-06), XP030007391, ISSN: 0000-0063 *
RAPPORTEUR Q6/16: "H.264 (V9) Advanced video coding for generic audiovisual services (Rev.): Input draft (for consent)", ITU-T SG16 MEETING; 28-10-2013 - 8-11-2013; GENEVA,, no. T13-SG16-131028-TD-WP3-0099, 5 November 2013 (2013-11-05), XP030100669 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3185560A1 (en) * 2015-12-23 2017-06-28 Thomson Licensing System and method for encoding and decoding information representative of a bokeh model to be applied to an all-in-focus light-field content
JP7277372B2 (en) 2017-10-27 2023-05-18 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ 3D model encoding device, 3D model decoding device, 3D model encoding method, and 3D model decoding method
WO2019082958A1 (en) * 2017-10-27 2019-05-02 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Three-dimensional model encoding device, three-dimensional model decoding device, three-dimensional model encoding method, and three-dimensional model decoding method
JPWO2019082958A1 (en) * 2017-10-27 2020-11-12 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America 3D model coding device, 3D model decoding device, 3D model coding method, and 3D model decoding method
CN117014611A (en) * 2019-03-11 2023-11-07 杜比实验室特许公司 Frame rate scalable video coding
CN116668696A (en) * 2019-03-11 2023-08-29 杜比实验室特许公司 Frame rate scalable video coding
US11323728B2 (en) 2019-03-11 2022-05-03 Dolby Laboratories Licensing Corporation Frame-rate scalable video coding
CN111971964B (en) * 2019-03-11 2022-06-03 杜比实验室特许公司 Frame rate scalable video coding
EP4064706A1 (en) * 2019-03-11 2022-09-28 Dolby Laboratories Licensing Corporation Signalling of information related to shutter angle
US11523127B2 (en) 2019-03-11 2022-12-06 Dolby Laboratories Licensing Corporation Frame-rate scalable video coding
EP4300975A3 (en) * 2019-03-11 2024-03-27 Dolby Laboratories Licensing Corporation Signalling of information related to shutter angle
US11582472B2 (en) 2019-03-11 2023-02-14 Dolby Laboratories Licensing Corporation Frame-rate scalable video coding
CN111971964A (en) * 2019-03-11 2020-11-20 杜比实验室特许公司 Frame rate scalable video coding
US10999585B2 (en) 2019-03-11 2021-05-04 Dolby Laboratories Licensing Corporation Frame-rate scalable video coding
EP4236325A3 (en) * 2019-03-11 2023-10-11 Dolby Laboratories Licensing Corporation Signalling of information related to shutter angle
WO2020185853A3 (en) * 2019-03-11 2020-10-29 Dolby Laboratories Licensing Corporation Frame-rate scalable video coding
US11818372B2 (en) 2019-03-11 2023-11-14 Dolby Laboratories Licensing Corporation Frame-rate scalable video coding
US11871015B2 (en) 2019-03-11 2024-01-09 Dolby Laboratories Licensing Corporation Frame-rate scalable video coding
JP7411727B2 (en) 2019-03-11 2024-01-11 ドルビー ラボラトリーズ ライセンシング コーポレイション Frame rate scalable video encoding
US11936888B1 (en) 2019-03-11 2024-03-19 Dolby Laboratories Licensing Corporation Frame-rate scalable video coding
CN117014611B (en) * 2019-03-11 2024-03-15 杜比实验室特许公司 Frame rate scalable video coding
US11877000B2 (en) 2019-08-06 2024-01-16 Dolby Laboratories Licensing Corporation Canvas size scalable video coding
WO2023274129A1 (en) * 2021-06-28 2023-01-05 Beijing Bytedance Network Technology Co., Ltd. Enhanced signaling of depth representation information supplemental enhancement information

Similar Documents

Publication Publication Date Title
EP2005757B1 (en) Efficient encoding of multiple views
US9525858B2 (en) Depth or disparity map upscaling
JP6027034B2 (en) 3D image error improving method and apparatus
De Silva et al. Display dependent preprocessing of depth maps based on just noticeable depth difference modeling
EP1978755A2 (en) Method and system for acquiring, encoding, decoding and displaying 3D light fields
EP1978754A2 (en) Method and system for processing light field of three dimensional scene
EP2532166B1 (en) Method, apparatus and computer program for selecting a stereoscopic imaging viewpoint pair
KR20170140187A (en) Method for fully parallax compression optical field synthesis using depth information
JP2013527646A5 (en)
JP2014056466A (en) Image processing device and method
Schmeing et al. Depth image based rendering: A faithful approach for the disocclusion problem
WO2015115946A1 (en) Methods for encoding and decoding three-dimensional video content
Salahieh et al. Light Field Retargeting from Plenoptic Camera to Integral Display
JP5931062B2 (en) Stereoscopic image processing apparatus, stereoscopic image processing method, and program
EP2822279B1 (en) Autostereo tapestry representation
Knorr et al. From 2D-to stereo-to multi-view video
KR101289269B1 (en) An apparatus and method for displaying image data in image system
KR101920113B1 (en) Arbitrary View Image Generation Method and System
TWI536832B (en) System, methods and software product for embedding stereo imagery
Al-Obaidi et al. Influence of depth map fidelity on virtual view quality
Fatima et al. Quality assessment of 3D synthesized images based on structural and textural distortion
Aflaki et al. Unpaired multiview video plus depth compression
KR101336955B1 (en) Method and system for generating multi-view image
Cho et al. Enhancing depth accuracy on the region of interest in a scene for depth image based rendering
BR112021007522A2 (en) image generator apparatus, image generation method and computer program product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14707856

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14707856

Country of ref document: EP

Kind code of ref document: A1