WO2015097118A1 - Method and device for encoding a high-dynamic range image into a bitstream and/or decoding a bitstream representing a high-dynamic range image - Google Patents

Method and device for encoding a high-dynamic range image into a bitstream and/or decoding a bitstream representing a high-dynamic range image Download PDF

Info

Publication number
WO2015097118A1
WO2015097118A1 PCT/EP2014/078940 EP2014078940W WO2015097118A1 WO 2015097118 A1 WO2015097118 A1 WO 2015097118A1 EP 2014078940 W EP2014078940 W EP 2014078940W WO 2015097118 A1 WO2015097118 A1 WO 2015097118A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
bitstream
decoded
encoding
illumination map
Prior art date
Application number
PCT/EP2014/078940
Other languages
French (fr)
Inventor
Yannick Olivier
Sebastien Lasserre
Fabrice Leleannec
David Touze
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Publication of WO2015097118A1 publication Critical patent/WO2015097118A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2320/00Control of display operating conditions
    • G09G2320/06Adjustment of display parameters
    • G09G2320/0626Adjustment of display parameters for control of overall brightness
    • G09G2320/0646Modulation of illumination source brightness and image signal correlated to each other
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2340/00Aspects of display data processing
    • G09G2340/02Handling of images in compressed format, e.g. JPEG, MPEG

Definitions

  • the present invention generally relates to image/video encoding and decoding.
  • the technical field of the present invention is related to encoding of an image whose pixels values belong to a high-dynamic range, and decoding a bitstream representing a high-dynamic range image.
  • LDR images are images whose luminance values are represented with a limited number of bits (most often 8 or 10). This limited representation does not allow correct rendering of small signal variations, in particular in dark and bright luminance ranges.
  • HDR images high- dynamic range images
  • pixel values are usually represented in floating-point format (either 32-bit or 16-bit for each component, namely float or half-float), the most popular format being openEXR half-float format (16-bit per RGB component, i.e. 48 bits per pixel) or in integers with a long representation, typically at least 16 bits.
  • a typical approach for encoding an HDR image is to reduce the dynamic range of the image in order to encode the image by means of a traditional encoding scheme (initially configured to encode LDR images).
  • a tone-mapping operator is applied to the input HDR image and the tone-mapped image is then encoded by means of a traditional 8-10 bit depth encoding scheme such as JPEG/JPEG200 or MPEG-2, H.264/AVC for video ('Advanced video coding for generic audiovisual Services", SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Recommendation ITU-T H.264, Telecommunication Standardization Sector of ITU, January 2012).
  • an inverse tone- mapping operator is applied to the decoded image and a residual image is calculated between the input image and the decoded and inverse-tone- mapped image.
  • the residual image is encoded by means of a second traditional 8-10 bit-depth encoder scheme.
  • This first approach is backward compatible in the sense that a low dynamic range image may be decoded and displayed by means of a traditional apparatus.
  • this first approach uses two encoding schemes and limits the dynamic range of the input image to be twice the dynamic range of a traditional encoding scheme (16-20 bits). Moreover, such approach leads sometimes to a low dynamic range image with a weaker correlation with the input HDR image. This leads to low coding performance of the image.
  • an illumination map is determined from the input HDR image.
  • a residual image is then obtained from the image and the illumination map and both the illumination map and the residual image are directly encoded.
  • This specific approach for encoding an input HDR image is not backward compatibility with a traditional apparatus that is not able to decode and/or display a high-dynamic range image.
  • bitstream a signalization data indicating that the bitstream comprises the illumination map.
  • the method further comprises:
  • the illumination map is encoded as an auxiliary picture whose syntax conforms either to the H264/AVC or HEVC standard.
  • Auxiliary pictures have been defined in the H264/AVC or HEVC standard in addition to the so-called "primary coded picture", which actually correspond to the main stream (main video) of the content.
  • Auxiliary pictures enable usually the transport of additional image information such as alpha compositing, chroma enhancement information or depth information for 3D applications.
  • the residual image is encoded as a primary picture whose syntax conforms either to the H264/AVC or HEVC standard.
  • auxiliary data i.e. illumination map
  • the decoding method of auxiliary data which takes place before the display, conforms to the HEVC specification, thus is used as is, in its already specified form.
  • the illumination map is a backlight image and the residual image is obtained by dividing the image by a decoded version of the backlight image.
  • the residual image is tone-mapped before encoding.
  • This provides a viewable residual image, i.e. a residual image in the sense that resulting residual image renders artistically the tone-mapped scene reasonably well and consistently compared to the original scene in the image.
  • This method is thus backward compatible because the viewable residual image may be decoded and/or displayed by a traditional apparatus that is not able to handle high dynamic range.
  • a legacy (non-HDR) H264/AVC or HEVC decoder can simply drop the illumination maps (which are not recognized by this legacy decoder), and only decode the residual images.
  • encoding a high dynamic range image by means of such method leads to an efficient encoding scheme because the tone-mapped residual image (low dynamic range image), which is highly spatially correlated (and temporally correlated with other images of a same sequence of images), and the backlight image are encoded separately.
  • a coding gain is thus reached because of the high compression rate of the tone-mapped residual image and of the little amount of data to encode the backlight image.
  • tone-mapping the residual image comprises either a gamma correction or a SLog correction according to the pixel values of the residual image.
  • the method further comprises scaling of the residual image before encoding.
  • the method further comprises clipping the residual image before encoding.
  • the residual image ensures a limited number of bits and allows the use of a traditional encoding/decoding scheme for encoding it.
  • the encoding/decoding scheme is backward compatible with existing infrastructure (codec, displays, distribution channels, etc.) because only the residual image, which has a low dynamic range, typically 8-10 bits, may be transmitted over such infrastructure to display a low dynamic range version of the image.
  • the small bitstream, which contains the backlight data, may be carried in a side container over a dedicated infrastructure to distribute the original version of the image (i.e. a HDR image).
  • the invention relates to a method for decoding a bitstream representing an image comprising:
  • bitstream if a signalization data indicates that the bitstream comprises data related to an illumination map determined from the image
  • the decoded residual image is obtained by decoding the bitstream at least partially.
  • the signalization data is detected from high level syntax elements and its usage is completed by an SEI message.
  • the bitstream comprises a primary picture and an auxiliary picture whose syntax conforms with the standard H264/AVC or HEVC and wherein the primary picture represents the residual image and the auxiliary picture represents the illumination map.
  • the decoded illumination map is a backlight image and wherein the decoded image is obtained by multiplying the decoded residual image by the backlight image.
  • the decoded residual image is inverse- tone-mapped before multiplying the decoded residual image by the backlight image.
  • the illumination map is a low-spatial-frequency version of the luminance component of the image to be encoded
  • the residual image is obtained by calculating the difference between the luminance component of the image and a decoded version of the encoded low-spatial-frequency version.
  • the invention relates to a bitstream representing an image, characterized in that it comprises a signalization data indicating that it represents an illumination map determined from the image.
  • the invention relates to a device for encoding an image and a device for decoding a bitstream which implements the above methods.
  • FIG. 1 shows a block diagram of the steps of a method for encoding an image into a bitstream in accordance with an embodiment of the invention
  • - Fig. 2 represents a block diagram of a method for decoding a bitstream F representing an image in accordance with an embodiment of the invention
  • - Fig. 3 shows a block diagram of the sub-steps of the step 1000 in accordance with an embodiment of the invention
  • FIG. 4 shows a block diagram of the sub-steps of the step 1020 in accordance with an embodiment of the invention
  • FIG. 5 shows a block diagram of the sub-steps of the step 1020 in accordance with an embodiment of the invention
  • FIG. 6 shows a block diagram of the sub-steps of the step 1020 in accordance with an embodiment of the invention
  • FIG. 7 shows a block diagram of the sub-steps of the step 2300 in accordance with an embodiment of the invention
  • FIG. 8 shows a block diagram of the sub-steps of the step 1000 in accordance with an embodiment of the invention.
  • FIG. 9 shows a block diagram of the sub-steps of the step 2300 in accordance with an embodiment of the invention.
  • FIG. 10 shows an example of an architecture of a device in accordance with an embodiment of the invention.
  • FIG. 11 shows two remote devices communicating over a communication network in accordance with an embodiment of the invention. 5. Detailed description of preferred embodiments of the invention.
  • each block represents a circuit element, module, or portion of code which comprises one or more executable instructions for implementing the specified logical function(s).
  • the function(s) noted in the blocks may occur out of the order noted. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending on the functionality involved.
  • the invention is described for encoding/decoding an image but extends to the encoding/decoding of a sequence of images (video) because each image of the sequence is sequentially encoded/decoded as described below.
  • Fig. 1 shows a block diagram of the steps of a method for encoding an image into a bitstream in accordance with an embodiment of the invention.
  • a module PRP determines an illumination map IM from the image I to be encoded.
  • An illumination map gathers illumination data relative to the pixels of the image I to be encoded.
  • the illumination map may comprise a triplet of illumination values for each pixel of the image, each value of a triplet being an illumination value for a color component value of a pixel.
  • the illumination map is described as being either a backlight image or a low-frequency version of the luminance component of the image to be encoded but the invention is not limited to any specific representation of illumination values relative to an image to be encoded.
  • an encoder ENC1 encodes the illumination map IM into a bitstream F.
  • a module SM encodes into the bitstream F a signalization data SD indicating that the bitstream F comprises the illumination map IM.
  • the signalization data SD further comprises parameters related to the illumination map structure, and parameters related to the process to be applied to the decoded illumination map to reconstruct the image I.
  • the illumination map structure parameters comprises at least an image spatial resolution and an image samples bit-depth.
  • an encoder ENC2 encodes into the bitstream F a residual image Rl determined from the image I and the illumination map IM.
  • the signalization data SD is also adapted to synchronize the residual image with the illumination map in order, for example, to obtain a decoded image.
  • the bitstream F comprises the signalization data SD, the illumination map IM and, according to an embodiment, the residual image Rl.
  • the bitstream F may be stored on a local or remote memory and/or transmitted through a communication interface (e.g. to a bus or over a communication network or a broadcast network).
  • Fig. 2 represents a block diagram of a method for decoding a bitstream F representing an image in accordance with an embodiment of the invention.
  • the bitstream F may be obtained by the method of encoding an image as described in relation with Fig. 1 .
  • a module SMD detects in the bitstream F if a signalization data SD indicates that the bitstream F comprises data related to an illumination map determined from the image to be decoded.
  • a decoder DEC1 obtains the illumination map DIM by decoding the bitstream F at least partially. Potentially, parameters are also obtained by decoding the bitstream F.
  • these parameters are obtained from an SEI message in the bitstream F.
  • the signalization data is detected from high level syntax elements and its usage is completed by an SEI message.
  • a decoder DEC2 obtains a decoded residual image DRI from a memory or, according to an embodiment, by decoding the bitstream F at least partially.
  • a module POP obtains a decoded image / from the decoded residual image DRI and the decoded illumination map DIM.
  • the encoder ENC1 is configured to encode the illumination map IM as an auxiliary picture whose syntax is defined by the standard H264/AVC or HEVC (B. Bross, W.J. Han, G. J. Sullivan, J.R. Ohm, T. Wiegand JCTVC-K1003, "High Efficiency Video Coding (HEVC) text specification draft 9," Oct 2012), and the decoder DEC1 is configured to obtain (step 2100) an illumination map DIM from an auxiliary picture whose syntax is defined by the standard H264/AVC or HEVC.
  • the bitstream F then comprises an auxiliary picture whose syntax conforms with the standard H264/AVC or HEVC and the auxiliary picture represents the illumination map IM.
  • An auxiliary picture may be implemented by specifying new Video Coding Layer (VCL) NAL unit type(s) as done in H.264/AVC (cf recommendation ITU-T H.264, SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Infrastructure of audiovisual services - Coding of moving video, 03/2009).
  • VCL Video Coding Layer
  • an auxiliary picture corresponds to the NAL Unit type 19, as shown in the table below.
  • the syntax and decoding process of an auxiliary picture is exactly identical to the syntax and decoding process of a primary (non auxiliary) coded picture. In other words, the decoding of the data related to the auxiliary pictures use the same decoding syntax and engines as when decoding the primary coded pictures.
  • an auxiliary coded picture may be implemented as a specific layer in a scalable encoding.
  • This is what is recommended for HEVC, as explained in document JCTVC-O0041 :REXT/MV-HEVC/SHVC HLS: auxiliary picture layers, and specified in the most recent specification of the scalable extension of HEVC, document JCTVC-O1008, High efficiency video coding (HEVC) scalable extension Draft 4, November 2013).
  • HEVC High efficiency video coding
  • the scalability type of the enhancement scalable layer indicates that it corresponds to auxiliary coded pictures, using a syntax element scalability mask flag corresponding to the 'scalability mask index' as shown in the table below.
  • a parameter 'Auxld' indicating the type of the auxiliary picture, is derived from the Scalability Id parameter, which is itself deduced from a syntax element dimension_id as explained in section F.7.4.3.1 .1 : Video parameter set extension semantics of document JCTVC-01008.
  • the encoder ENC2 is configured to encode the residual image Rl as a primary picture whose syntax conforms either to the H264/AVC or HEVC standard
  • the decoder DEC2 is configured to obtain (step 2200) a decoded residual image from a primary picture whose syntax is defined by the standard H264/AVC or HEVC.
  • the primary coded and auxiliary coded pictures commonly use the syntax and decoding process specified in the AVC or HEVC specifications. There is no specific decoding process related to each type of picture. The main differences relate to high level syntax (e.g. NAL Unit Type, or scalability_mask_flag, as explained above).
  • a specific value of the parameter Auxld may be defined to indicate the nature of the auxiliary picture.
  • a specific value can be defined for the illumination map, as shown in the table below.
  • Auxld equal to 1 corresponds to an alpha channel
  • Auxld equal to 2 corresponds to a depth map
  • Auxld equal to 3 corresponds to an illumination map.
  • the value of Auxld for the illumination map is the same as for the alpha channel.
  • the usage of alpha channel is partly similar to the usage of a backlight channel, since it consists in multiplying and scaling the input primary picture by the alpha map, with a final clipping operation to guarantee the signal to stay inside the min and max signal limits.
  • the alpha channel concept can therefore be simply adapted to be used for the backlight map. The difference can come from different scaling and clipping values, signaled in the lacying SEI message.
  • the bitstream F then further comprises a primary picture whose syntax conforms with the standard H264/AVC or HEVC and the primary picture represents the residual image Rl.
  • the signalization data SD is carried by an SEI message indicated according to a syntax conform with JCTVC-O1008. Then, the signalization data SD is detected in step 2000 from the message SEI according to this embodiment.
  • the SEI message may be used to carry parameters and/or indication for obtaining a decoded image from a decoded version of the auxiliary picture and a decoded version of the primary picture (decoded residual image).
  • syntax to signal an SEI message is based on the syntax given by JCTVC-O0041/JCTVC-F0031 :
  • the parameters and/or indication for obtaining a decoded image may be one of the following list:
  • picture samples topology e.g. regular sample topology or quinquonce samples topology
  • auxiliary_hdr_channel_info( payloadSize ) Descriptor hdr_illumination_picture_color_format ue(v)
  • the parameter hdr_illumination_picture_color_format specifies the color format of the illumination map, 4:2:0, 4:2:2, or 4:4:4 colour format or the array or a single sample of the array that compose a picture in monochrome format.
  • one bit-depth is signaled for the luma component and one for the chroma components.
  • the parameter hdr_ illumination picture width specifies the horizontal size of the illumination map
  • the parameter hdr_ illumination picture height specifies the vertical size of the illumination map
  • the parameter hdr_illumination_picture_scaling_type specifies a scaling process of the illumination map to get a full definition image.
  • the parameter hdr_shape_function_size_x specifies the width of a shape function ( ⁇ ) used to determine a backlight image
  • the parameter hdr_shape_function_size_y specifies the height of a shape function used to determine a backlight image
  • the parameter hdr_shape_function [cy][cx] gives the value of a scaling filter coefficient at the position (cy, cx)
  • the parameter hdr_ ldr_gamma_slog parameters specifies the parameters to an inverse tone-mapping
  • the parameter hdr_ ldr_scaling_factor specifies a scaling process of the residual image to get a full definition image.
  • the default value is equal to 120.
  • the width and height of the shape function depends on
  • one shape function may be defined per block size.
  • one shape function may be defined for the luma component, and one for the chroma components.
  • one shape function may be defined per luma value range. For instance, for luma being between 0 to 127, a first shape function applies, with a large width and height. For luma being between 128 to 255, a second shape function applies, with a smaller width and height in order to limit the propagation of large values to the neighboring areas.
  • the parameters specified in the SEI message maintain their validity until a new SEI message is received. In that case the new parameter values are overwritten.
  • Fig. 3 shows a block diagram of the sub-steps of the step 1000 in accordance with an embodiment of the invention.
  • a module IC obtains the luminance component L and potentially at least one color component C(i) of the image I to be encoded.
  • the luminance component L is obtained, for instance in the 709 gamut, by a linear combination which is given by:
  • a module BAM determines a backlight image Bal from the luminance component L of the image I.
  • the backlight image Bal is the illumination map IM according to this embodiment of the step 1000.
  • a module Bl determines a backlight image Ba as being a weighted linear combination of shape functions ⁇ given by:
  • determining a backlight image Ba from a luminance component L consists in finding optimal weighting coefficients (and potentially also optimal shape functions if not known beforehand) in order that the backlight image Ba fits the luminance component L.
  • weighting coefficients a There are many well-known methods to find the weighting coefficients a,. For example, one may use a least mean square method to minimize the mean square error between the backlight image Ba and the luminance component L.
  • the invention is not limited to any specific method to obtain the backlight image Ba.
  • shape functions may be the true physical response of a display backlight (made of LED's for instance, each shape function then corresponding to the response of one LED) or may be a pure mathematical construction in order to fit the luminance component at best.
  • the backlight image Bal output from step 1020, is the backlight image Ba given by equation (1 ).
  • a module BM modulates the backlight image Ba (given by equation (1 )) with a mean luminance value L mean of the image I obtained by the means of a module HL.
  • the backlight image Bal output from step 1020, is the modulated backlight image.
  • the module HL is configured to calculate the mean luminance value L mean over the whole luminance component L.
  • the module HL is configured to calculate the mean luminance value L mean by
  • This last embodiment is advantageous because it avoids that the mean luminance value L mean be influenced by a few pixels with extreme high values which usually leads to very annoying temporal mean brightness instability when the image I belongs to a sequence of images.
  • the invention is not limited to a specific embodiment for calculating the mean luminance value L mean .
  • a module N normalizes the backlight image Ba (given by equation (1 )) by its mean value E(Ba) such that one gets a mid-gray-at-one backlight image Ba gray for the image (or for all images if the image I belongs to a sequence of images):
  • the module BM is configured to modulate the mid-gray-at-one backlight image Ba gray with the low-spatial-frequency version L lf of the image L, by using the following relation with cst mod being a modulation coefficient and a being another modulation coefficient less than 1 , typically 1/3.
  • the backlight image Bal output from step 1020, is the modulated backlight image Ba mod given by equation (2).
  • the modulation coefficient cst mod is tuned to get a good looking brightness for the residual image and highly depends on the process to obtain the backlight image. For example, cst m0C i ⁇ 1 .7 for a backlight image obtained by least means squares.
  • step 1 100 the data needed to determine the backlight image Bal, output from step 1020, are encoded by means of the encoder ENC1 and added in the bitstream F. According to an embodiment, these data are embedded in an SEI message as explained before.
  • the data to be encoded are limited to the weighting coefficients a t or a t when known non-adaptive shape functions are used, but the shape functions ⁇ , may also be a priori unknown and then encoded in the bitstream F, for instance in a case of a somewhat optimal mathematical construction for better fitting. So, all the weighting coefficients a t or a l (and potentially shape functions ⁇ ,) are encoded in the bitstream F.
  • the weighting coefficients a t or a l are quantized before encoded in order to reduce the size of the bitstream F.
  • a residual image Res is calculated by dividing the image by a decoded version Ba of the backlight image.
  • the luminance component L and potentially each colour component C(i) of the image I, obtained from the module IC, is divided by the decoded version Ba of the backlight image. This division is done pixel per pixel.
  • the decoded version Ba of the backlight image is processed before obtaining the residual image Res.
  • the process applied to the decoded version Ba of the backlight image may, for instance, be used to generate a processed backlight image of same resolution as its corresponding residual image.
  • the term 'decoded version Ba of the backlight image' will be used indifferently to represent the processed or non-processed decoded version Ba of the backlight image.
  • the said processed decoded version Ba of the backlight image is obtained from the decoded version Ba of the backlight image using parameters signaled in the said signalization data SD.
  • the decoded version Ba of the backlight image is obtained by decoding at least partially the bitstream F by means of the decoder DEC1 .
  • step 1020 some data needed to obtain the backlight image, output of step 1020, have been encoded (step 1 100) and then obtained by at least partially decoding the bitstream F.
  • weighting coefficients (and potentially shape functions 3 ⁇ 4) are then obtained as output of step 2100.
  • a module BAG generates a decoded version Ba of the backlight image from the weighting coefficients ( and either some known non-adaptive shape functions or the shape functions ⁇ by:
  • a module TMO tone-maps the residual image Res in order to get a viewable residual image Res v .
  • the residual image Res may not be viewable because its dynamic range is too high and because a decoded version of this residual image Res shows too visible artifacts. Tone-mapping the residual image remedies to at least one of these drawbacks.
  • the invention is not limited to any specific tone-mapping operator.
  • tone-mapping operator shall be reversible.
  • the tone-mapping operator defined by Boitard may be used (Boitard, R., Bouatouch, K., Cozot, R., Thoreau, D., & Gruson, A. (2012). Temporal coherency for video tone mapping. In A. M. J. van Eijk, C. C. Davis, S. M. Hammel, & A. K. Majumdar (Eds.), Proc. SPIE 8499, Applications of Digital Image Processing (p. 84990D-84990D-10)).
  • the encoder ENC2 is configured to encode the viewable residual image Res v in the bitstream F.
  • tone mapping the residual image comprises either a gamma correction or a SLog correction according to the pixel values of the residual image.
  • the viewable residual image Res v is then given, for example, by:
  • being a coefficient of a gamma curve equal, for example, to 1/2.4.
  • the viewable residual image Res v is given, for example, by:
  • a,b and c are coefficients of a SLog curve determined such that 0 and 1 are invariant, and the derivative of the SLog curve is continuous in 1 when prolonged by a gamma curve below 1 .
  • a,b and c are functions of the parameter /.
  • the parameter ⁇ of the gamma-Slog curve is encoded in the bitstream F.
  • the module TMO applies either the gamma correction or the SLog correction according to the pixel values of the residual image Res.
  • the viewable residual image Res v usually has a mean value more or less close to 1 depending on the brightness of the image I, making the use of the above gamma-Slog combination particularly efficient.
  • a module SCA scales the viewable residual image Res v before encoding (step 1300) by multiplying each component of the viewable residual image Res v by a scaling factor cst sca iing- The resulting residual image Res s is then given by
  • the scaling factor cst sca iing is defined to map the values of the viewable residual image Res v between from 0 to the maximum value 2 N -1 , where N is the number of bits allowed as input for the coding by the encoder ENC2.
  • the encoder ENC2 is configured to encode the residual image Res s .
  • step 1060 a module
  • CLI clips the viewable residual image Res v before encoding to limit its dynamic range to a targeted dynamic range TDR which is defined, for example, according to the capabilities of the encoder ENC2.
  • the resulting residual image Res c is given, for example, by:
  • the invention is not limited to such clipping (max(.)) but extends to any kind of clipping.
  • the encoder ENC2 is configured to allocate the residual image Res c .
  • the scaling and clipping embodiments leads to a residual image Res sc given by:
  • Ressc max(2 N , cst S caiing*Res v )
  • Ressc max(2 N , csts C aiing*Res s ) according to the embodiments of the method.
  • the encoder ENC2 is configured to encode the residual image Res sc -
  • the tone-mapping and scaling of the viewable residual image Res v is a parametric process.
  • the parameters may be fixed or not and in the latter case they may be encoded in the bitstream F by means of the ENC1 .
  • the constant value ⁇ of the gamma correction, the scaling factor cst sca iing may be parameters which are encoded in the bitstream F.
  • At least one parameter a, cst scaXingi y ⁇ is embedded in an SEI message as explained before.
  • the residual image Rl is either the viewable residual image Res v or Res s or Res c .
  • Fig. 7 shows a block diagram of the sub-steps of the step 2300 in accordance with an embodiment of the invention.
  • a backlight image B a ⁇ (a decoded illumination map) is obtained by at least partially decoding the bitstream F by means of the decoder DEC1 .
  • the bitstream F may have been stored locally or received from a communication network.
  • a decoded residual image Res is obtained by a at least partial decoding of a bitstream F by means of a decoder DEC2.
  • the decoded residual image Res ⁇ is viewable by a traditional apparatus.
  • a decoded image / is obtained by multiplying the decoded residual image Res ⁇ by the backlight image B a ⁇ .
  • the backlight image B a ⁇ is processed before obtaining the decoded image /.
  • the process applied to the backlight image Ba may, for instance, be used to generate a processed backlight image of same resolution as its corresponding decoded residual image Res ⁇ .
  • the term 'backlight image 3 ⁇ 4' will be used indifferently to represent the processed or non-processed backlight image ⁇ .
  • the said processed backlight image B a ⁇ is obtained from the backlight image So using parameters signaled in the said signalization data SD.
  • the parameters ⁇ and/or cstscaiing are also obtained either from a local memory or by a at least partial decoding of the bitstream BF by means of the decoder DEC1 .
  • a module ISCA applied an inverse scaling to the decoded residual image Res ⁇ by dividing the decoded residual image Res ⁇ by the parameter cst ⁇ ing .
  • step 2320 a module ITMO applied an inverse-tone-mapping to the decoded residual image Res ⁇ , by means of the parameters f.
  • the parameter ⁇ defines a gamma curve and the inverse-tone-mapping is just to find, from the gamma curve, the values which correspond to the pixel values of the decoded residual image Res ⁇ .
  • Fig. 8 shows a block diagram of the sub-steps of the step 1000 in accordance with an embodiment of the invention.
  • the image I to be encoded is split into multiple image block B and each image block B is considered as follows.
  • a module IC obtains each component of the image block B to be encoded.
  • the image block B comprises a luminance component L and potentially at least one colour component C(i) with i an index which identifies a colour component of the image block B.
  • the components of the image block B belong to a perceptual space, usually a 3D space, i.e. the image block B comprises a luminance component L and potentially at least one colour component C(i), for example two called C1 and C2 in what follows.
  • a perceptual space has a metric d((L, CI, C2), ⁇ V, CI', C2')) whose values are representative of, preferably proportional to, the differences between the visual perceptions of two points of said perceptual space.
  • the metric d((L,Cl,C2),(L',Cl',C2')) is defined such that a perceptual threshold AE 0 (also referred to as the JND, Just Noticeable Difference) exists below which a human being is not able to perceive a visual difference between two colours of the perceptual space, i.e.
  • this perceptual threshold is independent of the two points (L,C1,C2) and (L',C1',C2') of the perceptual space.
  • the metric may be calculated on a pixel base.
  • the image I comprises components belonging to a non perceptual space such as (R,G,B) for example
  • a perceptual transform is applied to the image I in order to obtain a luminance component L and potentially two colours components C1 and C2 which belong to the perceptual space.
  • Such a perceptual transform is defined from the lighting conditions of the display and depends on the initial colour space.
  • the image I is first transformed to the well-known linear space (X, Y, Z) (an inverse gamma correction may be potentially needed) and the resulting image is then transformed from reference lighting conditions of the display of a decoded version of the encoded image which are here a 3D vector of values (X n , Y n , Z n ) in the ( ⁇ , ⁇ , ⁇ ) space.
  • such a perceptual transform is defined as follows when the perceptual space LabCIE1976 is selected:
  • f is a conversion function for example given by:
  • d((L * , a * , b * ), (L * ', a * ', b * ')) 2 (AL * ) 2 + ( ⁇ * ) 2 + ⁇ Ah * ) 2 ⁇ (AE 0 ) 2 with AL * being the difference between the luminance components of the two colours (L * , a * , b * ) and (L * ', a * ', b *r ) and Aa * (respectively Ah * ) being the difference between the colour components of these two colours.
  • a perceptual transform is defined as follows:
  • the following Euclidean metric may be defined on the perceptual space Lu * v *
  • the invention is not limited to the perceptual space LabCIE1976 but mat be extended to any type of perceptual space such as the LabCIE1994, LabCIE2000, which are the same Lab space but with a different metric to measure the perceptual distance, or any other Euclidean perceptual space for instance.
  • Other examples are LMS spaces and IPT spaces.
  • a condition is that the metric shall be defined on these perceptual spaces in order that the metric is preferably proportional to the perception difference; as a consequence, a homogeneous maximal perceptual threshold AE 0 exists below which a human being is not able to perceive a visual difference between two colours of the perceptual space.
  • a module LF obtains a low-spatial-frequency version L lf of the luminance component L of the image I.
  • the low-spatial-frequency version L lf of the luminance component L of the image I is the illumination map IM according to this embodiment of the step 1000.
  • the module LF is configured to calculate the low-spatial-frequency version L lf per block by assigning to each pixel of a block a mean value computed by averaging the pixel values of the block.
  • the invention is not limited to a specific embodiment for computing a low-spatial-frequency version of the image I and any low-pass filtering or down-sampling of the luminance component of the image I may be used.
  • step 1 100 the encoded ENC1 is configured to encode the low- spatial-frequency L lf into the bitstream F.
  • a differential image Diff is obtained.
  • the differential image Diff comprises a differential luminance component L r which is obtained by calculating the difference between the luminance component L and a decoded version of the encoded low-spatial-frequency version L lf .
  • a module ASS0 is configured to associate each colour component of the image I with the differential luminance component L r in order to get a differential image Diff .
  • the image I comprises two colour components C1 and C2.
  • the colour components C1 and C2 are associated with the differential luminance component L r in order to get a differential image Diff comprising three components (L r , Cl, C2).
  • the encoder ENC2 is configured to encode the differential image Diff into the bitstream F.
  • the encoder ENC1 and/or ENC2 comprises an entropy encoding.
  • the coding precision of the encoder ENC2 depends on a perceptual threshold AE defining an upper bound of the metric in the perceptual space and enabling a control of the visual losses in a displayed decoded version of the image.
  • the perceptual threshold AE is determined according to reference lighting conditions of the display of a decoded version of the encoded image and the decoded version L ⁇ f of the low-spatial-frequency version L lf .
  • the brightness of the low-spatial-frequency version L lf is not constant over the image but changes locally.
  • the low-spatial-frequency version L lf is calculated per block by assigning to each pixel of a block a mean value computed by averaging the pixel values of the block, the perceptual threshold AE is constant over each block but the mean values of two blocks of the image may be different. Consequently, the perceptual threshold AE changes locally according to the brightness values of the image.
  • the local changes of the perceptual threshold AE are not limited to block-based changes but may extend to any zones defined over the image by means of any operator such as a segmentation operator based for example on the brightness values of the image.
  • step 2100 the decoded version f of the low-spatial-frequency version L lf is obtained by decoding the output from step 1 10, i.e. by decoding at least partially the bitstream F, by means of a decoder DEC2.
  • a decoder DEC1 implements inverse operations compared to the operations of the encoder ENC1 (step 1 100).
  • the perceptual threshold AE is determined from the ratio of the brightness value Yj of the decoded version of the low-spatial-frequency version L lf over the maximal environmental brightness value Y n .
  • the perceptual threshold AE is then given by:
  • AE enc is chosen close to AE 0 for visually lossless encoding and greater than AE 0 for an encoding with a control of the visual losses in a decoded version of the encoded image.
  • the reference lighting conditions of the display of a decoded version of the encoded image (X n , Y n , Z n ) which have a local character may be replaced by global reference lighting conditions of the display of a decoded version of the encoded image defined by
  • this replacement is equivalent to the choice of the perceptual threshold AE (4) because the encoding with a precision equals to AE of a color component a*' in the color space LabCIE1976, which is given by
  • the perceptual threshold AE is given by
  • the perceptual threshold AE in then is given by
  • E min is set to about 1/5. This is due to a contrast masking effect of the dark local brightness of the decoded version L ⁇ f of the low-spatial-frequency version L lf by the maximal environmental brightness value Y n .
  • a threshold TH is applied to the component(s) of the differential image Diff in order to limit the dynamic range of each of its components to a targeted dynamic ranges TDR.
  • each component of a differential image is normalized by means of the perceptual threshold AE, and the normalized differential image is then encoded at a constant encoding precision.
  • the precision of the encoding is thus a function of the perceptual threshold AE which changes locally and which is the optimal precision assuming that the perceptual space is ideal.
  • a precision to 1 of the encoding of the normalized differential image ensures that the differential image is encoded to a precision of AE as required.
  • the normalization of a component of a differential image by means of the perceptual threshold AE is the division of this component by a value which is a function of the perceptual threshold AE.
  • a component C of the differential image including both the differential luminance component and potentially each colour component, is then normalized, for example, as follows to get a normalized component C N : with being a value equals, for example, to 0.5 or 1 .
  • At least one parameter of the encoding of a differential image depends on the perceptual threshold AE.
  • a parameter of the quantification QP of such an encoding depends on the perceptual threshold AE.
  • a parameter QP exists in image/video coders like h264/AVC and HEVC and may be defined locally for each coding block.
  • an encoding with locally (block by block) varying precision through the differential image is performed by choosing the local QP insuring a coding precision of the perceptual threshold AE for each block.
  • Fig. 9 shows a block diagram of the sub-steps of the step 2300 in accordance with an embodiment of the invention.
  • bitstream F which represents an image I which comprises a luminance component and potentially at least one colour component.
  • the component(s) of the image I belong to a perceptual colour space as described above.
  • a decoded version L ⁇ f of the low- spatial-frequency version of the luminance component of the image I is obtained by decoding at least partially the bitstream F, by means of a decoder DEC1 .
  • step 2200 a decoded version of a differential image Diff is obtained by a at least partial decoding of the bitstream F by means of the decoder DEC2.
  • the decoded version of a differential image Diff comprises a differential luminance component L r , which represents the difference between a luminance component L of the image I and the decoded version L ⁇ f of the low-spatial-frequency version of the luminance component of the image I.
  • the decoded version of a differential image Diff comprises the differential luminance component L r , which represents the difference between a luminance component L of the image I and the decoded version L f of the low-spatial-frequency version of the luminance component of the image I, and each of said at least one color component of the image I.
  • step 2350 the decoded version of a differential image Diff and the decoded version f of the low-spatial-frequency version of the luminance component of the image are added together to get the decoded image /.
  • the decoding precision of the decoder DEC2 depends on a perceptual threshold AE defining an upper bound of a metric in a perceptual space described above and enabling a control of the visual losses in a displayed decoded version of the image.
  • the precision of the decoding is thus a function of the perceptual threshold which changes locally.
  • the perceptual threshold AE is determined, according to an embodiment, according to reference lighting conditions of the display of a decoded version of the encoded image (the same as those used for encoding) and the decoded version h ⁇ f of the low-spatial-frequency version of the luminance component of the image I.
  • the differential image is decoded at a constant precision and each component of the decoded version of the differential image Diff is re- normalized by the means the perceptual threshold AE.
  • the re-normalization is the multiplication by a value which is a function of the perceptual threshold AE.
  • each component C N of the decoded version of the differential image is re-normalized, for example, as follows:
  • C C" .
  • AE a with being a value equals, for example, to 0.5 or 1 .
  • a module IIC is configured to apply an inverse perceptual transform to the decoded image /, output from step 2250.
  • the estimate of the decoded image / is transformed to the well-known space (X, Y, Z).
  • ⁇ _1 ( ⁇ 6 ( ⁇ + 16 + 55 ⁇ ⁇ ⁇ )
  • the image in the space ( ⁇ , ⁇ , ⁇ ) is inverse transformed to get the decoded image in the initial space such as (R,G,B) space.
  • data of the bitstream F are also at least partially entropy-decoded.
  • the decoders DEC1 is configured to decode data which have been encoded by the encoders ENC1 , respectively ENC2.
  • the encoders ENC1 and/or ENC2 are not limited to a specific encoder (decoder) but when an entropy encoder (decoder) is required, an entropy encoder such as a Huffmann coder, an arithmetic coder or a context adaptive coder like Cabac used in h264/AVC or HEVC is advantageous.
  • the encoders ENC1 and ENC2 are not limited to a specific encoder which may be, for example, an image/video coder with loss like JPEG, JPEG2000, MPEG2, h264/AVC or HEVC.
  • the modules are functional units, which may or not be in relation with distinguishable physical units. For example, these modules or some of them may be brought together in a unique component or circuit, or contribute to functionalities of a software. A contrario, some modules may potentially be composed of separate physical entities.
  • the apparatus which are compatible with the invention are implemented using either pure hardware, for example using dedicated hardware such ASIC or FPGA or VLSI, respectively « Application Specific Integrated Circuit » « Field- Programmable Gate Array » « Very Large Scale Integration » or from several integrated electronic components embedded in a device or from a blend of hardware and software components.
  • Fig. 10 represents an exemplary architecture of a device 1000 which may be configured to implement a method described in relation with Fig. 1-9.
  • Device 1000 comprises following elements that are linked together by a data and address bus 1001 :
  • microprocessor 1002 which is, for example, a DSP (or Digital Signal Processor);
  • RAM or Random Access Memory
  • the battery 1006 is external to the device.
  • the word « register » used in the specification can correspond to area of small capacity (some bits) or to very large area (e.g. a whole program or large amount of received or decoded data).
  • ROM 1003 comprises at least a program and parameters. Algorithm of the methods according to the invention is stored in the ROM 1003. When switched on, the CPU 1002 uploads the program in the RAM and executes the corresponding instructions.
  • RAM 1004 comprises, in a register, the program executed by the CPU 1002 and uploaded after switch on of the device 1000, input data in a register, intermediate data in different states of the method in a register, and other variables used for the execution of the method in a register.
  • the implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program).
  • An apparatus may be implemented in, for example, appropriate hardware, software, and firmware.
  • the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs”), and other devices that facilitate communication of information between end-users.
  • PDAs portable/personal digital assistants
  • the device further comprises means for obtaining reference lighting conditions of the display of a decoded version of the encoded image such as a maximal environmental brightness value Y n .
  • the device comprises a display 1007 and the means for obtaining reference lighting conditions of the display of a decoded version of the encoded image are configured to determine such reference lighting conditions of the display of a decoded version of the encoded image from some characteristics of the display 1 007 or from lighting conditions around the display 1007 which are captured by the apparatus.
  • the means for obtaining a maximal environmental brightness value Y n are a sensor attached to the display and which measures the environmental conditions.
  • a photodiode or the like may be used to this purpose.
  • the image I is obtained from a source.
  • the source belongs to a set comprising:
  • a local memory e.g. a video memory or a RAM (or Random Access Memory), a flash memory, a ROM (or Read Only Memory), a hard disk ;
  • a storage interface (1005) e.g. an interface with a mass storage, a RAM, a flash memory, a ROM, an optical disc or a magnetic support;
  • a communication interface (1005) e.g. a wireline interface (for example a bus interface, a wide area network interface, a local area network interface) or a wireless interface (such as a IEEE 802.1 1 interface or a Bluetooth® interface); and
  • a wireline interface for example a bus interface, a wide area network interface, a local area network interface
  • a wireless interface such as a IEEE 802.1 1 interface or a Bluetooth® interface
  • an image capturing circuit e.g. a sensor such as, for example, a CCD (or Charge-Coupled Device) or CMOS (or Complementary Metal-Oxide-Semiconductor)).
  • a sensor such as, for example, a CCD (or Charge-Coupled Device) or CMOS (or Complementary Metal-Oxide-Semiconductor)
  • the decoded image ⁇ is sent to a destination; specifically, the destination belongs to a set comprising:
  • a local memory e.g. a video memory or a RAM, a flash memory, a hard disk ;
  • a storage interface (1005) e.g. an interface with a mass storage, a RAM, a flash memory, a ROM, an optical disc or a magnetic support;
  • a communication interface (1005) e.g. a wireline interface (for example a bus interface (e.g. USB (or Universal Serial Bus)), a wide area network interface, a local area network interface, a HDMI (High Definition Multimedia Interface) interface) or a wireless interface (such as a IEEE 802.1 1 interface, WiFi ® or a Bluetooth ® interface); and
  • a wireline interface for example a bus interface (e.g. USB (or Universal Serial Bus)
  • a wide area network interface e.g. USB (or Universal Serial Bus)
  • a wide area network interface e.g. USB (or Universal Serial Bus)
  • a local area network interface e.g. HDMI (High Definition Multimedia Interface) interface
  • a wireless interface such as a IEEE 802.1 1 interface, WiFi ® or a Bluetooth ® interface
  • the bitstream F are sent to a destination.
  • the bitstream F is stored in a local or remote memory, e.g. a video memory (1004) or a RAM (1004), a hard disk (1003).
  • the bitstream is sent to a storage interface (1005), e.g. an interface with a mass storage, a flash memory, ROM, an optical disc or a magnetic support and/or transmitted over a communication interface (1005), e.g. an interface to a point to point link, a communication bus, a point to multipoint link or a broadcast network.
  • the bitstream BF and/or F is obtained from a source.
  • the bitstream is read from a local memory, e.g. a video memory (1004), a RAM (1004), a ROM (1003), a flash memory (1003) or a hard disk (1003).
  • the bitstream is received from a storage interface (1005), e.g. an interface with a mass storage, a RAM, a ROM, a flash memory, an optical disc or a magnetic support and/or received from a communication interface (1005), e.g. an interface to a point to point link, a bus, a point to multipoint link or a broadcast network.
  • device 1000 being configured to implement an encoding method described in relation with Fig. 1 ,3-6 and 8, belongs to a set comprising:
  • device 1000 being configured to implement a decoding method described in relation with Fig. 2, 7 and 9, belongs to a set comprising:
  • the device A comprises means which are configured to implement a method for encoding an image as described in relation with the Fig. 1 and the device B comprises means which are configured to implement a method for decoding as described in relation with Fig. 2.
  • the network is a broadcast network, adapted to broadcast still images or video images from device A to decoding devices including the device B.
  • Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications.
  • equipment examples include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices.
  • the equipment may be mobile and even installed in a mobile vehicle.
  • the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette (“CD"), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory (“RAM”), or a read-only memory (“ROM”).
  • the instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination.
  • a processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
  • implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted.
  • the information may include, for example, instructions for performing a method, or data produced by one of the described implementations.
  • a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment.
  • Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal.
  • the formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream.
  • the information that the signal carries may be, for example, analog or digital information.
  • the signal may be transmitted over a variety of different wired or wireless links, as is known.
  • the signal may be stored on a processor-readable medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present invention generally relates to a method for encoding an image into a bitstream. The method is characterized in that it comprises: - encoding (1100) into the bitstream an illumination map determined (1000) from the image; and - encoding (1200) into the bitstream a signalization data indicating that the bitstream comprises the illumination map. The invention relates also a method and device for decoding a bitstream and also the bitstream itself.

Description

Method and device for encoding a high-dynamic range image into a bitstream and/or decoding a bitstream representing a high-dynamic range image. 1. Field of invention.
The present invention generally relates to image/video encoding and decoding. In particular, the technical field of the present invention is related to encoding of an image whose pixels values belong to a high-dynamic range, and decoding a bitstream representing a high-dynamic range image.
2. Technical background.
The present section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Low-Dynamic-Range images (LDR images) are images whose luminance values are represented with a limited number of bits (most often 8 or 10). This limited representation does not allow correct rendering of small signal variations, in particular in dark and bright luminance ranges. In high- dynamic range images (HDR images), the signal representation is extended in order to maintain a high accuracy of the signal over its entire range. In HDR images, pixel values are usually represented in floating-point format (either 32-bit or 16-bit for each component, namely float or half-float), the most popular format being openEXR half-float format (16-bit per RGB component, i.e. 48 bits per pixel) or in integers with a long representation, typically at least 16 bits. A typical approach for encoding an HDR image is to reduce the dynamic range of the image in order to encode the image by means of a traditional encoding scheme (initially configured to encode LDR images).
According to a first approach, a tone-mapping operator is applied to the input HDR image and the tone-mapped image is then encoded by means of a traditional 8-10 bit depth encoding scheme such as JPEG/JPEG200 or MPEG-2, H.264/AVC for video ('Advanced video coding for generic audiovisual Services", SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Recommendation ITU-T H.264, Telecommunication Standardization Sector of ITU, January 2012). Then, an inverse tone- mapping operator is applied to the decoded image and a residual image is calculated between the input image and the decoded and inverse-tone- mapped image. Finally, the residual image is encoded by means of a second traditional 8-10 bit-depth encoder scheme.
This first approach is backward compatible in the sense that a low dynamic range image may be decoded and displayed by means of a traditional apparatus.
However, this first approach uses two encoding schemes and limits the dynamic range of the input image to be twice the dynamic range of a traditional encoding scheme (16-20 bits). Moreover, such approach leads sometimes to a low dynamic range image with a weaker correlation with the input HDR image. This leads to low coding performance of the image.
According to a second approach, an illumination map is determined from the input HDR image. A residual image is then obtained from the image and the illumination map and both the illumination map and the residual image are directly encoded.
This specific approach for encoding an input HDR image is not backward compatibility with a traditional apparatus that is not able to decode and/or display a high-dynamic range image.
Moreover, this approach cannot be used in a usual communication infrastructure in which a HDR image is transmitted between two remote devices because the traditional transport means of such an infrastructure are not adapted to carry illumination maps.
3. Summary of the invention.
The invention sets out to remedy some of the drawbacks of the prior art with a method for encoding an image into a bitstream comprising:
- encoding into the bitstream an illumination map determined from the image; and
- encoding into the bitstream a signalization data indicating that the bitstream comprises the illumination map.
According to an embodiment of the method, the method further comprises:
- encoding into the bitstream a residual image determined from the image and the illumination map.
According to an embodiment, the illumination map is encoded as an auxiliary picture whose syntax conforms either to the H264/AVC or HEVC standard.
Auxiliary pictures have been defined in the H264/AVC or HEVC standard in addition to the so-called "primary coded picture", which actually correspond to the main stream (main video) of the content. Auxiliary pictures enable usually the transport of additional image information such as alpha compositing, chroma enhancement information or depth information for 3D applications.
According to an embodiment, the residual image is encoded as a primary picture whose syntax conforms either to the H264/AVC or HEVC standard.
This allows to get a bitstream representing an HDR image which is fully compliant with either the H264/AVC or HEVC standard : auxiliary data (i.e. illumination map) is transmitted according to the same order as the primary pictures coding order. The decoding method of auxiliary data, which takes place before the display, conforms to the HEVC specification, thus is used as is, in its already specified form.
According to an embodiment, the illumination map is a backlight image and the residual image is obtained by dividing the image by a decoded version of the backlight image.
According to an embodiment, the residual image is tone-mapped before encoding.
This provides a viewable residual image, i.e. a residual image in the sense that resulting residual image renders artistically the tone-mapped scene reasonably well and consistently compared to the original scene in the image. This method is thus backward compatible because the viewable residual image may be decoded and/or displayed by a traditional apparatus that is not able to handle high dynamic range. A legacy (non-HDR) H264/AVC or HEVC decoder can simply drop the illumination maps (which are not recognized by this legacy decoder), and only decode the residual images.
Moreover, encoding a high dynamic range image by means of such method leads to an efficient encoding scheme because the tone-mapped residual image (low dynamic range image), which is highly spatially correlated (and temporally correlated with other images of a same sequence of images), and the backlight image are encoded separately. A coding gain is thus reached because of the high compression rate of the tone-mapped residual image and of the little amount of data to encode the backlight image.
According to an embodiment, tone-mapping the residual image comprises either a gamma correction or a SLog correction according to the pixel values of the residual image.
Gamma and SLog corrections, such that there is no loss of dark and bright information, lead to the reconstruction of an HDR image, from the residual image and the backlight image, with high precision. Moreover, gamma and S-log corrections avoid flat clipped areas in both the reconstructed HRD image and the viewable residual image. According to an embodiment, the method further comprises scaling of the residual image before encoding.
This put the mean gray of an image obtained from the residual image at an adequate value for both viewing and coding.
According to an embodiment, the method further comprises clipping the residual image before encoding.
Clipping the residual image ensures a limited number of bits and allows the use of a traditional encoding/decoding scheme for encoding it. Also, the encoding/decoding scheme is backward compatible with existing infrastructure (codec, displays, distribution channels, etc.) because only the residual image, which has a low dynamic range, typically 8-10 bits, may be transmitted over such infrastructure to display a low dynamic range version of the image. The small bitstream, which contains the backlight data, may be carried in a side container over a dedicated infrastructure to distribute the original version of the image (i.e. a HDR image).
According to another of its aspects, the invention relates to a method for decoding a bitstream representing an image comprising:
- detecting in the bitstream if a signalization data indicates that the bitstream comprises data related to an illumination map determined from the image;
- obtaining a decoded illumination map by decoding the bitstream at least partially; and
- obtaining a decoded image from a decoded residual image and the decoded illumination map.
According to an embodiment, the decoded residual image is obtained by decoding the bitstream at least partially.
According to an embodiment, the signalization data is detected from high level syntax elements and its usage is completed by an SEI message.
According to an embodiment, the bitstream comprises a primary picture and an auxiliary picture whose syntax conforms with the standard H264/AVC or HEVC and wherein the primary picture represents the residual image and the auxiliary picture represents the illumination map. According to an embodiment, the decoded illumination map is a backlight image and wherein the decoded image is obtained by multiplying the decoded residual image by the backlight image.
According to an embodiment, the decoded residual image is inverse- tone-mapped before multiplying the decoded residual image by the backlight image.
According to an embodiment, of the method, the illumination map is a low-spatial-frequency version of the luminance component of the image to be encoded, and the residual image is obtained by calculating the difference between the luminance component of the image and a decoded version of the encoded low-spatial-frequency version.
According to another of its aspects, the invention relates to a bitstream representing an image, characterized in that it comprises a signalization data indicating that it represents an illumination map determined from the image.
According to another of its aspects, the invention relates to a device for encoding an image and a device for decoding a bitstream which implements the above methods.
The specific nature of the invention as well as other objects, advantages, features and uses of the invention will become evident from the following description of a preferred embodiment taken in conjunction with the accompanying drawings.
4. List of figures. The embodiments will be described with reference to the following figures:
- Fig. 1 shows a block diagram of the steps of a method for encoding an image into a bitstream in accordance with an embodiment of the invention;
- Fig. 2 represents a block diagram of a method for decoding a bitstream F representing an image in accordance with an embodiment of the invention; - Fig. 3 shows a block diagram of the sub-steps of the step 1000 in accordance with an embodiment of the invention;
- Fig. 4 shows a block diagram of the sub-steps of the step 1020 in accordance with an embodiment of the invention;
- Fig. 5 shows a block diagram of the sub-steps of the step 1020 in accordance with an embodiment of the invention;
- Fig. 6 shows a block diagram of the sub-steps of the step 1020 in accordance with an embodiment of the invention;
- Fig. 7 shows a block diagram of the sub-steps of the step 2300 in accordance with an embodiment of the invention;
- Fig. 8 shows a block diagram of the sub-steps of the step 1000 in accordance with an embodiment of the invention;
- Fig. 9 shows a block diagram of the sub-steps of the step 2300 in accordance with an embodiment of the invention;
- Fig. 10 shows an example of an architecture of a device in accordance with an embodiment of the invention; and
- Fig. 11 shows two remote devices communicating over a communication network in accordance with an embodiment of the invention; 5. Detailed description of preferred embodiments of the invention.
The present invention will be described more fully hereinafter with reference to the accompanying figures, in which embodiments of the invention are shown. This invention may, however, be embodied in many alternate forms and should not be construed as limited to the embodiments set forth herein. Accordingly, while the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit the invention to the particular forms disclosed, but on the contrary, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the claims. Like numbers refer to like elements throughout the description of the figures.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises", "comprising," "includes" and/or "including" when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Moreover, when an element is referred to as being "responsive" or "connected" to another element, it can be directly responsive or connected to the other element, or intervening elements may be present. In contrast, when an element is referred to as being "directly responsive" or "directly connected" to other element, there are no intervening elements present. As used herein the term "and/or" includes any and all combinations of one or more of the associated listed items and may be abbreviated as"/".
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element without departing from the teachings of the disclosure.
Although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.
Some embodiments are described with regard to block diagrams and operational flowcharts in which each block represents a circuit element, module, or portion of code which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in other implementations, the function(s) noted in the blocks may occur out of the order noted. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending on the functionality involved.
Reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" or "according to an embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments necessarily mutually exclusive of other embodiments.
Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.
While not explicitly described, the present embodiments and variants may be employed in any combination or sub-combination.
The invention is described for encoding/decoding an image but extends to the encoding/decoding of a sequence of images (video) because each image of the sequence is sequentially encoded/decoded as described below.
Fig. 1 shows a block diagram of the steps of a method for encoding an image into a bitstream in accordance with an embodiment of the invention.
In step 1000, a module PRP determines an illumination map IM from the image I to be encoded.
An illumination map gathers illumination data relative to the pixels of the image I to be encoded. For example, the illumination map may comprise a triplet of illumination values for each pixel of the image, each value of a triplet being an illumination value for a color component value of a pixel. In what follows, the illumination map is described as being either a backlight image or a low-frequency version of the luminance component of the image to be encoded but the invention is not limited to any specific representation of illumination values relative to an image to be encoded. In step 1 100, an encoder ENC1 encodes the illumination map IM into a bitstream F.
In step 1200, a module SM encodes into the bitstream F a signalization data SD indicating that the bitstream F comprises the illumination map IM.
According to an embodiment, the signalization data SD further comprises parameters related to the illumination map structure, and parameters related to the process to be applied to the decoded illumination map to reconstruct the image I.
According to an embodiment, the illumination map structure parameters comprises at least an image spatial resolution and an image samples bit-depth.
According to an embodiment of the method, in step 1300, an encoder ENC2 encodes into the bitstream F a residual image Rl determined from the image I and the illumination map IM.
Then, according to this embodiment, the signalization data SD is also adapted to synchronize the residual image with the illumination map in order, for example, to obtain a decoded image.
The bitstream F comprises the signalization data SD, the illumination map IM and, according to an embodiment, the residual image Rl. The bitstream F may be stored on a local or remote memory and/or transmitted through a communication interface (e.g. to a bus or over a communication network or a broadcast network).
Fig. 2 represents a block diagram of a method for decoding a bitstream F representing an image in accordance with an embodiment of the invention. The bitstream F may be obtained by the method of encoding an image as described in relation with Fig. 1 .
In step 2000, a module SMD detects in the bitstream F if a signalization data SD indicates that the bitstream F comprises data related to an illumination map determined from the image to be decoded.
In that case, in step 2100, a decoder DEC1 obtains the illumination map DIM by decoding the bitstream F at least partially. Potentially, parameters are also obtained by decoding the bitstream F.
According to an embodiment, these parameters are obtained from an SEI message in the bitstream F.
According to a variant of the embodiment, the signalization data is detected from high level syntax elements and its usage is completed by an SEI message.
In step 2200, a decoder DEC2 obtains a decoded residual image DRI from a memory or, according to an embodiment, by decoding the bitstream F at least partially.
In step 2300, a module POP obtains a decoded image / from the decoded residual image DRI and the decoded illumination map DIM.
According to an embodiment of step 1 100, the encoder ENC1 is configured to encode the illumination map IM as an auxiliary picture whose syntax is defined by the standard H264/AVC or HEVC (B. Bross, W.J. Han, G. J. Sullivan, J.R. Ohm, T. Wiegand JCTVC-K1003, "High Efficiency Video Coding (HEVC) text specification draft 9," Oct 2012), and the decoder DEC1 is configured to obtain (step 2100) an illumination map DIM from an auxiliary picture whose syntax is defined by the standard H264/AVC or HEVC.
The bitstream F then comprises an auxiliary picture whose syntax conforms with the standard H264/AVC or HEVC and the auxiliary picture represents the illumination map IM.
An auxiliary picture may be implemented by specifying new Video Coding Layer (VCL) NAL unit type(s) as done in H.264/AVC (cf recommendation ITU-T H.264, SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Infrastructure of audiovisual services - Coding of moving video, 03/2009). In H.264/AVC, an auxiliary picture corresponds to the NAL Unit type 19, as shown in the table below. The syntax and decoding process of an auxiliary picture is exactly identical to the syntax and decoding process of a primary (non auxiliary) coded picture. In other words, the decoding of the data related to the auxiliary pictures use the same decoding syntax and engines as when decoding the primary coded pictures.
Figure imgf000014_0001
Alternatively, an auxiliary coded picture may be implemented as a specific layer in a scalable encoding. This is what is recommended for HEVC, as explained in document JCTVC-O0041 :REXT/MV-HEVC/SHVC HLS: auxiliary picture layers, and specified in the most recent specification of the scalable extension of HEVC, document JCTVC-O1008, High efficiency video coding (HEVC) scalable extension Draft 4, November 2013). In this case, it is signaled in the high level syntax of the bitstream (namely in the Video Parameters set - VPS) that an enhancement scalable layer is added to the base layer which comprises the primary coded pictures (corresponding in the invention to the residual pictures). The scalability type of the enhancement scalable layer indicates that it corresponds to auxiliary coded pictures, using a syntax element scalability mask flag corresponding to the 'scalability mask index' as shown in the table below. scalability mask Scalability Scalabilityld
index dimension mapping
0 Reserved
1 Multiview View Order Index
2 spatial/SNR Dependencyld
scalability
3 Auxiliary Auxld
4-15 Reserved
In addition, a parameter 'Auxld', indicating the type of the auxiliary picture, is derived from the Scalability Id parameter, which is itself deduced from a syntax element dimension_id as explained in section F.7.4.3.1 .1 : Video parameter set extension semantics of document JCTVC-01008.
According to an embodiment of step 1300, the encoder ENC2 is configured to encode the residual image Rl as a primary picture whose syntax conforms either to the H264/AVC or HEVC standard, and the decoder DEC2 is configured to obtain (step 2200) a decoded residual image from a primary picture whose syntax is defined by the standard H264/AVC or HEVC. The primary coded and auxiliary coded pictures commonly use the syntax and decoding process specified in the AVC or HEVC specifications. There is no specific decoding process related to each type of picture. The main differences relate to high level syntax (e.g. NAL Unit Type, or scalability_mask_flag, as explained above).
In this configuration, according to an embodiment, a specific value of the parameter Auxld may be defined to indicate the nature of the auxiliary picture. In particular a specific value can be defined for the illumination map, as shown in the table below. As an example, Auxld equal to 1 corresponds to an alpha channel, Auxld equal to 2 corresponds to a depth map, Auxld equal to 3 corresponds to an illumination map.
In another embodiment, the value of Auxld for the illumination map is the same as for the alpha channel. Indeed, the usage of alpha channel is partly similar to the usage of a backlight channel, since it consists in multiplying and scaling the input primary picture by the alpha map, with a final clipping operation to guarantee the signal to stay inside the min and max signal limits. The alpha channel concept can therefore be simply adapted to be used for the backlight map. The difference can come from different scaling and clipping values, signaled in the accompagnying SEI message.
Table F-1 - Mapping of Auxld to the type of auxiliary pictures
Figure imgf000016_0001
The bitstream F then further comprises a primary picture whose syntax conforms with the standard H264/AVC or HEVC and the primary picture represents the residual image Rl.
According to an embodiment of the step 1200, the signalization data SD is carried by an SEI message indicated according to a syntax conform with JCTVC-O1008. Then, the signalization data SD is detected in step 2000 from the message SEI according to this embodiment.
The SEI message may be used to carry parameters and/or indication for obtaining a decoded image from a decoded version of the auxiliary picture and a decoded version of the primary picture (decoded residual image).
As an example the syntax to signal an SEI message (sei_payload) is based on the syntax given by JCTVC-O0041/JCTVC-F0031 :
sei_payload( payloadType, payloadSize ) { Descriptor if( nal_unit_type = = PREFIX_SEI_NUT ) else if( payloadType = = 138 ) auxiliary_hdr_channel_info( payloadSize )
Else
reserved_sei_message( payloadSize ) else /* nal_unit_type = = SUFFIX_SEI_NUT 7 else if( payloadType = = 138 )
auxiliary_hdr_channel_info( payloadSize )
Else
reserved_sei_message( payloadSize ) if( more_data_in_payload( ) ) {
if( payload_extension_present( ) )
reserved_payload_extension_data u(v)
payload_bit_equal_to_one /* equal to 1 7 f(1 ) while( !byte_aligned( ) )
payload_bit_equal_to_zero /* equal to 0 7 f(1 )
}
}
Table 1
For example, the parameters and/or indication for obtaining a decoded image may be one of the following list:
• Color format
· Bit depth of the samples (may be different for the different color components)
• Picture size
• picture samples topology (e.g. regular sample topology or quinquonce samples topology)
· scaling factor cstscaiing,
• parameter y of a gamma-Slog curve
• parameters of a backlight image (a;, ^), where ψι correspond for instance to shape functions, and ai to weighting parameters associated to each of the ψι functions.
· reconstruction mode for obtaining a decoded image
• minimum and maximum clipping values of the reconstructed (HDR) image
Using the HEVC standard the syntax of the SEI message relative to these parameters is given, for example by: auxiliary_hdr_channel_info( payloadSize ) { Descriptor hdr_illumination_picture_color_format ue(v)
hdr_ illumination_picture_bith_depth_minus8 ue(v)
hdr_ illumination_picture_width ue(v)
hdr_ illumination_picture_height ue(v)
hdr_illumination_picture_scaling_type ue(v) if (hdr_illumination_picture_scaling_type==1)
hdr_illumination_shape_function_id ue(v)
if (hdr_illumination_picture_scaling_type == 2 )
hdr_shape_function_size_x ue(v)
hdr_shape_function_size_y ue(v) for( cy = 0; cy < hdr_shape_function_size_y; cy ++ )
for( cx = 0; cx < hdr_shape_function_size_x; cx ++ )
hdr_shape_function [cy ][cx] ue(v)
hdr_ ldr_scaling_factor ue(v)
hdr_ ldr_gamma_slog parameters ue(v)
hdr_reconstruction_mode ue(v)
}
Table 2
In table 2, the parameter hdr_illumination_picture_color_format specifies the color format of the illumination map, 4:2:0, 4:2:2, or 4:4:4 colour format or the array or a single sample of the array that compose a picture in monochrome format.
Figure imgf000018_0001
In table 2, the parameter hdr_illumination_picture_bith_depth_minus8 specifies the bit depth of the illumination map hdr_ illumination_picture_bith_depth = 8 + hdr_ illumination_picture_bith_depth_minus8.
According to an embodiment, one bit-depth is signaled for the luma component and one for the chroma components.
In table 2, the parameter hdr_ illumination picture width specifies the horizontal size of the illumination map, the parameter hdr_ illumination picture height specifies the vertical size of the illumination map, and the parameter hdr_illumination_picture_scaling_type specifies a scaling process of the illumination map to get a full definition image.
Figure imgf000019_0001
Table 3
In table 3, the parameter hdr_shape_function_size_x specifies the width of a shape function (ψι) used to determine a backlight image, the parameter hdr_shape_function_size_y specifies the height of a shape function used to determine a backlight image, the parameter hdr_shape_function [cy][cx] gives the value of a scaling filter coefficient at the position (cy, cx), the parameter hdr_ ldr_gamma_slog parameters specifies the parameters to an inverse tone-mapping, and the parameter hdr_ ldr_scaling_factor specifies a scaling process of the residual image to get a full definition image. The default value is equal to 120.
According to an embodiment, the width and height of the shape function depends on
• Its position in the picture
· The value of the auxiliary picture to which it applies
• The size of the block to which it applies
• The color component to which it applies
For instance, one shape function may be defined per block size. In another example, one shape function may be defined for the luma component, and one for the chroma components.
In another example, one shape function may be defined per luma value range. For instance, for luma being between 0 to 127, a first shape function applies, with a large width and height. For luma being between 128 to 255, a second shape function applies, with a smaller width and height in order to limit the propagation of large values to the neighboring areas.
These parameters are discussed in details in what follows.
It may be noted that the parameters specified in the SEI message maintain their validity until a new SEI message is received. In that case the new parameter values are overwritten.
Fig. 3 shows a block diagram of the sub-steps of the step 1000 in accordance with an embodiment of the invention.
In step 1010, a module IC obtains the luminance component L and potentially at least one color component C(i) of the image I to be encoded.
For example, when the image I belongs to the color space (Χ,Υ,Ζ), the luminance component L is obtained by a transform f(.) of the component Y, e.g. L=f(Y).
When the image I belongs to the color space (R,G,B), the luminance component L is obtained, for instance in the 709 gamut, by a linear combination which is given by:
L=0.2127.R+0.7152.G+0.0722.B
In step 1020, a module BAM determines a backlight image Bal from the luminance component L of the image I.
The backlight image Bal is the illumination map IM according to this embodiment of the step 1000.
According to an embodiment of the step 1020, illustrated in Fig. 4, a module Bl determines a backlight image Ba as being a weighted linear combination of shape functions ψι given by:
Ba =∑i Uii i (1 )
with cii being weighting coefficients. Thus, determining a backlight image Ba from a luminance component L consists in finding optimal weighting coefficients (and potentially also optimal shape functions if not known beforehand) in order that the backlight image Ba fits the luminance component L.
There are many well-known methods to find the weighting coefficients a,. For example, one may use a least mean square method to minimize the mean square error between the backlight image Ba and the luminance component L.
The invention is not limited to any specific method to obtain the backlight image Ba.
It may be noted that the shape functions may be the true physical response of a display backlight (made of LED's for instance, each shape function then corresponding to the response of one LED) or may be a pure mathematical construction in order to fit the luminance component at best.
According to this embodiment, the backlight image Bal, output from step 1020, is the backlight image Ba given by equation (1 ).
According to an embodiment of the step 1020, illustrated in Fig. 5, a module BM modulates the backlight image Ba (given by equation (1 )) with a mean luminance value Lmean of the image I obtained by the means of a module HL.
According to this embodiment, the backlight image Bal, output from step 1020, is the modulated backlight image.
According to an embodiment, the module HL is configured to calculate the mean luminance value Lmean over the whole luminance component L.
According to an embodiment, the module HL is configured to calculate the mean luminance value Lmean by
1
Lmean E^L^^fi
with β being a coefficient less than 1 and E(X) the mathematical expectation value (mean) of the luminance component L.
This last embodiment is advantageous because it avoids that the mean luminance value Lmean be influenced by a few pixels with extreme high values which usually leads to very annoying temporal mean brightness instability when the image I belongs to a sequence of images.
The invention is not limited to a specific embodiment for calculating the mean luminance value Lmean.
According to a variant of this embodiment, illustrated in Fig. 6, a module N normalizes the backlight image Ba (given by equation (1 )) by its mean value E(Ba) such that one gets a mid-gray-at-one backlight image Bagray for the image (or for all images if the image I belongs to a sequence of images):
Ba
Bagray = ~E(Ba
Then, the module BM is configured to modulate the mid-gray-at-one backlight image Bagray with the low-spatial-frequency version Llf of the image L, by using the following relation with cstmod being a modulation coefficient and a being another modulation coefficient less than 1 , typically 1/3.
According to this variant, the backlight image Bal, output from step 1020, is the modulated backlight image Bamod given by equation (2).
It may be noted that the modulation coefficient cstmod is tuned to get a good looking brightness for the residual image and highly depends on the process to obtain the backlight image. For example, cstm0Ci ~ 1 .7 for a backlight image obtained by least means squares.
Practically, by linearity, all operations to modulate the backlight image apply to the backlight coefficients at as a correcting factor which transforms the coefficients at into new coefficients ( such that one gets
Bamod = ^ αι Ψί
i
According to this embodiment of the step 1000, in step 1 100, the data needed to determine the backlight image Bal, output from step 1020, are encoded by means of the encoder ENC1 and added in the bitstream F. According to an embodiment, these data are embedded in an SEI message as explained before.
For example, the data to be encoded are limited to the weighting coefficients at or at when known non-adaptive shape functions are used, but the shape functions ψ, may also be a priori unknown and then encoded in the bitstream F, for instance in a case of a somewhat optimal mathematical construction for better fitting. So, all the weighting coefficients at or al (and potentially shape functions ψ,) are encoded in the bitstream F.
Advantageously, the weighting coefficients at or al are quantized before encoded in order to reduce the size of the bitstream F.
In step 1030, a residual image Res is calculated by dividing the image by a decoded version Ba of the backlight image.
It is advantageous to use the decoded version Ba of the backlight image to ensure a same backlight image on both encoder and decoder side, thus leading to a better precision of a final decoded image /.
More precisely, the luminance component L and potentially each colour component C(i) of the image I, obtained from the module IC, is divided by the decoded version Ba of the backlight image. This division is done pixel per pixel.
For example, when the components R, G or B of the image I are expressed in the color space (R,G,B), the component RRes, GRes and BRes are obtained as follows:
Rres= RlBa, Gres= G/Ba, Bres= B/Ba, For example, when the components X, Y or Z of the image I are expressed in the color space (Υ,Υ,Ζ), the component XRes, YRes and ZRes are obtained as follows:
Xres= X/Ba Yres= ΎΐΒα Zres= ZlBa
According to a variant of step 130, the decoded version Ba of the backlight image is processed before obtaining the residual image Res.
The process applied to the decoded version Ba of the backlight image may, for instance, be used to generate a processed backlight image of same resolution as its corresponding residual image. In what follows, the term 'decoded version Ba of the backlight image' will be used indifferently to represent the processed or non-processed decoded version Ba of the backlight image.
According to an embodiment, the said processed decoded version Ba of the backlight image is obtained from the decoded version Ba of the backlight image using parameters signaled in the said signalization data SD.
According to an embodiment, in step 2100, the decoded version Ba of the backlight image is obtained by decoding at least partially the bitstream F by means of the decoder DEC1 .
As explained before, some data needed to obtain the backlight image, output of step 1020, have been encoded (step 1 100) and then obtained by at least partially decoding the bitstream F.
Following the example given above, weighting coefficients ( (and potentially shape functions ¾) are then obtained as output of step 2100.
Then, in step 1070, a module BAG generates a decoded version Ba of the backlight image from the weighting coefficients ( and either some known non-adaptive shape functions or the shape functions ι by:
=∑i ^ipl
In step 1040, a module TMO tone-maps the residual image Res in order to get a viewable residual image Resv.
It may appear that the residual image Res may not be viewable because its dynamic range is too high and because a decoded version of this residual image Res shows too visible artifacts. Tone-mapping the residual image remedies to at least one of these drawbacks.
The invention is not limited to any specific tone-mapping operator.
This single condition is that the tone-mapping operator shall be reversible.
For example, the tone-mapping operator defined by Boitard may be used (Boitard, R., Bouatouch, K., Cozot, R., Thoreau, D., & Gruson, A. (2012). Temporal coherency for video tone mapping. In A. M. J. van Eijk, C. C. Davis, S. M. Hammel, & A. K. Majumdar (Eds.), Proc. SPIE 8499, Applications of Digital Image Processing (p. 84990D-84990D-10)). According to this embodiment of the step 1000, in step 1300, the encoder ENC2 is configured to encode the viewable residual image Resv in the bitstream F.
According to an embodiment of the step 1040, tone mapping the residual image comprises either a gamma correction or a SLog correction according to the pixel values of the residual image.
The viewable residual image Resv is then given, for example, by:
Resv = A. Resr
with A being a constant value, γ being a coefficient of a gamma curve equal, for example, to 1/2.4.
Alternatively, the viewable residual image Resv is given, for example, by:
Resv = a. \n(Res + b) + c
with a,b and c being coefficients of a SLog curve determined such that 0 and 1 are invariant, and the derivative of the SLog curve is continuous in 1 when prolonged by a gamma curve below 1 . Thus, a,b and c are functions of the parameter /.
According to an embodiment, the parameter γ of the gamma-Slog curve is encoded in the bitstream F.
Applying a gamma correction on the residual image Res, pulls up the dark regions but does not lower enough high lights to avoid burning of bright pixels.
Applying a SLog correction on the residual image Res lowers enough high lights but does not pull up the dark regions.
Then, according to an embodiment of the step 1040, the module TMO applies either the gamma correction or the SLog correction according to the pixel values of the residual image Res.
For example, when the pixel value of the residual image Res is below a threshold (equal to 1 ), then the gamma correction is applied and otherwise the SLog correction is applied. By construction, the viewable residual image Resv usually has a mean value more or less close to 1 depending on the brightness of the image I, making the use of the above gamma-Slog combination particularly efficient.
According to an embodiment of the method, in step 1050, a module SCA scales the viewable residual image Resv before encoding (step 1300) by multiplying each component of the viewable residual image Resv by a scaling factor cstscaiing- The resulting residual image Ress is then given by
ReSs CStscaling- ReSv
In a specific embodiment, the scaling factor cstscaiing is defined to map the values of the viewable residual image Resv between from 0 to the maximum value 2N-1 , where N is the number of bits allowed as input for the coding by the encoder ENC2.
This is naturally obtained by mapping the value 1 (which is roughly the mean value of the viewable residual image Resv) to the mid-gray value 2N"1. Thus, for a viewable residual image Resv with a standard number of bits N=8, a scaling factor equal to 120 is a very consistent value because very closed to the neutral gray at 27=128.
According to this embodiment of the method, in step 1300, the encoder ENC2 is configured to encode the residual image Ress.
According to an embodiment of the method, in step 1060, a module
CLI clips the viewable residual image Resv before encoding to limit its dynamic range to a targeted dynamic range TDR which is defined, for example, according to the capabilities of the encoder ENC2.
According to this last embodiment, the resulting residual image Resc is given, for example, by:
Resc = max(2N, Resv)
Resc = max(2N, Ress)
according to the embodiments of the method.
The invention is not limited to such clipping (max(.)) but extends to any kind of clipping.
According to this embodiment of the method, in step 1300, the encoder ENC2 is configured to encore the residual image Resc. Combining the scaling and clipping embodiments leads to a residual image Ressc given by:
Ressc = max(2N, cstScaiing*Resv)
or by Ressc = max(2N, cstsCaiing*Ress) according to the embodiments of the method.
According to this embodiment of the method, in step 1300, the encoder ENC2 is configured to encode the residual image Ressc-
The tone-mapping and scaling of the viewable residual image Resv is a parametric process. The parameters may be fixed or not and in the latter case they may be encoded in the bitstream F by means of the ENC1 .
According to an embodiment of the method, the constant value γ of the gamma correction, the scaling factor cstscaiing may be parameters which are encoded in the bitstream F.
It may be noted that the choice of the parameters a, cstmod, cstscaiing, Y, gives room for the choice of the tone-mapping which suits the content the best following the taste of an expert in post-production and color grading.
On the other hand, universal parameters may be defined in order to be acceptable for all of a large variety of images. Then, no parameters are encoded in the bitstream F.
According to an embodiment, at least one parameter a, cstscaXingi y^ is embedded in an SEI message as explained before.
According to this embodiment of the step 1000, the residual image Rl is either the viewable residual image Resv or Ress or Resc.
Fig. 7 shows a block diagram of the sub-steps of the step 2300 in accordance with an embodiment of the invention.
As explained above, in steps 2100 and 1070, a backlight image Ba ~ (a decoded illumination map) is obtained by at least partially decoding the bitstream F by means of the decoder DEC1 .
The bitstream F may have been stored locally or received from a communication network. In step 2200, a decoded residual image Res is obtained by a at least partial decoding of a bitstream F by means of a decoder DEC2.
As explained below, the decoded residual image Res~ is viewable by a traditional apparatus.
In step 2340, a decoded image / is obtained by multiplying the decoded residual image Res~ by the backlight image Ba~.
According to a variant of this embodiment, the backlight image Ba~ is processed before obtaining the decoded image /.
The process applied to the backlight image Ba may, for instance, be used to generate a processed backlight image of same resolution as its corresponding decoded residual image Res~ . In what follows, the term 'backlight image ¾' will be used indifferently to represent the processed or non-processed backlight image ίζ.
According to an embodiment, the said processed backlight image Ba~ is obtained from the backlight image So using parameters signaled in the said signalization data SD.
According to an embodiment of step 2100, the parameters γ and/or cstscaiing are also obtained either from a local memory or by a at least partial decoding of the bitstream BF by means of the decoder DEC1 .
According to the method, in step 2310, a module ISCA applied an inverse scaling to the decoded residual image Res~ by dividing the decoded residual image Res~ by the parameter cst^ing .
In step 2320, a module ITMO applied an inverse-tone-mapping to the decoded residual image Res~ , by means of the parameters f.
For example, the parameter γ defines a gamma curve and the inverse-tone-mapping is just to find, from the gamma curve, the values which correspond to the pixel values of the decoded residual image Res~ .
Fig. 8 shows a block diagram of the sub-steps of the step 1000 in accordance with an embodiment of the invention.
According to this embodiment, the image I to be encoded is split into multiple image block B and each image block B is considered as follows. In step 1080, a module IC obtains each component of the image block B to be encoded. The image block B comprises a luminance component L and potentially at least one colour component C(i) with i an index which identifies a colour component of the image block B. The components of the image block B belong to a perceptual space, usually a 3D space, i.e. the image block B comprises a luminance component L and potentially at least one colour component C(i), for example two called C1 and C2 in what follows.
But the invention isn't limited neither to a grey image (no colour component) nor to an image with one, two or more colour components. When a grey level image is encoded as described below, do not consider the parts of the description which refer to the colour components.
A perceptual space has a metric d((L, CI, C2), {V, CI', C2')) whose values are representative of, preferably proportional to, the differences between the visual perceptions of two points of said perceptual space.
Mathematically speaking, the metric d((L,Cl,C2),(L',Cl',C2')) is defined such that a perceptual threshold AE0 (also referred to as the JND, Just Noticeable Difference) exists below which a human being is not able to perceive a visual difference between two colours of the perceptual space, i.e.
d((L,Cl,C2),(L',Cl',C2')) < AE0 , (3)
and this perceptual threshold is independent of the two points (L,C1,C2) and (L',C1',C2') of the perceptual space.
Thus, encoding an image whose components belong to a perceptual space in order that the metric d of equation (3) stays below the perceptual threshold AE0 ensures that the displayed decoded version of the image is visually lossless.
According to an embodiment, the metric may be calculated on a pixel base.
It may be noted that, in practice, it is easier to control the three following inequalities individually:
dL, L') < AE^, d Cl, CI') < AEQ 1 and dC2, C2') < AE 2 It may be noted that, if the equation (3) is fulfilled with a perceptual threshold greater than AE0 , it is said, in what follows, that the encoded image is visually controlled, i.e. the visual losses in a displayed decoded version of this image are controlled.
When the image I comprises components belonging to a non perceptual space such as (R,G,B) for example, a perceptual transform is applied to the image I in order to obtain a luminance component L and potentially two colours components C1 and C2 which belong to the perceptual space.
Such a perceptual transform is defined from the lighting conditions of the display and depends on the initial colour space.
For example, assuming the initial space is the (R,G,B) colour space, the image I is first transformed to the well-known linear space (X, Y, Z) (an inverse gamma correction may be potentially needed) and the resulting image is then transformed from reference lighting conditions of the display of a decoded version of the encoded image which are here a 3D vector of values (Xn, Yn, Zn) in the (Χ,Υ,Ζ) space.
Consequently, for example, such a perceptual transform is defined as follows when the perceptual space LabCIE1976 is selected:
L* = 116f(Y/Yn) - 16
a* = 500(f(X/Xn) - f(Y/Yn))
b* = 200(f(Y/Yn) - f(Z/Zn))
where f is a conversion function for example given by:
Figure imgf000030_0001
, 1 29\2 4
f(r) =— * (— J * r +— otherwise
The following metric may be defined on the perceptual space LabCIE1976:
d((L*, a*, b*), (L*', a*', b*'))2 = (AL*)2 + (Δα*)2 + {Ah*)2 < (AE0)2 with AL* being the difference between the luminance components of the two colours (L*, a*, b*) and (L*', a*', b*r) and Aa* (respectively Ah*) being the difference between the colour components of these two colours. According to another example, when the perceptual space Lu*v* is selected, a perceptual transform is defined as follows:
u* = 13L(u' - u'white) and v* = 13L(v' - v'white) where
, AX , 9Y
u = X+15Y+3Z, v = X+15Y+3Z, and
4Xn , = 9Yn
white χη+ΐ5Υη+3Ζη ' white χη+ΐ5Υη+3Ζη '
The following Euclidean metric may be defined on the perceptual space Lu*v*
d((L*,u*,v*),(L*',u*',v*'))2 = (AL)2 + (Au*)2 + (Av*)2 with AL* being the difference between the luminance components of the two colours (L*,u*,v*) and (L*',u*',v*r) and Au* (respectively Av*) being the difference between the colour components of these two colours.
The invention is not limited to the perceptual space LabCIE1976 but mat be extended to any type of perceptual space such as the LabCIE1994, LabCIE2000, which are the same Lab space but with a different metric to measure the perceptual distance, or any other Euclidean perceptual space for instance. Other examples are LMS spaces and IPT spaces. A condition is that the metric shall be defined on these perceptual spaces in order that the metric is preferably proportional to the perception difference; as a consequence, a homogeneous maximal perceptual threshold AE0 exists below which a human being is not able to perceive a visual difference between two colours of the perceptual space.
In step 1090, a module LF obtains a low-spatial-frequency version Llf of the luminance component L of the image I.
The low-spatial-frequency version Llf of the luminance component L of the image I is the illumination map IM according to this embodiment of the step 1000.
According to an embodiment, the module LF is configured to calculate the low-spatial-frequency version Llf per block by assigning to each pixel of a block a mean value computed by averaging the pixel values of the block. The invention is not limited to a specific embodiment for computing a low-spatial-frequency version of the image I and any low-pass filtering or down-sampling of the luminance component of the image I may be used.
In step 1 100, the encoded ENC1 is configured to encode the low- spatial-frequency Llf into the bitstream F.
In step 1 1 10, a differential image Diff is obtained. The differential image Diff comprises a differential luminance component Lr which is obtained by calculating the difference between the luminance component L and a decoded version of the encoded low-spatial-frequency version Llf .
Potentially, at step 1 120, a module ASS0 is configured to associate each colour component of the image I with the differential luminance component Lr in order to get a differential image Diff . According to the example, the image I comprises two colour components C1 and C2. Then, the colour components C1 and C2 are associated with the differential luminance component Lr in order to get a differential image Diff comprising three components (Lr, Cl, C2).
In step 1300, the encoder ENC2 is configured to encode the differential image Diff into the bitstream F.
Potentially, in step 1 100 and/or 1300, the encoder ENC1 and/or ENC2 comprises an entropy encoding.
The coding precision of the encoder ENC2 depends on a perceptual threshold AE defining an upper bound of the metric in the perceptual space and enabling a control of the visual losses in a displayed decoded version of the image.
In step 1 130, the perceptual threshold AE is determined according to reference lighting conditions of the display of a decoded version of the encoded image and the decoded version L^f of the low-spatial-frequency version Llf .
The brightness of the low-spatial-frequency version Llf is not constant over the image but changes locally. For example, if the low-spatial-frequency version Llf is calculated per block by assigning to each pixel of a block a mean value computed by averaging the pixel values of the block, the perceptual threshold AE is constant over each block but the mean values of two blocks of the image may be different. Consequently, the perceptual threshold AE changes locally according to the brightness values of the image.
The local changes of the perceptual threshold AE are not limited to block-based changes but may extend to any zones defined over the image by means of any operator such as a segmentation operator based for example on the brightness values of the image.
In step 2100, the decoded version f of the low-spatial-frequency version Llf is obtained by decoding the output from step 1 10, i.e. by decoding at least partially the bitstream F, by means of a decoder DEC2. Such a decoder DEC1 implements inverse operations compared to the operations of the encoder ENC1 (step 1 100).
According to an embodiment of the step 1 130, assuming, during the display of the image, a potential increase of the lighting until a maximal environmental brightness value Yn, the perceptual threshold AE is determined from the ratio of the brightness value Yj of the decoded version of the low-spatial-frequency version Llf over the maximal environmental brightness value Yn.
According to an embodiment of the step 1 130, when the coding degradation over the maximal environmental brightness value is forbidden, the perceptual threshold AE is then given by:
AE = AEenc(gf = ^§1^1 (4) with (Xn, Yn, Zn) being reference lighting conditions of the display of a decoded version of the encoded image and Yj being a value which represents the brightness of the decoded version of the low-spatial- frequency version Llf and AEenc being a perceptual encoding parameter. Typically, AEenc is chosen close to AE0 for visually lossless encoding and greater than AE0 for an encoding with a control of the visual losses in a decoded version of the encoded image. Thus, using such a perceptual threshold AE allows adapting the encoding to the environmental lighting conditions of the display.
Alternatively, the reference lighting conditions of the display of a decoded version of the encoded image (Xn, Yn, Zn) which have a local character, may be replaced by global reference lighting conditions of the display of a decoded version of the encoded image defined by
From a coding point of view (color coding), this replacement is equivalent to the choice of the perceptual threshold AE (4) because the encoding with a precision equals to AE of a color component a*' in the color space LabCIE1976, which is given by
a* = 500(f(X/Xn) - f (Y/Yn)) « 500 ((X/Xn) 3 - (Y/Yn)^3)
is equivalent to the encoding with a precision equals to AEenc of the color component a*' which is given by
a*' = 500(/(*/*n') - f (Y/Yn' ) « 500((¾')1/3 - 07 1/3) The same remark applies to the other component b* . Therefore, instead of changing the perceptual space locally, one just adapts the threshold from AEenc to AE.
According to an embodiment of the step 1 130, to avoid a sub-coding of the parts of the image having high brightness values, the perceptual threshold AE is given by
AE = AEencmin
Figure imgf000034_0001
where an upper-bound is set to AEencEmax, typically, Emax is set to 1 . This last equation means that the brightness of the decoded version of the low- spatial-frequency version Llf is never taken bigger than the maximal environmental brightness value Yn.
On the other hand, in order to avoid over-coding of the parts of the image having very low brightness values, the perceptual threshold AE in then is given by
Figure imgf000035_0001
where a lower-bound is set to AEencEmin; typically, Emin is set to about 1/5. This is due to a contrast masking effect of the dark local brightness of the decoded version L^f of the low-spatial-frequency version Llf by the maximal environmental brightness value Yn.
A combination of both bounds is sim ly obtained by
AE = AEencmin jmax
Figure imgf000035_0002
According to a variant of the method, at step 1 140, a threshold TH is applied to the component(s) of the differential image Diff in order to limit the dynamic range of each of its components to a targeted dynamic ranges TDR.
According to an embodiment of the step 1300, each component of a differential image is normalized by means of the perceptual threshold AE, and the normalized differential image is then encoded at a constant encoding precision.
Consequently, the precision of the encoding is thus a function of the perceptual threshold AE which changes locally and which is the optimal precision assuming that the perceptual space is ideal. By doing so, a precision to 1 of the encoding of the normalized differential image ensures that the differential image is encoded to a precision of AE as required.
According to an embodiment of the step 1300, the normalization of a component of a differential image by means of the perceptual threshold AE is the division of this component by a value which is a function of the perceptual threshold AE.
Mathematically speaking, a component C of the differential image, including both the differential luminance component and potentially each colour component, is then normalized, for example, as follows to get a normalized component CN: with being a value equals, for example, to 0.5 or 1 .
According to another embodiment of the step 1300, at least one parameter of the encoding of a differential image depends on the perceptual threshold AE.
For example, a parameter of the quantification QP of such an encoding depends on the perceptual threshold AE. Actually, such a parameter QP exists in image/video coders like h264/AVC and HEVC and may be defined locally for each coding block. In this example, an encoding with locally (block by block) varying precision through the differential image is performed by choosing the local QP insuring a coding precision of the perceptual threshold AE for each block.
Fig. 9 shows a block diagram of the sub-steps of the step 2300 in accordance with an embodiment of the invention.
It may be noted that, in what follows, is considered a bitstream F which represents an image I which comprises a luminance component and potentially at least one colour component. The component(s) of the image I belong to a perceptual colour space as described above.
In step 2100, as described above, a decoded version L^f of the low- spatial-frequency version of the luminance component of the image I is obtained by decoding at least partially the bitstream F, by means of a decoder DEC1 .
In step 2200, a decoded version of a differential image Diff is obtained by a at least partial decoding of the bitstream F by means of the decoder DEC2.
Thus, when the image I represented by the bitstream F is a grey level image, the decoded version of a differential image Diff comprises a differential luminance component Lr , which represents the difference between a luminance component L of the image I and the decoded version L^f of the low-spatial-frequency version of the luminance component of the image I. When the image I represented by the bitstream F is a colour image, i.e an image having a luminance component L and at least one colour component, the decoded version of a differential image Diff comprises the differential luminance component Lr, which represents the difference between a luminance component L of the image I and the decoded version Lf of the low-spatial-frequency version of the luminance component of the image I, and each of said at least one color component of the image I.
In step 2350, the decoded version of a differential image Diff and the decoded version f of the low-spatial-frequency version of the luminance component of the image are added together to get the decoded image /.
The decoding precision of the decoder DEC2 depends on a perceptual threshold AE defining an upper bound of a metric in a perceptual space described above and enabling a control of the visual losses in a displayed decoded version of the image. The precision of the decoding is thus a function of the perceptual threshold which changes locally.
As described above in relation with the steps 1 130, the perceptual threshold AE is determined, according to an embodiment, according to reference lighting conditions of the display of a decoded version of the encoded image (the same as those used for encoding) and the decoded version h^f of the low-spatial-frequency version of the luminance component of the image I.
According to an embodiment of the step 2200, when each component of a differential image has been normalized by means of the perceptual threshold AE, the differential image is decoded at a constant precision and each component of the decoded version of the differential image Diff is re- normalized by the means the perceptual threshold AE.
According to an embodiment of the step 2200, the re-normalization is the multiplication by a value which is a function of the perceptual threshold AE.
Mathematically speaking, each component CN of the decoded version of the differential image is re-normalized, for example, as follows:
C = C" . AEa with being a value equals, for example, to 0.5 or 1 .
According to a variant, in step 2360, a module IIC is configured to apply an inverse perceptual transform to the decoded image /, output from step 2250. For example, the estimate of the decoded image / is transformed to the well-known space (X, Y, Z).
When the perceptual space LabCIE1976 is selected, the inverse perceptual transform is given by:
^ = ^_1 (ϊΪ6 (Γ + 16 + 55θ α·)
Figure imgf000038_0001
z = z -1 - - (r + 16) + ^—b*]
nJ V116 200 J
When the perceptual space Luv is selected, the inverse perceptual transform is given by:
9Yu'
X—
4v'
Υ = ΥηΓ1 (T^L* + 16)
37(4 - u')
Z = , - 5Y
Potentially, the image in the space (Χ,Υ,Ζ) is inverse transformed to get the decoded image in the initial space such as (R,G,B) space.
Potentially, during the steps 2100 and/or 2200, data of the bitstream F are also at least partially entropy-decoded.
The decoders DEC1 , respectively DEC2, is configured to decode data which have been encoded by the encoders ENC1 , respectively ENC2.
The encoders ENC1 and/or ENC2 (and decoders DEC1 and/or DEC2) are not limited to a specific encoder (decoder) but when an entropy encoder (decoder) is required, an entropy encoder such as a Huffmann coder, an arithmetic coder or a context adaptive coder like Cabac used in h264/AVC or HEVC is advantageous.
The encoders ENC1 and ENC2 (and decoders DEC1 and DEC2) are not limited to a specific encoder which may be, for example, an image/video coder with loss like JPEG, JPEG2000, MPEG2, h264/AVC or HEVC. On Fig. 1-9, the modules are functional units, which may or not be in relation with distinguishable physical units. For example, these modules or some of them may be brought together in a unique component or circuit, or contribute to functionalities of a software. A contrario, some modules may potentially be composed of separate physical entities. The apparatus which are compatible with the invention are implemented using either pure hardware, for example using dedicated hardware such ASIC or FPGA or VLSI, respectively « Application Specific Integrated Circuit », « Field- Programmable Gate Array », « Very Large Scale Integration », or from several integrated electronic components embedded in a device or from a blend of hardware and software components.
Fig. 10 represents an exemplary architecture of a device 1000 which may be configured to implement a method described in relation with Fig. 1-9.
Device 1000 comprises following elements that are linked together by a data and address bus 1001 :
- a microprocessor 1002 (or CPU), which is, for example, a DSP (or Digital Signal Processor);
- a ROM (or Read Only Memory) 1003;
- a RAM (or Random Access Memory) 1004;
- an I/O interface 1005 for reception of data to transmit, from an application; and
- a battery 1006
According to a variant, the battery 1006 is external to the device. Each of these elements of Fig. 10 are well-known by those skilled in the art and won't be disclosed further. In each of mentioned memory, the word « register » used in the specification can correspond to area of small capacity (some bits) or to very large area (e.g. a whole program or large amount of received or decoded data). ROM 1003 comprises at least a program and parameters. Algorithm of the methods according to the invention is stored in the ROM 1003. When switched on, the CPU 1002 uploads the program in the RAM and executes the corresponding instructions. RAM 1004 comprises, in a register, the program executed by the CPU 1002 and uploaded after switch on of the device 1000, input data in a register, intermediate data in different states of the method in a register, and other variables used for the execution of the method in a register.
The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
According to an embodiment, the device further comprises means for obtaining reference lighting conditions of the display of a decoded version of the encoded image such as a maximal environmental brightness value Yn.
According to an embodiment, the device comprises a display 1007 and the means for obtaining reference lighting conditions of the display of a decoded version of the encoded image are configured to determine such reference lighting conditions of the display of a decoded version of the encoded image from some characteristics of the display 1 007 or from lighting conditions around the display 1007 which are captured by the apparatus.
For instance, the means for obtaining a maximal environmental brightness value Yn are a sensor attached to the display and which measures the environmental conditions. A photodiode or the like may be used to this purpose. According to a specific embodiment of encoding or encoder, the image I is obtained from a source. For example, the source belongs to a set comprising:
- a local memory (1003 or 1004), e.g. a video memory or a RAM (or Random Access Memory), a flash memory, a ROM (or Read Only Memory), a hard disk ;
- a storage interface (1005), e.g. an interface with a mass storage, a RAM, a flash memory, a ROM, an optical disc or a magnetic support;
- a communication interface (1005), e.g. a wireline interface (for example a bus interface, a wide area network interface, a local area network interface) or a wireless interface (such as a IEEE 802.1 1 interface or a Bluetooth® interface); and
- an image capturing circuit (e.g. a sensor such as, for example, a CCD (or Charge-Coupled Device) or CMOS (or Complementary Metal-Oxide-Semiconductor)).
According to different embodiments of the decoding or decoder, the decoded image ΐ is sent to a destination; specifically, the destination belongs to a set comprising:
- a local memory (1003 or 1004), e.g. a video memory or a RAM, a flash memory, a hard disk ;
- a storage interface (1005), e.g. an interface with a mass storage, a RAM, a flash memory, a ROM, an optical disc or a magnetic support;
- a communication interface (1005), e.g. a wireline interface (for example a bus interface (e.g. USB (or Universal Serial Bus)), a wide area network interface, a local area network interface, a HDMI (High Definition Multimedia Interface) interface) or a wireless interface (such as a IEEE 802.1 1 interface, WiFi ® or a Bluetooth ® interface); and
- a display. According to different embodiments of encoding or encoder, the bitstream F are sent to a destination. As an example, the bitstream F is stored in a local or remote memory, e.g. a video memory (1004) or a RAM (1004), a hard disk (1003). In a variant, the bitstream is sent to a storage interface (1005), e.g. an interface with a mass storage, a flash memory, ROM, an optical disc or a magnetic support and/or transmitted over a communication interface (1005), e.g. an interface to a point to point link, a communication bus, a point to multipoint link or a broadcast network.
According to different embodiments of decoding or decoder, the bitstream BF and/or F is obtained from a source. Exemplarily, the bitstream is read from a local memory, e.g. a video memory (1004), a RAM (1004), a ROM (1003), a flash memory (1003) or a hard disk (1003). In a variant, the bitstream is received from a storage interface (1005), e.g. an interface with a mass storage, a RAM, a ROM, a flash memory, an optical disc or a magnetic support and/or received from a communication interface (1005), e.g. an interface to a point to point link, a bus, a point to multipoint link or a broadcast network.
According to different embodiments, device 1000 being configured to implement an encoding method described in relation with Fig. 1 ,3-6 and 8, belongs to a set comprising:
- a mobile device ;
- a communication device ;
- a game device ;
- a tablet (or tablet computer) ;
- a laptop ;
- a still image camera;
- a video camera ;
- an encoding chip;
- a still image server ; and
- a video server (e.g. a broadcast server, a video-on-demand server or a web server). According to different embodiments, device 1000 being configured to implement a decoding method described in relation with Fig. 2, 7 and 9, belongs to a set comprising:
- a mobile device ;
- a communication device ;
- a game device ;
- a set top box;
- a TV set;
- a tablet (or tablet computer) ;
- a laptop ;
- a display and
- a decoding chip.
According to an embodiment illustrated in Fig. 11 , in a transmission context between two remote devices A and B over a communication network NET, the device A comprises means which are configured to implement a method for encoding an image as described in relation with the Fig. 1 and the device B comprises means which are configured to implement a method for decoding as described in relation with Fig. 2.
According to a variant of the invention, the network is a broadcast network, adapted to broadcast still images or video images from device A to decoding devices including the device B.
Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette ("CD"), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory ("RAM"), or a read-only memory ("ROM"). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.

Claims

CLAIMS 1 . Method for encoding an image into a bitstream, characterized in that it comprises:
- encoding (1 100) into the bitstream an illumination map determined (1000) from the image; and
- encoding (1200) into the bitstream a signalization data indicating that the bitstream comprises the illumination map.
2. Method according to the claim 1 , wherein it further comprises:
- encoding (1300) into the bitstream a residual image determined from the image and the illumination map.
3. Method according to the claim 1 or 2, wherein the illumination map is encoded as an auxiliary picture whose syntax conforms either to the H264/AVC or HEVC standard.
4. Method according to the claim 2 or 3, wherein the residual image is encoded as a primary picture whose syntax conforms either to the H264/AVC or HEVC standard.
5. Method according to one of the claims 1 to 4, wherein the illumination map is a backlight image and the residual image is obtained by dividing the image by a decoded version of the backlight image.
6. Method according to the claim 5, wherein the residual image is tone- mapped before encoding.
7. Method according to the claim 4, wherein tone-mapping the residual image comprises either a gamma correction or a SLog correction according to the pixel values of the residual image.
8. Method according to one of the claims 1 to 7, wherein the method further comprises scaling of the residual image before encoding.
9. Method according to one of the claims 1 to 8, wherein the method further comprises clipping the residual image before encoding.
10. Method for decoding a bitstream representing an image, characterized in that it comprises:
- detecting (2000) in the bitstream if a signalization data indicates that the bitstream comprises data related to an illumination map determined from the image to be decoded;
- obtaining (2100) a decoded illumination map by decoding the bitstream at least partially; and
- obtaining (2300) a decoded image from a decoded residual image and the decoded illumination map.
1 1 . Method according to the claim 10, wherein the decoded residual image is obtained by decoding the bitstream at least partially.
12. Method according to the claim 10 or 1 1 , wherein the signalization data is detected from high level syntax elements and its usage is completed by an SEI message.
13. Method according to one of the claims 10 to 12, wherein the bitstream comprises a primary picture and an auxiliary picture whose syntax conforms with the standard H264/AVC or HEVC and wherein the primary picture represents the residual image and the auxiliary picture represents the illumination map.
14. Method according to one of the claims 10 to 13, wherein the decoded illumination map is a backlight image and wherein the decoded image is obtained by multiplying the decoded residual image by the backlight image.
15. Method according to one of the claims 10 to 14, wherein the decoded residual image is inverse-tone-mapped before multiplying the decoded residual image by the backlight image.
16. Method according to one of the claims 1 to 4, wherein the illumination map is a low-spatial-frequency version of the luminance component of the image to be encoded, and the residual image is obtained by calculating the difference between the luminance component of the image and a decoded version of the encoded low-spatial-frequency version.
17. Device for encoding an image comprising means for:
- encoding (ENC1 ) into the bitstream an illumination map determined (1000) from the image; and
- encoding (ENC2) into the bitstream a signalization data indicating that the bitstream comprises the illumination map.
18. Device for decoding a bitstream representing an image, characterized in that it comprises:
- detecting (SMD) in the bitstream if a signalization data indicates that the bitstream represents an illumination map determined from the image;
- obtaining (DEC1 ) a decoded illumination map by decoding the bitstream at least partially; and
- obtaining (DEC2) a decoded image from a decoded residual image and the decoded illumination map.
19. Bitstream representing an image, characterized in that it comprises a signalization data indicating that it represents an illumination map determined from the image.
PCT/EP2014/078940 2013-12-27 2014-12-22 Method and device for encoding a high-dynamic range image into a bitstream and/or decoding a bitstream representing a high-dynamic range image WO2015097118A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP13306885 2013-12-27
EP13306885.8 2013-12-27

Publications (1)

Publication Number Publication Date
WO2015097118A1 true WO2015097118A1 (en) 2015-07-02

Family

ID=49955860

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2014/078940 WO2015097118A1 (en) 2013-12-27 2014-12-22 Method and device for encoding a high-dynamic range image into a bitstream and/or decoding a bitstream representing a high-dynamic range image

Country Status (2)

Country Link
TW (1) TW201540052A (en)
WO (1) WO2015097118A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3185561A1 (en) 2015-12-23 2017-06-28 THOMSON Licensing Methods and devices for encoding and decoding frames with a high dynamic range, and corresponding signal and computer program
US10250893B2 (en) 2015-06-15 2019-04-02 Interdigital Vc Holdings, Inc. Method and device for encoding both a high-dynamic range frame and an imposed low-dynamic range frame
WO2020081126A1 (en) * 2018-10-19 2020-04-23 Gopro, Inc. Tone mapping and tone control integrations for image processing
US11006151B2 (en) 2015-06-30 2021-05-11 Interdigital Madison Patent Holdings Sas Method and device for encoding both a HDR picture and a SDR picture obtained from said HDR picture using color mapping functions
US11178412B2 (en) 2015-01-30 2021-11-16 Interdigital Vc Holdings, Inc. Method and apparatus of encoding and decoding a color picture

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100046612A1 (en) * 2008-08-25 2010-02-25 Microsoft Corporation Conversion operations in scalable video encoding and decoding
EP2375383A2 (en) * 2004-04-23 2011-10-12 Dolby Laboratories Licensing Corporation Encoding, decoding and representing high dynamic range images
WO2011163114A1 (en) * 2010-06-21 2011-12-29 Dolby Laboratories Licensing Corporation Displaying images on local-dimming displays

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2375383A2 (en) * 2004-04-23 2011-10-12 Dolby Laboratories Licensing Corporation Encoding, decoding and representing high dynamic range images
US20100046612A1 (en) * 2008-08-25 2010-02-25 Microsoft Corporation Conversion operations in scalable video encoding and decoding
WO2011163114A1 (en) * 2010-06-21 2011-12-29 Dolby Laboratories Licensing Corporation Displaying images on local-dimming displays

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
CHOI B ET AL: "MV-HEVC/SHVC HLS: Carriage of auxiliary pictures", 6. JCT-3V MEETING; 25-10-2013 - 1-11-2013; GENEVA; (THE JOINT COLLABORATIVE TEAM ON 3D VIDEO CODING EXTENSION DEVELOPMENT OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL: HTTP://PHENIX.INT-EVRY.FR/JCT2/, no. JCT3V-F0048-v3, 21 October 2013 (2013-10-21), XP030131449 *
DAVID TOUZÉ ET AL: "HDR Video Coding based on Local LDR Quantization", HDRI2014 -SECOND INTERNATIONAL CONFERENCE AND SME WORKSHOP ON HDR IMAGING, 4 March 2014 (2014-03-04), XP055112158, Retrieved from the Internet <URL:http://people.irisa.fr/Ronan.Boitard/articles/2014/HDR Video Coding based on Local LDR Quantization.pdf> [retrieved on 20140404] *
FIRAS HASSAN ET AL: "High throughput JPEG2000 compatible encoder for high dynamic range images", IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 12 October 2008 (2008-10-12), pages 1424 - 1427, XP031374279, ISBN: 978-1-4244-1765-0 *
RAFAL MANTIUK: "Multidimensional retargeting: Tone Mapping", ACM SIGGRAPH ASIA 2011 COURSES: MULTIDIMENSIONAL IMAGE RETARGETING, 1 December 2011 (2011-12-01), pages 1 - 75, XP055112802, Retrieved from the Internet <URL:http://vcg.isti.cnr.it/Publications/2011/BAADEGMM11/talk1_tone_mapping.pdf> [retrieved on 20140408] *
SEGALL A ET AL: "Tone Mapping SEI Message: New results", 21. JVT MEETING; 78. MPEG MEETING; 20-10-2006 - 27-10-2006; HANGZHOU,CN; (JOINT VIDEO TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), no. JVT-U041, 17 October 2006 (2006-10-17), XP030006687, ISSN: 0000-0407 *
TAKAO JINNO ET AL: "High Contrast HDR Video Tone Mapping Based on Gamma Curves", IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS, COMMUNICATIONS AND COMPUTER SCIENCES, vol. E94A, no. 2, 1 February 2011 (2011-02-01), pages 525 - 532, XP001560934, ISSN: 0916-8508, [retrieved on 20110201], DOI: 10.1587/TRANSFUN.E94.A.525 *
TAKAO JINNO ET AL: "New local tone mapping and two-layer coding for HDR images", 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 25 March 2012 (2012-03-25) - 30 March 2012 (2012-03-30), pages 765 - 768, XP032227239, ISBN: 978-1-4673-0045-2, DOI: 10.1109/ICASSP.2012.6287996 *
TOUZE DAVID ET AL: "High dynamic range video distribution using existing video codecs", 2013 PICTURE CODING SYMPOSIUM (PCS), IEEE, 8 December 2013 (2013-12-08), pages 349 - 352, XP032566992, DOI: 10.1109/PCS.2013.6737755 *
YASIR SALIH ET AL: "Tone mapping of HDR images: A review", IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT AND ADVANCED SYSTEMS, 12 June 2012 (2012-06-12), pages 368 - 373, XP032238666, ISBN: 978-1-4577-1968-4, DOI: 10.1109/ICIAS.2012.6306220 *
YEU-HORNG SHIAU ET AL: "High Dynamic Range Image Rendering with Order-Statistics Filter", IEEE INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING, 25 August 2012 (2012-08-25), pages 352 - 355, XP032327353, ISBN: 978-1-4673-2138-9, DOI: 10.1109/ICGEC.2012.100 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11178412B2 (en) 2015-01-30 2021-11-16 Interdigital Vc Holdings, Inc. Method and apparatus of encoding and decoding a color picture
US10250893B2 (en) 2015-06-15 2019-04-02 Interdigital Vc Holdings, Inc. Method and device for encoding both a high-dynamic range frame and an imposed low-dynamic range frame
US11006151B2 (en) 2015-06-30 2021-05-11 Interdigital Madison Patent Holdings Sas Method and device for encoding both a HDR picture and a SDR picture obtained from said HDR picture using color mapping functions
EP3185561A1 (en) 2015-12-23 2017-06-28 THOMSON Licensing Methods and devices for encoding and decoding frames with a high dynamic range, and corresponding signal and computer program
WO2020081126A1 (en) * 2018-10-19 2020-04-23 Gopro, Inc. Tone mapping and tone control integrations for image processing
US11941789B2 (en) 2018-10-19 2024-03-26 Gopro, Inc. Tone mapping and tone control integrations for image processing

Also Published As

Publication number Publication date
TW201540052A (en) 2015-10-16

Similar Documents

Publication Publication Date Title
CN107736024B (en) Method and apparatus for video processing
US10735755B2 (en) Adaptive perceptual mapping and signaling for video coding
KR102367205B1 (en) Method and device for encoding both a hdr picture and a sdr picture obtained from said hdr picture using color mapping functions
JP7053722B2 (en) A method and apparatus for signaling a picture / video format of an LDR picture within a bitstream and a picture / video format of a decoded HDR picture obtained from this LDR picture and an illumination picture.
US9924178B2 (en) Method and device for encoding a high-dynamic range image and/or decoding a bitstream
US20160316215A1 (en) Scalable video coding system with parameter signaling
CN112042202B (en) Decoded picture buffer management and dynamic range adjustment
WO2015097118A1 (en) Method and device for encoding a high-dynamic range image into a bitstream and/or decoding a bitstream representing a high-dynamic range image
EP3107301A1 (en) Method and device for encoding both a high-dynamic range frame and an imposed low-dynamic range frame
TWI765903B (en) Video coding tools for in-loop sample processing
WO2019203973A1 (en) Method and device for encoding an image or video with optimized compression efficiency preserving image or video fidelity
WO2015193113A1 (en) Method and device for signaling in a bitstream a picture/video format of an ldr picture and a picture/video format of a decoded hdr picture obtained from said ldr picture and an illumination picture.
KR20190059006A (en) Method and device for reconstructing a display adapted hdr image
WO2015097135A1 (en) Method and device for encoding a high-dynamic range image
WO2015097126A1 (en) Method and device for encoding a high-dynamic range image and/or decoding a bitstream
EP3113494A1 (en) Method and device for encoding a high-dynamic range image
EP3099073A1 (en) Method and device of encoding/decoding a hdr and a sdr picture in/from a scalable bitstream
EP3272120A1 (en) Adaptive perceptual mapping and signaling for video coding
WO2015091323A1 (en) Method and device for encoding a high-dynamic range image
WO2015193114A1 (en) Method and device for signaling in a bitstream a picture/video format of an ldr picture and a picture/video format of a decoded hdr picture obtained from said ldr picture and an illumination picture
WO2015097124A1 (en) Method and device for encoding a high-dynamic range image and/or decoding a bitstream
WO2015097131A1 (en) Method and device for encoding a high-dynamic range image
WO2015097129A1 (en) Method and device for encoding a high-dynamic range image
WO2015097134A1 (en) Method and device for encoding a high-dynamic range image and/or decoding a bitstream
EP3272124A1 (en) Scalable video coding system with parameter signaling

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14815392

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14815392

Country of ref document: EP

Kind code of ref document: A1