EP3219108A1 - Method for encoding, video processor, method for decoding, video decoder - Google Patents

Method for encoding, video processor, method for decoding, video decoder

Info

Publication number
EP3219108A1
EP3219108A1 EP15788431.3A EP15788431A EP3219108A1 EP 3219108 A1 EP3219108 A1 EP 3219108A1 EP 15788431 A EP15788431 A EP 15788431A EP 3219108 A1 EP3219108 A1 EP 3219108A1
Authority
EP
European Patent Office
Prior art keywords
pixel
video
merging
signal
overlay
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP15788431.3A
Other languages
German (de)
English (en)
French (fr)
Inventor
Wiebe De Haan
Leon Maria Van De Kerkhof
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips NV filed Critical Koninklijke Philips NV
Publication of EP3219108A1 publication Critical patent/EP3219108A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/426Internal components of the client ; Characteristics thereof
    • H04N21/42653Internal components of the client ; Characteristics thereof for processing graphics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/98Adaptive-dynamic-range coding [ADRC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4854End-user interface for client configuration for modifying image parameters, e.g. image brightness, contrast
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments

Definitions

  • the invention relates to encoding a video image, in particular a high dynamic range image, and corresponding technical systems and methods to convey the necessary coded image information to a receiving side, and decoders to decode the coded images, and ultimately make them available for display.
  • HDR Ultra High Definition
  • HDR images requires a camera which can capture the increased dynamic range of at least above 11 stops, but preferably above 16 stops.
  • Current cameras of e.g. ARRI, RED and Sony are achieving about 14 stops.
  • Some HDR cameras use a slow and fast exposure and combine those in a single HDR image, other cameras use beam splitting towards two or more sensors of different sensitivity.
  • Display units are currently being developed that are able to provide a high brightness level and a very high contrast between dark parts of the image and bright parts of the image.
  • video information may be enhanced by providing adapted video information, e.g. taking into account the higher brightness and the HDR contrast range.
  • the traditional video information is called low dynamic range [LDR] video in this document.
  • LDR video information may be displayed on an HDR display unit in HDR display mode for improved contrast.
  • a more compelling image is achieved when the video information itself is generated in an HDR video format, e.g. exploiting the enhanced dynamic range for better visual effects or for improving visibility of textures in bright or dark areas while avoiding visual banding.
  • movie directors can locally enhance the experience, by e.g. emphasizing explosions, and/or improve visibility in bright or dark scenes/areas.
  • An HDR image may be an image which encodes the textures of an HDR scene (which may typically contain both very bright and dark regions) with sufficient information for high quality encoding of the color textures of the various captured objects in the scene, such that a visually good quality rendering of the HDR scene can be done on an HDR display with high peak brightness, like e.g. 5000 nit.
  • a typical HDR image comprises brightly colored parts or parts strongly illuminated compared to the average illumination.
  • night scenes becomes more and more important. In contrast with day scenes in which sun and sky illuminate each point the same, at night there may be only some light sources, which light the scene in a quadratically reducing manner.
  • HDR video (or even still image) encoding has only recently been researched.
  • a simpler single image approach is disclosed in WO2011/107905 and WO2012/153224.
  • This approach is based upon parametric encoding.
  • this approach also addresses displays with other peak brightnesses and dynamic ranges out there. Since there will also be displays of e.g. 500 or 100 nit, rather than to leave it blindly to the receiving side how to change the encoded high dynamic range image to some reasonable image by auto-conversion, color processing functions are co-encoded how to arrive at an appropriate image for the specific properties of the display, starting from the encoded HDR image. This process then results into an image optimized for that specific display, that a content creator could agree with.
  • HDR high dynamic range
  • the HDR image is to be displayed on a display.
  • future HDR displays will have different peak brightness levels depending on technology, design choices, cost considerations, market factors, etc.
  • the video signal received by the display will usually be graded for a specific reference display, which may not correspond to the characteristic of the display on which the video signal is to be presented.
  • the display receiving the HDR signal tries to adapt the video signal to match its own characteristics, including peak brightness level. If the receiver/display has no knowledge about the characteristics of the video signal and/or the grading that was applied, the resulting picture might not be in line with the artistic intent or might simply look bad.
  • dynamic range adaptation parameters/instructions may be and preferably are included with the video or conveyed otherwise to the display to provide processing information for optimizing the picture quality for the peak brightness level and other characteristics of the display on which the signal is displayed.
  • the adaptation parameters may operate on the whole picture area or may be constrained to certain areas of the picture.
  • the HDR display may on its own adapt the incoming HDR signal for instance if it knows the characteristics for the incoming signal, for instance if a standard has been used.
  • the display therefore adapts the incoming e.g. HDR signal.
  • the incoming signal could also be an LDR signal which is then displayed in HDR mode (note that this LDR signal may, although it is by itself suitable for direct display on an LDR display, implicitly be an encoding of an HDR image look, because it contains all necessary pixel color data which can be functionally mapped to a HDR image by co-encoded functions).
  • the display performs a dynamic range adaptation on the incoming signal for adjusting it to its characteristics (e.g. peak intensity, black level) before displaying it.
  • the display applies a mapping function which maps the incoming HDR (or
  • LDR LDR
  • HDR data on a set of HDR data which best (or at least better or at least such is the intention) fits the capabilities of the display, such as e.g. black level and peak brightness level of the display.
  • the so adapted HDR data is used for displaying the image on the display.
  • the mapping can be an upgrading of the image wherein the dynamic range of the displayed image is larger than the dynamic range of the original image as well as a downgrading of the image wherein the dynamic range is smaller than the dynamic range of the original image.
  • the video signal may be provided to the home in various ways, including through broadcast, through the internet or via packaged media. It may for instance be received by a set top box (STB) or via another video processing system as a compressed stream.
  • STB set top box
  • the set top box decodes the video and subsequently sends it as baseband video to the television set.
  • the coded video is stored on a storage medium, e.g. a DVD/Blu-ray disc or a Flash drive.
  • the playback device media player MP
  • the separate box is connected with the TV through a standard interface (e.g. HDMI, Display Port, or a wireless video interface).
  • STB set top boxes
  • MP media players
  • PG Graphics
  • BD-J java engine
  • the dynamic range adaptation may be on the basis of parameters that are sent along with the video, based upon an analysis of the image in the TV, based upon information sent along with the video signal, or any other method.
  • the dynamic range adaptation applies to the underlying video, not for the areas that contain the graphics overlays.
  • the dynamic range adaptation may change at certain instances (e.g. when the scene is changing), while subtitles or a menu overlay may be fixed during the change. This may e.g. result in unwanted changes in the appearance of the graphics at scene boundaries.
  • an improved approach for adapting video would be advantageous and in particular an approach allowing increased flexibility, improved dynamic range adaptation, improved perceived image quality, improved overlay and/or video image presentation (in particular when changing dynamic range) and/or improved performance would be
  • the Invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination.
  • an apparatus for decoding an HDR video signal merged from more than one source signal comprising: an input for receiving the HDR video signal, a reader for reading at least one or more least significant bits for one or more color components of the HDR video signal for a pixel and generating one or more values from the read least significant bits, the one or more values indicating a merging property for the pixel, the merging property being indicative of a property of a merging in the HDR video signal for that pixel of one or more overlay signals with a video image signal; and an adapter for adapting the HDR video signal, and wherein the adapter is arranged to adapt a pixel value in dependence on the one or more values.
  • the invention may provide improved performance in many embodiments. In many systems, it may provide improved rendering of images and video sequences comprising overlay on displays with different dynamic ranges.
  • the apparatus may be a video decoder, and specifically may be comprised in a display, such as in a television or monitor.
  • the adapter may be arranged to perform a dynamic range adaptation on the pixel in dependence on the one or more values.
  • the more than one source signals may include (or consist in) the one or more overlay signals and the video image signal.
  • the source signals may be video signals which at the encoding side is combined into a single video signal (the HDR video signal).
  • the apparatus comprises an input for a signal including information on which least significant bits to read and how to convert them to the one or more values.
  • the apparatus is arranged to receive data indicative of an encoding of the one or more values in the HDR video signal, and the reader is arranged to determine the one or more values in response to the data indicative of the encoding.
  • the adapter is arranged to perform a dynamic range adaptation on image pixels of the HDR video signal.
  • the adapter is arranged to adapt a mapping from an input dynamic range of the HDR video signal to an output dynamic range for the pixel in dependence on the one or more values.
  • the one or more values is indicative of a percentual contribution to the pixel from the video image signal relative to a percentual contribution from one or more overlay signals; and the adapter is arranged to apply a different mapping for different percentual contributions.
  • percentual may indicate a percentage/ proportion/ ratio for a contribution of the pixel value originating from a given source.
  • the apparatus comprises: an estimator for splitting the HDR video signal into a plurality of estimated signals, based on an estimate of a contribution of the video image signal and one or more overlay signals to pixels of the HDR video signal; a mixer for remixing the plurality of estimated signals following the adaptation and wherein the adaptor is arranged to separately adapt at least one of the plurality of estimated signals.
  • the contribution(s) may be indicated by the one or more values.
  • the adaptation may be different for at least two of the estimated signals.
  • the merging property is indicative of a merging type.
  • the merging type may specifically indicate whether the merging of the pixel is one from a group of possibly merging operations including at least one of: the merging includes a contribution only from the video image signal; the merging includes a contribution only from one overlay signal; the merging includes a contribution from both the video image signal and from at least one overlay signal.
  • the merging property for a pixel is indicative of an amount of merging of the video image signal and the one or more overlay signals for said pixel.
  • the amount of merging may reflect a weight of at least one of the one or more overlay signals relative to a weight of the input video image signal.
  • the merging may be a weighted summation
  • the adapter is arranged to, prior to adaptation, split the HDR video signal into more than one estimated signals estimating at least some of the one or more overlay signals and the video image signal based on an estimate of contribution of the at least some of the one or more overlay signals and the video image signal to a pixel value of the HDR video signal, where after at least one of the estimated signals is color transformed to adapt its luminance, and the more than one estimated signals are remixed following adaptation.
  • the merging property may be indicative of the pixel comprising first overlay content, the first overlay content originating from at least one of an overlay signal comprising locally generated graphic content or an overlay signal comprising a second video image signal which includes merged overlay content.
  • the adapter is arranged to adapt pixels within a region for which the merging property is indicative of the pixel comprising first overlay content to have output luminances within a predetermined range.
  • the predetermined range may e.g. for a display be a preset range reflecing the dynamic of the display.
  • the range may have an an upper limit of, say, 10% of the peak brightness, and a lower limit of e.g. the larger of 1% of peak brightness and 1 nit.
  • the predetermined range may be determined by the viewer etc.
  • a method of decoding an HDR video signal merged from more than one source signal comprising: receiving the HDR video signal, reading at least one or more least significant bits for one or more color components of the HDR video signal for a pixel and generating one or more values from the read one or more least significant bits, the one or more values indicating a merging property for the pixel, the merging property being indicative of a property of a merging in the HDR video signal of one or more overlay signals with a video image signal for that pixel; and adapting a pixel value in dependence on the one or more values.
  • an apparatus for encoding a video signal comprising: a merger for merging an input HDR video image signal with one or more overlay signals to form a merged video signal, a processor for generating one or more values indicating for a pixel in the merged video signal a merging property indicative of a property of the merging for that pixel; and an encoder for encoding for said pixel said one or more values in one or more least significant bits of one or more color components of a pixel value for the pixel in the merged video signal.
  • the image property is indicative of at least one of a merging type for said pixel and of an amount of merging of the input HDR video image signal and the one or more overlay signals.
  • the encoder is arranged to provide to the merged video signal an information signal comprising information on a property of the encoding of the one or more values in the one or more least significant bits.
  • a method of encoding a video signal comprising: merging an input HDR video image signal with one or more overlay signals to form a merged video signal, generating one or more values indicating for a pixel in the merged video signal a merging property indicative of a property of the merging for that pixel; and encoding for said pixel said one or more values in one or more least significant bits of one or more color components of a pixel value for the pixel in the merged video signal.
  • a method of encoding may comprise adding to a video image signal one or more overlay signals to form a merged video signal, generating one or more values indicating for a pixel a merging type and/or one or more merging parameters in the merged video signal and encoding for said pixel said one or more values in one or more least significant bits from one or more color components of the merged video signal
  • a video processor may comprise a merger for merging a video image signal and one or more overlays, typically graphics signals, to form a merged video signal, and an image encoder for generating or receiving one or more values indicating the merging type and/or one or more merging parameters for a pixel and for encoding for said pixel said value or values in one or more least significant bits from one or more color components of the merged video signal .
  • a value indicating merging type may be considered a value indicating from which signals the pixel in the merged video signal is composed. For instance, it may indicate whether the pixel is pure video, pure overlay, a mix of video and overlay, or possibly also what type of overlay is present in the mixture.
  • at least one value encoded in one or more LSBs indicates the merging type of the pixel.
  • such merging type information is sufficient to indicate the merging type of a pixel, for instance to be able to distinguish between pure video pixels and other pixels.
  • at least one of the values indicates merging type.
  • the one or more merging parameters may provide information on the amount of merging of signals, e.g. it may indicate the contribution of the individual signals to the merged signal for that pixel.
  • the amount of merging indicates for instance for pixels of mixed merging type whether it is for instance a 50%-50% mixture of video and overlay, or a 25%-75% mixture of overlay and video etc.
  • merging parameters indicative of the mixing ratio may be used. For instance if 6 LSBs, 3 in a first color component and 3 in another color component, are used where the 3 LSBs in the first component indicate the percentage of video mixed in and the other 3 LSBs in the other component indicate the amount of overlay mixed in, then the combination of the two triplets LSBs provide information on both type and merging amount. If the 3 LSB in the first component are all zero, then the pixel is pure overlay, if the 3 LSBs in the second component are all zero, then the pixel is pure video. If both the LSBs in the first and second component are non-zero, then the pixel is of mixed type (a mixing/ merging of video and overlay) and the amount of merging can be read from said LSBs.
  • a method for decoding comprises receiving a video signal, reading one or more of the least significant bits of one or more of the color components, generating from said least significant bits one or more values and subjecting the received video image signal to an adaptation prior to display, wherein the adaptation is dependent on the generated value or values.
  • the video decoder comprises an input for receiving a video signal, a reader for reading at least one or more least significant bits for one or more of color components of the video signal for a pixel and generating one or more values from the read least significant bits and an adapter for adapting the video, wherein the adapter is arranged for adapting a pixel value in dependence on the generated value or values.
  • the display receiving the data from a VPS Video processing system
  • a VPS Video processing system
  • a set top box or Blu-ray disc player wherein the video signal has been merged with overlays
  • a set top box and a Blu-ray disc player are examples of a video processing systems.
  • VPS is a consumer-side device.
  • the approach is not restricted to such at home devices.
  • the VPS could be a remote, i.e. not at home, device or system.
  • the VPS could be a box at a TV station making some mix of a program + subtitles, or, maybe even an insertion/recoding from an intermediate (e.g. a cable station adding interesting further text from a commercial company "buy this car").
  • the user could for instance choose a subtitle language wherein the language subtitles are merged not at the end user (e.g. at the home), but at the source, i.e. the signal being sent to the home user is a merged signal.
  • the VPS would then be a remote system.
  • the approach is not restricted by the physical location of the merger or the algorithm by which video and overlays are merged, the important aspect is that they are merged.
  • the merging does not necessarily have to take place in a single device or method step, for instance, an intermediate adds interesting text ("buy this car") on one part of the screen, while the TV station provides subtitles at another part of the screen. In that case two merging actions have taken place, or equivalently a distributed multi step merging has been performed.
  • Subtitles and other overlays do not form part of the original HDR video and often do not need boosting, or at least not to the same amount as the video itself.
  • the dynamic range adaptation parameters (and in a more general sense, any manipulation or change to the HDR video data to adapt the video data to the display, by whatever functional manipulation) often only apply to the underlying video, not for the areas that contain the graphics overlays.
  • the dynamic range adaptation parameters may change at certain instances (e.g. when the scene is changing), while subtitles or a menu overlay may be fixed during the change.
  • the display will (and in fact cannot do otherwise) process every pixel of the received merged video independent of whether a pixel forms part of an overlay or not.
  • the merger and the merging process is, as seen from the decoder side, a black box. There is an output, and there may and may have been some signals merged, but whether this has happened and how is unknown and cannot become known.
  • the least significant bit or bits of a component of the video signal are used, and in a sense sacrificed, (although in some technical implementations some bits may still be available for per pixel coding further information in addition to the video pixel color data), to provide information on whether and, in many examples to which amount, a pixel is formed by or part of an overlay (such as a subtitle or a menu) and/or by the original video..
  • This allows a display receiving the video signal generated by the method or device of the current approach to treat e.g. overlays or parts of the image that are a mix of the original video and overlays differently from parts of the image that do not comprise an overlay but only comprise the original video.
  • insight may be obtained on a pixel by pixel basis on the merging that has taken place in the merger.
  • a single least significant bit of one component is filled with an indicative value.
  • a single bit may be 0 or 1. This allows the video to distinguish on a pixel by pixel basis which pixels belong to the original video and which pixels belong to an overlay or a mix of video and overlay. In this simple arrangement, the above problems of e.g. co- fluctuating of subtitles with lamps can be mitigated.
  • the display is provided with information that allows it to distinguish an overlay, for instance subtitles, from video and to dynamically adapt the image rendition of the video, while keeping the subtitles for instance fixed or adapt them in a different manner to the video pixels.
  • the dynamic range adaptation may be applied to pixels belonging to the original video, while no or a fixed dynamic range adaptation is applied to pixels belonging to an overlay.
  • a color component may be for instance one of the RGB components of the signal or, if the signal is in the YUV format, one of the YUV components, e.g. the luminance Y of the pixel.
  • more than one last significant bit is filled with an indication value.
  • Dynamic range adaptation may for instance be applied to pixels that are fully original video, no dynamic range adaptation being applied to pixels that are fully overlay, and varying degrees of dynamic range adaptation being applied to pixels that are part original video, part overlay, dependent on the ratio of video to overlay.
  • the sacrifice of the least significant bits typically has little or no effect on the image. Often it is no sacrifice at all since the LSBs do not comprise highly visible image information.
  • the least significant bits (LSBs) to be used to convey the video/overlay information may be selected from the various components, while taking into account the bit depth of the original video.
  • the number of bits used per pixel for at least one color component on the interface is more than the number of bits in the original video (e.g. 12, or 10). In that case there may be no impact on the video quality at all, if only little information on the merging situation needs to be communicated. In many cases, one can simply use the additional bandwidth of the interface channel.
  • not more than 8 bits, divided over the 3 or more components of a pixel are filled with data indicative of the contribution of an overlay to the pixel values. Sacrificing more than 8 bits would typically not greatly improve the positive effect while starting to have an increased negative effect on image rendition and as stated, if the number of bits used in the merged signal is more than the original video no bits need to be sacrificed at all. In some embodiments, 6 or 4 indication bits may also communicate most useful information. In some embodiments, the number of bits used in the merged signal may exceed the number of bits in the video signal. In the following, the acronym LSB will also be used for "least significant bit".
  • the number of LSBs indicating a merging type and/or one or more merging parameters in the merged video signal may be variable, for instance dependent on the merging type.
  • a coding length signal may be provided so that at the decoding side the length of coding can be known.
  • the variable length coding can be provided by the merging type indication.
  • Information on the encoding of the data indicative of the merging property may be provided in LSBs separate from LSBs that provide information on the amount of merging, e.g. the transparency of overlays or a mixing proportion.
  • information is supplied to or in the signal, for instance in the form of metadata or flags, indicating which the encoding of the values being indicative of the merging property.
  • information may indicate the LSB or LSBs that are used for providing information on the merging, such as which signals were merged and or how they were merged, and how they can be read.
  • Fig. 1 illustrates an example of a merging of video signals.
  • Fig. 2 illustrates an example of a video processing system in accordance with some embodiments of the invention.
  • Fig. 3 illustrates an example of a video processing system in accordance with some embodiments of the invention.
  • Figs.. 4 to 7 illustrate examples of some embodiments of an encoding in a video processing system in accordance with some embodiments of the invention.
  • Fig. 8 schematically illustrates a problem of prior art methods and systems.
  • Fig. 9 illustrates examples of merging video and overlays.
  • Fig. 10 illustrates some examples of dynamic range adaptation.
  • Fig. 11 illustrates an example of a system in accordance with some embodiments.
  • Fig. 12 illustrates some examples of dynamic range adaptation.
  • Fig. 13 illustrates examples of aspects of some embodiments of the invention..
  • Fig. 14 illustrates examples of aspects of some embodiments of the invention.
  • Figs. 15 and 16 show examples of a combination of encoding in a VPS and decoding in a TV in accordance with some embodiments of the invention..
  • Fig. 17 illustrates a display comprising functionality in accordance with some embodiments the invention.
  • Figs. 18 and 19 illustrate illustrates an example of aspects of some embodiments of the invention..
  • FIGs. 20 and 21 illustrate some examples of generating indications of a merging operation
  • FIG. 22 illustrate examples of possible dynamic range mappings. DETAILED DESCRIPTION OF THE DRAWINGS
  • Fig. 1 an example of an aspect of an exemplary embodiment of the invention relating to the encoding side is illustrated.
  • an input video image signal 2 for the original video is merged with overlays 3 and 4 in merger 5 of the VPS 1 to provide a merged video signal 6, comprising a merging of the original video and the overlays.
  • the merger 5 provides information to encoder part 7.
  • the input video image signal 2 is an HDR image signal in the sense that it provides a representation of an HDR signal.
  • the input video image signal 2 may in itself be an image that is intended to be presented on an HDR display, i.e. the color grading/ tone mapping may be adapted to be directly rendered on a display with a maximum brightness no less than typically 500 (or 1000) nits.
  • the input video image signal 2 may provide an image representation that is intended to directly be displayed on an LDR display, i.e. with on a display with a maximum brightness below (typically) 500 nits, but which has a direct (known) mapping or translation to an HDR image intended for display on an HDR display.
  • the input video image signal 2 may be an LDR image representation mapped from an HDR image representation of an image using an HDR-to-LDR mapping.
  • Such a message is still an HDR image as the original HDR image can be regenerated by applying the inverse mapping (which may further be included in the HDR image).
  • overlays may be provided as HDR representations or may be
  • LDR representations that can be combined with the HDR or LDR representation of the HDR image signal.
  • the merged video signal 6 is similarly an HDR image signal and indeed with the input signal being a (direct or indirect) HDR image signal, a simple merging of the overlay values with the input values will result in the merged video signal 6 corresponding to the input video image signal 2.
  • the original input video image signal a (direct or indirect) HDR image signal
  • the VPS comprises an encoder 7 for encoding the merged video signal 6.
  • merging information is provided by merger 5 on the merge (e.g. whether a pixel is video or overlay and to what extent) for pixels.
  • the merging information provides information on a merging property indicative of a property of the adding of the one or more overlay signals 3, 4 to the input video image signal 2 for that pixel.
  • the merging property may specifically be indicative of the source of the pixel in the merged video signal 6 (i.e. from which signals the pixel in the merged video signal is composed) and/ or of the amount of merging of signals.
  • the merging information can for instance be an indication of the fact that signals are merged, e.g. "this pixel is video plus subtitle" and/or be transparency data for the different components of merged signals, for instance it may be an indication that a merged subtitle has a transparency of, say, 50% or 25%.
  • Fig. 1 shows two possible types of overlays, subtitling 3 and/or menu 4 being merged with the video in merger 5.
  • Other types of overlays may of course be used in other embodiments, such as for instance PIP (picture in picture), a logo, or advertisement overlays etc..
  • Such overlays may additionally or alternatively be merged with the video in merger 5.
  • embodiments of the described methods/apparatuses can be used in various scenarios where there is only one simple kind of special pixels, e.g. simple subtitles of only few possible opaque colors, or can be used in more complex systems, where various types of graphics or video mix may be present at the same time.
  • the different types may e.g. be discriminated by the here below described situation characterizing codifications.
  • the individual pixels can in many embodiments be one of several merging types.
  • it can be a pure type which comprises contributions from only one source, for instance it can be a pure video or pure overlay
  • the individuals pixels can alternatively be of a mixed or merged type wherein the pixel is made up from contributions of more than one source, for instance 50%) video and 50%> subtitles.
  • the weight may for example be determined by the
  • Fig. 2 illustrates the encoding performed by the encoder 7.
  • the encoder 7 is provided the merging information A' via signal MIS where the merging information is indicative of merging property which reflects a property of the merging in merger 5.
  • the encoder generates a value A for (typically) each pixel indicating, for a given pixel in the merged video signal, a merging property indicative of a property of the merging of one or more overlay signals 3, 4 to the input video image signal 2 for that pixel.
  • the value A for a pixel may specifically be indicative of the merging type and/or one or more merging parameters for that pixel as determined from the received information A'.
  • the encoder 7 may generate a different value A to encode from a received value A'.
  • each pixel of the merged image is indicated to be either video or to be a subtitle or menu.
  • each pixel is indicated to be one of two possible merging types.
  • the encoder 7 fills one (or more) of the least significant bits of one of the components of the HDR signals with the value A it has generated on basis of the information provided by the merger or which it has received directly from the merger 5.
  • the merged video signal 6 is an three color component video signal and is encoded as such. Specifically, it uses three color components R, G and B which in the specific example are indicated as each having 12 bits, i.e. each color component value is represented by 12 bits.
  • R, G and B which in the specific example are indicated as each having 12 bits, i.e. each color component value is represented by 12 bits.
  • the filling is specifically depicted as being performed in a separate encoder 7, but a person skilled in the art will appreciate that this may be performed by other functional units, and that in particular the encoder can form a part of the merger 5, which may in some embodiments perform the two functions
  • each pixel can only be either video or it can be subtitle or menu, so it suffices to use for the value A, a simple binary representatoin of 0 or 1, where 0 may denote that the pixel is pure (unaffected) video and 1 denotes that the pixel represents a subtitle or menu (or e.g. that it represents any special graphics object).
  • This can be represented by a single bit and thus for that pixel the value may be provided in one of the LSBs.
  • the blue channel is used, because human vision is less sensitive to that information, and therefore the perceptual impact of the introducing of a small error in the LSB is reduced.
  • other color channels may be used. For example, for a YUV encoding, we could e.g. use one of the U or V components mutatis mutandis.
  • the merging property may be determined in response to a comparison of the merged video signal 6 to one or more of the input signals to the merger 7. For instance, if for a pixel, the input video image signal 2 is compared to the merged video signal 6 and the two signals are found to be the same, then it can be assumed that said pixel is pure video and no merging has taken place of the input image signal with an overlay.
  • the value of A for said pixel may then be set to e.g. 0 indicating "pixel is pure video". If the input video signal 2 and the merged video signal 6 differ, then merging has taken place. The value for A for said pixel may then be set to e.g. 1, indicating "pixel is of mixed type”.
  • Such a comparison scheme provides information on merging that is not coming from the merger 5 directly but by comparison of signals prior to and after merging. Such a comparison can be made in a comparator that may be part of or coupled to the merger 5 or the encoder 6. More complex comparisons may also be made. Of course any combination of these two possibilities may also be used. For instance the merger 5 may in the signal MIS provide a coarse indication (e.g.
  • the pixel represents pure video (i.e. the input video image signal 2 without any contribution from an overlay) or not) and if that indication shows "not pure video” then a comparison is performed between one or more of incoming signals and the merged signal to determine more details on the merging.
  • the determined merging information is then encoded on a pixel per pixel basis in one or more LSBs.
  • the type of overlay may also be encoded, e.g. it may be encoded whether the overlay is a subtitle or a menu.
  • more than two merging types are possible and more than one LSB is used for encoding the merging type.
  • the value A may take more than two values and accordingly it is represented by more than a single bit, and thus communicated in a plurality of LSBs.
  • Fig. 3 illustrates such an example where in each color component, the least significant bit is used. This provides three information bits which can signal maximally eight merging types (e.g.
  • Fig. 4 to 6 show further embodiments of video processing controlled for an input stream of video information.
  • Fig. 4 shows an embodiment with only LDR video data representing an HDR image being provided.
  • the LDR image data 43 is graded to be directly shown on an LDR display.
  • the source of the image data may have optimized the dynamic range for a, say, 100 nits display, and thus if the image is presented on an LDR display (with a maximum brightness of 100 nits), it can be rendered directly without any dynamic range adaptation or further color grading.
  • the LDR image data still provides a representation of an HDR image as the data allows a direct generation of an original HDR image by applying a dynamic range mapping.
  • the receiving end can re-generate the original HDR image.
  • an HDR image may be subjected to a color (specifically luminance) grading to generate an output image suitable or optimized for presentation on an LDR display.
  • This grading may provide a dynamic range mapping or function which specifically may be a homogenous one to one (reversible) function, e.g. directly relating an input luminance to an output luminance.
  • an output image is generated which may be presented directly on an LDR display.
  • the LDR image data still represents the HDR image, and specifically provides a representation of the HDR image from which the original HDR can be generated by applying the reverse dynamic range mapping or function.
  • the image is represented by image data that allows direct presentation on an LDR display, it still provides a representation of the HDR image (e.g., it can be considered as a specific encoding of the HDR image).
  • the LDR image data of the example is actually a(n LDR) representation of an HDR image.
  • the received LDR image data is linked to or associated with an original HDR image.
  • HDR images for HDR displays can be derived by boosting the LDR images to provide HDR representations.
  • the video processing system of FIG. 4 will in the following be considered a
  • the BD player 41 receives an input stream of video information, e.g. a BD data stream 42.
  • the stream comprises both LDR video data 43 and graphics data 44 for generating graphics (or alternatively or additionally graphics data could come from another place, e.g. it could be graphics generated in the player itself, or received over a network connection such as the internet, etc.).
  • the LDR video data 43 (representing an HDR image) is processed in LDR- decoder LDR -dec 45 providing a decoded signal 46 (e.g. linear RGB colors per pixel, derived from DCT transformed data according to an MPEG or similar video encoding standard which was used for storage).
  • the graphics data 44 are processed in graphic decoder GR-dec 47 and constitutes an overlay 48 that is used in overlay processor OVR 49 to overlay the video signal to generate the merged image display signal 50, e.g. an HDMI signal, i.e. a signal to be sent over an HDMI interface (or any other video signal communication system).
  • This overlay processor merges the video with one or more overlays such as subtitles or menus and is therefore also called a merger within the framework of the invention.
  • the OVR reacts for instance to a signal sent by a remote controller with which the viewer can choose e.g. whether or not to use subtitles and if so, in which language, and or to start a menu etc.
  • the HDMI signal 50 is the merged signal that is to be received by a display device and which results in an image being displayed on the display of the display device, wherein the image displayed on the display may have subtitles and/or menu parts.
  • the encoder 51 one or more of the least significant bits of the merged signal are filled with merging information such as specifically information on whether and/or to what extent, the individual pixel is representing video or overlay.
  • the information in the respective LSB can be read at the decoder side and thus informs the decoder of e.g.
  • the merging type of the merged pixel (e.g. what the constituents of the pixel are, i.e. which signals were merged, or indeed if signals were merged), and/or e.g. merging parameters (e.g. indicating the merging ratio between the merged signals).
  • the information in the LSBs may be values indicating what was merged, (e.g. merging type information, and/or how the input signals were merged (e.g. the amount of merging for, for instance, pixels that are mixed or merged type pixels). As explained above, such information can be based on information provided by the merger 49 and/or by comparing signals before and after merging.
  • Fig. 5 shows an example of a video processing which is controlled by graphics processing control data and a display mode for an input stream of video information which includes both LDR and HDR video data.
  • the system is similar to that of Fig. 4.
  • an additional feature is that the BD data stream 42 has both LDR video data 43 aimed at direct presentation on an LDR display as well as HDR video data 52 representing additional HDR data.
  • the HDR video data 52 may for example provide information on the mapping used to generate the LDR image data from the original HDR image, e.g. by directly providing the dynamic range mapping or the inverse dynamic range mapping function.
  • the HDR data may directly provide a full or partial HDR representation of the original HDR image (or a different HDR representation of the original HDR image (e.g. related to a different maximum brightness level).
  • the input video image signal 2 may be considered to be an HDR video image signal by virtue of the HDR image data with the LDR image data potentially not being a representation of an HDR image.
  • the HDR video data 52 may in some embodiments specifically definee.g. only the color mapping functions to transform the LDR representation on the disk to a HDR representation. However, in other embodiments, the data could directly be HDR images in a dual-image encoding system, or parts of images (e.g. provided only for high brightness regions of the LDR images, etc.).
  • the video processing system includes an HDR decoder HDR-dec 54 for decoding the HDR signal. In some embodiments, either or both of the decoders may be used e.g. dependent on whether an LDR or a HDR image is to be sent over the HDMI interface. Alternatively, in some embodiments or scenarios, the signal may only comprise an HDR signal.
  • the graphic data (information on the overlays) are part of the incoming signal. It will be appreciated that it is in many embodments possible that the graphic data is generated within the VPS.
  • the video signal and one or more overlays are merged and this is not limited to graphics data or overlays being provided in any specific form or from any specific source.
  • the merging provides for a signal wherein the video and the one or more overlays are merged thereby providing for at least some pixels a merged pixel value.
  • the merging may be a selection merging for each pixel.
  • the pixel value of one of the input signals i.e. of either the input video data or of one of the overlays, is selected for each pixel.
  • the merger may select the output pixel values as the pixel values of the subtitle overlay input.
  • the merger may select the pixel values of the input video image.
  • a merged image is generated comprising the subtitle overlay pixel values in the subtitle regions and the image pixel values in the remaining parts of the image.
  • pixel values may e.g. be generated by combining pixel values from the input image and one or more of the overlays.
  • a weighted summation may be made between the pixel values of the input image and pixel values for the subtitle overlay.
  • the weight for the subtitle overlay pixel values may e.g. reflect a transparency level of the subtitle.
  • the overlay information need not be provided as full or partial images comprising a set of pixel values but may be provided in any suitable form, such as e.g. as a set of letters from which pixel values can be determined based on a stored graphical representation of each letter.
  • Fig. 6 shows another exemplary embodiment.
  • the example corresponds to that described with respect to FIG. 4 and 5 but with the further additional feature that the BD signal 42 also comprises information 55 for an HDR display on how to dynamically adapt the signal 50.
  • This information may e.g. be in the form of metadata provided with the video signal, where the metadata may encode e.g. luminance boosting functions for changing the luminances of pixel colors of e.g. the LDR image.
  • it may encode a dynamic range mapping which maps from an HDR image to an LDR image.
  • Such an approach may be suitable for a scenario where the HDMI images are HDR images and may allow the signal to be presented on an LDR display by this mapping the HDMI image to a suitable LDR representation.
  • the video processing system may then pass this information on to the display.
  • Fig. 7 illustrates yet a further exemplary embodiment.
  • the video processing system 41 further generates a signal 56 which comprises information on which least significant bits are used and what kind of information can be obtained from these bits.
  • data is provided which describes how the merging information is encoded in the merged video signal.
  • a video e.g. a movie, commercial, YouTube video, etc.
  • system of encoding is used for providing the merging information.
  • the same e.g. three LSBs can then in various scenarios be used to encode different aspects (e.g. if only simple subtitles are used, a simple mixing scheme may be communicated indicating whether individual pixels are subtitle or image pixels. If e.g. complex graphics in a menu rectangle is employed, the encoding may possibly reflect properties of the menu background etc.)
  • dynamic schemes may be used. E.g.
  • the codification/ encoding scheme for BD disk menus may be communicated, but if the user during the movie accesses apparatus menus, a different codification/ encoding scheme may be communicated, which is more appropriate for the apparatus menus. For a given time (or e.g. until other information is provided) this new encoding scheme may be used.
  • the signal 56 may also provide changing information for instance in some proprietary video a "1" in the least significant bit may mean "opaque subtitle", but at the beginning of said video signal (or e.g. half way through the video) the signal 56 may indicate that the "merging type-1" indication means something else, for that video, or from that point on in the video it may mean e.g.
  • signal 56 may provide information on which bits are used for providing merging information and/or on how the bits are used for providing this merging information. It may specifically provide information on how the information may change in time, e.g. from scene to scene or from video to video.
  • a menu to be added to an image is partially transparent so that the image may be partly seen through the menu (the image "shines" through the menu).
  • the encoder 7 may encode a value A representing the transparency of the menu or of the mix of menu and video in a number of LSBs, and in the specific example in the three LSBs. The amount of merging per pixel can then be indicated in Value A encoded in the three LSBs.
  • the video information may also be in other formats, such as for instance an YUV format, or an RGBE format, or formats where layers are used or where four colors are used ets.
  • digital representation of the values provides for components wherein the values are expressed in a number of bits, and for HDR the number of bits is typically relatively large (typically 12 or more).
  • the binary representations of the values includes a number of bits of which the least significant bit(s) are referred to as LSB(s).
  • the LSBs that have the least visible effect are used for encoding of the merging information.
  • the encoding in many embodiments may indicate not just whether an overlay is present or not, but also which type of overlay and/or e.g. the transparency of the overlay.
  • Which of the LSBs that are used for encoding may be predetermined and may e.g. be a standard. In such a case the decoder will know how the merging data is encoded and no additional information needs to be added to the video signal. In some embodiments, there may be more than one possible way of encoding merging information in the LSBs and the BD player/ VPS may add information on the applied encoding. For instance, metadata or a flag may be included detailing which LSBs are used for encoding which information.
  • this information can then be read and the relevant LSBs can be decoded accordingly.
  • the type of method used may e.g. be dynamically indicated as a value in one or more of the LSBs, as metadata or flag.
  • One option would be to use the LSBs of pixels e.g. from the left of the top line of the video. The perceptual impact of this is likely to be acceptable and indeed in many embodiments would be insignificant.
  • Fig. 8 illustrates an example of how a prior art decoder or video display driver may process a merged video signal.
  • the merged video signal may be a conventional merged video signal or may be a merged video signal provided by a video processing system as previously described. Thus, it may be the previously described merged video signal 6 which comprises merging information in LSBs. In the prior art video decoder processor of FIG. 8, this merging property information will simply be ignored.
  • the example also illustrates that there is backward compatibility for the described approach.
  • the merged image 6 is to be displayed on an HDR display and the video encoder includes a dynamic range adaptation to modify the dynamic range to be suitable for an HDR display.
  • the decoder at the display has no functionality for reading the LSBs and it cannot generate any merging information or specifically the value A. This may result in the following sceanrio.
  • An HDR image is an image which encodes the textures of a HDR scene
  • HDR high peak brightness
  • a typical HDR image comprises brightly colored parts or parts strongly illuminated compared to the average illumination.
  • the display receiving the HDR signal tries to improve the video signal to match its own characteristics, including e.g. peak brightness level. To do so dynamic range adaptation is performed. E.g. if the DR of the display is somewhat lower than that of the encoded HDR images (i.e.
  • the processing may non- linearly downgrade the luminances, e.g. mostly lower the luminances of the brightest objects while keeping those of the darker objects constant, and vice versa if the display is brighter (e.g. a 5000 nit image to be optimized for a 10000 peak brightness nit display).
  • the display is brighter (e.g. a 5000 nit image to be optimized for a 10000 peak brightness nit display).
  • an HDR image for a 3000 nit display can be calculated from a 100 nit graded input image as starting image.
  • dynamic range adaptation parameters/instructions such as for instance via signal 55 as shown in Fig.7, may be and preferably are included with the video or conveyed otherwise to the display to provide processing information for optimizing the picture quality for the peak brightness level and other characteristics of the display on which the signal is displayed.
  • the dynamic range adaptation parameters/instructions may be static, i.e. be valid for a whole program, or dynamic, i.e. change from frame to frame or (typically) from scene to scene.
  • the adaptation parameters may operate on the whole picture area or may be constrained to certain areas of the picture.
  • dynamic range adaptation is performed on every pixel in the same manner, i.e. ignoring that some pixels are of a different type, like a mix with graphics.
  • the adaptation would then be valid only for one type of pixel, namely typically the video only pixels.
  • the image 6 is subjected to a dynamic range adaptation DRA in dynamic range adapter 71 for providing an adapted image 6a, and the adapted image 6a is displayed on the display of an HDR display device.
  • DRA dynamic range adaptation
  • the data are in the form of standard data and can be adapted and then displayed.
  • Fig. 8 one or more of the LSBs of the components of the signal are filled with information on the overlays.
  • the received video data is used directly without considering the merging information contained in the LSBs (or indeed without the video decoder having any knowledge of this information being encoded in the LSBs).
  • the least significant bits are typically of no or little significance so the display will tend to provide a "normal image" when this is rendered.
  • the degradation or error resulting from the inclusion of merging is likely to be insignificant in many embodiments and scenarios.
  • the method of encoding the merging information may in many embodiments be achieved while maintaining a system that is backwards compatible. If more bits are available in the interface than needed for the original video the introduction of merging property data can be achieved without any negative effect as otherwise unused bits can be utilized. This may provide backwards compatibility while allowing a new decoder to make use of the additional merging information to e.g. improve the rendering of the e.g. subtitles (as will be described later).
  • the overlay (specifically subtitles and/or a menu) will in the prior art approach of FIG. 9 be subjected to one and the same dynamic range adaptation DRA as the video image itself. Thus, all parts are treated equally. There is no other way of doing so in known methods and devices. In FIG. 8, this is indicated by having all lines and text thickened evenly.
  • subtitles may due to the dynamic range adaptation be different from the type of subtitles that viewers are used to.
  • subtitles tend to often become too bright and/or may start oscillating in brightness when processed as in the system of FIG. 8.
  • FIG. 9 illustrates an example of an apparatus for processing a merged video signal in accordance with some embodiments of the invention.
  • the apparatus may
  • the apparatus may also perform dynamic range adaptation which specifically may provide an adaptation from an input luminance range to an output luminance range.
  • the dynamic range adaptation may adapt a received image from a dynamic range corresponding to a first maximum brightness or white level (e.g. given in nits) to a dynamic range corresponding to a second maximum brightness or white level (e.g. given in nits).
  • the dynamic range adaptation may for example be from image data referenced to an LDR dynamic range to image data referenced to an HDR dynamic range.
  • the dynamic range adaptation may be from an LDR range to an HDR range.
  • the dynamic range adaptation may from an HDR range to an LDR range. In yet other embodiments, the dynamic range adaptation may e.g. be from an HDR range to another HDR range. In yet other embodiments, the dynamic range adaptation may e.g. be from an LDR range to another LDR range.
  • a merged video signal 6 is provided by a video processing system as previously described.
  • the merged video signal 6 comprises merging information in LSBs of one or more color components for at least some pixels.
  • an input receives a merged HDR video signal.
  • the video signal is an HDR video signal in that it provides a representation of an underlying HDR image/ video sequence.
  • the actual video data may be reference to an LDR dynamic range and may specifically be an LDR image generated by a mapping or color grading from an HDR image.
  • the original HDR image can be generated and thus the received video signal is inherently a representation of an HDR image/ video.
  • the system further comprises a reader 72 which is arranged to read at least one or more least significant bits for one or more color components of the video signal for a given pixel.
  • the reader 72 is then arranged to generate one or more values A from the read least significant bits where the one or more values A indicates a merging property for the given pixel.
  • the merging property is indicative of a property of a merging in the HDR video signal of one or more overlay signals 3, 4 with a video image signal 2 for that pixel.
  • the value A for a given pixel is indicative of the merging operation performed by the merger in the encoder for that pixel.
  • the reader 72 is coupled to an adapter 71 which is arranged to adapt the video and specifically is arranged to perform a dynamic range adaptation to images of the video signal. Further, the adapter 71 is arranged to adapt the pixel values in dependence on the generated value or values (A). Thus, the adapter 71 receives pixel values that are referenced to a given input dynamic range (e.g. an LDR range of, say, 100 nits) and to generate new pixel values that are referenced to a different output dynamic range (e.g. an LDR range of, say, 1000 nits). However, the adapter 71 does not apply the same dynamic range mapping or function to all pixels but rather modifies the dynamic range mapping/ function depending on the received merging properties for the pixels. For example, a different mapping may be applied to pure video pixels than are applied to subtitle pixels (as indicated by the values A).
  • a different mapping may be applied to pure video pixels than are applied to subtitle pixels (as indicated by the values A).
  • the adapter 71 thus generates an output signal which is referenced to a different dynamic range than the received merged video signal 6. This signal may then be fed to a display suitable for rendering this range. For example, a received LDR image may be converted to an HDR image and rendered on an HDR display.
  • a substantially improved adaptation can be performed resulting in display images that are perceived to be of much higher quality.
  • the brightness of subtitles may be reduced to more suitable levels without comprising the rendering of the underlying image and/or the flickering of graphic elements may be reduced.
  • the dynamic range adaptation DRA is thus made dependent on the value A.
  • the value A is read in the reader 72 of the decoder and a signal 73 indicating this value A (either directly or after conversion to another form) is provided to dynamic range adapter 71, which then adapts the pixel values in dependence on this value A.
  • the dynamic range adapter 71 may in some embodiments be provided signal 55 (see Fig. 7) thereby providing information on how to perform the dynamic adaptation, i.e. which luminance or color mapping functions to apply to the input image pixel values to obtain the output image of desired dynamic range.
  • the apparatus may use its own method to do the dynamic range adaptation, i.e. the dynamic range adaptation and the dependency on the value A may be determined in the apparatus without any specific adaptation information being received from an external source (except for the merging property values A).
  • the decoder apparatus of FIG. 9 may be provided with a signal 56that includes information on the encoding of the merging property values in the color component.
  • the signal 56 may provide information on which bit comprises what information.
  • signal 56 may e.g. include information indicating: "The least significant bit of the blue color component is used to encode merging property information with "0" indicating that the pixel is a video pixel and "1" indicating that the pixel is an overlay pixel.”
  • the adapter 71 may be arranged to perform the adaptation in response to these values. Specifically, pixels that have a value of "1" in the LSB of the blue color component are not subjected to a dynamic range change or at least are not to subjected to the same dynamic range adaptation as video pixels. Consequently the pixels indicated by a "1" in the LSB of the blue channel do not have their luminance e.g. boosted (or it is e.g. boosted differently). In Fig. 9 this is indicated by those parts of the image on the screen that are video pixels, i.e.
  • the pixels having a value of "0" in the LSB of the blue component have thickened lines, while those parts that have a "1" in the LSB of the blue component (in this example subtitles and menus) have thinner lines .
  • the menus and the subtitles are then displayed in a non-boosted manner. This may substantially improve the perceived image quality in many embodiments and scearios.
  • the reader 72 is arranged to generate information from LSBs values of the color components.
  • the reader 72 thus reads the relevant LSBs, possibly guided by a signal 56 indicating to the reader 72 which LSBs to read and/or how to interpret them.
  • the reader 72 then generates one or more merging property values A from said LSBs. These values are used in adapter 71 to steer or control the adaptation.
  • the described approaches may provide improved performance and in particular an improved user experience in many scenarios.
  • the approach may in particular provide an improved rendering of combined video (comprising both underlying images as well as overlay) on displays which require dynamic range adaptation to be performed.
  • a substantially improved rendering of images from an HDR display can be achieved based on an input video that is optimized for an LDR display.
  • improved performance can be achieved for many embodiments where dynamic range adaptation is used to increase the dynamic range.
  • a substantially improved rendering of images from an LDR display can be achieved based on an input video that is optimized for an HDR display.
  • improved performance can be achieved for many embodiments where dynamic range adaptation is used to decrease the dynamic range.
  • a code allocation function or electro optical transfer function provides a mapping between the (digital) values and a corresponding light output, i.e. the code allocation function for a given image/ range provides a map from HDR linear luminance values to suitable quantized luma codes.
  • HDR linear luminance values are often represented as e.g. floating point values with a relatively high number of bits per value (e.g. 16 bits).
  • the quantized luma codes typically represent luma values by a relatively low number of bits (e.g. 8 bits), and often as integer values.
  • LDR and HDR are not just the size of the dynamic range, Rather, the relative distribution of intensities in most scenes is also substantially different for LDR and HDR representations.
  • HDR images/video typically have a different intensity distribution than the conventional (LDR) images/video.
  • LDR peak-to-average luminance ratio of high-dynamic-range image data
  • EOTFs conventionally applied code allocation curves or EOTFs tend to be sub-optimal for HDR data.
  • a conventional LDR mapping from HDR luminance values to encoded luma values is used, a significant image degradation typically occurs.
  • most of the image content can only be represented by a few code values as a large number of codes are reserved to the increased brightness range which is however typically only used for a few very bright image objects.
  • luma/luminance domains which are specified using their log curves or code allocation functions/ EOTFs. Examples of this are the curves used for sRGB or ITU Rec. 709 logarithmic data.
  • HDR images/video typically have a different brightness (e.g. when defined as display rendered luminance) distribution than current standard dynamic range images.
  • the current video content distribution typically peaks around 20% of peak brightness (which means that the luma codes are nicely spread around the half of e.g. 255 values)
  • HDR content may oftentimes typically peak around a much lower percentage, e.g. 1%, of peak brightness (data of at least the darker regions of the HDR images spread around the code at l/100 th of code maximum).
  • most of the relevant HDR content will be contained in only a few of the 8-bit or 10-bit video levels when it is encoded using current standard log curves. This will lead to severe and unacceptable quantization artifacts in the preview image, thus preventing the colorist to color grade/correct HDR images.
  • the code allocation function which maps linear light luminances to how they are to be seen upon display rendering to actual technical codes, or vice versa, have however conventionally largely been based upon LDR models (like gamma 2.2), but were optimal only for LDR displays of peak brightness of around 100 nit or cd/m 2 (henceforth both the terms nit and cd/m 2 will be used). If such approaches are used for a HDR display (e.g. with a peak brightness of 5000 nit) one risks seeing artefacts, such as banding in the darker parts of the video (e.g. banding in a dark blue sky, especially for fades).
  • a suitable code allocation curve should be used such that a sufficient number of quantization levels is assigned to the most important video data.
  • code allocation functions have been developed wherein a non- uniform quantization has been applied. This may specifically be performed by applying a non-linear function (luma code mapping/ tone mapping function) to the input luminance values followed by a linear quantization.
  • a non-linear function luminance values followed by a linear quantization.
  • the defined functions in many scenarios provide a suboptimal result. For example, applying a code allocation function to HDR images in order to e.g. allow these to be processed by LDR circuits with a relatively low number of bits per value (typically 8 bits) tends to result in suboptimal conversion of the HDR image and specifically in the image values being concentrated around a few quantization levels/ codes.
  • a dynamic range adaptation may be seen as a conversion from one code allocation function associated with one dynamic range/ maximum brightness to another code allocation function associated with another dynamic range/ maximum brightness.
  • the codes representing the image before and after the dynamic range adaptation may have the same or a different number of bits.
  • the issues may be illustrated by considering the scenario for an exemplary image (see FIG. 10) illustrating a night scene comprising a dark monster 1001 hiding in shadows next to an averagely lit up house 1003 with some bright streetlights 1005 in front. Further an average grey or dim car 1007 may be present.
  • FIG. 10 illustrates three representations of this image.
  • the actual brightness (in nits) of the scene in real life (as captured by an HDR camera which in the example can capture brightness up to 5000 nits, or typically it may represent a 5000 nit reference range master HDR grading of an original camera capture).
  • the captured data is typically captured at a high resolution and is typically reduced to a relatively low number of values.
  • 1024 luma codes are available to represent the range up to 5000 nits.
  • a non- linear code allocation function 1009 (EOTF) is used , and specifically the OETF defined in SMPTE2084 is used (which is a log-gamma shaped function).
  • EOTF non- linear code allocation function
  • SMPTE2084 is used (which is a log-gamma shaped function).
  • more codes are allocated to dark areas and fewer to bright areas.
  • this results in the distance in code values between the monster 1001 and the car 1007 being larger than the difference between the house 1003 and the car 1007 despite the opposite being true for the brightness in the real scene (when measured in nits).
  • the difference in codes between the bright street light 1005 and the house 1003 is reduced.
  • the OETF of SMPTE2084 (henceforth referred to as SMPTE2084) has been designed for a dynamic range (maximum brightness) of 1000 nits, and thus if the image is displayed on a 1000 nits display, the OETF 2084 can be used to decode the received luma values directly.
  • data may thus also be provided (e.g. on a Bluray Disc) which is related to an EOTF that is associated with a different dynamic range.
  • FIG. 10 illustrates an alternative EOTF which is suitable for HDR image encoding when an LDR image is needed for direct rendering (namely the mapping between the third and first axis, i.e. the combination of the two successive mappings).
  • an even higher number of codes are allocated to dark values with fewer being provided for bright values.
  • the specific example may for example be used to provide an SDR graded look (i.e. suitable for LDR presentation) using a conventional gamma 2.2 EOTF.
  • Such a representation may for example be used directly by a legacy LDR display.
  • the EOTF may vary in time (between different frames) and/or spatially (between different areas of the image).
  • a spatially differentiated approach may be used where more than one EOTF may be provided for an image.
  • the image may be divided into a few regions (e.g. one corresponding to a dark region and one corresponding to a brighter region of the image), and an EOTF may be provided for each region.
  • This may allow the EOTF to be optimized for the different characteristics of different parts of the image, and it may provide improved rendering for some images (such as images including both very dark and very bright regions).
  • an EOTF may be used which has a very large number of codes allocated to dark values. This may locally increase the resolution in the dark range thereby providing improved differentiation (e.g. allowing the monster to be seen despite the whole region being dark).
  • a different EOTF may be provided for a region that has been identified as being brighter.
  • a different EOTF may be provided for the brighter region around the car. This EOTF which is used for e.g. the brighter region comprising the car may have fewer codes allocated to dark values and more codes to the midrange. Again this may improve differentiation and reduce banding etc. Using this approach thus allows improved representation of the scene as a whole (with e.g. reduced quantization error).
  • the EOTF may in some cases be adapted between frames such that e.g. it provides a higher number of codes to a dark range for dark images than for bright images.
  • the merged video signal 6 may be represented by luma codes corresponding to one of the described EOTFs. It should also be noted that if e.g. the video image signal comprises luma codes based on an EOTF linked to an SDR range, it is still possible to recreate the original HDR values from the luma codes and that accordingly the image is still a representation of an HDR image.
  • a display receives a video signal in accordance with an EOTF associated with a specific dynamic range, it may be desirable to perform a dynamic range conversion if the dynamic range does not match that of the display.
  • the dynamic range adaptation should typically be non-linear (although possibly piecewise linear). For example, dark levels should typically not be increased in brightness despite the dynamic range being significantly increased. In other words, dark signals are often not compressed significantly in a color grading for LDR displays. Mid-level brightness levels should however typically be boosted somewhat in brightness although typically it is important that the boosting is not too substantial as this may create an artificial look with too many bright areas.
  • FIG. 12 An example of a dynamic range mapping is illustrated in FIG. 12.
  • a straight linear mapping is provided for all brightness values below a threshold and a different linear mapping is provided for all brightness values above the threshold.
  • the approach may result in a relatively modest (or no) boosting for dark and medium brightness levels whereas very bright areas are boosted substantially more. This effect reflects both typical brightness conditions in the real world as well as the perception of the human visual system.
  • the dynamic range adaptation is performed at the display. This may allow the dynamic range adaptation to be adapted to the specific characteristics of the individual display and may allow the display to be used with a large variety of sources.
  • overlay graphics e.g. a display menu
  • These overlays are typically rendered at a suitable brightness determined locally and taking the dynamic range of the display into account.
  • the display simply receives an image comprising both the original image and the graphics.
  • the display then performs range adaptation resulting in the unpleasant rendering of the overlay graphics. For example, if subtitles are added by a BD player, these will often by presented at an uncomfortable brightness on an HDR display.
  • the issue may be particularly problematic for images where the graphics is blended with the original image, such as e.g. where semi-transparent subtitles are used.
  • the semi-transparent overlay may in this case increase the brightness of the corresponding pixels, e.g. resulting in the brightness exceeding that of the threshold of the dynamic range mapping of FIG. 12.
  • a high brightness boost may accordingly be provided.
  • This may result in perceived distortion to the underlying image in addition to typically the overlay being perceived as too bright.
  • a standard dynamic range adaptation may not only result in too bright subtitles but also in the face appearing brighter for the area in which the subtitles are shown than for the rest of the face. Thus, an unnatural look results.
  • the issue may be even further exacerbated when an adaptive EOTF is used.
  • a different EOTF may be used for dark areas than for bright areas.
  • Applying a fixed dynamic range adaptation to such a scenario e.g. with a fixed mapping of input luma values to output luma values
  • the dynamic range adaptation may map a luma value of 723 (e.g. related to a 1000 nit range) to a value of, say, 617 (e.g. related to a 5000 nits range).
  • this value may for a dark area EOTF be interpreted to correspond to, say, 1 10 nits but for a dark area EOTF be interpreted to correspond to, say, 150 nits.
  • the EOTF named TM1 may be an EOTF provided for a first frame
  • the EOTF named TM2 may be an EOTF received for a second frame.
  • the two EOTFs thus represent two different dynamic range mappings which are applied to different frames of a video sequence.
  • TM2 is a dynamic range mapping (EOTF and also sometimes referred to as a tone mapping) for received video pixel luminances (or lumas) at a first time moment.
  • EOTF dynamic range mapping
  • tone mapping for received video pixel luminances (or lumas) at a first time moment.
  • the mapping is actually meant for the underlying video content (i.e. it has been designed to provide a desirable look when video is presented). However, if a pixel comprises overlay this results in the pixel having a brighter luminance (1 -alpha)*vid(x,y)
  • the dynamic range mapping at least provides for the graphics/ overlay regions not to suddenly become very bright. If instead TM1 were used, then the overlay parts would become very bright as the additional contribution from the overlay pushes the input luminance to a level where it is boosted very aggresively by TM1.
  • the presentation of the overlay pixels e.g. subtitles and menus
  • the presentation of the overlay pixels will change between different frames. This would be seen by the viewer as e.g. white menu text oscillating between dark and bright values merely because it was mixed with video, but that may be undesired for that region.
  • some embodiments of the current approach allow the detection of such pixels, and then the rendering at a limited luminance Lim vid, e.g. 20 % brighter always for graphics pixels than the received input luminance ( 1 -alpha) * vid(x,y) +alpha*graph(x,y); or e.g. 50% brighter for the text in a menu, and 10% brighter for the background graphics pixels in the menu rectangle.
  • the display communicates data to the external device relating to the external device e.g. whether the display is an LDR or HDR display.
  • the external device adapts the properties of the graphics overlay before the merging and then provides the merged image to the display which proceeds to display the image as it would an image without any graphics overlay.
  • the display simply treats all images the same way but the external device may prior to the merging adapt the graphics overlay based on display property data received from the display.
  • merging may be performed without considering specific characteristics of the display, i.e. the external device may not have any information of the specific display being used. No adaptation of the individual components prior to the merging is performed.
  • the external device provides merging property information to the display which may be used by the display to perform an adaptable dynamic range adaptation to re-target the image to be displayed with the specific dynamic range of the display.
  • a feed forward structure is applied wherein the source external device (e.g. a BD player) provides additional information to the display which may control or steer the dynamic range adaptation of the display.
  • the source external device e.g. a BD player
  • the approach provides a number of advantages. Firstly, in many embodiments, it allows a substantially improved image to be generated with rendering of graphic overlays and/or the image being a more accurately adapted to the specific display.
  • EOTFs directly reflecting the properties of the individual display can be used.
  • the dynamic range adaptation may in many embodiments be individually optimized for respectively the image content and the overlay content.
  • a video dynamic range mapping may be applied which reflects an optimized (in some way) mapping of image content to the specific dynamic range (and other characteristic of the display).
  • a graphics dynamic range mapping may be applied to provide a desired rendering of graphics overlays.
  • the approach may allow an individually optimized dynamic range adaptation for respectively video/ image content and for graphics overlays.
  • the approach may in particular allow the control over the dynamic range adaptation to remain with the display, i.e. the display comprises the functionality for adapting an input dynamic range to the dynamic range of the display. This may be highly
  • the display may for example set the graphics to a predetermined brightness (e.g. set by the user of the display).
  • the dynamic range adaptation of televisions may often be better than those of cheap BD players.
  • the described approach allows for the dynamic range adaptation to be performed further downstream from where the merging is performed rather than requiring any adaptation to be before merging. This allows such improved dynamic range adaptation to be utilized also for merged images comprising overlays.
  • the approach thus allows a downstream dynamic range adaptation to be adapted to the individual pixels, and specifically allows it to be adapted to the specific merging property of the individual pixel. Indeed, in some embodiments, individually optimized adaptation of image and graphics content can be achieved downstream of the merging.
  • the merging property indicated by the value A may reflect whether the corresponding pixel is an image/ video pixel or whether it is an overlay graphics pixel.
  • a pixel may be considered to either correspond to an underlying image/ video pixel or it may be an overlay pixel.
  • the adapter 71 may be arranged to apply a first dynamic range mapping if the pixel is indicated to be a video pixel and a different second dynamic range mapping if the pixel is indicated to be an overlay pixel.
  • the first dynamic range mapping may map an incoming LDR dynamic range to an HDR dynamic range by providing a very aggressive boost for high brightness levels.
  • a mapping corresponding to that of FIG. 12 may be used for the first dynamic range mapping.
  • the second dynamic range mapping may be a simple linear function that does not provide any additional boost to high brightness levels.
  • the second dynamic range mapping may be a simple identity mapping resulting in the brightness of the overlay pixels being exactly the same whether presented on an LDR display or an HDR display.
  • an HDR scene representation may be achieved with e.g. bright light sources such as street lights or the sun being shown at very high HDR brightness levels whereas overlay such as subtitles are still shown at normal brightness levels.
  • a merging property may be generated which e.g. may indicate that the pixel is an overlay pixel.
  • the source of the overlay may either be from e.g. a local graphics generator, such as e.g. a generator generating subtitles or a menu.
  • the overlay may be received as part of a second video image signal where this signal may comprise overlay content, such as e.g. graphics or subtitles.
  • the adapter 71 may be arranged to adapt the pixels within a region for which it is indicated that the pixels are overlay content such that the luminance values are restricted to a given predetermined range. This may ensure that the overlay is presented within a reasonable brightness range regardless of the display brightness capabilities.
  • the predetermined range this may e.g. for a display be a preset range reflecing the dynamic of the display.
  • the range may have an an upper limit of, say, 10% of the peak brightness, and a lower limit of e.g. the larger of 1% of peak brightness and 1 nit.
  • the predetermined range may be determined by the viewer etc.
  • the approach may readily be extended to more than two categories with a separate dynamic range mapping being selected for each category.
  • the value A for a pixel may indicate whether the pixel is a video pixel, a subtitle pixel or a menu pixel, and one of three possible dynamic range mappings may be selected accordingly.
  • the graphics category is thus further subdivided into a plurality of subcategories indicating a specific graphics type.
  • the approach is not limited to a hard allocation of a given pixel into a specific category being indicated by the value A.
  • the value A may indicate a relatively weight of graphics relative to image for a given pixel.
  • the adapter 71 may then select the first dynamic range mapping if this is below, say, 50% and the second dynamic range mapping if it is above, say, 50%>.
  • the adapter 71 may be arranged to reduce a boosting of brightness values for pixels for which the value A is indicative of the pixel belonging to an overlay relative to a boosting of brightness values for pixels for which the value A is indicative of the pixel belonging to video/ image signal.
  • the reduction may be for higher brightness values which specifically may be brightness values above a threshold (e.g. of 50%>, 60%, 70% or 80% of the maximum brightness level of the input signal to the adapter 71).
  • the adapter 71 may be arranged to restrict a brightness level for pixels indicated to be overlay pixels to a limit brightness level below the maximum possible brightness level for the output of the dynamic range adaptation.
  • the dynamic range adaptation may perform a dynamic range mapping from an input dynamic range to an output dynamic range.
  • the output dynamic range may have a maximum possible brightness level.
  • the dynamic range mapping is restricted to a maximum or limit brightness level that is lower than this maximum level.
  • the limit brightness level for an overlay pixel is thus lower than for a non-overlay (video) pixel and in many embodiments there may be no brightness limit for video pixels (except for the maximum value of the output dynamic range).
  • the same dynamic range mapping may be applied to all pixels independently of the merging property (i.e. independently of the value A) up to a given brightness level. However, for values above this brightness level, different dynamic range mapping may be applied for different values of the merging property. As a specific example, the same dynamic range mapping may be applied to all pixels except that the maximum output brightness level is limited to a maximum value for overlay pixels.
  • Such an approach may for example provide an efficient mapping that provides pleasing resulta and both allows e.g. grey graphics to be supported for an HDR display while ensuring that bright graphics do not become too grey.
  • the adapter 71 may be arranged to allocate substantially the same brightness level to pixels that are indicated to be overlay pixels. For example, a normal dynamic range mapping may be applied to video pixels whereas graphic pixels are simple given a predetermined value. Such an approach may result in a very pleasing result with not only a suitable (for the display) presentation of graphics being achieved but also this presentation is ensured to be stable. For example, such an approach would be insensitive to adaptive changes in the applied EOTF and would prevent fluctuations across the image or between frames.
  • Fig. 11 illustrates a method in accordance with some exemplary embodiments of the invention.
  • step 101 the merged image data (the merged video signal 6) is received.
  • One or more of the LSBs are then read in step 102 by a reader 72.
  • a value A indicative of a merging property for a merging performed at the encoder side is then generated in step 103.
  • step 104 dynamic range adaptation is performed on the image, in dependence on the generated value A.
  • the resulting image is the displayed in step 105.
  • steps 102 and 103 may be performed in a single step. It will also be appreciate that the method may comprise further steps, for instance metadata or flags may be read.
  • signal 56 may be read at an input and the retrieved information may be used to controlthe operation in step 102 and/or 103, e.g. it may controlwhich values A are to be extracted from the color components. It may also control operation in step 104, i.e. how the adaptation is to be performed in step 104.
  • metadata may comprise information indicating which LSBs comprise which information .
  • the generated value A may be determined directly as the value for the relevant least significant bits. It may in other embodiments be a value derived from the appropriate LSBs.
  • the reader 72 may read a value which is in the range from 0 to 3 (including 3) for a two bit indication and from 0 to 7, including 7, for a three bit indicator.
  • the receiving system may then determine whether a two bit or a three bit indicator is used, and it may use this to generate e.g. a transparency value (25%, 37.5% %, 50%> etc.). This value A may then be used in step 104.
  • the video signal may comprise specific metadata or flags, e.g. indicating how the merging property is encoded.
  • a frame or scene does not comprise any overlay
  • the encoder adds such information to the video signal.
  • the decoder may then read this information and based on this it may e.g. ignore the following frame, scene or movie as it knows this will not contain any overlays.
  • the decoder knows " that in the next frame, scene or movie it can bypass the reading and generating steps and go directly to step 105 to perform the dynamic adaptation. It can then also use all bits of the signal as video information, thereby enhancing the video quality.
  • a merged video signal 6 is provided generated by an encoder 51 as described with reference to FIGs. 4 to 7.
  • a value A of "1" is used to indicate that a pixel is a video pixel and a value of "0" is used to indicate that a pixel is an overlay pixel.
  • the decoder receives the merged video signal 6 and reader 72 (see Fig. 9) reads the value of A:
  • step 104 dynamic adaptation is performed. For most pixels, this will lead to a change in value.
  • a variation on this scheme may be:
  • Pure video is indicated by a value A of "0"
  • a menu pixel is indicated by a value of "1”
  • a subtitle pixel is indicated by a value of "2”.
  • this pixel is video and a dynamic range adaptation is performed, leading for many pixels to a change in value.
  • the display may thus scale the amount of dynamic range modificution depending on the transparency level of the pixel.
  • the scaling may in some embodiments be non-linear, e.g. if more emphasis is to be given to the video or the overlay.
  • the scaling may be on a logarithmic scale.
  • the adaptation is dependent on the value A for a pixel, i.e. on a pixel per pixel basis.
  • the decoder may analyse all or a group of values A, or the distribution of the values A over the image or a part of the image, and modify the dynamic adaptation for some (or all) pixels (in dependence on an analysis of the values for A in the image. For instance:
  • part of the image is a menu
  • the image as a whole may be displayed in LDR.
  • the display may apply the transformations in such a manner that the menu stays around some (LDR) luminance values, e.g. such that it is not brighter than luminance Yx and maybe not darker than Yy.
  • LDR some
  • Even if such a relatively simplr strategy may not always calculate the exact required values for mixing colors, it does tend to create more realistic menu colors which oscillate less wildly.
  • the encoder may provide (in signal 56 or in a separate signal) some statistical information about e.g. the images in a scene or shot. In the simplest form, for instance it may indicate whether or not there is any overlay in an image or even in the entire video or movie, and whether or not there is a menu part (and e.g. statistically some parameters regarding the menu part can be communicating aiding either the processing of such a part, or the
  • Fig. 12 schematically illustrates an exemplary dynamic range adaptation.
  • “Dynamic range adaptation” may be considered a short hand form for any type of dynamic range adapting color transformation, i.e. wherein a signal is color transformed to adapt its luminance from one range, for instance a maximum brightness level of Y nit, to another range, for instance a maximum brightness level of X nit.
  • a color transformation need not (but may) include a chroma transform or change.
  • the horizontal axis represents the luminance value for the incoming signal, which may for instance be graded for a display with a maximum brightness level of 2000 nit.
  • the incoming signal is to be displayed on a display with can provide a maximum of 5000 nits.
  • the incoming signal is in this example graded for a maximum luminance value of 2000 nits, and it must be mapped, i.e. adapted, to the higher dynamic range of the display.
  • FIG.12 provides the luminance values for the display (OL, output luminance values) as a function of the luminance values for the incoming signal (IL, incoming luminance values).
  • the relation between the two is given by line 110.
  • Using the dynamic range adaptation/ mapping illustrated in FIG.12 will result in darker parts (luminance values near the lower end of the scale) being rendered the same on all displays, as illustrated in part 110a of line 110, while bright lights (luminances near the upper end of the scale) are boosted, i.e. increased in luminance, as illustrated by part 110b of line 110.
  • the gradient of part 110a may specifically be one whereas it is higher for part 110b.
  • the higher range of the display is used for boosting bright objects, such as lights.
  • This example is for a situation wherein the signal graded for 2000 nit is " upgraded " to a display that can provide up to 5000 nit luminance.
  • Input is for 5000 nit
  • output is either 100 nit (legacy TV) or around 1000 nit (early HDR TV)
  • Input is for 100 nit, output is around 1000 nit.
  • a dynamic range adaptation may be performed, and thus upgrading (increasing dynamic range) as well as downgrading (reducing dynamic range) can be performed.
  • Fig. 13 provides an example of a dynamic range mapping in which the value A may be 0 for a pixel of the video and 1 for a subtitle.
  • DRA dynamic range adaptation
  • the graphic pixels may be subjected to a fixed dynamic range mapping (but not as much as the video)whereas video pixels are subjected to a variable dynamic range mapping (different from scene to scene)
  • the dynamic range mapping applied to the video pixels may also be more complex, for instance the curve may be steeper and more complex.
  • Fig. 14 provides a further example.
  • the display has a table providing information on the intensity and thus the component values that are considered to provide the best visibility for subtitles.
  • the pixel values are then set to predetermined values based on a table look-up (the table providing a fixed value considered best suited for subtitles on this display).
  • the pixels of the image is subjected to an adaptation prior to the image being displayed.
  • the adaptation parameters i.e. how the adaptation is performed
  • the value A may be generally be seen as any combination of bits that is used for conveying information on the merging and specifically on whether or not, and/or to what extent, a pixel is or comprises an overlay. This can be implemented in various ways. For example, in the above example two bits of one of the color components (which will be referred to as component I) may be used as follows:
  • At least one value in one or more of the LSBs may indicate a type of overlay.
  • This may again be an example of a mixing type indication.
  • the transparency of the subtitle and/or menu may then be given in for instance two of the least significant bits for two other components (which for simplicity we denote II and III). If the two least significant bits of component I have a value of "3" this indicates that the pixel is not a pure type pixel but rather is a mixed type pixel. For example, it could be 50% video and 50%> subtitle so that the subtitle then has a transparency T of 50%> (or it could e.g. be 50%> video and 50% menu).
  • the reader 56 may then continue to read the two least significant bits for the two other components II and III. For a first of the other components, for instance component II, the reader 56 reads the two least significant bits and determines a value of 0. This
  • the reader56 would read a value of 2 in the two least significant bits of the first component and a value of 0 in the other.
  • l overlay, or mix of overlay and video In a first example, only this information is used.
  • This scheme embodies an embodiment which often allows a low complexity implementation using only a single bit while providing an efficient use of the method.
  • the only distinction made is between video and anything else, whether this is an overlay, or a mix of overlay and video, and independent of the nature of the overlay, be it subtitle, menu, logo, advertisement etc.
  • the transparency of the overlay is given in, for instance, two of the least significant bits in another component.
  • This scheme embodies an embodiment which often allows low complexity implementation using a minimal number of bits while providing an efficient use of the method.
  • the number of LSBs of the components indicating pixel type i.e. whether the pixel is video, overlay, type of overlay and/or a mix of these
  • merging parameters is variable depending on the merging type of pixel. E.g., when the pixel is only video the number of LSBs is lower than when the pixel is a mix.
  • the two least significant bits of components II and III need not comprise mixing parameters, since there is no mixing.
  • These two least significant bits can be used for providing more details for the pixel luminance values, i.e. they can be used to increase the resolution of the value.
  • the number of LSBs used for values indicating merging type and merging parameters may varies possibly also depending on the merging type.
  • the signal may be provided with a coding length signal.
  • One simple approach for providing a coding length signal may be to use the value that also indicates the merging type. In that case, the coding length signal is given by the merging type. If the signal is a pure signal, thus of "pure type", then there is no need for information on the amount of merging.
  • pixels of mixed type for instance the type of pixels that are formed by mixing subtitles with video
  • additional information on the amount of merging for instance whether it is a 50%-50% mixture or a 25%-75% mixture
  • the number of LSBs needed for providing all information on the merging is less than for more complex situations.
  • the decoder reads the LSBs in which the pixel merging type is indicated, and then proceeds dependent on the value in said LSBs. If the merging type indicates the pixel to be of a mixed type, the LSBs indicating the merging parameters are read. The dynamic adaptation is performed on basis of the read information. If the merging type indicate the pixel to be only video, there are no LSBs indicating merging parameter but instead said LSBs will comprise video information (luminance values for instance) and the decoder will interpret them as such.
  • a flag may indicate that in the coming scene, instead of only using merging type indication, there will also be provided in 3 LSBs of a certain color component, further (fine) details on the amount of merging. That again means that a variable length coding signal is provided to the decoding side.
  • a signal is provided which may signal to the decoding side when the encoder switches from one codification scheme using a certain number of LSB to convey merging information to the decoding side to another codification scheme which uses a smaller or larger number of LSBs.
  • Such a signal may also constitute a coding length signal.
  • the merging indications i.e. the indications of the one or more merging properties may be inserted in one or more LSBs of color component values, such as into an R, G or B value.
  • these values may subsequently be encoded using a lossless encoding format (or not be encoded but transmitted directly in raw form) and thus in such embodiments, it can be assumed that the received values correspond directly to the transmitted values, i.e. a decoder can assume that the received bits are identical to the ones transmitted.
  • the merging data can accordingly typically be assumed to be correct at the decoder and can accordingly be used directly.
  • the color component value may be encoded using a lossy encoding.
  • the received LSBs indicating the merging property may still be assumed, i.e. the received values may directly be used to determine the values A, and the dynamic range adaptation may be adapted accordingly. This may sometimes lead to erros but in many embodiments, this may be acceptable.
  • error correcting coding of the merging data may be used. This may require additional bits.
  • a bit indicating whether a pixel is a video or overlay pixel may be copied to the LSB of both the R, G and B values.
  • the decoder may decode the three bits and select the decoded bit by a majority decision.
  • spatial or temporal filtering may be applied followed by a hard decision.
  • a spatial filter may be applied to received data bits and the final bit value may be designed by comparing the filtered output value to a threshold.
  • This may exploit the fact that overlay is typically provided in blocks and that individual pixels are rarely graphic content unless a relatively large number of pixels in the neighborhood are also graphic content.
  • the approach may for example prevent that a single pixel within a menu or subtitle is erroneously detected to not be an overlay pixel and accordingly is boosted to high brightness.
  • the approach may reduce the risk of e.g. exremely bright individual pixels within a subtitle or menu (or dark pixels within a very bright image object).
  • the dynamic range adaptation for a given pixel may be dependent on the value(s) A for a plurality of pixels in a neighborhood of the given pixel. This may for example be achieved by the dynamic range mapping used by the dynamic range adaptation being a function of a plurality of values A (of neighbor pixels) or e.g. by applying a spatial filter as described above.
  • the dynamic range adaptation may be arranged to restrict the difference in the dynamic range adaptation between neighbouring pixels. For example, rahtern than a hard decision of whether to select a first or second dynamic range mapping depending on whether the pixel is designated a video or overlay pixel, the actual dynamic range mapping may be determined as a weighted combination of the two dynamic range mappings with the weights being restricted to only vary by a given amount between neighboring pixels.
  • the adapter 71 may be arranged to determine a suitable dynamic range mapping for a plurality of pixels.
  • the dynamic range mapping may be applied in groups of four or 16 pixels with the selected dynamic range mapping being dependent on the merging property for all pixels. For examples, if more pixels within a block are indicated to be video pixels than overlay pixels, then a video dynamic range mapping is applied and otherwise an overlay dynamic range mapping is applied.
  • the inclusion of the bits indicating the merging property is performed following at least some of the vide encoding.
  • perceptual lossy video encoding is first performed (e.g. based on spatial frequency transform) followed by an lossless encoding of the resulting bits (e.g. using run length coding).
  • the LSBs of the lossy video encoding output may be substituted by bits indicating the merging property. This may allow efficient encoding without the risk of the encoding introducing errors to the merging property information bits.
  • At least one of the values A indicates the degree of merging of video and one or more overlays.
  • the adaptation parameters i.e. how the image pixels are adapted before display, is dependent one more than one parameter provided in more than one of the LSBs in one or more of the color components, such as e.g. in the two least significant bits of three components.
  • a video/overlay pixel indication (A) is encoded in one or more of the least significant bits of one or more of the pixel color components in the video signal.
  • the video signal is transmitted over the interface between the VPS and a display.
  • the display applies dynamic range adaptation to the image(s) of the video signal. This adaptation is performed in dependence on the video/overlay pixel indication (A).
  • the approach may be used in various configurations and using various (color component) formats, such as e.g. (non/limiting) RGB 4:4:4, YCbCr 4:4:4, YCbCr 4:2:2, YCbCr 4:2:0.
  • various (color component) formats such as e.g. (non/limiting) RGB 4:4:4, YCbCr 4:4:4, YCbCr 4:2:2, YCbCr 4:2:0.
  • the number of bits available for each color component may vary between different embodiments, e.g. there may typically be 10, 12, 16 bits per component. Most commonly 10 or 12 bits are used for video signals although 16 bits may have some use, albeit predominantly for luminance values. 8 bit values are also possible in many systems but are typically considered to be too low for HDR (it is more typically used for legacy equipment, e.g. for 8 bit MPEG video).
  • an apparatus in accordance with an example of e.g.one of FIGs. 1-7 may be arranged to communicate the merged video signal 6 in accordance with a video signal format.
  • the apparatus may communicate the merged video signal 6 over an HDMI cable, i.e. in accordance with an HDMI format.
  • the number of bits used to represent the color component values of the image may be smaller than a number of bits allocated to each color component by the video signal format.
  • the bits of the color component values may be allocated to some of the bits allocated to the component values by the video format and bits indicative of the one or more values (A) indicating the merging property for the pixel may be provided (inserted/ embedded) into some bits allocated to the color component values by the video signal format but not used by the color component values.
  • A the one or more values indicating the merging property for the pixel
  • communication of the values indicative of the merging property may be communicated without affecting the actual image information.
  • no degradation or artefacts need be introduced in order to support the additional functionality.
  • the source video has a bit-depth of 10 bits and the merged video is transmitted from the VPS to the display in a 12 bit mode
  • 2 LSB per component can be used to convey the video/overlay information without any degradation being introduced.
  • the pixel configuration is RGB or YCbCr 4:4:4 with 12 bits per component, 6 bits per pixel are available. If the configuration is YCbCr 4:2:2, it may be taken into account that the CbCr values are shared among 2 pixels, leaving 4 bits per pixel for the video/overlay indicator.
  • the least significant bit of the Cb or Cbr component could e.g. be used to indicate that at least one of the T values (transparency values) of the merger input pixels has a value different from zero.
  • this LSB could be used to indicate that both values of T were below a certain threshold.
  • multiple bits of the output of the merger are used, e.g. the LSB of Cb and the LSB of Cr, or the LSBs of all three components may be used.
  • Multiple bits could be used to differentiate between various merging levels. With two bits four levels can be distinguished, e.g. one value (1) could indicate no merging, another value (2) could indicate 75% video and 25% overlay, a third value (3) could indicate 50%> video, 50%> overlay, and a fourth value (4) could indicate 100% overlay. With more bits, a more precise indication of the contribution of the graphics overlay in the output levels can be achieved.
  • the number of bits per component in an interconnect format are 8, 10, 12 or 16 bits. If the video source is coded with 10 bits per component, it is thus possible to use 6 (12 bits output) or even 18 bits (16 bit output) to transmit merging information, such as the transparency level that was locally applied for the pixel.
  • One option would be to apply the 12-bit output mode and steal 3 LSB bits from each of Cb and Cr components and 2 bits from the Y-component. In that way, 8 bits would be available to transmit the merging level.
  • the RGB output mode could be used in a similar way.
  • the spatial resolution of each component is equal.
  • This is called YCbCr 4:4:4 or RGB 4:4:4.
  • subsampling of the color components is applied.
  • the color components are spatially subsampled by a factor of 2 in the horizontal direction only.
  • the color components are spatially subsampled by a factor of 2 in both directions. It may in some embodiments be beneficial to keep the full resolution for the merging indication bits. Therefore, for each LSB that is used to indicate merging information, a clear assignment may be provided as to which pixel location the bit applies.
  • the LSBs of Cb components may relate to graphics overlay for the odd pixels, while the LSBs of Cr components may relate to even pixels.
  • Another option to reduce the number of bits needed is to transmit a merging bitmap (providing indications of the merging property, i.e. A values) at a lower resolution than the video resolution.
  • the merging bitmap resolution may be for instance 960x540.
  • the LSBs of the Cb and the Cr components could be used to indicate the merging level for pixel locations that apply for the Cb and Cr components.
  • one or more A values may apply to a plurality of pixels, and specifically may apply to a group or area of pixels.
  • Signaling of which merging indication bit configuration is used across the interface may be indicated in a metadata channel also used for the dynamic range adaptation data.
  • signal 55 providing dynamic range adaptation parameters and signal 56, providing information on which LSBs of which components are used for indicating the merging property (often whether and/or to what extent a pixel is a mix of video and one or more overlays) may be done in the same metadata channel.
  • the signal 56 may also provide some statistical or general information such as for instance whether or not any subtitles are used.
  • the merging property indication bits may be used to decide on a per pixel basis if, and possibly also to what extent, the dynamic range adaptation is applied. If there is only one bit per pixel (indicating that the pixel is an overlay pixel or video pixel), the dynamic range adaptation intended for the original video could switch between full (no overlay) or limited (overlay). Limited could mean that no adaptation is applied at all, as in the example of FIG.8, or only to a certain extent. If there are multiple bits indicating e.g. a merging level value per pixel, the display may apply dynamic range adaptation by scaling the amount of adaptation depending on the merging level. The scaling may have a linear relation with the merging level or a non-linear function may be applied to optimize the perceived picture quality.
  • Figs. 15 and 16 show embodiments of a combination of encoding in a VPS and decoding in a TV.
  • an incoming signal 42 is received by the VPS 41.
  • the vertical axis illustrates the peak brightness PB or peak luminance (white point) for which the signal is graded, i.e. for which the images of the received video signal has been graded, e.g. by a manual color grading.
  • the signal 42 is in the example an LDR signal graded for a PB of 100 nit, and is for instance derived from an original HDR signal graded for 5000 nit.
  • an LDR optimized image it also represents an HDR image and thus is an HDR image representation.
  • a signal 55 is provided (as in figure 6) providing information on how to upgrade the signal to higher values for the peak brightness, or in other words how to apply dynamic range adaptation to the images of the video signal 42.
  • the signal 55 is passed on by the VPS 41.
  • merged signal 50 is produced with one or more LSBs of one or more color components comprise information on a merging property and specifically on the type and merging parameters for a pixel.
  • signal 56 is provided, which provides instructions on which LSBs are filled with which merging information, such as the type and merging parameters for a merged pixel.
  • the signal provides information on the encoding of the merging property.
  • the horizontal axis illustrates various steps in manipulation of a signal and the components that are used in such steps.
  • the signals 50, 55 and 56 are received at an input of a TV 141.
  • the TV 141 comprises a decoder 142 and a dynamic range adapter 143.
  • the decoder is instructed via the signal 56 of which LSBs of which components comprise which information, i.e. it is informed of how the merging property is encoded.
  • the decoder decodes the information on said LSBs and then sends the information to the dynamic range adapter 143, e.g. specifically telling the dynamic range adapter the type of the pixel (i.e. whether it is video and/or overlay or a mix) and, when appropriate what the mix is.
  • the dynamic range adapter 143 is provided with information from signal 55, which enables the adapter to upgrade the incoming LDR signal.
  • the merging type i.e. whether it is video and/or overlay or a mix
  • dynamic range adapter 143 uses dynamic range adapter 143 to dynamically adapt pixels wherein, as an example, an overlay pixel is kept at a 100 nit grading, for a pixel belonging only to video the dynamic range is adapted to 5000 nit, while for a mixed pixel (i.e. comprising both video and overlay) the dynamic range is adapted to 1500 nit.
  • the dynamic range adaptation is schematically indicated in Fig. 15 with the arrow U for upgrading.
  • the net result for a video pixel is denoted by the letter V, for an overlay with O, for a mix with M, and for a legacy TV, which has no means for performing dynamic range adaptation with L.
  • the so generated pixels values are sent to a display 144 for being displayed.
  • Fig. 16 illustrates a variation on the set-up of Fig. 15.
  • the incoming signal 42 is an HDR signal which in the specific example is graded for a maximum
  • the signal 55 comprises information on how to grade, i.e. perform a dynamic range adaptation, this signal 42 to higher peak brightness, such as for instance a 5000 nit peak brightness. It also includes information on how to grade the signal to lower peak brightness, for instance to a peak brightness of 100 nit.
  • Fig. 15 The difference with respect to Fig. 15 is that in the example of FIG. 16, the decoder plus dynamic range adapter may increase (upgrade) as well as decrease (downgrade) the dynamic range for a pixel. For that reason the arrow U, for Up, of FIG. 15 is denoted UD, for Up or Down, in Fig. 16.
  • the VPS (or a device in between the VPS and the legacy TV) provides a signal 150 which is derived from the merged signal after a dynamic range adaptation to reduce the dynamic range. This downgrading is denoted by the arrow D.
  • the VPS 41 uses the information in signal 55 for the dynamic range adaptation of the output signal L. This signal L is then fed to the input of a legacy TV for display on a legacy TV display.
  • the incoming signal may in other examples also be e.g. a signal graded for the highest possible peak brightness, (e.g. 5000 nit), in which case the display may perform dynamic range adaptation to reduce the range to a specific peak brightness of the display (being below 5000 nit).
  • a signal graded for the highest possible peak brightness e.g. 5000 nit
  • the display may perform dynamic range adaptation to reduce the range to a specific peak brightness of the display (being below 5000 nit).
  • Fig. 17 shows more in detail an embodiment of a display device, such as a TV, in accordance with some embodiments of the invention.
  • the merged signal 50 is provided to the dynamic range adapter 143 and to reader 142.
  • the reader 142 reads one or more LSBs and provides a signal 145 based on the reading of the content of said LSBs.
  • This signal comprises the values A indicating the merging property, such as specifically the merging type and/or merging parameters for the pixel. This constitutes a decoding of the information that was put in the signal 50 by the encoder 51 in the VPS 41 of for example the system of Fig. 7.
  • the reader 142 may be provided (as in the specific example) with a signal 56 informing the reader 142 of which LSBs contain which information, and in which manner the encoding has been performed, or in other words it providesthe decoding scheme.
  • the reader 142 may be provided with a signal indicating how the merging property has been encoded in the signal.
  • the dynamic range adapter thus receives merged signal 50 and signal 145 providing information on the merging property, such as the merging type and/or merging parameters of pixels.
  • the dynamic range adapter 143 is also provided with signal 55, which indicates parameters for the dynamic range adaptation, i.e. it may provide information on how to adapt the dynamic range.
  • the information from signals 50, 55 and 56 is used for dynamic range adaptation. If e.g. encoding formats or dynamic range adaptation approaches are known in advance, one or more of the signals 55 and/or 56 may not be utilized.
  • signal 56 need not be generated, sent and received, since at both the encoder and decoder side a known standard is followed.
  • Figs. 18 and 19 illustrate a further embodiment of the invention.
  • a linear or non-linear interpolation based on provided mixing parameters, as given in examples above, is one possibility for dynamic range adaptation for incoming pixels that are comprised of a mix of video and graphics, such as for instance subtitles.
  • a pixel is composed of video as well as graphics, such as a subtitle
  • a mixed signal is provided, but it is not unambiguously known which part of the pixel values for different colors in the merged signal was originally video and what part is originally subtitle.
  • the mixing parameters for instance indicating that 25% is subtitle and 75% is video, provide some guidance, but a more accurate determination may be useful.
  • an estimation is made of the contribution to the pixel values of the graphics (such as subtitles) and video in various colors.
  • the mixing parameters are known or at least it is known that there is a mix or a possible mix.
  • the video may be red and the subtitles may be green.
  • the mix between overlay and video may as an overall be 50%of each for the pixel, but the in mix ratio in the individual colors may vary substantially from this (as the chromas of the video and overlay may be very different for the pixel).
  • an estimate can be made of the contribution of the graphics and of the video to the signal for said pixel.
  • Some mixing parameters may be provided in the LSBs of some of the color components, and often graphics such as subtitles typically have a more or less constant color and intensity and possibly even size and form. This may allow an estimation of the individual contributions.
  • the incoming (for instance LDR) mixed signal is used to provide, by an estimator using the incoming mixing parameters and an analysis of a pixel and the surrounding pixels, an estimate of the graphics contribution and an estimate for the video contribution to the incoming signal.
  • the video signal is for a white spot and the subtitle is green
  • a neighboring pixel that is 50% video and 50% subtitle one can estimate the contribution in the various color components of the subtitle by comparing the mentioned pixel values.
  • subtitles are often of more or less constant color and intensity, and often of more or less standard size and form, can of course be used in the analysis.
  • the data for said pixel and for a number of surrounding pixels are for instance compared.
  • a single neighboring pixel can be used for the analysis, but using more than one of the surrounding pixels in the analysis typically yields better results.
  • Subtracting the estimated subtitle contribution from the signal provides for the estimated video contribution to the merged signal and vice versa.
  • a dynamic range adaptation may then be applied individually to the individual estimated signals.
  • the dynamic range adaptation may also use some extrapolation as descrobed in previous examples.
  • Fig. 18 illustrates an example of such an embodiment:
  • An incoming mixed LDR signal is provided to an estimator 146 indicated by EST in Fig. 18.
  • This estimator is also provided with information signal 145 by reader 142, so it knows where there are mixed pixels and in embodiments also an indication of the amount of mixing. Estimator 145 knows where there are mixed pixels and may have some further general information.
  • the estimator analyses the data of the pixels and surrounding pixels to provide a graph estimation signal 147 and a video estimation signal 148.
  • the signal 147 is an Yuv_graph_estimate signal, i.e. a Yuv signal giving an estimate of the subtitle contribution alone in the incoming LDR signal.
  • Signal 148 is an Yuv_video_estimate signal, i.e. a Yuv signal giving an estimate of the video contribution alone in the incoming LDR signal.
  • Signal 55 may provide instructions on the to perform the dynamic range adaptation. At least one of the signals 147 and 148 is adapted, but often and preferably both.
  • the adapted signals are remixed in mixer 149, in Fig. 18 schematically indicated by the + sign.
  • the estimated subtitle signal 147 can for instance be boosted somewhat less than the estimated video signal 148; in the remixed signal the subtitles are then less prominently visible.
  • the resulting signal for display 144 may for instance be T_2*Yuv_graph_estimate signal + (l-T_2)*K*Yuv_video_estimate where K is a boosting factor for the Yuv video estimate.
  • the video estimate signal is boosted, while the overlay estimate signal is maintained at its LDR value.
  • the adapted signals are then remixed with a remixing parameter T_2.
  • the remixing parameter T_2 may be provided by signal 55.
  • the decoder comprises an estimator which estimates the contribution in different colors of overlay and video based upon an analysis of a pixel and its surrounding pixels.
  • LSBs of the incoming signal 50 may, in embodiments, also be provided in LSBs of the incoming signal 50 itself.
  • one LSB in one component may indicate whether a pixel is video or some sort of mix, and 3 bits in each component may indicate the
  • the reader 142 provides information on the contributions to the various colors in signal 145 and the estimator 146 can simply use the provided data directly to generate an estimate of the contributions of respectively video and overlay.
  • the estimator may in this case not need to perform an analysis using data of a pixel and surrounding pixels, rather all information may be supplied in the signal 145, read by reader 142 from LSBs of the incoming signal 50.
  • the resulting signals may still be estimates (or at least have some quantisation error), since the number of bits available is inherently limited, and thus the signals leaving the estimator can be considered estimates of the original merging.
  • Using an analysis based on the surrounding of a pixel may in some embodiments and applications be preferred since less LSBs are needed for providing merging information and a higher accuracy can often be obtained.
  • Fig. 19 shows a detail of the display device of Fig. 18.
  • the decoder is shown, with the incoming signals 50, the merged signal, 55, the information on dynamic range adaptation 56, the information on which LSB are used and how, and the outgoing signal.
  • the decoder is thus arranged to estimate the original overlay and video signals that were merged at the encoder. Dynamic range adaptation may then be applied individually to these estimated signals, and specifically the dynamic range mapping applied to the estimated video signal (the images) may be different to that applied to the estimated overlay signals.
  • a plurality of overlay signals may be estimated, i.e. the merging may be (assumed to be) performed with a plurality of input overlay signals being combined into the merged video signal 6 (together with the video signal).
  • the decoder may estimate a plurality of overlay signals and may perform different dynamic range adaptations to the different overlay signals. For example, for a white overlay signal, no brightness boost may be applied whereas a relatively small brightness boost is applied to the e.g. a green overlay. A different and e.g. more aggressive dynamic range adaptation may be applied to the video signal.
  • the decoder accordingly seeks to reverse the merging performed by the merger of the encoder. Indeed, the decoder seeks to generate estimates of the original signals, i.e. of the video and overlay signals. If this is achieved, dynamic range mappings that are optimal for the individual content type for the specific display can then be applied. Following such individual dynamic range mapping (optimization), a merged dynamic range adapted video signal can be generated and presented.
  • the pixels that are merged pixels are identified, and e,g. the degree of merging/ mixing is determined.
  • the original constituents of the signal are reconstructed (e.g. assuming a low frequency characteristic of the video, or using more complicated co-estimation models), or more precisely they are estimated. Knowing what was merged, and knowing the original grading and the grading of the display, allows for a further improved adaptation.
  • the value A indicating the merging property may be referred to as a the Graphics Indicator bit which is generated and transmitted for each output pixel.
  • This bit may be embedded in the video output signal .
  • a lowest one of the luma bits i.e. least significant bit
  • a bit value of "1" could be graphics, and "0" could be normal video (i.e. in the merging only video).
  • the approach may be used when in the HDMV mode.
  • the Graphics Indicator bit IG may be generated for each pixel as a function of al and a2 applicable for that pixel. IG may be set to l if any of al and «2 have a value greater than 0.06, otherwise IG may be set to 0b. See also FIG. 20.
  • the Graphics Indicator may be generated as illustrated in FIG. 21.
  • the the Graphics Indicator bit shall may be set to lb for all pixel locations for which the Background plane pixel is set in the interim video data. For all other pixel locations the Graphics Indicator bit may be generated in the same way as it is generated in HDMV mode, except that in the BD-J graphics case a2 is extracted directly from the pixel data.
  • HDR high definition digital versatile disc
  • TV any display device that comprises a display; it could be a screen of a home computer, or of a home video system, or of a tablet or any portable display device
  • HDR devices are often used at home, but this is not to be considered a restriction for the invention.
  • HDR displays may be used in many devices of various types.
  • graphics may be used to indicate a general type of overlay such as subtitles or menu or other overlays.
  • An overlay may be any additional signal that is merged in a merger to an image video signal.
  • color should not be interpreted to only refer to chroma values or properties but rather may also include luminance, or indeed may only refer to luminance.
  • a color grading may be a luminance grading only where chroma is not considered.
  • color grading/ tone mapping/ dynamic range adaptation may be considered to be equivalent (as indeed is in accordance with their use in the field).
  • the algorithmic components disclosed in this text may (entirely or in part) be realized in practice as hardware (e.g. parts of an application specific IC) or as software running on a special digital signal processor, or a generic processor, etc.
  • Arrangement "system”, or similar words are also intended to be used in the broadest sense, so it may comprise or be formed in inter alia a single apparatus, a part of an apparatus, a collection of (parts of) cooperating apparatuses, etc.
  • a computer program product version of the present embodiments as denotation should be understood to encompass any physical realization of a collection of commands enabling a generic or special purpose processor, after a series of loading steps (which may include intermediate conversion steps, such as translation to an intermediate language, and a final processor language) to enter the commands into the processor, and to execute any of the characteristic functions of an invention.
  • the computer program product may be realized as data on a carrier such as e.g.
  • There may be provide a method for encoding a video signal comprising adding to an input video image signal (2) one or more overlay signals (3, 4) to form a merged video signal (6, 50), generating one or more values (A) indicating for a pixel in the merged video signal a merging type and/or one or more merging parameters and encoding for said pixel said one or more values (A) in one or more least significant bits from one or more color components of the merged video signal (6, 50)
  • At least one of said one or more values (A) indicates the merging type of said pixel.
  • At least one of said one or more values provides a merging parameter indicating the amount of merging of the input video signal and one or more overlay signals for said pixel.
  • a single least significant bit is used for indicating whether the pixel is video, or overlay signal or a merge of video and one or more overlay signals.
  • the number of least significant bits indicating the merging type and/or one or more merging parameters in the merged video signal is variable, and indicated by a coding length signal.
  • an information signal (56) is provided comprising information on which least significant bits are used for indicating the merging type and/or one or more merging parameters in the merged video signal (50) for said pixel and/or how said least significant bits are used to indicate a codification method.
  • a video processor for encoding a video signal comprising a merger (5, 51) for merging an input video image signal (2, 46, 48, 54) and one or more overlays signals (3, 4) to form a merged video signal (6, 50), and an image encoder (7, 51) for generating or receiving one or more values (A) indicating for a pixel in the merged video signal the merging type and/or one or more merging parameters and for encoding for said pixel said one or more values (A) in one or more least significant bits from one or more color components of the merged video signal (6, 50).
  • the encoder is arranged for encoding at least one least significant bit with a value indicating a merging type for said pixel.
  • the encoder is arranged for encoding at least one value providing a merging parameter indicating the amount of merging of video and one or more overlays.
  • the encoder is arranged for providing to the merged video signal an information signal (56) with information on which least significant bits are used for indicating the merging type and/or one or more merging parameters in the merged signal for said pixel and how.
  • the merger (5) is arranged for providing a merging information signal (MIS) to the encoder (7).
  • MIS merging information signal
  • the video processer may be comprised in a set top box or BD player.
  • a method for decoding a video signal wherein a video signal merged from more than one signal is received, for a pixel one or more of the least significant bits of one or more of the color components of the video signal are read and from said least significant bits one or more values (A) are generated and wherein said pixel of the received video image signal is subjected to an adaptation prior to display, wherein the adaptation is dependent on the generated value (A) or values.
  • the adaptation comprises a step of applying a dynamic range adapting color transformation.
  • At least one the values A indicates a merging type for the pixel and the adaptation prior to display is dependent on the merging type of the pixel.
  • At least one of the values (A) represent whether or not a pixel is video or overlay and/or a mixture of video and overlay.
  • at least one of the values indicates an amount of merging of an image video signal and one or more overlay signals and the adaptation prior to display is dependent on the amount of merging.
  • a single least significant bit is read to obtain the value (A).
  • the video signal is split into more than one estimated signal estimating the more than one signals prior to the merge, based on an estimate of contribution of video and overlay to a pixel value of the signal, whereafter at least one of the signals is color transformed to adapt its luminance, and the adapted signals are remixed.
  • the splitting of a signal for a pixel is based on an analysis of the data for said pixel and data for a number of surrounding pixels which are identified as non- mixed video pixels.
  • a video decoder for decoding a video signal merged from more than one signal comprising an input for receiving a video signal, a reader (72) for reading at least one or more least significant bits for one or more color components of the video signal for a pixel and generating one or more values (A) from the read least significant bits and an adapter (71, 143) for adapting the video, and wherein the adapter is arranged for adapting a pixel value in dependence on the generated value or values (A).
  • the video decoder comprises an input for a signal (56) with information on which least significant bits to read and how to convert them to the values (A).
  • the adapter is arranged for performing a dynamic range adaptation on the image pixels.
  • the reader is adapted to read a single least significant bit to obtain
  • video signal into more than one estimated signal (147, 148), based on an estimate of contribution of image video and overlay to a pixel value of the signal, at least one of the signals is adapted, and comprising a mixer (149) for remixing the adapted signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Graphics (AREA)
  • Human Computer Interaction (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Controls And Circuits For Display Device (AREA)
  • Image Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
EP15788431.3A 2014-11-10 2015-11-04 Method for encoding, video processor, method for decoding, video decoder Withdrawn EP3219108A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP14192484 2014-11-10
PCT/EP2015/075651 WO2016074999A1 (en) 2014-11-10 2015-11-04 Method for encoding, video processor, method for decoding, video decoder

Publications (1)

Publication Number Publication Date
EP3219108A1 true EP3219108A1 (en) 2017-09-20

Family

ID=51868876

Family Applications (1)

Application Number Title Priority Date Filing Date
EP15788431.3A Withdrawn EP3219108A1 (en) 2014-11-10 2015-11-04 Method for encoding, video processor, method for decoding, video decoder

Country Status (9)

Country Link
US (1) US10567826B2 (pt)
EP (1) EP3219108A1 (pt)
JP (1) JP6698081B2 (pt)
CN (1) CN107113470B (pt)
BR (1) BR112017009528A2 (pt)
MX (1) MX2017005983A (pt)
RU (1) RU2689411C2 (pt)
WO (1) WO2016074999A1 (pt)
ZA (1) ZA201703974B (pt)

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2825699T3 (es) * 2014-12-11 2021-05-17 Koninklijke Philips Nv Optimización e imágenes de alto rango dinámico para pantallas particulares
US10122928B2 (en) * 2015-09-09 2018-11-06 Red.Com, Llc Motion video output for multiple displays
JP6831389B2 (ja) * 2015-11-24 2021-02-17 コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. 複数のhdr画像ソースの処理
US10750173B2 (en) * 2016-03-07 2020-08-18 Koninklijke Philips N.V. Encoding and decoding HDR videos
ES2951773T3 (es) * 2016-03-18 2023-10-24 Koninklijke Philips Nv Codificación y decodificación de vídeos HDR
TWI764898B (zh) * 2016-08-03 2022-05-21 日商新力股份有限公司 資訊處理裝置、資訊處理方法及程式
JP6729170B2 (ja) * 2016-08-23 2020-07-22 沖電気工業株式会社 画像処理システム及び画像復号装置
KR102554379B1 (ko) 2016-10-31 2023-07-11 엘지디스플레이 주식회사 하이 다이나믹 레인지 영상 처리 방법 및 영상 처리 모듈과 그를 이용한 표시 장치
WO2018147196A1 (ja) * 2017-02-09 2018-08-16 シャープ株式会社 表示装置、テレビジョン受像機、映像処理方法、バックライト制御方法、受信装置、映像信号生成装置、送信装置、映像信号伝送システム、受信方法、プログラム、制御プログラム、及び記録媒体
JP6381701B2 (ja) * 2017-02-09 2018-08-29 シャープ株式会社 受信装置、テレビジョン受像機、映像信号生成装置、送信装置、映像信号伝送システム、受信方法、プログラム、及び記録媒体
US10845865B2 (en) * 2017-04-21 2020-11-24 Intel Corporation Reducing power consumption when transferring frames from graphics processors to display panels
EP3644618A4 (en) * 2017-06-21 2020-05-27 Panasonic Intellectual Property Management Co., Ltd. IMAGE DISPLAY SYSTEM AND IMAGE DISPLAY METHOD
US10939158B2 (en) * 2017-06-23 2021-03-02 Samsung Electronics Co., Ltd. Electronic apparatus, display apparatus and control method thereof
CN109691117B (zh) * 2017-07-07 2022-05-27 松下知识产权经营株式会社 影像处理系统及影像处理方法
CN109691115B (zh) 2017-07-14 2022-02-08 松下知识产权经营株式会社 影像显示装置及影像显示方法
US10504263B2 (en) * 2017-08-01 2019-12-10 Samsung Electronics Co., Ltd. Adaptive high dynamic range (HDR) tone mapping with overlay indication
EP3451677A1 (en) * 2017-09-05 2019-03-06 Koninklijke Philips N.V. Graphics-safe hdr image luminance re-grading
US10817983B1 (en) * 2017-09-28 2020-10-27 Apple Inc. Method and device for combining real and virtual images
WO2019069482A1 (ja) 2017-10-06 2019-04-11 パナソニックIpマネジメント株式会社 映像表示システム及び映像表示方法
US10856040B2 (en) * 2017-10-31 2020-12-01 Avago Technologies International Sales Pte. Limited Video rendering system
KR102523672B1 (ko) * 2017-11-14 2023-04-20 삼성전자주식회사 디스플레이장치 및 그 제어방법과 기록매체
CN108205320A (zh) * 2017-12-18 2018-06-26 深圳市奇虎智能科技有限公司 地图数据处理方法及装置
EP3525463A1 (en) * 2018-02-13 2019-08-14 Koninklijke Philips N.V. System for handling multiple hdr video formats
KR102546990B1 (ko) 2018-03-16 2023-06-23 엘지전자 주식회사 신호 처리 장치 및 이를 구비하는 영상표시장치
KR102154530B1 (ko) * 2018-06-06 2020-09-21 엘지전자 주식회사 360 비디오 시스템에서 오버레이 미디어 처리 방법 및 그 장치
US11012657B2 (en) * 2018-06-08 2021-05-18 Lg Electronics Inc. Method for processing overlay in 360-degree video system and apparatus for the same
EP3588953A1 (en) * 2018-06-26 2020-01-01 InterDigital CE Patent Holdings Method for producing an output image comprising displayable non-linear output data and at least one semantic element
US10832389B2 (en) 2018-12-13 2020-11-10 Ati Technologies Ulc Method and system for improved visibility in blended layers for high dynamic range displays
JP7256663B2 (ja) * 2019-03-26 2023-04-12 キヤノン株式会社 画像出力装置およびその制御方法
JP7083319B2 (ja) * 2019-04-16 2022-06-10 株式会社ソニー・インタラクティブエンタテインメント 画像生成装置および画像生成方法
JP2022543864A (ja) 2019-08-05 2022-10-14 ホップラ リミテッド メディア再生機器にコンテンツを提供するための方法およびシステム
EP4068196A1 (en) * 2021-03-30 2022-10-05 Beijing Xiaomi Mobile Software Co., Ltd. High dynamic range tone mapping

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100841436B1 (ko) * 2002-08-08 2008-06-25 삼성전자주식회사 영상 기록/재생 장치 및 그 기억장치 제어방법
US7142723B2 (en) * 2003-07-18 2006-11-28 Microsoft Corporation System and process for generating high dynamic range images from multiple exposures of a moving scene
US7158668B2 (en) * 2003-08-01 2007-01-02 Microsoft Corporation Image processing using linear light values and other image processing improvements
US8218625B2 (en) 2004-04-23 2012-07-10 Dolby Laboratories Licensing Corporation Encoding, decoding and representing high dynamic range images
KR101545009B1 (ko) * 2007-12-20 2015-08-18 코닌클리케 필립스 엔.브이. 스트레오스코픽 렌더링을 위한 이미지 인코딩 방법
EP2539895B1 (en) * 2010-02-22 2014-04-09 Dolby Laboratories Licensing Corporation Video display with rendering control using metadata embedded in the bitstream.
PL2543181T3 (pl) 2010-03-03 2016-03-31 Koninklijke Philips Nv Urządzenia i sposoby definiowania systemu kolorów
TWI690211B (zh) * 2011-04-15 2020-04-01 美商杜比實驗室特許公司 高動態範圍影像的解碼方法、其處理器非暫態可讀取媒體及電腦程式產品
JP6009538B2 (ja) 2011-04-28 2016-10-19 コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. Hdr画像を符号化及び復号するための装置及び方法
MX2013012976A (es) 2011-05-10 2013-12-06 Koninkl Philips Nv Generacion y procesamiento de señal de imagen de intervalo dinamico alto.
MX340266B (es) 2011-06-14 2016-07-04 Koninklijke Philips Nv Procesamiento de graficos para video de intervalo dinamico alto.
JP5911643B2 (ja) 2012-10-08 2016-04-27 コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. 色制約付きの輝度変更画像処理
CN105009567B (zh) * 2013-02-21 2018-06-08 杜比实验室特许公司 用于合成叠加图形的外观映射的系统和方法
US9967599B2 (en) * 2013-04-23 2018-05-08 Dolby Laboratories Licensing Corporation Transmitting display management metadata over HDMI
US8872969B1 (en) * 2013-09-03 2014-10-28 Nvidia Corporation Dynamic relative adjustment of a color parameter of at least a portion of a video frame/image and/or a color parameter of at least a portion of a subtitle associated therewith prior to rendering thereof on a display unit
US9538155B2 (en) 2013-12-04 2017-01-03 Dolby Laboratories Licensing Corporation Decoding and display of high dynamic range video
WO2016020189A1 (en) 2014-08-08 2016-02-11 Koninklijke Philips N.V. Methods and apparatuses for encoding hdr images

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
None *
See also references of WO2016074999A1 *

Also Published As

Publication number Publication date
JP6698081B2 (ja) 2020-05-27
CN107113470A (zh) 2017-08-29
MX2017005983A (es) 2017-06-29
RU2017120282A (ru) 2018-12-13
US20180278985A1 (en) 2018-09-27
WO2016074999A1 (en) 2016-05-19
CN107113470B (zh) 2021-07-13
BR112017009528A2 (pt) 2018-02-06
ZA201703974B (en) 2019-06-26
RU2689411C2 (ru) 2019-05-28
RU2017120282A3 (pt) 2019-03-28
US10567826B2 (en) 2020-02-18
JP2018503283A (ja) 2018-02-01

Similar Documents

Publication Publication Date Title
US10567826B2 (en) Method for encoding, video processor, method for decoding, video decoder
US11423523B2 (en) Apparatus and method for dynamic range transforming of images
EP3381179B1 (en) Handling multiple hdr image sources
KR102135841B1 (ko) 높은 다이내믹 레인지 이미지 신호의 생성 및 처리
JP5992997B2 (ja) 映像符号化信号を発生する方法及び装置
US9288489B2 (en) Apparatuses and methods for HDR image encoding and decoding
US20170347113A1 (en) Local dynamic range adjustment color processing

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20170612

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20180327

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

APBK Appeal reference recorded

Free format text: ORIGINAL CODE: EPIDOSNREFNE

APBN Date of receipt of notice of appeal recorded

Free format text: ORIGINAL CODE: EPIDOSNNOA2E

APBR Date of receipt of statement of grounds of appeal recorded

Free format text: ORIGINAL CODE: EPIDOSNNOA3E

APAF Appeal reference modified

Free format text: ORIGINAL CODE: EPIDOSCREFNE

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: KONINKLIJKE PHILIPS N.V.

APBT Appeal procedure closed

Free format text: ORIGINAL CODE: EPIDOSNNOA9E

APAM Information on closure of appeal procedure modified

Free format text: ORIGINAL CODE: EPIDOSCNOA9E

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20231115