CN113748426A - Content aware PQ range analyzer and tone mapping in real-time feeds - Google Patents

Content aware PQ range analyzer and tone mapping in real-time feeds Download PDF

Info

Publication number
CN113748426A
CN113748426A CN202080031250.5A CN202080031250A CN113748426A CN 113748426 A CN113748426 A CN 113748426A CN 202080031250 A CN202080031250 A CN 202080031250A CN 113748426 A CN113748426 A CN 113748426A
Authority
CN
China
Prior art keywords
image
image processing
luminance
content type
dynamic range
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202080031250.5A
Other languages
Chinese (zh)
Other versions
CN113748426B (en
Inventor
A·赞迪法尔
J·E·克伦肖
C·M·瓦斯科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dolby Laboratories Licensing Corp
Original Assignee
Dolby Laboratories Licensing Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolby Laboratories Licensing Corp filed Critical Dolby Laboratories Licensing Corp
Publication of CN113748426A publication Critical patent/CN113748426A/en
Application granted granted Critical
Publication of CN113748426B publication Critical patent/CN113748426B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • G06T5/92
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/02Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the way in which colour is displayed
    • G09G5/026Control of mixing and/or overlay of colours in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0125Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level one of the standards being a high definition standard
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20208High dynamic range [HDR] image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/10Recognition assisted with metadata
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2320/00Control of display operating conditions
    • G09G2320/02Improving the quality of display appearance
    • G09G2320/0238Improving the black level
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2320/00Control of display operating conditions
    • G09G2320/06Adjustment of display parameters
    • G09G2320/066Adjustment of display parameters for control of contrast
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2340/00Aspects of display data processing
    • G09G2340/06Colour space transformation
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2370/00Aspects of data communication
    • G09G2370/02Networking aspects
    • G09G2370/022Centralised management of display operation, e.g. in a server instead of locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/46Colour picture communication systems
    • H04N1/56Processing of colour picture signals
    • H04N1/60Colour correction or control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computer Hardware Design (AREA)
  • Image Processing (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Analysis (AREA)
  • Computing Systems (AREA)

Abstract

An image processing system includes: an input configured to receive an image signal, the image signal comprising a plurality of frames of image data; and a processor configured to automatically determine an image classification based on at least one frame of the plurality of frames and dynamically generate mapping metadata based on the image classification. The processor includes: determination circuitry configured to determine a content type of the image signal; segmentation circuitry configured to segment the image data into a plurality of feature item regions based on the content type; extraction circuitry configured to extract at least one image aspect value for respective ones of the plurality of feature item regions.

Description

Content aware PQ range analyzer and tone mapping in real-time feeds
Cross Reference to Related Applications
This application claims priority from U.S. provisional patent application serial No. 62/838,518 filed on 25.4.2019 and EP patent application serial No. 19171057.3 filed on 25.4.2019, each of which is incorporated herein by reference in its entirety.
Technical Field
The present application relates generally to images. More particularly, the present application relates to content perception in terms of PQ range analysis and tone mapping of real-time feeds.
Background
As used herein, the term "dynamic range" may relate to the ability of the human visual system to perceive a range of intensities (e.g., luminance, brightness, etc.) in an image; for example, the range is from darkest black ("dark") to brightest white ("high light"). In this sense, the dynamic range is related to the "scene-referenced" strength. The dynamic range may also relate to the ability of the display device to adequately or appropriately render an intensity range of a particular breadth (break). In this sense, dynamic range refers to "display-referenced" intensity. Unless a particular meaning is explicitly specified anywhere in the description herein as having a particular meaning, it is to be inferred that the term can be used in either sense (e.g., interchangeably).
As used herein, the term "high dynamic range" (HDR) relates to a dynamic range span of about 14-15 orders of magnitude across the human visual system. In practice, the dynamic range over which humans can simultaneously perceive a wide breadth in the intensity range may be relatively truncated with respect to HDR. As used herein, the terms "extended dynamic range" (EDR) or "visual dynamic range" (VDR) may relate, individually or interchangeably, to a dynamic range that is simultaneously perceivable by the human visual system. As used herein, EDR may be related to a dynamic range spanning five to six orders of magnitude. Thus, while EDR may be somewhat narrow relative to the real scene reference HDR, EDR nonetheless represents a wide dynamic range breadth and may also be referred to as HDR.
In practice, an image includes one or more color components (e.g., luminance Y and chrominance Cb and Cr), where each color component is represented with a precision of n bits per pixel (e.g., n-8). Linear luminance coding is used, where an image of n <8 (e.g., a color 24-bit JPEG image) is considered to be a standard dynamic range image, and where an image of n >8 can be considered to be an enhanced dynamic range image. EDR and HDR images may also be stored and distributed using a high precision (e.g., 16-bit) floating point format, such as the OpenEXR file format developed by Industrial Light and Magic.
Most consumer desktop displays support 200 to 300cd/m2(nit) luminance. Most consumer high definition television ("HDTV") ranges from 300 to 1000 nits. In contrast to the HDR or EDR,such displays therefore represent a Low Dynamic Range (LDR), also known as Standard Dynamic Range (SDR). As the availability of EDR content grows due to the development of both capture devices (e.g., cameras) and EDR displays (e.g., PRM-4200 professional reference monitors from dolby laboratories), EDR content may be color graded and displayed on EDR displays that support a higher dynamic range (e.g., from 1000 nits to 5000 nits or greater).
As used herein, the term "display management" includes, but is not limited to, the processing (e.g., tone and gamut mapping) required for a display that maps an input video signal of a first dynamic range (e.g., 1000 nits) to a second dynamic range (e.g., 500 nits).
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Thus, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, problems identified with respect to one or more methods should not be assumed to have been recognized in any prior art based on this section.
Disclosure of Invention
Various aspects of the present disclosure relate to circuits, systems, and methods for image processing, including content awareness in PQ range analysis and tone mapping of real-time feeds.
In one exemplary aspect of the present disclosure, there is provided an image processing system including: an input configured to receive an image signal, the image signal comprising a plurality of frames of image data; and a processor configured to automatically determine an image classification based on at least one frame of the plurality of frames and dynamically generate mapping metadata based on the image classification, wherein the processor comprises: determination circuitry configured to determine a content type of the image signal; segmentation circuitry configured to segment the image data into a plurality of feature item regions based on the content type; and extraction circuitry configured to extract at least one image aspect value for respective ones of the plurality of feature item regions.
In another exemplary aspect of the present disclosure, there is provided an image processing method including: receiving an image signal, the image signal including a plurality of frames of image data; automatically determining an image classification based on at least one frame of the plurality of frames, comprising: determining a content type of the image signal, segmenting the image data into a plurality of spatial regions based on the content type, and extracting at least one image aspect value for respective ones of the plurality of spatial regions; and generating a plurality of frames of mapping metadata based on the image classification, wherein respective ones of the plurality of frames of mapping metadata correspond to respective ones of the plurality of frames of image data.
In yet another exemplary aspect of the disclosure, a non-transitory computer-readable medium storing instructions that, when executed by a processor of an image processing system, cause the image processing system to perform operations comprising: receiving an image signal, the image signal including a plurality of frames of image data; automatically determining an image classification based on at least one frame of the plurality of frames, comprising: determining a content type of the image signal, segmenting the image data into a plurality of spatial regions based on the content type, and extracting at least one image aspect value for respective ones of the plurality of spatial regions; and dynamically generating mapping metadata based on the image classification on a frame-by-frame basis.
In this manner, various aspects of the present disclosure provide improvements in at least the technical fields of image processing and related technical fields of image capture, encoding, and broadcasting.
Drawings
These and other more detailed and specific features of various aspects of the present disclosure are more fully disclosed in the following description, with reference to the accompanying drawings, in which:
FIG. 1 illustrates a source scene and various rendered scenes in accordance with aspects of the present disclosure;
FIG. 2 illustrates a block diagram of an exemplary broadcast workflow in accordance with various aspects of the present disclosure;
FIG. 3 illustrates a block diagram of an exemplary processing unit in accordance with various aspects of the present disclosure;
FIG. 4 illustrates a process flow of an exemplary processing method in accordance with various aspects of the present disclosure;
FIG. 5 illustrates a process flow of an exemplary classification method in accordance with various aspects of the present disclosure;
FIG. 6 illustrates an exemplary scenario in accordance with various aspects of the present disclosure;
FIG. 7 illustrates another exemplary scenario in accordance with aspects of the present disclosure; and
fig. 8 illustrates another exemplary scenario in accordance with various aspects of the present disclosure.
Detailed Description
In the following description, numerous details are set forth, such as circuit configurations, waveform timing, circuit operations, etc., in order to provide an understanding of one or more aspects of the present disclosure. It will be apparent to those skilled in the art that these specific details are merely exemplary and are not intended to limit the scope of the present application.
The present disclosure may be embodied in various forms including hardware or circuitry that is controlled by: computer-implemented methods, computer program products, computer systems and networks, user interfaces, and application programming interfaces; as well as hardware implemented methods, signal processing circuits, memory arrays, application specific integrated circuits, field programmable gate arrays, and the like. The foregoing summary is intended only to present a general idea of various aspects of the disclosure, and is not intended to limit the scope of the disclosure in any way.
Video capture, analysis, and encoding are described herein. In the following description, numerous details are set forth, such as circuit configurations, timings, circuit operations, etc., in order to provide an understanding of one or more aspects of the present disclosure. It will be apparent to those skilled in the art that these specific details are merely exemplary and are not intended to limit the scope of the present application. For example, in some instances, various aspects of the disclosure may be practiced without these details. In other instances, well-known structures and devices may not be described in great detail in order to avoid unnecessarily obscuring, or obscuring the present invention.
SUMMARY
Examples described herein relate to image processing, including generating metadata during a real-time broadcast of a video stream. Some examples described herein may be used with a "dolby view" architecture. Dolby vision for consumer applications is an end-to-end technology suite that enables the creation and distribution of content that holds a high dynamic range and a wide color gamut. Dolby view display management matches the capabilities of a given television (which may only be capable of displaying SDR images) by using a series of algorithms to map signals to any dolby view consumer class television. When displaying HDR content on an SDR display, HDR images are mapped to a relatively reduced dynamic range of the display.
FIG. 1 illustrates an example of a mapping from a source scene to various rendered scenes. As illustrated in fig. 1, the HDR image 101 depicts the source scene with both dark colors (e.g., regions in the lower left and upper left of the HDR image 101) and highlights (e.g., regions in the upper middle and upper right of the HDR image 101). When the HDR image 101 is mapped to faithfully display high light on the SDR display, an underexposed image 102 may be created as a rendered scene. In an underexposed image 102, highlights are accurately reproduced, but detail is reduced or lost in areas corresponding to the dark colors. Conversely, when HDR image 101 is mapped to faithfully display a dark color on an SDR display, overexposed image 103 may be created as a rendered scene. In overexposed image 103, the dark color is now accurately reproduced, but the areas corresponding to the highlights may appear to fade. To render a converted image that is neither underexposed nor overexposed, metadata (i.e., data related to the image data) can be utilized to determine which features of the HDR image 101 should be considered as the focal region of the image.
Fig. 2 illustrates an example of a broadcast workflow system 200 that includes video capture, production and post-production, and real-time distribution. Video capture may be accomplished by one or more camera groups (camera banks) 210, each camera group including one or more cameras 211. Each camera group 210 may be located at a different physical location to capture different video content. For example, if the broadcast workflow system 200 is used for real-time sports broadcasting, a first set of cameras 210 may be positioned to capture video of the sports event itself, a second set of cameras 210 may be positioned to capture video between broadcasts, a third set of cameras 210 may be positioned to capture video of analysts in the studio, and so on. Each camera group 210 may include any number of cameras 211. A separate camera 211 may be capable of capturing either HDR video data or SDR video data. Video data captured by a given camera 211 is passed through a corresponding contribution link 212 for further processing.
As illustrated in fig. 2, video data passed through contributing link 212 is received at a corresponding input converter 220. In the case where the video data is HDR video data, the input converter 220 may perform HDR to HDR conversion; for example, the conversion from mixed logarithmic gamma (HLG) or SLog-3 HDR to Perceptual Quantizer (PQ) HDR, for example, as described in Rec.ITU-R BT.2100-1(06/2017) ("Image parameter values for high dynamic range hierarchy vision for use in the production and interactive program exchange").
Where the video data is SDR video data, input converter 220 may perform SDR to HDR conversion. Although fig. 2 illustrates an input converter 220 for each contributing link 212, in practice there may be fewer input converters 220. For example, in the case where the video data is HDR video data using PQ, no conversion may occur, and thus the input converter 220 may not be provided. In either case, the video data is provided to production switchbox 221.
Production switchboards 221 receive video data from each camera 211 and provide several outputs including: a broadcast stream 222, which may correspond to video data received from a selected one of the cameras 211; an output to a Quality Control (QC) unit 223; an output to a mapping unit 224, which may in turn provide an output to an SDR-capable QC unit 223; output to playout server 225; and file ingestion (file ingest)226 for storage. Data from the file ingest 226 may be subjected to further processing in a post production unit 227 and then provided to the playout server 225. Video data that may be stored in playout server 225 may be utilized for later playback, such as for example for instant replay or midfield/midfield break analysis. The output of playout server 225 may include SDR video data (in which case the conversion may be performed via another input converter 220), HDR video data, or both.
For real-time distribution, the broadcast stream 222 and/or data from the playout server 225 is received at the router 230. Router 230 provides several outputs including: one or more outputs (HDR and/or SDR) to QC unit 223; one or more HDR distribution streams 231, each to a respective broadcast encoder 232; one or more SDR distribution streams 237 (e.g., SDR simulcasts); and HDR and/or SDR outputs to mapping unit 238. The respective broadcast encoder 232 includes an HDR Processing Unit (HPU)233 that receives the HDR distribution stream 231, performs various analyses (as will be described in more detail below), and outputs an HDR video feed 234 and a metadata feed 235. The HDR video feed 234 and the metadata feed 235 are provided to an encoding unit 236 for encoding and broadcasting. The SDR distribution stream 237 (if present) may be output directly to the encoding unit 236 without generating the metadata feed 235.
HDR processing
Fig. 3 illustrates an exemplary image processing system in accordance with various aspects of the present disclosure. In particular, fig. 3 illustrates an HPU300, which may be an example of HPU 233 illustrated in fig. 2. HPU300 includes an input/output (I/O) unit 310, a memory 320, a communication unit 330, a User Interface (UI)340, and a processor 350. The various elements of HPU300 communicate with each other via bus 360. The I/O unit receives input data 311, which may be an example of HDR distribution stream 231 illustrated in fig. 2, and outputs video feed 312 and metadata feed 313, which may be examples of HDR video feed 234 and metadata feed 235 illustrated in fig. 2, respectively. Processor 350 includes a determination unit 351, a segmentation unit 352, and an extraction unit 353, each of which will be described in more detail below.
The various components of HPU300 may be implemented as hardware, software, firmware, or a combination thereof. For example, the various units may be implemented as circuits or circuitry, as software modules in memory or algorithms in a processor, etc., including combinations of circuitry and software modules.
The I/O unit 310 may include one or more ports for inputting or outputting data via wires, optical fibers, wireless communication protocols, or a combination thereof. The memory 320 may be a volatile memory unit or a non-volatile memory unit, including but not limited to Read Only Memory (ROM) or Random Access Memory (RAM), such as a hard disk, flash memory devices, or the like. The communication unit 330 may include circuitry for receiving control signals or other communications from outside of the HPU300 via wires, optical fibers, wireless communication protocols, or a combination thereof. UI 340 may include a device or port, such as a mouse, keyboard, touch screen interface, display, graphical UI (gui), etc., for receiving instructions from and/or communicating with a local user.
Various components of HPU300, including but not limited to processor 350, may be implemented with a computer system, a system configured in electronic circuitry and components, and an Integrated Circuit (IC) device such as a microcontroller, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Field Programmable Gate Array (FPGA), another configurable or Programmable Logic Device (PLD), a discrete-time or Digital Signal Processor (DSP), an application specific IC (asic), etc. In one example, the determination unit 351, the segmentation unit 352, and the extraction unit 353 may be implemented as circuitry within the processor 350. In another example, the determining unit 351, the segmenting unit 352 and the extracting unit 353 may be implemented as software modules within the processor 350. Each of the determination unit 351, the segmentation unit 352, and the extraction unit 353 (variaous ones of) may share circuit components, algorithms, and/or subroutines with each other.
An example of an image processing method implemented by the HPU300 is illustrated in fig. 4-5. In step S401, the HPU300 receives an image signal; e.g., via I/O unit 310. The image signal includes a plurality of frames of image data and may correspond to a real-time video feed. In step S401, the HPU300 automatically determines an image classification based on at least one frame of the plurality of frames of image data included in the image signal. This determination may include a series of sub-processes, as illustrated in FIG. 5. For example, at step S501, the HPU300 determines the content type of the image signal, at step S502, the HPU300 divides the image data into a plurality of feature item regions based on the determined content type, and at step S503, the HPU300 extracts at least one image aspect value of respective ones of the plurality of feature item regions. The image classification may be performed by the processor 350, so that the content type determination of step S501 may be performed by the determination unit 351, the image data segmentation of step S502 may be performed by the segmentation unit 352, and the image aspect value extraction of step S503 may be performed by the extraction unit 353. As will be clearly understood and appreciated by the skilled artisan, image classification may generally involve, but is not limited to, assigning (e.g., by labeling or segmentation) images into several (e.g., predefined) categories and/or assigning a single image into several regions (e.g., based on content within the image). In particular, depending on various implementations and/or requirements, such assignment or classification may be performed using any suitable manner based on any suitable criteria and/or conditions. For example, the assignment or classification may be implemented based on the type of content determined from the respective images. Thus, in the present disclosure, the series of sub-processes/sub-routines S501-S503 may be referred to collectively and collectively as an image classification process/algorithm, or simply image classification. Based on the image classification, at step S403 (see fig. 4), the HPU300 generates mapping metadata for output; e.g., via I/O unit 310.
Generation and use of mapping metadata
These methods will be described in more detail with reference to fig. 6-8, which fig. 6-8 illustrate exemplary scenarios. In particular, fig. 6-8 illustrate examples of individual frames of image data, which may be frames of the HDR distribution stream 232 and/or the input data 311. Fig. 6 illustrates a frame 600 in which the content type is beach volleyball. Fig. 7 illustrates a frame 700 in which the content type is a cricket. Fig. 8 illustrates a frame 800 in which the content type is football (soccer). Although the content types of fig. 6-8 relate to real-time sports, the present disclosure is not so limited. For example, the content type may be real-time sports, movies, news programs, natural scenes, and the like.
Upon receiving a frame (or frames) of image data, such as frame 600, 700, or 800, the image processing system determines an image classification. This may be one example of step S402 illustrated in fig. 4, and may be performed by HPU300 illustrated in fig. 3. In determining the image classification, the image processing system determines the content type, which may be one example of step S501 illustrated in fig. 5.
The content type may be determined by analyzing various regions of the image frame and determining one or more confidence regions. For example, the image processing system may analyze the image frame 600 and determine that a large portion having a relatively beige color is a confidence region 601, and that the confidence region 601 likely corresponds to sand. The image processing system may further determine that the top portion of the image frame 600 includes a confidence region 602, and that the confidence region 602 likely corresponds to a face. Similarly, the image processing system may analyze the image frame 700 and determine that a large green portion is a confidence region 701, and that the confidence region 701 is likely to correspond to grass. The image processing system may also distinguish different shades of the same color. For example, as illustrated in fig. 8, an image processing system may analyze an image frame 800 and determine that a left portion includes one confidence region 801 and a right portion includes another confidence region 802. While the image processing system may determine that both confidence regions 801 and 802 are likely to correspond to grass, the image processing system may distinguish between grass with shading attached to confidence region 801 and sun-lit grass with confidence region 802. Although fig. 6-8 illustrate the respective confidence regions as circular, in practice the confidence regions may be elliptical, rectangular, or any other shape.
Based on the confidence regions, the image processing system may generate a ranked or non-ranked list of potential content types. For example, in fig. 6, the image processing system may determine that image frame 600 shows a beach volleyball with a chance of 85%, image frame 600 shows a beach football with a chance of 12%, image frame shows a beach tennis with a chance of 4%, etc. Such a determination may be based on a single frame of image data, a series of consecutive frames of image data, or a series of non-consecutive frames of image data (e.g., every four frames). This determination may be performed repeatedly throughout the broadcast, such as every ten frames, every thirty seconds, and so forth.
Once the content type has been determined, the image processing system segments the image data into one or more feature item regions. This may be one example of step S502 illustrated in fig. 5. The segmentation may be based on the content type itself; for example, the image processing system may determine an ordered set of priority items in the image data for which to search and segment. In the beach volleyball example illustrated in fig. 6, the image processing system may first search and segment for sand feature item regions, then search and segment for crowd feature item regions based on the presence of multiple faces in close proximity, and so on. In the cricket example illustrated in fig. 7, similarly, the image processing system may first search and segment for the meadow feature term regions, then search and segment for members of the first panel based on jersey colors, and so on. Segmentation may also be based on color or hue; for example, in the football example illustrated in FIG. 8, the image processing system may search and segment for shaded grass feature item regions, sunlit grass feature item regions, and so forth. Fig. 8 explicitly illustrates segmentation in which an image frame 800 is segmented into a first feature item region 810 (sunlit grass) and a second feature item region 820 (shaded grass). The segmentation may be based on a single frame of image data, a series of consecutive frames of image data, or a series of non-consecutive frames of image data (e.g., every four frames). The partitioning may be performed repeatedly throughout the broadcast, such as every ten frames, every thirty seconds, and so forth. In some aspects of the disclosure, the segmentation occurs more frequently than the content type determination. For example, the image processing system may determine the content type every five seconds, while the image processing system may segment the image data every half second.
From the segmented feature item regions, the image processing system may extract at least one image aspect value for respective ones of the feature item regions. This may be one example of step S503 illustrated in fig. 5. The image aspect value may relate to luminance information of the corresponding feature item region (but is not limited thereto). For example, the image aspect values may include, but are not limited to, luminance maxima, luminance minima, luminance midpoints, luminance averages, luminance variances, and the like. The image aspect values may be represented visually or in memory as a histogram. The distribution of image-wise values may be derived based on image content (e.g., pixel values, luminance values, chrominance values, Y values, Cb/Cr values, RGB values, etc.), scene, gain/offset/power, etc. In some aspects of the disclosure, the extraction occurs each time a segmentation occurs.
One or more of the routines and subroutines implemented by the image processing system may be automatically executed. For example, the HPU300 may utilize machine learning algorithms, such as deep learning. As used herein, deep learning refers to a class of machine learning algorithms that use a cascade of multiple layers of non-linear processing units for feature extraction and/or transformation. Each successive layer may use the output of the previous layer as input. Deep learning can be learned in a supervised (e.g., classification) and/or unsupervised (e.g., pattern analysis) manner. Deep learning may be used to learn multi-level representations that correspond to different levels of abstraction, respectively, such that the levels form a hierarchy of concepts. Examples of such techniques include the works of D.tran et al ("Learning spatial defects with 3d volumetric network", IEEE International computer Vision conference (ICCV), 2015, p. 4489-4497) and K.Zhang et al ("Joint face detection and alignment using volumetric channel volumetric network", IEEE Signal processing bulletin 23.10, 2016, p. 1499-1503).
The results of the image classification, including one or more of the determined content type, feature item region, and/or image aspect value, such as metadata feed 235 illustrated in fig. 2 and/or metadata feed 313 illustrated in fig. 3, may be used to dynamically generate mapping metadata. As will be appreciated by the skilled artisan, any suitable manner may be used to generate the mapping metadata depending on the various implementations and/or requirements. For example, generation of mapping metadata may be performed based on some or all of the determined content type, feature item region, and/or image aspect values as illustrated above. Furthermore, the mapping metadata may be dynamically generated as the input signal is processed. That is, upon receiving an input image/video signal (e.g., from a live feed), dynamically generating mapping metadata may be done with the image classification process (or, in other words, with the determination of content type, feature item regions, and/or image aspect values), thereby improving the quality, accuracy, and efficiency of the image/video when rendered, while reducing or even avoiding unnecessary or undesirable delays (e.g., during live broadcasts). Broadly speaking, the mapping metadata may be generated in such a way that a conversion (e.g., mapping) from an input signal to an output signal is achieved or facilitated. For example, the input signal and the output signal may have different dynamic ranges. In this case, the conversion may involve converting data of the first dynamic range (in the input signal) to data of the second dynamic range (in the output signal). In other words, the metadata may be generated to (enable/facilitate) the conversion of the image data from a first dynamic range to a second dynamic range (which may be higher or lower than the first dynamic range). As the skilled person will appreciate, the conversion may include, but is not limited to, tone and/or gamut mapping. The mapping metadata may include several components or parameters used in downstream image processing. By way of example and not limitation, the present disclosure (and in particular its image classification) may identify a real-time stream as a football game. The present disclosure may then decide or select the object priority list as the grass region and (person) face. It can then compute the HDR PQ profile (profile) characteristics (e.g., mean, variance, etc.) within those regions on a list object-by-object basis. Such features may then be used to determine an appropriate tone mapping curve. A typical case of mapping would be where the HDR feed ranges from 0.001 nit to 1000 nit and requires that it is possible to map to an SDR from 0.005 nit to 100 nit. Furthermore, the SDR may be encoded according to the bt.1886 standard. In addition, the requirements may also specify that the (human) face should be at about 70% of the maximum SDR code value and the grass should be at 18% of the maximum SDR code value. This can often be referred to as 70% IRE and 18% IRE, where IRE refers to the institute of wireless electrical engineers (the pre-professional organization that established television operating regulations). Now, in this example (of a soccer game), assume that in the HDR signal, the present disclosure finds that the face may be colored to 200 nits and the grass may be at 40 nits. The optimization algorithm may then be driven to select the parameters of the tone mapping algorithm such that pixels at 200 nits in HDR will be mapped to 70% of the SDR signal and pixels at 40 nits in HDR will be mapped to 18% of the SDR signal. It should be apparent to those skilled in the art that additional constraints can be added for the maximum HDR pixel value and the minimum HDR pixel value, so that they are also mapped to the appropriate levels in the SDR signal. As illustrated in fig. 8, the mapping metadata may be used at a display point (e.g., a commercial television owned by an end user) to display a rendered image frame 830 that faithfully reproduces the entire image, including both dark and light colors. In one particular example where the present disclosure is implemented in a dolby view architecture, the mapping metadata may include L1 parameters, L2/L8 parameters, L3 parameters, L4 parameters, L11 parameters, and the like.
The L1 metadata provides or describes information about the distribution of luminance values in a source image, source scene, etc. As described above, the distribution of image-wise values may be derived based on image content (e.g., pixel values, luminance values, chrominance values, Y values, Cb/Cr values, RGB values, etc.), scene, etc. The L1 metadata may include quantities in the image data representing minimum ("squeeze"), mid-tone ("mid"), and maximum ("clip") luminance values (representing one or more scenes).
The L2 metadata provides or describes information about video characteristic adjustments that originate from or are traced back to adjustments made by directors, colorists, video professionals, and the like. The L2 metadata may be based at least in part on processing performed in production and/or post production, such as processing performed by input converter 220, production switcher 221, QC unit 223, playout server 225, file ingest 226, and/or post production 227 illustrated in fig. 2. The L8 metadata is similar to the L2 metadata, and in some cases may be equivalent to the L2 metadata (e.g., depending on the corresponding tone curve). The L2 and L8 metadata may be referred to as "cropping" parameters, and may indicate or be related to the gain/offset/power of the image data. The L2 metadata may correspond to a first reference display with a particular reference dynamic range.
The L3 metadata provides or describes information about video characteristic adjustments that originate from or are traced back to adjustments made by directors, colorists, video professionals, and the like. In contrast to the L2 metadata, the L3 metadata may correspond to a second reference display with a different reference dynamic range than that of the first reference display. The L3 metadata may include, for example, offsets or adjustments from the L1 metadata, including offsets or adjustments to squeeze, median, and/or clip luminance values.
The L4 metadata provides or describes information about global dimming (global dimming) operations. The L4 metadata may be computed by the encoder during pre-processing and may be computed using the RGB primaries. In one example, the L4 metadata may include data that specifies a global backlight brightness level for the display panel on a per-frame basis. Other generated metadata (such as L11 metadata) may provide or describe information to be used to identify the source of the video data, such as movie content, computer game content, sports content, and so forth. Such metadata may further provide or describe desired picture settings such as desired white point, sharpness, etc.
Taken together, the mapping metadata may include translation data for translating from a first dynamic range to a second dynamic range different from the first dynamic range. In some aspects of the disclosure, the first dynamic range may be higher than the second dynamic range (e.g., a conversion from HDR to SDR). In other aspects of the disclosure, the second dynamic range may be higher than the first dynamic range (e.g., a conversion from SDR to HDR). Referring to fig. 1, mapping metadata may be utilized to avoid overexposure or underexposure as in images 102 and 103, respectively. For example, the mapping metadata may be encoded with the image data itself for use in tone mapping by a commercial television owned by the end user.
Equivalents, extensions, alternatives, and miscellaneous
With respect to the processes, systems, methods, heuristics, etc. described herein, it should be understood that, although the steps of such processes, etc. have been described as occurring according to a certain ordered sequence, such processes may be practiced with the described steps performed in an order different than the order described herein. It is further understood that certain steps may be performed simultaneously, that other steps may be added, or that certain steps described herein may be omitted. In other words, the description of processes herein is provided for the purpose of illustrating certain embodiments and should in no way be construed so as to limit the claims.
All terms used in the claims are intended to be given their broadest reasonable constructions and their ordinary meanings as understood by those familiar with the art described herein unless an explicit indication to the contrary is made herein. In particular, use of the singular articles such as "a," "the," "said," etc. should be read to recite one or more of the indicated elements unless a claim recites an explicit limitation to the contrary.
Example aspects related to video capture, analysis, and broadcast are thus described. In the foregoing specification, aspects of the present invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and is intended by the applicants to be the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Various examples of the disclosure may employ any one or more of the following Enumerated Example Embodiments (EEEs), which are not claims:
EEE1 an image processing system, comprising: an input configured to receive an image signal, the image signal comprising a plurality of frames of image data; and a processor configured to automatically determine an image classification based on at least one frame of the plurality of frames and dynamically generate mapping metadata based on the image classification, wherein the processor comprises: determination circuitry configured to determine a content type of the image signal; segmentation circuitry configured to segment the image data into a plurality of feature item regions based on the content type; and extraction circuitry configured to extract at least one image aspect value for respective ones of the plurality of feature item regions.
The EEE2, according to EEE1, wherein the at least one image aspect value includes at least one selected from: a maximum value of luminance, a minimum value of luminance, a midpoint of luminance, an average value of luminance, or a variance of luminance.
The EEE3 is according to the image processing system described in EEE1 or EEE2, wherein the image signal is a real-time video feed.
The image processing system of EEE4, according to any one of EEE1 to EEE3, further comprising an encoder configured to encode the image signal and the mapping metadata.
The image processing system of EEE5, according to any one of EEE1 to EEE4, wherein the mapping metadata includes conversion data for converting from a first dynamic range to a second dynamic range different from the first dynamic range.
The EEE6 is in accordance with the image processing system of EEE5, wherein the first dynamic range is higher than the second dynamic range.
EEE7 a method of image processing, comprising: receiving an image signal, the image signal including a plurality of frames of image data; automatically determining an image classification based on at least one frame of the plurality of frames, comprising: determining a content type of the image signal, segmenting the image data into a plurality of spatial regions based on the content type, and extracting at least one image aspect value for respective ones of the plurality of spatial regions; and generating a plurality of frames of mapping metadata based on the image classification, wherein respective ones of the plurality of frames of mapping metadata correspond to respective ones of the plurality of frames of image data.
The EEE8 method of image processing according to EEE7, wherein the at least one image aspect value includes at least one selected from: a maximum value of luminance, a minimum value of luminance, a midpoint of luminance, an average value of luminance, or a variance of luminance.
The EEE9 is according to the image processing method of EEE7 or EEE8, wherein the respective feature item region indicates at least one selected from: a landscape area, a shadow area, a sky area, a face detection area, or a crowd area.
The image processing method of EEE10 according to any one of EEE7 or EEE9, wherein the image signal is a real-time video feed.
The image processing method of the EEE11 according to any one of the EEEs 7 to EEE10, further comprising: the image signal and the mapping metadata are encoded into a compressed output signal.
The image processing method of EEE12 according to any one of EEE7 to EEE11, wherein the mapping metadata includes conversion data for converting from a first dynamic range to a second dynamic range different from the first dynamic range.
The EEE13 is according to the image processing method of EEE12, wherein the first dynamic range is higher than the second dynamic range.
EEE14 a non-transitory computer-readable medium storing instructions that, when executed by a processor of an image processing system, cause the image processing system to perform operations comprising: receiving an image signal, the image signal including a plurality of frames of image data; automatically determining an image classification based on at least one frame of the plurality of frames, comprising: determining a content type of the image signal, segmenting the image data into a plurality of spatial regions based on the content type, and extracting at least one image aspect value for respective ones of the plurality of spatial regions; and dynamically generating mapping metadata based on the image classification on a frame-by-frame basis.
EEE15 the non-transitory computer-readable medium of EEE14, wherein the at least one image aspect value comprises at least one selected from: a maximum value of luminance, a minimum value of luminance, a midpoint of luminance, an average value of luminance, or a variance of luminance.
The non-transitory computer readable medium of EEE16 according to EEE14 or EEE15, wherein the respective feature item region indication is selected from at least one of: a landscape area, a shadow area, a sky area, a face detection area, or a crowd area.
The EEE17 non-transitory computer readable medium according to any one of EEE 14-EEE 16, wherein the image signal is a real-time video feed.
EEE18 is the non-transitory computer-readable medium according to any one of EEE 14-EEE 17, further comprising encoding the image signal and the mapping metadata.
The non-transitory computer readable medium of any of EEEs 14-18, wherein the mapping metadata includes conversion data for converting between HDR and SDR signals.
The non-transitory computer-readable medium of EEE20, wherein the mapping metadata includes conversion data for converting from an HDR signal to an SDR signal, according to EEE 19.

Claims (30)

1. An image processing system, comprising:
an input configured to receive an image signal, the image signal comprising a plurality of frames of image data; and
a processor configured to automatically determine an image classification based on at least one frame of the plurality of frames and dynamically generate mapping metadata based on the image classification,
wherein the processor comprises:
determination circuitry configured to determine a content type of the image signal;
segmentation circuitry configured to segment the image data into a plurality of feature item regions based on the content type; and
extraction circuitry configured to extract at least one image aspect value for respective ones of the plurality of feature item regions.
2. The image processing system of claim 1, wherein the determination circuitry is configured to determine the content type by analyzing regions of the frame and determining one or more confidence regions.
3. The image processing system of claim 2, wherein the determination of the content type involves: generating a ranked or non-ranked list of potential content types based on the one or more confidence regions.
4. The image processing system of any of claims 1 to 3, wherein the segmentation of the image data involves: an ordered set of priority items in the image data for which to search and segment is determined based on the determined content type.
5. The image processing system of any of claims 1 to 4, wherein the at least one image aspect value comprises at least one selected from: a maximum value of luminance, a minimum value of luminance, a midpoint of luminance, an average value of luminance, or a variance of luminance.
6. The image processing system of any of claims 1 to 5, wherein the respective feature item region indication is selected from at least one of: a landscape area, a shadow area, a sky area, a face detection area, or a crowd area.
7. The image processing system of any of claims 1 to 6, wherein the image signal is a real-time video feed.
8. The image processing system of any of claims 1 to 7, further comprising an encoder configured to encode the image signal and the mapping metadata.
9. The image processing system of any of claims 1 to 8, wherein the mapping metadata includes conversion data for converting from a first dynamic range to a second dynamic range different from the first dynamic range.
10. The image processing system of claim 9, wherein the first dynamic range is higher than the second dynamic range.
11. An image processing method, comprising:
receiving an image signal, the image signal comprising a plurality of frames of image data;
automatically determining an image classification based on at least one frame of the plurality of frames, comprising:
determining a content type of the image signal,
segmenting the image data into a plurality of spatial regions based on the content type, an
Extracting at least one image aspect value for a respective one of the plurality of spatial regions; and
generating a plurality of frames of mapping metadata based on the image classification, wherein respective ones of the plurality of frames of mapping metadata correspond to respective ones of the plurality of frames of image data.
12. The image processing method of claim 11, wherein the content type is determined by analyzing regions of the frame and determining one or more confidence regions.
13. The method of image processing according to claim 12, wherein the determination of the content type involves generating a ranked or non-ranked list of potential content types based on the one or more confidence regions.
14. The image processing method of any of claims 11 to 13, wherein the segmenting of the image data involves: an ordered set of priority items in the image data for which to search and segment is determined based on the determined content type.
15. The image processing method according to any one of claims 11 to 14, wherein the at least one image aspect value comprises at least one selected from: a maximum value of luminance, a minimum value of luminance, a midpoint of luminance, an average value of luminance, or a variance of luminance.
16. The image processing method according to any one of claims 11 to 15, wherein the respective feature item region indication is selected from at least one of: a landscape area, a shadow area, a sky area, a face detection area, or a crowd area.
17. The image processing method according to any one of claims 11 to 16, wherein the image signal is a real-time video feed.
18. The image processing method of any of claims 11 to 17, further comprising: encoding the image signal and the mapping metadata into a compressed output signal.
19. The image processing method according to any one of claims 11 to 18, wherein the mapping metadata includes conversion data for converting from a first dynamic range to a second dynamic range different from the first dynamic range.
20. The image processing method according to claim 19, wherein the first dynamic range is higher than the second dynamic range.
21. A non-transitory computer-readable medium storing instructions that, when executed by a processor of an image processing system, cause the image processing system to perform operations comprising:
receiving an image signal, the image signal comprising a plurality of frames of image data;
automatically determining an image classification based on at least one frame of the plurality of frames, comprising:
determining a content type of the image signal,
segmenting the image data into a plurality of spatial regions based on the content type, an
Extracting at least one image aspect value for a respective one of the plurality of spatial regions; and
mapping metadata is dynamically generated based on the image classification on a frame-by-frame basis.
22. The non-transitory computer-readable medium of claim 21, wherein the content type is determined by analyzing regions of the frame and determining one or more confidence regions.
23. The non-transitory computer-readable medium of claim 22, wherein the determination of the content type involves generating a ranked or non-ranked list of potential content types based on the one or more confidence regions.
24. The non-transitory computer-readable medium of any of claims 21 to 23, wherein the segmentation of the image data involves: an ordered set of priority items in the image data for which to search and segment is determined based on the determined content type.
25. The non-transitory computer-readable medium of any one of claims 21 to 24, wherein the at least one image aspect value comprises at least one selected from: a maximum value of luminance, a minimum value of luminance, a midpoint of luminance, an average value of luminance, or a variance of luminance.
26. The non-transitory computer readable medium of any one of claims 21 to 25, wherein the respective feature item region indication is selected from at least one of: a landscape area, a shadow area, a sky area, a face detection area, or a crowd area.
27. The non-transitory computer readable medium of any of claims 21-26, wherein the image signal is a real-time video feed.
28. The non-transitory computer-readable medium of any of claims 21 to 27, further comprising encoding the image signal and the mapping metadata.
29. The non-transitory computer-readable medium of any of claims 21-28, wherein the mapping metadata includes conversion data for converting between HDR and SDR signals.
30. The non-transitory computer-readable medium of claim 29, wherein the mapping metadata includes conversion data for converting from the HDR signal to the SDR signal.
CN202080031250.5A 2019-04-25 2020-04-20 Content aware PQ range analyzer and tone mapping in real-time feed Active CN113748426B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201962838518P 2019-04-25 2019-04-25
EP19171057 2019-04-25
EP19171057.3 2019-04-25
US62/838,518 2019-04-25
PCT/US2020/029023 WO2020219401A1 (en) 2019-04-25 2020-04-20 Content-aware pq range analyzer and tone mapping in live feeds

Publications (2)

Publication Number Publication Date
CN113748426A true CN113748426A (en) 2021-12-03
CN113748426B CN113748426B (en) 2023-06-09

Family

ID=70482932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080031250.5A Active CN113748426B (en) 2019-04-25 2020-04-20 Content aware PQ range analyzer and tone mapping in real-time feed

Country Status (6)

Country Link
US (1) US20220180635A1 (en)
EP (1) EP3959646B1 (en)
JP (1) JP7092952B2 (en)
CN (1) CN113748426B (en)
ES (1) ES2945657T3 (en)
WO (1) WO2020219401A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11617946B1 (en) 2021-06-29 2023-04-04 Amazon Technologies, Inc. Video game streaming with dynamic range conversion
US11612812B1 (en) * 2021-06-29 2023-03-28 Amazon Technologies, Inc. Video game streaming with dynamic range conversion
US11666823B1 (en) 2021-06-29 2023-06-06 Amazon Technologies, Inc. Video game streaming with dynamic range conversion
US20240104766A1 (en) * 2022-09-23 2024-03-28 Apple Inc. Method and Device for Generating Metadata Estimations based on Metadata Subdivisions

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130148029A1 (en) * 2010-08-25 2013-06-13 Dolby Laboratories Licensing Corporation Extending Image Dynamic Range
US20130259375A1 (en) * 2008-02-15 2013-10-03 Heather Dunlop Systems and Methods for Semantically Classifying and Extracting Shots in Video
CN107211076A (en) * 2015-01-19 2017-09-26 杜比实验室特许公司 The display management of high dynamic range video
CN108141599A (en) * 2015-09-23 2018-06-08 杜比实验室特许公司 Retain texture/noise consistency in Video Codec

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4556319B2 (en) 2000-10-27 2010-10-06 ソニー株式会社 Image processing apparatus and method, and recording medium
US8346009B2 (en) 2009-06-29 2013-01-01 Thomson Licensing Automatic exposure estimation for HDR images based on image statistics
US9087382B2 (en) 2009-06-29 2015-07-21 Thomson Licensing Zone-based tone mapping
US8737738B2 (en) 2010-02-19 2014-05-27 Thomson Licensing Parameters interpolation for high dynamic range video tone mapping
CN103024300B (en) 2012-12-25 2015-11-25 华为技术有限公司 A kind of method for high dynamic range image display and device
JP6416135B2 (en) * 2013-03-15 2018-10-31 ベンタナ メディカル システムズ, インコーポレイテッド Spectral unmixing
US10812801B2 (en) * 2014-02-25 2020-10-20 Apple Inc. Adaptive transfer function for video encoding and decoding
CA2969038C (en) * 2014-12-03 2023-07-04 Ventana Medical Systems, Inc. Methods, systems, and apparatuses for quantitative analysis of heterogeneous biomarker distribution
US9681182B2 (en) * 2015-11-02 2017-06-13 Disney Enterprises, Inc. Real-time transmission of dynamic range tags in a video broadcast
US10593028B2 (en) * 2015-12-03 2020-03-17 Samsung Electronics Co., Ltd. Method and apparatus for view-dependent tone mapping of virtual reality images
US10242449B2 (en) * 2017-01-04 2019-03-26 Cisco Technology, Inc. Automated generation of pre-labeled training data
WO2018200840A1 (en) * 2017-04-27 2018-11-01 Retinopathy Answer Limited System and method for automated funduscopic image analysis
US10685236B2 (en) * 2018-07-05 2020-06-16 Adobe Inc. Multi-model techniques to generate video metadata

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130259375A1 (en) * 2008-02-15 2013-10-03 Heather Dunlop Systems and Methods for Semantically Classifying and Extracting Shots in Video
US20130148029A1 (en) * 2010-08-25 2013-06-13 Dolby Laboratories Licensing Corporation Extending Image Dynamic Range
CN107211076A (en) * 2015-01-19 2017-09-26 杜比实验室特许公司 The display management of high dynamic range video
CN108141599A (en) * 2015-09-23 2018-06-08 杜比实验室特许公司 Retain texture/noise consistency in Video Codec

Also Published As

Publication number Publication date
CN113748426B (en) 2023-06-09
EP3959646B1 (en) 2023-04-19
JP7092952B2 (en) 2022-06-28
ES2945657T3 (en) 2023-07-05
JP2022524651A (en) 2022-05-09
WO2020219401A1 (en) 2020-10-29
EP3959646A1 (en) 2022-03-02
US20220180635A1 (en) 2022-06-09

Similar Documents

Publication Publication Date Title
CN113748426B (en) Content aware PQ range analyzer and tone mapping in real-time feed
US11183143B2 (en) Transitioning between video priority and graphics priority
US10056042B2 (en) Metadata filtering for display mapping for high dynamic range images
KR101954851B1 (en) Metadata-based image processing method and apparatus
US9679366B2 (en) Guided color grading for extended dynamic range
CN106488141A (en) HDR is to the method for HDR inverse tone mapping (ITM), system and equipment
US10607324B2 (en) Image highlight detection and rendering
KR20120107429A (en) Zone-based tone mapping
KR20070090224A (en) Method of electronic color image saturation processing
KR101985880B1 (en) Display device and control method thereof
US20070242898A1 (en) Image processing apparatus, image processing method, and image processing program
Sazzad et al. Establishment of an efficient color model from existing models for better gamma encoding in image processing
CA2690987C (en) Method and apparatus for chroma key production
JP5084615B2 (en) Image display device
Qian et al. A local tone mapping operator for high dynamic range images
US8456577B2 (en) Method and apparatus for chroma key production
US20190349578A1 (en) Image component delineation in a video test and measurement instrument
CN117876280A (en) Video frame image enhancement method, system and storage medium
CN116309096A (en) Signal lamp image correction method and device, electronic equipment and storage medium
CN115690650A (en) Video processing method, video processing apparatus, electronic device, and storage medium
GB2575162A (en) Image component delineation in a video test and measurement instrument
Qian et al. A new technique to reproduced High-Dynamic-Range images for Low-Dynamic-Range display
EP2172030A1 (en) Method and apparatus for chroma key production

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant