WO2024012227A1 - 应用于电子设备的图像显示方法、编码方法及相关装置 - Google Patents

应用于电子设备的图像显示方法、编码方法及相关装置 Download PDF

Info

Publication number
WO2024012227A1
WO2024012227A1 PCT/CN2023/104105 CN2023104105W WO2024012227A1 WO 2024012227 A1 WO2024012227 A1 WO 2024012227A1 CN 2023104105 W CN2023104105 W CN 2023104105W WO 2024012227 A1 WO2024012227 A1 WO 2024012227A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
parameters
brightness
processed
ratio
Prior art date
Application number
PCT/CN2023/104105
Other languages
English (en)
French (fr)
Inventor
钟顺才
文锦松
翟其彦
周蔚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024012227A1 publication Critical patent/WO2024012227A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G3/00Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes
    • G09G3/20Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix no fixed position being assigned to or needed to be assigned to the individual characters or partial characters
    • G09G3/34Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix no fixed position being assigned to or needed to be assigned to the individual characters or partial characters by control of light from an independent source
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G3/00Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes
    • G09G3/20Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix no fixed position being assigned to or needed to be assigned to the individual characters or partial characters
    • G09G3/34Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix no fixed position being assigned to or needed to be assigned to the individual characters or partial characters by control of light from an independent source
    • G09G3/3406Control of illumination source
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/70Circuitry for compensating brightness variation in the scene
    • H04N23/74Circuitry for compensating brightness variation in the scene by influencing the scene brightness using illuminating means
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof

Definitions

  • the present application relates to image processing technology, and in particular, to an image display method, encoding method and related devices applied to electronic equipment.
  • the acquisition end is usually responsible for image video collection and recording scene content
  • the encoding end is responsible for encoding and compressing the image
  • the display end is responsible for decoding and reconstructing the image, and adaptively adjusting the screen brightness according to the intensity of the ambient light (i.e. Automatic backlight technology)
  • the collection end and the encoding end can be the same electronic device, or they can be different electronic devices.
  • display terminals electronic devices such as mobile phones and tablets basically have automatic backlight technology.
  • the main consideration is the comfort of the screen brightness to the human eye. Define an optimal comfort zone, including the upper limit of comfort (too bright and dazzling) and the lower limit of comfort (too dark to see clearly).
  • the power consumption of the screen is taken into consideration. The greater the brightness, the greater the power consumption, so the lower limit of comfort is usually used. value to adjust screen brightness.
  • the peak brightness of the screen of electronic devices can reach 1000nit or even higher.
  • the backlight brightness of mobile phones is set to 100-200nit, and A large brightness range is not used, and the brightness range of the screen is not fully utilized to achieve the best end-to-end effect experience.
  • This application provides an image display method, encoding method and related devices applied to electronic equipment to fully utilize the brightness range of the screen for image display and achieve the best end-to-end experience.
  • this application provides an image display method applied to electronic devices, including: acquiring an image to be processed; acquiring highlight enhancement data, where the highlight enhancement data includes a high dynamic range layer hdrLayer; acquiring the electronic device The initial backlight brightness; obtain the target backlight brightness of the electronic device according to the initial backlight brightness; perform brightness adjustment on the image to be processed according to the hdrLayer to obtain a target image suitable for the target backlight brightness; in the The target image is displayed under the target backlight brightness.
  • the target backlight brightness of the electronic device is obtained according to the initial backlight brightness of the electronic device, thereby adjusting the backlight brightness of the electronic device to fully utilize the brightness range of the screen for image display, and at the same time, distortion in the image to be processed due to brightness adjustment is area, combined with hdrLayer for pixel adjustment to obtain a target image suitable for the target backlight brightness, thereby solving the problem of image distortion, and then display the target image under the target backlight brightness.
  • the target backlight brightness and the target image are displayed together, achieving end-to-end Presenting the best effect experience.
  • the above-mentioned electronic device may be a display-side electronic device (that is, a video decoder), in which displaying the target image may be performed by a display component.
  • the display component may be a display module integrated on the electronic device, such as a touch screen.
  • the display component It can also be a display independent of the electronic device, for example, a display external to the electronic device, a smart screen, a curtain, etc. that the electronic device projects, and there is no specific limit to this.
  • the display terminal receives the code stream from the collection terminal and decodes the code stream to obtain the image to be processed.
  • the decoding method used by the display terminal corresponds to the encoding method used by the collection terminal.
  • the decoding method can include standard hybrid video decoding technology, end-to-end decoding network, decoding technology based on machine learning models, etc. The embodiments of this application do not specifically limit the decoding method of the image to be processed.
  • hdrLayer can be a two-dimensional single-channel 8-bit image, used to mark the highlight area in the image to be processed.
  • the resolution of hdrLayer can be equal to the resolution of the image to be processed.
  • the resolution of hdrLayer can also be smaller or larger than the resolution of the image to be processed, and this application does not specifically limit this.
  • hdrLayer can also be presented as a two-dimensional array, a three-dimensional array, or an array of other dimensions, or any other data form that can store multiple parameters. This application does not limit the specific form of hdrLayer.
  • hdrLayer mainly assists the display end in adjusting the brightness of the image to adapt to human eye perception. Therefore, the display end can obtain hdrLayer in the following three ways:
  • One way is to receive the code stream and decode the code stream to obtain hdrLayer.
  • the hdrLayer is generated by the collection end, and then the code stream obtained after encoding the hdrLayer is transmitted to the display end, and the display end only needs to decode the stream to recover the hdrLayer, which can improve the processing efficiency of the display end.
  • N ⁇ M groups of parameters Each group of parameters includes k parameters.
  • the N ⁇ M group of parameters corresponds to the N ⁇ M image blocks included in the image to be processed; according to N ⁇ M sets of parameters process the corresponding image blocks to obtain hdrLayer.
  • N and M are both positive integers, N ⁇ M>1, k>1.
  • the collection end does not directly generate hdrLayer, but only obtains the N ⁇ M set of parameters used to generate hdrLayer, then encodes the N ⁇ M set of parameters, and transmits the encoded code stream to the display end.
  • the display end first decodes the stream to recover an N ⁇ M set of parameters, and then generates hdrLayer based on the N ⁇ M set of parameters. This can save code streams and improve transmission efficiency.
  • each group of parameters includes k parameters, and the N ⁇ M group of parameters corresponds to the N ⁇ M image blocks included in the image to be processed; the corresponding The image block is processed to obtain hdrLayer, N and M are both positive integers, N ⁇ M>1, k>1.
  • the collection end does not need to generate hdrLayer, nor does it need to obtain the N ⁇ M set of parameters used to generate hdrLayer.
  • the display end generates the N ⁇ M set of parameters based on the image to be processed, and then uses the N ⁇ M set of parameters to generate the hdrLayer.
  • Group parameters generate hdrLayer. This can further save code streams and improve transmission efficiency.
  • the image to be processed (original image) can be divided into N ⁇ M blocks, and k parameters are obtained for each image block, so that N ⁇ M groups of parameters can be obtained, a total of N ⁇ M ⁇ k parameter.
  • the k parameters of each image block can be expressed as a one-dimensional table.
  • the N ⁇ M group of parameters are obtained through a machine learning model; or, the N ⁇ M group of parameters are obtained based on the histogram of the image to be processed.
  • the original image can be scaled to a smaller resolution, for example, 256 ⁇ 256.
  • the thumbnail enters a machine learning model (such as a neural network), and N ⁇ M ⁇ k parameters are learned through the neural network.
  • the neural network can include local branches and global branches. Perform convolution operations, downsampling, and increase the number of channels on thumbnails. Repeat these operations, for example, 4 times (4 times downsampling), and the resolution becomes 16 ⁇ 16. After that, the local branch is entered. The resolution of the local branch is maintained at 16 ⁇ 16, but some convolutions are performed without downsampling. Enter the global branch, which continues downsampling until it becomes 1 ⁇ 1.
  • N ⁇ M sets of parameters are applied to the image to be processed to obtain hdrLayer. This process is essentially an interpolation process.
  • N ⁇ M ⁇ k parameters The numerical range of N ⁇ M ⁇ k parameters is 0 ⁇ 1, which can also be considered as 0 ⁇ 255.
  • N and M are space divisions, which divide the image into N ⁇ M blocks, and k is the division of the value range, which divides the value range into k-1 segments and k fixed points. But in fact, the input values are continuous and will not be exactly k values, so interpolation is needed in the middle.
  • spatial interpolation That is to say, the spatial domain is two-dimensional interpolation, which can be called bilinear interpolation, and the value domain is linear interpolation.
  • N ⁇ M blocks in the spatial domain there are N ⁇ M blocks in the spatial domain.
  • four adjacent blocks need to be interpolated.
  • k in the value range the brightness Y of the input original image is continuous, and this k is interval, so interpolation is also required in the middle.
  • the k value range taking the range from 0 to 255 as an example, when the output is 255, the hdrLayer will be very bright. When the output is 0, the hdrLayer will be very dark.
  • the k values are direct values from 0 to 255.
  • hdrLayer can also be obtained in other ways, and there is no specific limitation on this.
  • the highlight enhancement data also includes metadata.
  • Metadata can include the dynamic range, maximum brightness, minimum brightness, etc. of the captured scene.
  • the display end can obtain metadata in the following two ways:
  • One method is to receive the code stream and decode the stream to get the metadata.
  • Another method is to receive the code stream, decode the stream to obtain the camera parameters of the captured scene, and calculate the metadata based on the camera parameters.
  • the initial backlight brightness and target backlight brightness of the electronic device are obtained in the following manner:
  • Electronic devices have backlight technology, so the initial backlight brightness of the electronic device can be set according to the surrounding environment. You can refer to the relevant backlight technology, which will not be described again.
  • the display can adjust the backlight of the electronic device by combining the brightness information related to the collection scene in the metadata (for example, the dynamic range, maximum brightness, minimum brightness of the collection scene, etc.), including increasing the backlight brightness. Or reduce the backlight brightness.
  • the backlight brightness can be increased to fully utilize the high dynamic range (HDR) of the screen of the electronic device. Therefore, the target backlight brightness of the electronic device is Higher than the initial backlight brightness of the electronic device.
  • the display end can use the following two methods to obtain the target backlight brightness of the electronic device:
  • One method is to process the initial backlight brightness according to a preset backlight adjustment ratio to obtain the target backlight brightness.
  • the display can preset a ratio based on historical records, big data analysis, screen attributes of electronic devices, etc., for example, backlight increase ratio (used to increase backlight brightness, target backlight brightness > initial backlight brightness) or backlight decrease ratio (used to increase backlight brightness). Reduce the backlight brightness, target backlight brightness ⁇ initial backlight brightness).
  • the display end can process the initial backlight brightness according to the preset backlight adjustment ratio, for example, multiply the preset backlight adjustment ratio and the initial backlight brightness to obtain the target backlight brightness.
  • the method described above does not constitute a limitation.
  • the embodiment of the present application does not specifically limit the setting method of the preset backlight adjustment ratio or the acquisition method of the target backlight brightness.
  • Another method is to obtain the backlight adjustment ratio based on metadata; process the initial backlight brightness according to the backlight adjustment ratio to obtain the target backlight brightness.
  • the difference from the previous method is that the backlight adjustment ratio is not preset and can be calculated by the display.
  • the backlight adjustment ratio can also be a backlight increase ratio (used to increase backlight brightness, target backlight brightness > initial backlight brightness) or a backlight reduction ratio (used to reduce backlight brightness, target backlight brightness ⁇ initial backlight brightness).
  • the display end can obtain the first ratio according to the maximum brightness of the collection scene.
  • the first ratio is the ratio of the brightness perception of the human eye and the white diffuse reflection perception of the collection scene; the second ratio can be obtained according to the first ratio.
  • the second ratio is the ratio between the brightness perception of the human eye and the white diffuse reflection perception at the display end.
  • the second ratio is less than or equal to the first ratio; the backlight adjustment ratio is obtained according to the second ratio.
  • P2 a ⁇ P1
  • a represents the preset coefficient, a ⁇ 1.
  • the human eye's brightness perception at the display end is the same as that at the collection end.
  • Lmax represents the maximum brightness of the captured scene
  • gainBL represents the backlight adjustment ratio
  • AmbientLum represents the ambient light intensity
  • the display end can process the initial backlight brightness according to the backlight adjustment ratio, for example, multiply the backlight adjustment ratio and the initial backlight brightness to obtain the target backlight brightness.
  • the target backlight brightness of the electronic device is calculated, and the backlight brightness of the electronic device is adjusted to
  • the target backlight brightness makes the display effect of the image to be processed on the display consistent with the brightness perception of the human eye in real collection scenes.
  • HDR areas in the image to be processed may be distorted after the aforementioned backlight adjustment. For example, when the target backlight brightness is greater than the initial backlight brightness, the backlight brightness of the electronic device is increased. At this time, the HDR area in the image to be processed may be distorted. It will be more dazzling.
  • pixel processing of the image to be processed can be performed, and the pixel values of some areas can be adjusted so that the brightness of this part of the area is the same as before the backlight adjustment to avoid glare.
  • the display end obtains the target weight according to hdrLayer. For example, the display end can divide the first pixel value in hdrLayer by the preset threshold to obtain the first weight value of the first pixel value.
  • the first pixel value is any pixel value in hdrLayer.
  • the target weight includes the first weight value. ; Then adjust the brightness of the image to be processed according to the target weight to obtain the target image.
  • pow(1/gainBL,1/2.2) represents the pixel adjustment coefficient
  • pixelSrc represents any pixel value in the image to be processed
  • pixelLow represents the adjusted pixel value of any of the aforementioned pixel values
  • weight represents the target weight
  • pixelOut represents The target pixel value corresponding to any of the aforementioned pixel values.
  • the embodiment of the present application can also use hdrLayer as a guide picture or reference picture to obtain the correspondence between the pixel values in the image to be processed and the pixel values in the target image, and then obtain the corresponding relationship between the pixel values in the image to be processed according to the correspondence.
  • the pixel values are processed to obtain the target image.
  • embodiments of the present application can also use other methods to adjust the pixel values of some areas to obtain the target image, which is not specifically limited.
  • All pixels in the image to be processed can be processed using the above method to obtain the target image.
  • the resolution of hdrLayer is less than or greater than the resolution of the image to be processed, you can first perform image super-resolution processing or downsampling processing on hdrLayer, so that the resolution of hdrLayer is equal to the resolution of the image to be processed, and then Then use the above formula to obtain the target image.
  • pixel adjustment based on hdrLayer can reduce the pixel brightness of the aforementioned partial areas to avoid glare; if the electronic device If the backlight brightness of the device is reduced, some areas of the image to be processed may be too dark, resulting in loss of details. Therefore, pixel adjustment based on hdrLayer can increase the pixel brightness of the aforementioned areas to avoid loss of details.
  • this application provides an encoding method, which includes: acquiring an image to be processed; acquiring metadata, where the metadata includes the maximum brightness of the captured scene; encoding the image to be processed and the metadata to obtain a third One stream.
  • the collection end uses any collection device, such as a camera, to collect multiple frames of pictures under different exposure conditions for the same scene. For example, long exposure pictures (L (long) frames), normal exposure pictures (N (normal) frames) and short exposure pictures are collected.
  • Exposure picture (S (short) frame) among which, the L frame has a longer exposure time, so that very dark areas in the scene can be photographed clearly, but the bright areas will be overexposed; N frame is a normal exposure frame, and the Moderately bright areas will be fine, but very bright areas will be overexposed, and very dark areas will be unclear; the exposure time of the S frame is shorter, so that very bright areas in the scene will not be overexposed, but medium brightness and dark areas will be dark and unclear.
  • the high-bit picture combines L frame, N frame and S frame and can have multiple frames. advantages and eliminates the disadvantages of multiple frames. For example, very bright areas in the scene will not be overexposed, and medium-bright areas will not be overexposed. The dark areas are also very clear.
  • the high-bit image is then processed through dynamic range compression (DRC) to obtain an 8-bit fused image.
  • DRC dynamic range compression
  • the above-mentioned 8-bit fused image is the image to be processed.
  • the collection end can obtain metadata information such as the dynamic range, maximum brightness, and minimum brightness of the collection scene.
  • the collection scene is the scene when the collection end collects the image to be processed (original image). For example, the collection scene is outdoors at noon, outdoors after dark, outdoors on a cloudy day, indoors with lights, etc.
  • the collection end can obtain metadata based on the above L frame, N frame and S frame.
  • the collection end can calculate the metadata based on preset photography parameters.
  • the collection end can encode the two to obtain the first code stream.
  • the encoding method used by the collection end to encode the image to be processed can include standard hybrid video coding technology. End-to-end encoding network, encoding technology based on machine learning models, etc.
  • the embodiment of the present application does not specifically limit the encoding method of the image to be processed; the metadata can be encoded into the reserved field of the code stream, such as appn of JPG field.
  • the collection end can also use other methods to encode metadata, and there are no specific restrictions on this.
  • the embodiment of this application can obtain a high dynamic range layer (hdrLayer).
  • the hdrLayer can be a two-dimensional single-channel 8-bit image, used to mark the highlight area in the image to be processed.
  • the resolution of the hdrLayer can be equal to the resolution of the image to be processed.
  • the resolution of hdrLayer can also be smaller than or larger than the resolution of the image to be processed.
  • the display end can perform image super-resolution processing or downsampling processing on hdrLayer, so as to The image to be processed is matched, which can reduce the storage space.
  • hdrLayer can also be presented as a two-dimensional array, a three-dimensional array, or an array of other dimensions, or any other data form that can store multiple parameters. This application does not limit the specific form of hdrLayer.
  • hdrLayer is a grayscale image, which can mark the highlighted areas of the original image. The larger the value, the greater the brightness of the original image. Therefore, corresponding to the areas with higher brightness of the original image, hdrLayer appears brighter, corresponding to the areas with lower brightness of the original image. area, hdrLayer renders darker.
  • hdrLayer mainly assists the display end to adjust the brightness of the image to adapt to human eye perception. Therefore, the display end needs to obtain hdrLayer.
  • the acquisition end can use the following two methods:
  • N ⁇ M groups of parameters each group of parameters includes k parameters, and the N ⁇ M group of parameters corresponds to the N ⁇ M image blocks included in the image to be processed.
  • N and M are both positive integers, N ⁇ M>1, k>1; encode the N ⁇ M group of parameters to obtain the second code stream.
  • the function of the above N ⁇ M group of parameters is to generate hdrLayer. Therefore, in order to save the code stream, the acquisition end does not need to directly generate hdrLayer, but instead transmits the second code stream obtained after encoding the N ⁇ M group of parameters used to generate hdrLayer to the display. end, and then the display end decodes the stream to recover an N ⁇ M set of parameters, and then generates hdrLayer based on the N ⁇ M set of parameters, which can improve transmission efficiency.
  • the acquisition end can also generate N ⁇ M sets of parameters, and then encode the metadata, the image to be processed and the N ⁇ M sets of parameters to obtain the first code stream and the second code stream, and then the first code stream can be and the second code stream is transmitted to the display end.
  • the first code stream and the second code stream can be serially connected and merged into one code stream, or they can be merged into one code stream in other preset ways, or they can be transmitted one by one as separate code streams. This is not the case. Make specific limitations.
  • each group of parameters includes k parameters.
  • the N ⁇ M group of parameters corresponds to the N ⁇ M image blocks included in the image to be processed.
  • N and M are both positive integers.
  • the acquisition end can also generate hdrLayer based on the N ⁇ M set of parameters, and then transmit the third code stream obtained after encoding the hdrLayer to the display end, and then the display end decodes the stream to recover the hdrLayer, which can improve the processing efficiency of the display end.
  • the acquisition end can also generate hdrLayer, and then encode the metadata, image to be processed and hdrLayer to obtain the first and third code streams, and then transmit the first and third code streams to the display end.
  • the first code stream and the third code stream can be serially connected and merged into one code stream, or they can be merged into one code stream in other preset ways, or they can be transmitted one by one as separate code streams. This is not the case. Make specific limitations.
  • the collection end can also transmit the second code stream and the third code stream to the display end in addition to the first code stream.
  • the first code stream, the second code stream and the third code stream can be serialized one after another. They can be combined into one code stream, can also be merged into one code stream in a preset manner, or can be transmitted one by one as separate code streams, with no specific limitations on this.
  • the N ⁇ M group of parameters can be obtained through a machine learning model; or, the N ⁇ M group of parameters can also be obtained based on the histogram of the image to be processed.
  • the method of obtaining the N ⁇ M group of parameters can refer to the relevant description in the first aspect, and will not be described again here.
  • the present application provides an image display device applied to electronic equipment, including: an acquisition module for acquiring an image to be processed; and acquiring highlight enhancement data, where the highlight enhancement data includes a high dynamic range layer hdrLayer; Obtain the initial backlight brightness of the electronic device; obtain the target backlight brightness of the electronic device according to the initial backlight brightness; and an adjustment module for adjusting the brightness of the image to be processed according to the hdrLayer to obtain an image suitable for the a target image with a target backlight brightness; a display module configured to display the target image at the target backlight brightness.
  • the highlight enhancement data includes a high dynamic range layer hdrLayer
  • the acquisition module is specifically configured to receive a code stream and decode the code stream to obtain the hdrLayer.
  • the acquisition module is specifically configured to receive a code stream and decode the code stream to obtain N ⁇ M sets of parameters.
  • Each set of parameters includes k parameters, and the N ⁇ M set of parameters are the same as
  • the image to be processed corresponds to N ⁇ M image blocks, N and M are both positive integers, N ⁇ M>1, k>1; the corresponding image blocks are respectively processed according to the N ⁇ M set of parameters. Process to get the hdrLayer.
  • the acquisition module is specifically used to generate N ⁇ M groups of parameters, each group of parameters includes k parameters, and the N ⁇ M group of parameters is consistent with the N ⁇ M included in the image to be processed.
  • N and M are both positive integers, N ⁇ M>1, k>1; the corresponding image blocks are processed respectively according to the N ⁇ M set of parameters to obtain the hdrLayer.
  • the acquisition module is specifically configured to process the initial backlight brightness according to a preset backlight adjustment ratio to obtain the target backlight brightness.
  • the highlight enhancement data also includes metadata; the acquisition module is specifically configured to obtain the backlight adjustment ratio according to the metadata; and adjust the initial backlight brightness according to the backlight adjustment ratio. Processing is performed to obtain the target backlight brightness.
  • the adjustment module is specifically configured to obtain a target weight according to the hdrLayer; and adjust the brightness of the image to be processed according to the target weight to obtain the target image.
  • the adjustment module is specifically configured to divide the first pixel value in the hdrLayer by a preset threshold to obtain the first weight value of the first pixel value.
  • the value is any pixel value in the hdrLayer, and the target weight includes the first weight value.
  • the adjustment module is specifically configured to obtain a pixel adjustment coefficient; obtain an adjusted image according to the pixel adjustment coefficient and the image to be processed; and obtain an adjusted image according to the image to be processed and the processed image. Adjust the image and the target weights to obtain the target image.
  • the N ⁇ M group of parameters is obtained through a machine learning model; or, the N ⁇ M group of parameters is obtained based on the histogram of the image to be processed.
  • the metadata includes the maximum brightness of the collection scene; the acquisition module is specifically configured to obtain a first ratio according to the maximum brightness of the collection scene, where the first ratio is the maximum brightness of the collection scene.
  • P1 represents the first ratio
  • Lmax represents the maximum brightness of the collection scene
  • P2 represents the second ratio
  • a represents the preset coefficient, a ⁇ 1.
  • the acquisition module is specifically configured to calculate the backlight adjustment ratio according to the following formula:
  • gainBL represents the backlight adjustment ratio
  • AmbientLum represents the ambient light intensity
  • the metadata is obtained by: decoding the code stream to obtain the metadata; or receiving the code stream, decoding the code stream to obtain the photographing parameters of the collection scene, and then based on the The metadata is obtained by calculating the photographing parameters.
  • the metadata also includes the minimum brightness of the collection scene and/or the dynamic range of the collection scene.
  • this application provides an encoding device, including: an acquisition module, used to acquire an image to be processed; to acquire metadata, where the metadata includes the maximum brightness of the captured scene; and an encoding module, used to encode the image to be processed. and the metadata are encoded to obtain the first code stream.
  • the acquisition module is also used to generate N ⁇ M groups of parameters, each group of parameters includes k parameters, and the N ⁇ M group of parameters is consistent with the N ⁇ M included in the image to be processed.
  • N and M are both positive integers, N ⁇ M>1, k>1; the encoding module is also used to encode the N ⁇ M group of parameters to obtain the second code stream.
  • the acquisition module is also used to generate N ⁇ M groups of parameters, each group of parameters includes k parameters, and the N ⁇ M group of parameters is consistent with the N ⁇ M included in the image to be processed.
  • N and M are both positive integers, N ⁇ M>1, k>1; the corresponding image blocks are processed according to the N ⁇ M set of parameters to obtain the high dynamic range layer hdrLayer;
  • the encoding module is also used to encode the hdrLayer to obtain the third code stream.
  • the N ⁇ M group of parameters is obtained through a machine learning model; or, the N ⁇ M group of parameters is obtained based on the histogram of the image to be processed.
  • the acquisition module is specifically configured to obtain the metadata based on long exposure pictures, normal exposure pictures, and short exposure pictures; or, calculate the metadata based on preset photography parameters. .
  • the metadata also includes the minimum brightness of the collection scene and/or the dynamic range of the collection scene.
  • this application provides a decoder, including: one or more processors; a memory for storing one or more programs; when the one or more programs are executed by the one or more processors , causing the one or more processors to implement the method described in any one of the above first aspects.
  • the present application provides an encoder, including: one or more processors; a memory for storing one or more programs; when the one or more programs are executed by the one or more processors , causing the one or more processors to implement the method described in any one of the above second aspects.
  • the present application provides a computer-readable storage medium, including a computer program.
  • the computer program When the computer program is executed on a computer, it causes the computer to perform the method described in any one of the first to second aspects.
  • the present application provides a computer program product.
  • the computer program product contains instructions, which are characterized in that when the instructions are run on a computer or processor, the computer or processor implements the above-mentioned The method described in any one of the first to second aspects.
  • the application provides a code stream, which can be stored in a computer-readable storage medium or transmitted in the form of signals such as electromagnetic waves.
  • the code stream includes encoded image data and metadata.
  • the metadata Includes the maximum brightness of the captured scene.
  • the collection scene is the scene when the image before encoding is collected.
  • the present application provides an image collection and display system, including a collection-side electronic device and a display-side electronic device.
  • the collection-side electronic device may include the encoder of the above-mentioned sixth aspect
  • the display-side electronic device may include the above-mentioned fifth aspect. decoder.
  • the present application provides a chip system, characterized in that the chip system includes a logic circuit and an input-output interface, wherein the input-output interface is used to communicate with other communication devices outside the chip system. communications, the logic circuit is used to The method described in any one of the above first to second aspects is performed.
  • the present application provides 3. a computer-readable storage medium on which a code stream to be decoded is stored, or a code stream obtained by encoding is stored, and the code stream passes through the second aspect or any of the second aspects.
  • a coding method of implementation is obtained.
  • Figure 1 is a schematic diagram of the image acquisition and display system
  • Figure 2A is a schematic block diagram of an exemplary decoding system 10
  • Figure 2B is an illustration of an example of video coding system 40
  • Figure 3 is a schematic diagram of a video decoding device 300 provided by an embodiment of the present invention.
  • Figure 4 is a simplified block diagram of an apparatus 400 provided by an exemplary embodiment
  • Figure 5 is a flow chart of a process 500 of the image encoding method according to the embodiment of the present application.
  • Figure 6 is a schematic diagram of multi-frame fusion of images
  • Figure 7 is a schematic diagram of the encoding process at the acquisition end
  • Figure 8a and Figure 8b are schematic diagrams of hdrLayer
  • Figure 9 is a schematic diagram of the encoding process at the collection end
  • Figure 10 is a schematic diagram of the encoding process at the collection end
  • Figure 11 is a schematic diagram of the generation of N ⁇ M group parameters
  • Figure 12 is a schematic diagram of value range interpolation
  • Figure 13 is a schematic diagram of spatial domain interpolation
  • Figure 14 is a flow chart of the process 1400 of the image processing method according to the embodiment of the present application.
  • Figure 15 is a schematic diagram of the human eye’s perception of brightness and white diffuse reflection
  • Figure 16 is a schematic diagram of the processing process at the display end
  • Figure 17 is an exemplary structural schematic diagram of the image processing device 1700 according to the embodiment of the present application.
  • Figure 18 is an exemplary structural schematic diagram of the encoding device 1800 according to the embodiment of the present application.
  • At least one (item) refers to one or more, and “plurality” refers to two or more.
  • “And/or” is used to describe the relationship between associated objects, indicating that there can be three relationships. For example, “A and/or B” can mean: only A exists, only B exists, and A and B exist simultaneously. , where A and B can be singular or plural. The character “/” generally indicates that the related objects are in an "or” relationship. “At least one of the following” or similar expressions thereof refers to any combination of these items, including any combination of a single item (items) or a plurality of items (items).
  • At least one of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c” ”, where a, b, c can be single or multiple.
  • Neural network is a machine learning model.
  • the neural network can be composed of neural units.
  • the neural unit can refer to an arithmetic unit that takes xs and intercept 1 as input.
  • the output of the arithmetic unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer.
  • the activation function can be a nonlinear function such as ReLU.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected to the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • Multi-layer perception (MLP)
  • MLP is a simple deep neural network (DNN) (different layers are fully connected), also called a multi-layer neural network, which can be understood as a neural network with many hidden layers.
  • DNN deep neural network
  • the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the layers in between are hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • DNN looks very complicated, the work of each layer is actually not complicated.
  • the coefficient from the k-th neuron in layer L-1 to the j-th neuron in layer L is defined as It should be noted that the input layer has no W parameter.
  • more hidden layers make the network more capable of describing complex situations in the real world. Theoretically, a model with more parameters has higher complexity and greater "capacity", which means it can complete more complex learning tasks.
  • Training a deep neural network is the process of learning the weight matrix. The ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by the vectors W of many layers).
  • CNN Convolutional neural network
  • the convolutional neural network contains a feature extractor composed of convolutional layers and pooling layers.
  • the feature extractor can be regarded as a filter, and the convolution process can be regarded as using a trainable filter to convolve with an input image or convolution feature plane (feature map).
  • the convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal.
  • the convolution layer can include many convolution operators.
  • the convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator can essentially Is a weight matrix, which is usually predefined. During the convolution operation on the image, the weight matrix is usually pixel by pixel (or two pixels by two pixels) along the horizontal direction on the input image... ...This depends on the value of the step size), so as to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image.
  • the depth dimension of the weight matrix is the same as the depth dimension of the input image.
  • the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a convolved output with a single depth dimension, but in most cases, instead of using a single weight matrix, multiple weight matrices of the same size (rows ⁇ columns) are applied, That is, multiple matrices of the same type.
  • the output of each weight matrix is stacked to form the depth dimension of the convolution image.
  • the dimension here can be understood as being determined by the "multiple" mentioned above. Different weight matrices can be used to extract different features in the image.
  • one weight matrix is used to extract edge information of the image, and another weight matrix is used to extract the edge information of the image.
  • Specific colors, and a weight matrix are used to blur unwanted noise in the image, etc.
  • the multiple weight matrices have the same size (row ⁇ column), and the feature maps extracted by the multiple weight matrices with the same size are also the same size. The extracted multiple feature maps with the same size are then merged to form a convolution operation. output.
  • the weight values in these weight matrices require a lot of training in practical applications. Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network can make correct predictions.
  • the initial convolutional layer often extracts more general features, which can also be called low-level features; as the depth of the convolutional neural network deepens,
  • the features extracted by subsequent convolutional layers become more and more complex, such as high-level semantic features.
  • Features with higher semantics are more suitable for the problem to be solved.
  • the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image.
  • the average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling.
  • the max pooling operator can take the pixel with the largest value in a specific range as the result of max pooling.
  • the operators in the pooling layer should also be related to the size of the image.
  • the size of the image output after processing by the pooling layer can be smaller than the size of the image input to the pooling layer.
  • Each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network After being processed by the convolutional layer/pooling layer, the convolutional neural network is not enough to output the required output information. Because as mentioned before, the convolutional layer/pooling layer will only extract features and reduce the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network needs to use neural network layers to generate an output or a set of required number of classes. Therefore, the neural network layer can include multiple hidden layers, and the parameters contained in the multiple hidden layers can be pre-trained based on relevant training data of a specific task type. For example, the task type can include image recognition, Image classification, image super-resolution reconstruction, etc.
  • the output layer of the entire convolutional neural network is also included.
  • This output layer has a loss function similar to categorical cross-entropy, specifically used to calculate the prediction error.
  • Recurrent neural networks are used to process sequence data.
  • the layers are fully connected, while the nodes within each layer are unconnected.
  • this ordinary neural network has solved many difficult problems, it is still incompetent for many problems. For example, if you want to predict the next word of a sentence, you generally need to use the previous word, because the preceding and following words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output.
  • RNN can process sequence data of any length.
  • the training of RNN is the same as the training of traditional CNN or DNN.
  • the error backpropagation algorithm is also used, but there is one difference: that is, if the RNN is expanded into a network, then the parameters, such as W, are shared; this is not the case with the traditional neural network as shown in the example above.
  • the output of each step not only depends on the network of the current step, but also depends on the status of the network of several previous steps. This learning algorithm is called Back propagation Through Time (BPTT).
  • BPTT Back propagation Through Time
  • the weight vector of the neural network (of course, there is usually an initialization process before the first update, that is, preconfiguring parameters for each layer in the deep neural network). For example, if the prediction value of the network is high, adjust the weight vector to make It predicts lower, and keeps adjusting until the deep neural network is able to predict the really wanted target value or something very close to the really wanted target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value".
  • loss function loss function
  • objective function object function
  • the convolutional neural network can use the error back propagation (BP) algorithm to modify the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller.
  • BP error back propagation
  • forward propagation of the input signal until the output will produce an error loss
  • the parameters in the initial super-resolution model are updated by back-propagating the error loss information, so that the error loss converges.
  • the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the super-resolution model, such as the weight matrix.
  • Generative adversarial networks is a deep learning model.
  • the model includes at least two modules: one module is a generative model (Generative Model), and the other module is a discriminative model (Discriminative Model). Through these two modules, they learn from each other to produce better output.
  • Both the generative model and the discriminative model can be neural networks, specifically deep neural networks or convolutional neural networks.
  • the basic principle of GAN is as follows: Take the GAN that generates pictures as an example. Suppose there are two networks, G (Generator) and D (Discriminator), where G is a network that generates pictures.
  • D is a discriminant network, used to judge whether a picture is "real". Its input parameter is x, x represents a picture, and the output D(x) represents the probability that x is a real picture. If it is 1, it means 130% is a real picture. If it is 0, it means it cannot be real. picture.
  • the goal of the generative network G is to generate real pictures as much as possible to deceive the discriminant network D, and the goal of the discriminant network D is to try to distinguish the pictures generated by G from the real pictures. Come.
  • G and D constitute a dynamic "game” process, that is, the "confrontation” in the "generative adversarial network".
  • Figure 1 is a schematic diagram of an image collection and display system.
  • the collection end for example, electronic equipment such as cameras, cameras, surveillance cameras, etc.
  • the image is encoded and compressed
  • the display end for example, electronic devices such as mobile phones, tablets, and smart screens
  • the display end is responsible for decoding and reconstructing the image, and adaptively adjusts the screen brightness according to the intensity of ambient light (i.e., automatic backlight technology).
  • electronic devices such as mobile phones, tablets, and smart screens basically have automatic backlight technology.
  • the main consideration is the comfort of the screen brightness to the human eye. Define an optimal comfort zone, including the upper limit of comfort (too bright and dazzling) and the lower limit of comfort (too dark to see clearly).
  • the power consumption of the screen is taken into consideration. The greater the brightness, the greater the power consumption, so the lower limit of comfort is usually used. value to adjust screen brightness.
  • the peak brightness of the screen of electronic devices can reach 1000nit or even higher.
  • automatic backlight technology only lower screen brightness is used. For example, under ordinary indoor ambient light, the backlight brightness of mobile phones is set to 100-200nit, and A large brightness range is not used, and the brightness range of the screen is not fully utilized for image display to achieve the best end-to-end experience.
  • the above content exemplarily describes an implementation of the image collection and display system.
  • part or all of the functions of the acquisition end and the display end can also be integrated on the same electronic device; alternatively, image acquisition and image encoding can be implemented by different electronic devices.
  • the method provided by the embodiment of the present application is used for images with Electronic devices with encoding and/or decoding functions; alternatively, the image decoding function and display function can also be implemented by different electronic devices, such as screen projection or external display screens.
  • the embodiments of this application do not specifically limit the usage scenarios.
  • embodiments of the present application provide an image encoding and processing method to fully utilize the brightness range of the screen for image display to achieve the best end-to-end experience.
  • FIG. 2A is a schematic block diagram of an exemplary decoding system 10.
  • the video encoder 20 (or simply referred to as the encoder 20 ) and the video decoder 30 (or simply referred to as the decoder 30 ) in the decoding system 10 may be used to perform various example solutions described in the embodiments of this application.
  • the decoding system 10 includes a source device 12 for providing encoded image data 21 such as an encoded image to a destination device 14 for decoding the encoded image data 21 .
  • the source device 12 includes an encoder 20 and, additionally or optionally, an image source 16, a preprocessor (or preprocessing unit) 18 such as an image preprocessor, and a communication interface (or communication unit) 22.
  • Image source 16 may include or be any type of image capture device for capturing real-world images or the like, and/or any type of image generation device, such as a computer graphics processor or any type of user for generating computer animation images. Devices used to acquire and/or provide real-world images, computer-generated images (e.g., screen content, virtual reality (VR) images, and/or any combination thereof (e.g., augmented reality (AR) images)).
  • the image source may be any type of memory or storage that stores any of the above images.
  • the image (or image data) 17 may also be referred to as the original image (or original image data) 17.
  • the preprocessor 18 is used to receive (original) image data 17 and perform preprocessing on the image data 17 to obtain a preprocessed image (or preprocessed image data) 19 .
  • preprocessing performed by preprocessor 18 may include cropping, color format conversion (eg, from RGB to YCbCr), color grading, or denoising. It can be understood that the preprocessing unit 18 may be an optional component.
  • Video encoder (or encoder) 20 is used to receive pre-processed image data 19 and provide encoded image data 21 (further described below with reference to FIG. 3 and the like).
  • the communication interface 22 in the source device 12 may be used to receive the encoded image data 21 and send the encoded image data 21 (or any other processed version) to another device such as the destination device 14 or any other device through the communication channel 13 for storage. Or rebuild directly.
  • the destination device 14 includes a decoder 30 and may additionally or optionally include a communication interface (or communication unit) 28, a post-processor (or post-processing unit) 32 and a display device 34.
  • the communication interface 28 in the destination device 14 is used to receive the encoded image data 21 (or any other processed version) directly from the source device 12 or from any other source device such as a storage device.
  • the storage device is an encoded image data storage device
  • the encoded image data 21 is provided to the decoder 30 .
  • Communication interface 22 and communication interface 28 may be used via a direct communication link between source device 12 and destination device 14, such as a direct wired or wireless connection, or the like, or via any type of network, such as a wired network, a wireless network, or any thereof.
  • the communication interface 22 may be used to encapsulate the encoded image data 21 into a suitable format such as a message, and/or process the encoded image data using any type of transmission encoding or processing for transmission over a communication link or network. transfer on.
  • the communication interface 28 corresponds to the communication interface 22 and can, for example, be used to receive transmission data and process the transmission data using any type of corresponding transmission decoding or processing and/or decapsulation to obtain the encoded image data 21 .
  • Both communication interface 22 and communication interface 28 can be configured as a one-way communication interface as indicated by the arrow pointing from the source device 12 to the corresponding communication channel 13 of the destination device 14 in Figure 2A, or a bi-directional communication interface, and can be used to send and receive messages. etc., to establish the connection, confirm and exchange any other information related to the communication link and/or data transmission such as the transmission of encoded image data, etc.
  • the video decoder (or decoder) 30 is configured to receive encoded image data 21 and provide decoded image data (or decoded image data) 31 (further described below with reference to FIG. 4 and the like).
  • the post-processor 32 is used to perform post-processing on decoded image data 31 (also referred to as reconstructed image data) such as decoded images to obtain post-processed image data 33 such as post-processed images.
  • Post-processing performed by the post-processing unit 32 may include, for example, color format conversion (eg, from YCbCr to RGB), toning, cropping or resampling, or any other processing for generating decoded image data 31 for display by a display device 34 or the like. .
  • the display device 34 is used to receive the post-processed image data 33 to display the image to a user or viewer or the like.
  • Display device 34 may be or include any type of display for representing reconstructed images, such as an integrated or external display or display.
  • the display screen may include a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display, a projector, a micro-LED display, a liquid crystal on silicon (LCoS) display ), digital light processor (DLP) or any type of other display.
  • the decoding system 10 also includes a training engine 25, which is used to train the encoder 20 or the decoder 30, especially the neural network used in the encoder 20 or the decoder 30 (described in detail below).
  • the training data can be stored in a database (not shown), and the training engine 25 trains and obtains a neural network based on the training data. It should be noted that the embodiment of the present application does not limit the source of the training data. For example, the training data may be obtained from the cloud or other places for model training.
  • FIG. 2A shows source device 12 and destination device 14 as separate devices
  • device embodiments may also include both source device 12 and destination device 14 or the functionality of both source device 12 and destination device 14 , that is, include both source devices 12 and 14 .
  • Device 12 or corresponding function and destination device 14 or corresponding function In these embodiments, source device 12 or corresponding functions and destination device 14 or corresponding functions may be implemented using the same hardware and/or software or by separate hardware and/or software or any combination thereof.
  • Encoder 20 eg, video encoder 20
  • decoder 30 eg, video decoder 30
  • processing circuitry such as one or more microprocessors, digital signal processors (digital signal processor, DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), discrete logic, hardware, video encoding special processor or any combination thereof .
  • the encoder 20 and the decoder 30 may each be implemented by a processing circuit 46.
  • the processing circuitry 46 may be used to perform various operations discussed below.
  • the device can store the instructions of the software in a suitable non-transitory computer-readable storage medium, and use one or more processors to execute the instructions in hardware, thereby performing the technology of the present application.
  • One of the encoder 20 and the decoder 30 may be integrated in a single device as part of a combined encoder/decoder (CODEC), as shown in Figure 2B.
  • Source device 12 and destination device 14 may include any of a variety of devices, including any type of handheld or stationary device, for example, a notebook or laptop computer, a smartphone, a tablet or slate, a camera, a desktop computer , set-top boxes, televisions, display devices, digital media players, video game consoles, video streaming devices (e.g., content business servers or content distribution servers), etc., and may not use or use any type of operating system.
  • source device 12 and destination device 14 may be equipped with components for wireless communications. Accordingly, source device 12 and destination device 14 may be wireless communication devices.
  • the decoding system 10 shown in FIG. 2A is only exemplary, and the technology provided in this application may be applicable to video decoding devices (eg, video encoding or video decoding), which do not necessarily include encoding devices and video decoding devices. Decode any data communication between devices. In other examples, data is retrieved from local storage, sent over the network, and so on. The video encoding device may encode the data and store the data in memory, and/or the video decoding device may retrieve the data from memory and decode the data. In some examples, encoding and decoding are performed by devices that do not communicate with each other but merely encode data to memory and/or retrieve and decode data from memory.
  • video decoding devices eg, video encoding or video decoding
  • FIG. 2B is an illustration of an example of video coding system 40.
  • Video coding system 40 may include imaging device 41, video encoder 20, video decoder 30 (and/or a video codec implemented by processing circuitry 46), antenna 42, one or more processors 43, a or multiple memory stores 44 and/or display devices 45.
  • the imaging device 41, the antenna 42, the processing circuit 46, the video encoder 20, the video decoder 30, the processor 43, the memory storage 44 and/or the display device 45 can communicate with each other.
  • video coding system 40 may include only video encoder 20 or only video decoder 30 .
  • antenna 42 may be used to transmit or receive an encoded bitstream of video data.
  • display device 45 may be used to present video data.
  • the processing circuit 46 may include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, etc.
  • Video decoding system 40 may also include an optional processor 43, which may similarly include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, or the like.
  • the memory 44 may be any type of memory, such as volatile memory (eg, static random access memory (SRAM), dynamic random access memory (DRAM), etc.) or non-volatile memory. Volatile memory (for example, flash memory, etc.), etc.
  • memory store 44 may be implemented by cache memory.
  • processing circuitry 46 may include memory (eg, cache, etc.) for implementing image buffers, etc.
  • video encoder 20 implemented by logic circuitry may include an image buffer (eg, implemented by processing circuitry 46 or memory storage 44) and a graphics processing unit (eg, implemented by processing circuitry 46).
  • a graphics processing unit may be communicatively coupled to the image buffer.
  • the graphics processing unit may include video encoder 20 implemented through processing circuitry 46 .
  • Logic circuits can Used to perform various operations discussed in this article.
  • video decoder 30 may be implemented in a similar manner with processing circuitry 46 to implement the various functions discussed with respect to video decoder 30 of FIG. 4 and/or any other decoder system or subsystem described herein. module.
  • logic circuitry implemented video decoder 30 may include an image buffer (implemented by processing circuitry 46 or memory storage 44 ) and a graphics processing unit (eg, implemented by processing circuitry 46 ).
  • a graphics processing unit may be communicatively coupled to the image buffer.
  • the graphics processing unit may include video decoder 30 implemented by processing circuitry 46 .
  • antenna 42 may be used to receive an encoded bitstream of video data.
  • the encoded bitstream may include data related to encoded video frames, indicators, index values, mode selection data, etc., as discussed herein, such as data related to encoded partitions (e.g., transform coefficients or quantized transform coefficients , optional indicators (as discussed, and/or data defining encoding splits).
  • Video coding system 40 may also include video decoder 30 coupled to antenna 42 and for decoding the encoded bitstream.
  • Display device 45 is used to present video frames.
  • video decoder 30 may be used to perform the opposite process.
  • video decoder 30 may be configured to receive and parse such syntax elements and decode related video data accordingly.
  • video encoder 20 may entropy encode the syntax elements into an encoded video bitstream.
  • video decoder 30 may parse such syntax elements and decode the associated video data accordingly.
  • VVC Versatile video coding
  • VCEG ITU-T Video Coding Experts Group
  • HEVC High-Efficiency Video Coding
  • JCT-VC Joint Collaboration Team on Video Coding
  • FIG. 3 is a schematic diagram of a video decoding device 300 provided by an embodiment of the present invention.
  • Video coding device 300 is suitable for implementing the disclosed embodiments described herein.
  • the video decoding device 300 may be a decoder, such as the video decoder 30 in FIG. 2A, or an encoder, such as the video encoder 20 in FIG. 2A.
  • the video decoding device 300 includes: an inlet port 310 (or input port 310) for receiving data and a receiving unit (receiver unit, Rx) 320; a processor, logic unit or central processing unit (central processing unit) for processing data , CPU) 330; for example, the processor 330 here can be a neural network processor 330; a transmitter unit (Tx) 340 for transmitting data and an output port 350 (or output port 350); a transmitter unit for storing data Memory 360.
  • the video decoding device 300 may further include optical-to-electrical (OE) components and electrical-to-optical (EO) components coupled to the inlet port 310, the receiving unit 320, the transmitting unit 340, and the egress port 350, An outlet or entrance for optical or electrical signals.
  • OE optical-to-electrical
  • EO electrical-to-optical
  • Processor 330 is implemented in hardware and software.
  • Processor 330 may be implemented as one or more processor chips, cores (eg, multi-core processors), FPGAs, ASICs, and DSPs.
  • Processor 330 communicates with ingress port 310, receiving unit 320, transmitting unit 340, egress port 350, and memory 360.
  • the processor 330 includes a decoding module 370 (eg, a neural network NN based decoding module 370).
  • Decoding module 370 implements the embodiments disclosed above. For example, decoding module 370 performs, processes, prepares, or provides various encoding operations. Therefore, the decoding module 370 provides a substantial improvement in the functionality of the video decoding device 300 and affects the switching of the video decoding device 300 to different states.
  • decoding module 370 may be implemented as instructions stored in memory 360 and executed by processor 330 .
  • Memory 360 includes one or more disks, tape drives, and solid-state drives that may serve as overflow data storage devices for storing programs as they are selected for execution, and for storing instructions and data that are read during execution of the programs.
  • Memory 360 may be volatile and/or non-volatile, and may be read-only memory (ROM), random access memory (RAM), ternary content-addressable memory (ternary content-addressable memory (TCAM) and/or static random-access memory (static random-access memory (SRAM)).
  • ROM read-only memory
  • RAM random access memory
  • TCAM ternary content-addressable memory
  • SRAM static random-access memory
  • Figure 4 is a simplified block diagram of an apparatus 400 provided by an exemplary embodiment.
  • the apparatus 400 can be used as either or both of the source device 12 and the destination device 14 in Figure 2A.
  • Processor 402 in device 400 may be a central processing unit.
  • processor 402 may be any other type of device or devices that exists or may be developed in the future that is capable of manipulating or processing information.
  • the disclosed implementations may be implemented using a single processor, such as processor 402 as shown, it is faster and more efficient to use more than one processor.
  • memory 404 in apparatus 400 may be a read-only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device may be used as memory 404.
  • Memory 404 may include processor 402 Code and data 406 accessed via bus 412.
  • Memory 404 may also include an operating system 408 and application programs 410 including at least one program that allows processor 402 to perform the methods described herein.
  • application 410 may include Applications 1-N, and may also include a video coding application that performs the methods described herein.
  • Apparatus 400 may also include one or more output devices, such as display 418.
  • display 418 may be a touch-sensitive display that combines a display with a touch-sensitive element that can be used to sense touch input.
  • Display 418 may be coupled to processor 402 via bus 412 .
  • bus 412 in device 400 is described herein as a single bus, bus 412 may include multiple buses. Additionally, auxiliary storage may be directly coupled to other components of device 400 or accessed through a network, and may include a single integrated unit, such as a memory card, or multiple units, such as multiple memory cards. Accordingly, device 400 may have a wide variety of configurations.
  • FIG. 5 is a flow chart of a process 500 of the image encoding method according to an embodiment of the present application.
  • the process 500 may be performed by the above collection-end electronic device (ie, the video encoder 20).
  • Process 500 is described as a series of steps or operations, and it should be understood that process 500 may be performed in various orders and/or occur simultaneously and is not limited to the order of execution shown in FIG. 5 .
  • Process 500 may include:
  • Step 501 Obtain the image to be processed.
  • Figure 6 is a schematic diagram of multi-frame image fusion.
  • the collection end uses any collection device, such as a camera, to collect multiple frames of pictures under different exposure conditions for the same scene. For example, long exposure pictures (L(long ) frame), normal exposure picture (N (normal) frame) and short exposure picture (S (short) frame).
  • L(long ) frame long exposure pictures
  • N (normal) frame normal exposure picture
  • S (short) frame short exposure picture
  • the L frame has a longer exposure time, so that very dark areas in the scene can be photographed clearly, but the bright The area will be overexposed;
  • the N frame is a normal exposure frame, the medium brightness area in the scene will be good, but the very bright area will be overexposed, and the very dark area will be unclear;
  • the S frame has a shorter exposure time , so that very bright areas in the scene will not be overexposed, but medium-brightness and dark areas will be dark and unclear.
  • the high-bit picture combines L frame, N frame and S frame and can have multiple frames. The advantages and disadvantages of multiple frames are eliminated. For example, very bright areas in the scene will not be overexposed, medium-bright areas are good, and very dark areas are also very clear.
  • DRC dynamic range compression
  • the above-mentioned 8-bit fused image is the image to be processed.
  • Step 502 Obtain metadata, which includes the maximum brightness of the captured scene.
  • the collection end can obtain metadata information such as the dynamic range, maximum brightness, and minimum brightness of the collection scene.
  • the collection scene is the scene when the collection end collects the image to be processed (original image). For example, the collection scene is outdoors at noon, outdoors after dark, outdoors on a cloudy day, indoors with lights, etc.
  • the collection end can obtain metadata based on the above L frame, N frame and S frame.
  • fusing the L frame, N frame and S frame can not only prevent the brightness of the highlight area from being lost, but also make the very dark area visible. It can be seen that multi-frame fusion can obtain high dynamic range correlation. information to generate metadata.
  • the collection end can calculate the metadata based on preset photography parameters.
  • the acquisition end selects a benchmark, for example, the maximum brightness of the real acquisition scene is baseLum (the brightness corresponding to 255), marks the brightness corresponding to each pixel value in the image to be processed, and saves it as lumLUT [256], that is, , the one-to-one correspondence between the brightness of the real scene and the pixel value of the image.
  • the range of the image pixel value is 0 to 255, a total of 256 values.
  • the selection criterion is to make each pixel value correspond to the real brightness value of the scene one-to-one, marked with minval
  • the image grayscale of the pixel, the corresponding sensitivity (ISO) is baseISO, and the exposure time is baseExp.
  • the ISO of the N frame is curISO
  • the exposure time is curExp
  • the gain corresponding to the electron volt (EV) reduction of the S frame is Dgain. That is, the S frame is achieved by reducing the EV, and the EV has different sizes. Different sizes correspond to different Dgain. If there is no S frame, Dgain is 1.
  • Step 503 Encode the image and metadata to be processed to obtain the first code stream.
  • the collection end can encode the two to obtain the first code stream.
  • the encoding method used by the collection end to encode the image to be processed can include standard hybrid video coding technology. End-to-end encoding network, encoding technology based on machine learning models, etc.
  • the embodiment of the present application does not specifically limit the encoding method of the image to be processed; the metadata can be encoded into the reserved field of the code stream, such as appn of JPG field.
  • the collection end can also use other methods to encode metadata, and there are no specific restrictions on this.
  • Figure 7 is a schematic diagram of the encoding process of the collection end.
  • the collection end can use the method of step 502 to obtain metadata when the image is processed by DRC, and then use the method of step 503 to encode the image to be processed and the metadata.
  • the first code stream can then be transmitted to the display end.
  • the embodiment of the present application can obtain a high dynamic range layer (hdrLayer).
  • the hdrLayer can be a two-dimensional single-channel 8-bit image used to mark the highlight area in the image to be processed.
  • the resolution of hdrLayer can be equal to the resolution of the image to be processed, and the resolution of hdrLayer can also be smaller or larger than the resolution of the image to be processed.
  • the display end can perform processing on hdrLayer. Image super-resolution processing or down-sampling processing is performed to match the image to be processed, which can reduce storage space.
  • hdrLayer can also be presented as a two-dimensional array, a three-dimensional array, or an array of other dimensions, or any other data form that can store multiple parameters. This application does not limit the specific form of hdrLayer.
  • Figure 8a and Figure 8b are schematic diagrams of hdrLayer. As shown in Figure 8a and Figure 8b, hdrLayer is a grayscale image that can mark the highlighted areas of the original image. The larger the value, the greater the brightness of the original image, so it corresponds to the original image. In areas with higher brightness, hdrLayer appears brighter, and corresponding to areas with lower brightness in the original image, hdrLayer appears darker.
  • hdrLayer mainly assists the display end to adjust the brightness of the image to adapt to human eye perception. Therefore, the display end needs to obtain hdrLayer.
  • the acquisition end can use the following two methods:
  • N ⁇ M groups of parameters each group of parameters includes k parameters, and the N ⁇ M group of parameters corresponds to the N ⁇ M image blocks included in the image to be processed.
  • N and M are both positive integers, N ⁇ M>1, k>1; encode the N ⁇ M group of parameters to obtain the second code stream.
  • the function of the above N ⁇ M group of parameters is to generate hdrLayer. Therefore, in order to save the code stream, the acquisition end does not need to directly generate hdrLayer, but instead transmits the second code stream obtained after encoding the N ⁇ M group of parameters used to generate hdrLayer to the display. end, and then the display end decodes the stream to recover an N ⁇ M set of parameters, and then generates hdrLayer based on the N ⁇ M set of parameters, which can improve transmission efficiency.
  • Figure 9 is a schematic diagram of the encoding process of the acquisition end.
  • the acquisition end in addition to obtaining metadata, the acquisition end can also generate N ⁇ M sets of parameters, and then encode the metadata, the image to be processed and the N ⁇ M set of parameters to obtain the first
  • the first code stream and the second code stream can then be transmitted to the display end.
  • the first code stream and the second code stream can be serially connected and merged into one code stream, or they can be merged into one code stream in a preset manner, or they can be transmitted one by one as separate code streams. This is not done. Specific limitations.
  • each group of parameters includes k parameters.
  • the N ⁇ M group of parameters corresponds to the N ⁇ M image blocks included in the image to be processed.
  • N and M are both positive integers.
  • the acquisition end can also generate hdrLayer based on the N ⁇ M set of parameters, and then transmit the third code stream obtained after encoding the hdrLayer to the display end, and then the display end decodes the stream to recover the hdrLayer, which can improve the processing efficiency of the display end.
  • Figure 10 is a schematic diagram of the encoding process of the collection end.
  • the collection end in addition to obtaining metadata, the collection end can also generate hdrLayer, and then encode the metadata, image to be processed and hdrLayer to obtain the first code stream and the third code stream. , and then the first code stream and the third code stream can be transmitted to the display end.
  • the first code stream and the third code stream can be serially connected and merged into one code stream, or they can be merged into one code stream in a preset manner, or they can be transmitted one by one as separate code streams. This is not done. Specific limitations.
  • the collection end can also transmit the second code stream and the third code stream to the display end in addition to the first code stream.
  • the first code stream, the second code stream and the third code stream can be serialized one after another. They can be combined into one code stream, can also be merged into one code stream in a preset manner, or can be transmitted one by one as separate code streams, with no specific limitations on this.
  • the N ⁇ M group of parameters can be passed through the machine learning model (the machine learning model can refer to the above description, which is not included here). (to be discussed again); alternatively, the N ⁇ M set of parameters can also be obtained based on the histogram of the image to be processed.
  • Figure 11 is a schematic diagram for generating N ⁇ M sets of parameters.
  • the acquisition end divides the image to be processed (original image) into N ⁇ M blocks, and each block outputs k parameters, and a total of N can be obtained.
  • ⁇ M ⁇ k parameters are parameters that are used to generate N ⁇ M sets of parameters.
  • the original image can be scaled to a smaller resolution, for example, 256 ⁇ 256.
  • the thumbnail enters a machine learning model (such as a network), and N ⁇ M ⁇ k parameters are learned through the network.
  • the network can include local branches and global branches. Perform convolution operations, downsampling, and increase the number of channels on thumbnails. Repeat these operations, for example, 4 times (4 times downsampling), and the resolution becomes 16 ⁇ 16. After that, the local branch is entered. The resolution of the local branch is maintained at 16 ⁇ 16, but some convolutions are performed without downsampling. Enter the global branch, which continues downsampling until it becomes 1 ⁇ 1.
  • N ⁇ M ⁇ k parameters are 0 to 1, which can also be considered to be 0 to 255.
  • N and M are space divisions, which divide the image into N ⁇ M blocks, and k is the division of the value range, which divides the value range into k-1 segments and k fixed points.
  • the input values are continuous and will not be exactly k values, so interpolation is needed in the middle.
  • the spatial domain is two-dimensional interpolation, which can be called bilinear interpolation, and the value domain is linear interpolation.
  • N ⁇ M blocks in the spatial domain there are N ⁇ M blocks in the spatial domain.
  • four adjacent blocks need to be interpolated.
  • k in the value range the brightness Y of the input original image is continuous, and this k is interval, so interpolation is also required in the middle.
  • the k value range taking the range from 0 to 255 as an example, when the output is 255, the hdrLayer will be very bright. When the output is 0, the hdrLayer will be very dark.
  • the k values are direct values from 0 to 255.
  • Figure 12 is a schematic diagram of value range interpolation.
  • Vi is obtained by interpolating between the abscissa BinN and BinN+1, and then the vertical axis coordinate Gam[Vi] corresponding to Vi is found from the curve.
  • Figure 13 is a schematic diagram of spatial domain interpolation. As shown in Figure 13, four adjacent blocks are taken. The closer the point P is to which block, the greater the weight of which block.
  • FIG. 14 is a flowchart of a process 1400 of an image display method applied to an electronic device according to an embodiment of the present application.
  • Process 1400 may be executed by the above display-end electronic device (ie, video decoder 30), wherein displaying the target image may be executed by a display component, which may be a display module integrated on the electronic device, such as a touch screen, The display component can also be a display independent of the electronic device, for example, a display external to the electronic device, a smart screen or curtain projected onto the electronic device, etc. There is no specific limitation on this.
  • Process 1400 is described as a series of steps or operations, and it should be understood that process 1400 may be performed in various orders and/or occur simultaneously and is not limited to the order of execution shown in FIG. 14 .
  • Process 1400 may include:
  • Step 1401 Obtain the image to be processed.
  • the display end receives the code stream from the acquisition end and decodes the code stream to obtain the image to be processed. It can use the hybrid decoding method above, the end-to-end decoding method, or the decoding method based on the machine learning model, etc.
  • the embodiment of the present application does not specifically limit the decoding method of the image to be processed.
  • Step 1402 Obtain highlight enhancement data, which includes hdrLayer.
  • hdrLayer can be a two-dimensional single-channel 8-bit image, used to mark the highlight area in the image to be processed.
  • the resolution of hdrLayer can be equal to the image to be processed. Resolution, the resolution of hdrLayer can also be smaller than the resolution of the image to be processed, there is no specific limit on this.
  • hdrLayer mainly assists the display end in adjusting the brightness of the image to adapt to human eye perception. Therefore, the display end can obtain hdrLayer in the following three ways:
  • One way is to receive the code stream and decode the code stream to obtain hdrLayer.
  • the collection end generates hdrLayer, encodes it into a code stream and transmits it to the display end.
  • the display end can receive the code stream and decode the stream to directly obtain hdrLayer. This can improve the processing efficiency of the display side.
  • Another way is to receive the code stream and decode the code stream to obtain N ⁇ M groups of parameters.
  • Each group of parameters includes k parameters.
  • the N ⁇ M group of parameters corresponds to the N ⁇ M image blocks included in the image to be processed; according to N ⁇ M sets of parameters process the corresponding image blocks to obtain hdrLayer.
  • the collection end generates N ⁇ M sets of parameters, encodes them into code streams and transmits them to the display end.
  • the display end can receive the code stream, decode the stream to first obtain the N ⁇ M set of parameters, and then obtain the hdrLayer based on the N ⁇ M set of parameters. This can save code streams and improve transmission efficiency.
  • each group of parameters includes k parameters, and the N ⁇ M group of parameters corresponds to the N ⁇ M image blocks included in the image to be processed; the corresponding The image block is processed to get hdrLayer.
  • the collection end does not need to process hdrLayer, that is, it does not need to generate hdrLayer, nor does it need to generate an N ⁇ M set of parameters.
  • the display end completely generates an N ⁇ M set of parameters based on the image to be processed, and then obtains the hdrLayer. . This can save code streams and improve transmission efficiency.
  • the display end can obtain N ⁇ M image blocks included in the image to be processed and obtain k parameters for each image block, thereby obtaining N ⁇ M sets of parameters.
  • the k parameters of each image block can be expressed as a one-dimensional table. Apply the N ⁇ M set of parameters to the image to be processed to obtain the final hdrLayer. This process can be referred to the description above and will not be repeated here.
  • the N ⁇ M group of parameters are obtained through a machine learning model; or, the N ⁇ M group of parameters are obtained based on the histogram of the image to be processed. You can refer to the above description and will not be repeated here.
  • hdrLayer can also be obtained in other ways, and there is no specific limitation on this.
  • the highlight enhancement data also includes metadata.
  • Metadata can include the dynamic range, maximum brightness, minimum brightness, etc. of the captured scene.
  • the display end can obtain metadata in the following two ways:
  • One method is to receive the code stream and decode the stream to get the metadata.
  • the collection end can encode the metadata and the image to be processed to obtain a code stream, and then transmit the code stream to the display end.
  • the display end can receive and decode the code stream to directly obtain the metadata.
  • Another method is to receive the code stream, decode the stream to obtain the camera parameters of the captured scene, and calculate the metadata based on the camera parameters.
  • the collection terminal can encode the photographing parameters of the collection scene required to obtain metadata into the code stream and transmit it to the display terminal.
  • the display terminal receives and decodes the code stream to obtain the photographing parameters of the collection scene, and then based on This camera parameter gets metadata.
  • the photographing parameters of the collection scene can include: the maximum brightness baseLum of the actual collection scene (the brightness corresponding to 255), marking the brightness corresponding to each pixel value in the image to be processed, and storing it as lumLUT [256], that is, There is a one-to-one correspondence between the brightness of the real scene and the pixel value of the image.
  • the image pixel value ranges from 0 to 255, with a total of 256 values.
  • the selection criterion is to make each pixel value correspond to the real brightness value of the scene one-to-one, and mark the pixel with minval.
  • Step 1403 Obtain the initial backlight brightness of the electronic device.
  • Electronic devices have backlight technology, so the initial backlight brightness of the electronic device can be set according to the surrounding environment. You can refer to the relevant backlight technology, which will not be described again.
  • Step 1404 Obtain the target backlight brightness of the electronic device according to the initial backlight brightness.
  • the display can adjust the backlight of the electronic device by combining the brightness information related to the collection scene in the metadata (for example, the dynamic range, maximum brightness, minimum brightness of the collection scene, etc.), including increasing the backlight brightness. Or reduce the backlight brightness.
  • the backlight brightness can be increased to fully utilize the high dynamic range (HDR) of the screen of the electronic device. Therefore, the target backlight brightness of the electronic device is Higher than electronic equipment The initial backlight brightness of the device.
  • the display end can use the following two methods to obtain the target backlight brightness of the electronic device:
  • One method is to process the initial backlight brightness according to a preset backlight adjustment ratio to obtain the target backlight brightness.
  • the display can preset a ratio based on historical records, big data analysis, screen attributes of electronic devices, etc., for example, backlight increase ratio (used to increase backlight brightness, target backlight brightness > initial backlight brightness) or backlight decrease ratio (used to increase backlight brightness). Reduce the backlight brightness, target backlight brightness ⁇ initial backlight brightness).
  • the display end can process the initial backlight brightness according to the preset backlight adjustment ratio, for example, multiply the preset backlight adjustment ratio and the initial backlight brightness to obtain the target backlight brightness.
  • the method described above does not constitute a limitation.
  • the embodiment of the present application does not specifically limit the setting method of the preset backlight adjustment ratio or the acquisition method of the target backlight brightness.
  • Another method is to obtain the backlight adjustment ratio based on metadata; process the initial backlight brightness according to the backlight adjustment ratio to obtain the target backlight brightness.
  • the difference from the previous method is that the backlight adjustment ratio is not preset and can be calculated by the display.
  • the backlight adjustment ratio can also be a backlight increase ratio (used to increase backlight brightness, target backlight brightness > initial backlight brightness) or a backlight reduction ratio (used to reduce backlight brightness, target backlight brightness ⁇ initial backlight brightness).
  • the display end can obtain the first ratio according to the maximum brightness of the collection scene.
  • the first ratio is the ratio of the brightness perception of the human eye and the white diffuse reflection perception of the collection scene; the second ratio can be obtained according to the first ratio.
  • the second ratio is the ratio between the brightness perception of the human eye and the white diffuse reflection perception at the display end.
  • the second ratio is less than or equal to the first ratio; the backlight adjustment ratio is obtained according to the second ratio.
  • P2 a ⁇ P1
  • a represents the preset coefficient, a ⁇ 1.
  • the human eye's brightness perception at the display end is the same as that at the collection end.
  • Lmax represents the maximum brightness of the captured scene
  • gainBL represents the backlight adjustment ratio
  • AmbientLum represents the ambient light intensity
  • the display end can process the initial backlight brightness according to the backlight adjustment ratio, for example, multiply the backlight adjustment ratio and the initial backlight brightness to obtain the target backlight brightness.
  • Step 1405 Adjust the brightness of the image to be processed according to hdrLayer to obtain a target image suitable for the target backlight brightness.
  • the target backlight brightness of the electronic device is calculated, and the backlight brightness of the electronic device is adjusted to the target backlight brightness, so that the display effect of the image to be processed on the display end conforms to the brightness perception of the human eye in the real collection scene.
  • the backlight brightness of the electronic device is increased.
  • the HDR area in the image to be processed may be distorted. It will be more dazzling.
  • pixel processing of the image to be processed can be performed, and the pixel values of some areas can be adjusted so that the brightness of this part of the area is the same as before the backlight adjustment to avoid glare.
  • the display end obtains the target weight according to hdrLayer. For example, the display end can divide the first pixel value in hdrLayer by the preset threshold to obtain the first weight value of the first pixel value.
  • the first pixel value is any pixel value in hdrLayer.
  • the target weight includes the first weight value. ; Then adjust the brightness of the image to be processed according to the target weight to obtain the target image.
  • pow(1/gainBL,1/2.2) represents the pixel adjustment coefficient
  • pixelSrc represents any pixel value in the image to be processed
  • pixelLow represents the adjusted pixel value of any of the aforementioned pixel values
  • weight represents the target weight
  • pixelOut represents The target pixel value corresponding to any of the aforementioned pixel values.
  • All pixels in the image to be processed can be processed using the above method to obtain the target image.
  • the embodiment of the present application can also use hdrLayer as a guide picture or reference picture to obtain the correspondence between the pixel values in the image to be processed and the pixel values in the target image, and then obtain the corresponding relationship between the pixel values in the image to be processed according to the correspondence.
  • the pixel values are processed to obtain the target image.
  • embodiments of the present application can also use other methods to adjust the pixel values of some areas to obtain the target image, which is not specifically limited.
  • pixel adjustment based on hdrLayer can reduce the pixel brightness of the aforementioned partial areas to avoid glare; if If the backlight brightness of the electronic device is reduced, some areas of the image to be processed may be too dark, resulting in loss of details. Therefore, pixel adjustment based on hdrLayer can increase the pixel brightness of the aforementioned areas to avoid loss of details.
  • Step 1406 Display the target image under the target backlight brightness.
  • the target backlight brightness of the electronic device is obtained. Therefore, based on the backlight technology of the electronic device, the screen brightness of the electronic device can be adjusted so that it reaches the target backlight brightness, and then the screen adjusted in step 1405 is displayed at this brightness.
  • the target image can not only solve the problem of glare caused by excessive brightness in some areas of the image to be processed due to the increase in backlight brightness, but also solve the problem of lack of details caused by the reduction of backlight brightness and the partial area of the image to be processed that is too dark.
  • Figure 16 is a schematic diagram of the processing process of the display end.
  • the display end uses the method described in the above embodiment to obtain the image and metadata to be processed, and obtains hdrLayer through the three methods in step 1402. Obtain the backlight adjustment ratio through metadata, and then adjust the backlight brightness of the electronic device. Adjust the pixels in the image to be processed through hdrLayer. The two are combined to obtain the final target image, and then the target image is sent to display.
  • the target backlight brightness of the electronic device is obtained according to the initial backlight brightness of the electronic device, thereby adjusting the backlight brightness of the electronic device to fully utilize the brightness range of the screen for image display, and at the same time, distortion in the image to be processed due to brightness adjustment is area, combined with hdrLayer for pixel adjustment to obtain a target image suitable for the target backlight brightness, thereby solving the problem of image distortion, and then display the target image under the target backlight brightness.
  • the target backlight brightness and the target image are displayed together, achieving end-to-end Presenting the best effect experience.
  • FIG. 17 is an exemplary structural diagram of an image display device 1700 applied to an electronic device according to an embodiment of the present application. As shown in FIG. 17 , the image display device 1700 applied to an electronic device according to this embodiment can be applied to the decoding end 30 .
  • the image display device 1700 applied to electronic equipment may include: an acquisition module 1701, an adjustment module 1702 and a display module 1703. in,
  • Acquisition module 1701 is used to acquire the image to be processed; acquire highlight enhancement data, which includes a high dynamic range layer hdrLayer; acquire the initial backlight brightness of the electronic device; acquire the Electronic device target back Brightness; adjustment module 1702, used to adjust the brightness of the image to be processed according to the hdrLayer to obtain a target image suitable for the target backlight brightness; display module 1703, used to display the target image under the target backlight brightness. Describe the target image.
  • the acquisition module 1701 is specifically configured to receive a code stream and decode the code stream to obtain the hdrLayer.
  • the acquisition module 1701 is specifically configured to receive a code stream and decode the code stream to obtain N ⁇ M groups of parameters.
  • Each group of parameters includes k parameters.
  • the N ⁇ M group of parameters Corresponding to the N ⁇ M image blocks included in the image to be processed, N and M are both positive integers, N ⁇ M>1, k>1; the corresponding image blocks are respectively processed according to the N ⁇ M group of parameters. Process to get the hdrLayer.
  • the acquisition module 1701 is specifically configured to generate N ⁇ M groups of parameters, each group of parameters includes k parameters, and the N ⁇ M group of parameters is the same as the N ⁇ M group of parameters included in the image to be processed.
  • N and M are both positive integers, N ⁇ M>1, k>1; the corresponding image blocks are processed according to the N ⁇ M set of parameters to obtain the hdrLayer.
  • the acquisition module 1701 is specifically configured to process the initial backlight brightness according to a preset backlight adjustment ratio to obtain the target backlight brightness.
  • the highlight enhancement data also includes metadata; the acquisition module 1701 is specifically configured to obtain the backlight adjustment ratio according to the metadata; and adjust the initial backlight according to the backlight adjustment ratio.
  • the brightness is processed to obtain the target backlight brightness.
  • the adjustment module 1702 is specifically configured to obtain a target weight according to the hdrLayer; and adjust the brightness of the image to be processed according to the target weight to obtain the target image.
  • the adjustment module 1702 is specifically configured to divide the first pixel value in the hdrLayer by a preset threshold to obtain the first weight value of the first pixel value, and the first The pixel value is any pixel value in the hdrLayer, and the target weight includes the first weight value.
  • the adjustment module 1702 is specifically configured to obtain a pixel adjustment coefficient; obtain an adjusted image according to the pixel adjustment coefficient and the image to be processed; and obtain an adjusted image according to the image to be processed and the image to be processed.
  • the adjusted image and the target weights obtain the target image.
  • the N ⁇ M group of parameters is obtained through a machine learning model; or, the N ⁇ M group of parameters is obtained based on the histogram of the image to be processed.
  • the metadata includes the maximum brightness of the collection scene; the acquisition module 1701 is specifically configured to obtain a first ratio according to the maximum brightness of the collection scene, where the first ratio is the Collect the ratio of the brightness perception of the human eye and the white diffuse reflection perception of the scene; obtain the second ratio according to the first ratio, the second ratio is the ratio of the brightness perception of the human eye and the white diffuse reflection perception of the display end, the The second ratio is less than or equal to the first ratio; the backlight adjustment ratio is obtained according to the second ratio.
  • P1 represents the first ratio
  • Lmax represents the maximum brightness of the collection scene
  • P2 represents the second ratio
  • a represents the preset coefficient, a ⁇ 1.
  • the acquisition module 1701 is specifically configured to calculate the backlight adjustment ratio according to the following formula:
  • gainBL represents the backlight adjustment ratio
  • AmbientLum represents the ambient light intensity
  • the metadata is obtained by: decoding the code stream to obtain the metadata; or receiving the code stream, decoding the code stream to obtain the photographing parameters of the collection scene, and then based on the The metadata is obtained by calculating the photographing parameters.
  • the metadata also includes the minimum brightness of the collection scene and/or the dynamic range of the collection scene.
  • the device of this embodiment can be used to execute the technical solution of the method embodiment shown in Figure 14. Its implementation principles and technical effects are similar and will not be described again here.
  • FIG. 18 is an exemplary structural diagram of the encoding device 1800 according to the embodiment of the present application. As shown in FIG. 18 , the encoding device 1800 according to this embodiment can be applied to the encoding end 20 .
  • the encoding device 1800 may include: an acquisition module 1801 and an encoding module 1802. in,
  • the acquisition module 1801 is used to acquire the image to be processed; acquire metadata, the metadata includes the maximum brightness of the captured scene; the encoding module 1802 is used to encode the image to be processed and the metadata to obtain the first code flow.
  • the acquisition module 1801 is also used to generate N ⁇ M groups of parameters, each group of parameters includes k parameters, and the N ⁇ M group of parameters is the same as the N ⁇ M group of parameters included in the image to be processed.
  • N and M are both positive integers, N ⁇ M>1, k>1; the encoding module 1802 is also used to encode the N ⁇ M group of parameters to obtain the second code stream.
  • the acquisition module 1801 is also used to generate N ⁇ M groups of parameters, each group of parameters includes k parameters, and the N ⁇ M group of parameters is the same as the N ⁇ M group of parameters included in the image to be processed.
  • N and M are both positive integers, N ⁇ M>1, k>1; the corresponding image blocks are processed according to the N ⁇ M set of parameters to obtain the high dynamic range layer hdrLayer ;
  • the encoding module is also used to encode the hdrLayer to obtain the third code stream.
  • the N ⁇ M group of parameters is obtained through a machine learning model; or, the N ⁇ M group of parameters is obtained based on the histogram of the image to be processed.
  • the acquisition module 1801 is specifically configured to obtain the metadata based on long exposure pictures, normal exposure pictures, and short exposure pictures; or, calculate the metadata based on preset photography parameters. data.
  • the metadata also includes the minimum brightness of the collection scene and/or the dynamic range of the collection scene.
  • the device of this embodiment can be used to execute the technical solution of the method embodiment shown in Figure 5. Its implementation principles and technical effects are similar and will not be described again here.
  • this application also provides a computer-readable storage medium, including a computer program.
  • the computer program When the computer program is executed on a computer, it causes the computer to execute the technical solution of the method embodiment shown in Figure 5 or Figure 14.
  • the computer program product contains instructions, which are characterized in that when the instructions are run on a computer or a processor, the computer or the processor implements FIG. 5 or FIG.
  • This application also provides a code stream, which can be stored in a computer-readable storage medium or transmitted through electromagnetic waves and other signal forms.
  • the code stream includes encoded image data and metadata, and the metadata includes collection scenes. the maximum brightness.
  • the collection scene is the scene when the image before encoding is collected.
  • This application also provides a chip system, which is characterized in that the chip system includes a logic circuit and an input and output interface, wherein: the input and output interface is used to communicate with other communication devices outside the chip system, and the The logic circuit is used to execute the technical solution of the method embodiment shown in Figure 5 or Figure 14.
  • each step of the above method embodiment can be implemented in the form of hardware integrated logic circuit or software in the processor.
  • the instruction is completed.
  • the processor may be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other Programmed logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the methods disclosed in the embodiments of the present application can be directly implemented by a hardware encoding processor, or executed by a combination of hardware and software modules in the encoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
  • non-volatile memory may be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • RAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double data rate SDRAM double data rate SDRAM
  • DDR SDRAM double data rate SDRAM
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Processing (AREA)

Abstract

本申请提供一种应用于电子设备的图像显示方法、编码方法及相关装置。本申请应用于电子设备的图像显示方法,包括:获取待处理图像;获取高亮增强数据,所述高亮增强数据包括高动态范围图层hdrLayer;获取所述电子设备的初始背光亮度;根据所述初始背光亮度,获取所述电子设备的目标背光亮度;根据所述hdrLayer对所述待处理图像进行亮度调节以得到适用于所述目标背光亮度的目标图像;在所述目标背光亮度下显示所述目标图像。本申请可以充分利用屏幕的亮度范围进行图像显示,并实现端到端呈现出最佳的效果体验。

Description

应用于电子设备的图像显示方法、编码方法及相关装置
本申请要求于2022年07月15日提交中国专利局、申请号为202210831137.6、申请名称为“应用于电子设备的图像显示方法、编码方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术,尤其涉及一种应用于电子设备的图像显示方法、编码方法及相关装置。
背景技术
在图像采集显示系统中,通常采集端负责图像视频采集,记录场景内容,编码端负责图像的编码压缩,显示端负责解码重建得到图像,并根据环境光的强弱自适应的调整屏幕亮度(即自动背光技术),采集端和编码端可以是同一电子设备,也可以是不同的电子设备。手机、平板等电子设备作为显示端,基本上都有自动背光技术,其主要考虑的是屏幕亮度对人眼的舒适性。定义一个最佳舒适区间,包括舒适性上限(太亮会刺眼)和舒适性下限(太暗看不清),同时考虑屏幕功耗,亮度越大功耗越大,所以通常会根据舒适性下限值调整屏幕亮度。
目前电子设备的屏幕的峰值亮度可以达到1000nit甚至更高,但是,自动背光技术中,只使用了较低的屏幕亮度,例如,普通室内环境光下,手机背光亮度设置为100-200nit,还有很大的亮度范围未使用,没有充分利用屏幕的亮度范围来实现端到端呈现出最佳的效果体验。
发明内容
本申请提供一种应用于电子设备的图像显示方法、编码方法及相关装置,以充分利用屏幕的亮度范围进行图像显示,并实现端到端呈现出最佳的效果体验。
第一方面,本申请提供一种应用于电子设备的图像显示方法,包括:获取待处理图像;获取高亮增强数据,所述高亮增强数据包括高动态范围图层hdrLayer;获取所述电子设备的初始背光亮度;根据所述初始背光亮度获取所述电子设备的目标背光亮度;根据所述hdrLayer对所述待处理图像进行亮度调节以得到适用于所述目标背光亮度的目标图像;在所述目标背光亮度下显示所述目标图像。
本申请实施例,根据电子设备的初始背光亮度获取其目标背光亮度,从而对电子设备的背光亮度进行调节,以充分利用屏幕的亮度范围进行图像显示,同时对于待处理图像中由于亮度调节出现失真的区域,结合hdrLayer进行像素调整以得到适用于目标背光亮度的目标图像,从而解决图像失真的问题,再在目标背光亮度下显示目标图像,目标背光亮度和目标图像配合显示,实现了端到端呈现出最佳的效果体验。
上述电子设备可以是显示端电子设备(亦即视频解码器),其中,显示目标图像时可以由显示组件执行,该显示组件可以是集成于电子设备上的显示模块,例如,触摸屏,该显示组件也可以是独立于电子设备的显示器,例如,电子设备外接的显示器,电子设备投屏的智慧屏、幕布等,对此不做具体限定。
显示端接收来自采集端的码流,解码码流以得到待处理图像,显示端采用的解码方式与采集端采用的编码方式相对应,该解码方式可以包括标准的混合视频解码技术,端到端解码网络,基于机器学习模型的解码技术,等等,本申请实施例对待处理图像的解码方式不做具体限定。
在显示端,可以获取高动态范围图层(hdrLayer),hdrLayer可以是二维单通道8bit的图像,用于标记待处理图像中的高亮区域,hdrLayer的分辨率可以等于待处理图像的分辨率,hdrLayer的分辨率也可以小于或大于待处理图像的分辨率,本申请对此不做具体限定。或者,hdrLayer也可以呈现为二维数组、三维数组或其他维度的数组等任意可以存储多个参数的数据形式。本申请对hdrLayer的具体形式不做限定。
hdrLayer主要是辅助显示端对图像进行亮度调节,以适应人眼感知,因此显示端要获取hdrLayer可以采用以下三种方式:
一种方式是,接收码流,解码码流以得到hdrLayer。
本申请实施例中,由采集端生成hdrLayer,再将hdrLayer编码后得到的码流传输给显示端,而显示端只需要解码流即可恢复出hdrLayer,这样可以提高显示端的处理效率。
另一种方式是,接收码流,解码码流以得到N×M组参数,每组参数包括k个参数,N×M组参数与待处理图像包括的N×M个图像块对应;根据N×M组参数分别对对应的图像块进行处理以得到hdrLayer,N和M均为正整数,N×M>1,k>1。
本申请实施例中,采集端不直接生成hdrLayer,而是只获取用于生成hdrLayer的N×M组参数,然后对该N×M组参数编码,并将编码后得到的码流传输给显示端,显示端先解码流恢复出N×M组参数,再根据该N×M组参数生成hdrLayer,这样可以节省码流,提高传输效率。
又一种方式是,生成N×M组参数,每组参数包括k个参数,N×M组参数与待处理图像包括的N×M个图像块对应;根据N×M组参数分别对对应的图像块进行处理以得到hdrLayer,N和M均为正整数,N×M>1,k>1。
本申请实施例中,采集端既不需要生成hdrLayer,也不需要获取用于生成hdrLayer的N×M组参数,完全由显示端根据待处理图像生成N×M组参数,进而根据该N×M组参数生成hdrLayer。这样可以进一步节省码流,并提高传输效率。
在一种可能的实现方式中,可以将待处理图像(原图)分成N×M块,针对每个图像块获取k个参数,从而可以得到N×M组参数,共N×M×k个参数。每个图像块的k个参数可以表示成一维表。其中,N×M组参数通过机器学习模型得到;或者,N×M组参数基于待处理图像的直方图得到。
本申请实施例中,可以把原图缩放到一个较小的分辨率,例如,256×256。缩略图进入机器学习模型(例如神经网络),通过神经网络学习出N×M×k个参数,该神经网络可以包括局部分支和全局分支。对缩略图做卷积操作、下采样、通道数增多等处理。重复这些操作,例如,做4次(4次下采样),此时分辨率就变成16×16了。此后,进入局部分支,该局部分支分辨率维持16×16,但做一些卷积,不做下采样。进入全局分支,全局分支继续做下采样,直到变成1×1。再把局部分支的输出和全局分支的输出相加(分辨率16×16与分辨率1×1相加,会先把1×1的变成16×16,例如,重复拷贝),然后再做一些卷积,变成16×16×k,k这里可以取9、17等等,大概是2的n次方加1。最后输出N×M×k个参数。
需要说明的是,本申请实施例中还可以采用其他方式获取N×M组参数,对此不做具体限定。
在一种可能的实现方式中,将N×M组参数作用于待处理图像上,得到hdrLayer,该过程本质是一个插值的过程。
N×M×k个参数的数值范围是0~1,也可以认为是0~255。N和M是空间划分,将图像划分成了N×M个块,k是值域的划分,把值域划分成了k-1段,k个定点。但实际上,输入的数值是连续的,不会恰好是k个值,所以中间需要插值得到。空间上插值也是一样的。亦即,空域上是二维插值,可以称之为双线性插值,值域上是线性插值。
示例性的,空域上是N×M个块,为了保证块间的平滑,需要取邻近的四个块进行插值。值域上的k,输入的原图的亮度Y是连续的,而这个k是间隔的,所以中间也需要插值。值域的k,以范围是0~255为例,输出为255时,hdrLayer上就很亮,输出为0时,hdrLayer上就很暗,k个值就是0~255直接的数值。
需要说明的是,本申请实施例中还可以采用其他方式获取hdrLayer,对此不做具体限定。
在一种可能的实现方式中,高亮增强数据还包括元数据。元数据可以包括采集场景的动态范围、最大亮度、最小亮度等。显示端可以采用以下两种方式获取元数据:
一种方法是接收码流,解码流以得到元数据。
另一种方法是接收码流,解码流以得到采集场景的拍照参数,根据拍照参数计算得到元数据。
本申请实施例中,关于电子设备的初始背光亮度和目标背光亮度的获取方式如下:
电子设备具有背光技术,因此可以根据周围环境设置电子设备的初始背光亮度,可以参照相关背光技术,不再赘述。
为了得到较好的视觉体验,显示端可以结合元数据中与采集场景相关的亮度信息(例如,采集场景的动态范围、最大亮度、最小亮度等)对电子设备的背光进行调节,包括提升背光亮度或者降低背光亮度。相比于相关技术中,考虑到屏幕功耗而降低背光亮度的情况,可以提升背光亮度,以充分利用电子设备的屏幕的高动态范围(high dynamic range,HDR),因此电子设备的目标背光亮度高于电子设备的初始背光亮度。
示例性的,显示端可以采用以下两种方法获取电子设备的目标背光亮度:
一种方法是根据预设背光调节比例对初始背光亮度进行处理以得到目标背光亮度。
显示端可以基于历史记录、大数据分析、电子设备的屏幕属性等预先设定一个比例,例如,背光提升比例(用于提升背光亮度,目标背光亮度>初始背光亮度)或者背光降低比例(用于降低背光亮度,目标背光亮度<初始背光亮度)。显示端可以根据该预设背光调节比例对初始背光亮度进行处理,例如,将预设背光调节比例与初始背光亮度相乘以得到目标背光亮度。
需要说明的是,上文描述的方法不构成限定,本申请实施例对预设背光调节比例的设置方式,以及目标背光亮度的获取方式均不作具体限定。
另一种方法是根据元数据获取背光调节比例;根据背光调节比例对初始背光亮度进行处理以得到目标背光亮度。
与上一方法的区别在于背光调节比例不是预先设定,可以由显示端计算得到。背光调节比例也可以是背光提升比例(用于提升背光亮度,目标背光亮度>初始背光亮度)或者背光降低比例(用于降低背光亮度,目标背光亮度<初始背光亮度)。
本申请实施例中,显示端可以根据采集场景的最大亮度获取第一比例,该第一比例是采集场景的人眼的亮度感知与白色漫反射感知的比例;根据第一比例获取第二比例,第二比例是显示端的人眼的亮度感知与白色漫反射感知的比例,该第二比例小于或等于第一比例;根据第二比例获取背光调节比例。
在不同的白色漫反射下,人眼对亮度感知通常满足幂函数,如图15所示,per=lum1/γ 通常情况下,人眼在显示端的亮度感知小于在采集端的亮度感知,在白色漫反射相同的情况下,可以得到P2=a×P1,a表示预设系数,a≤1。而最理想的状态下,人眼在显示端的亮度感知与在采集端的亮度感知一样,在白色漫反射相同的情况下,可以得到P1=P2。
代入上述幂函数可以得到:
P1=L11/γs
其中,Lmax表示采集场景的最大亮度;

P2=(gainBL×L2)1/γd
其中,gainBL表示背光调节比例;
AmbientLum表示环境光强度;
根据P1和P2的等式关系可以得到背光调节比例:
最理想状态下,a=1。
显示端可以根据背光调节比例对初始背光亮度进行处理,例如,将背光调节比例与初始背光亮度相乘以得到目标背光亮度。
需要说明的是,本申请实施例还可以采用其他方法获取背光调节比例,对此不做具体限定。
上述过程中,以尽可能趋近于达到人眼在显示端的亮度感知等于在采集端的亮度感知的最理想状态为目的,计算得到了电子设备的目标背光亮度,并将电子设备的背光亮度调节为目标背光亮度,使得待处理图像在显示端的显示效果符合人眼对真实采集场景下的亮度感知。但是,待处理图像中有一些HDR区域,经过前述背光调节后可能会失真,例如,当目标背光亮度大于初始背光亮度时,就是提升电子设备的背光亮度,此时待处理图像中的HDR区域可能会比较刺眼。
为了保证人眼在显示端的亮度感知尽可能趋近于在采集端的亮度感知,可以对待处理图像进行像素处理,调整部分区域的像素值,使得该部分区域的亮度和背光调节之前相同,避免刺眼。
本申请实施例可以采用以下方法调整部分区域的像素值:显示端根据hdrLayer获取目标权重。例如,显示端可以将hdrLayer中第一像素值除以预设阈值以得到第一像素值的第一权重值,该第一像素值是hdrLayer中的任意一个像素值,目标权重包括第一权重值;再根据目标权重对待处理图像进行亮度调节以得到目标图像。
例如,上述过程可以表示为如下公式:
pixelLow=pow(1/gainBL,1/2.2)×pixelSrc;
weight=hdrLayer/255;
pixelOut=pixelSrc×weight+pixelLow×(1–weight)
其中,pow(1/gainBL,1/2.2)表示像素调整系数;pixelSrc表示待处理图像中的任意一个像素值;pixelLow表示前述任意一个像素值经调整后的像素值;weight表示目标权重;pixelOut表示前述任意一个像素值对应的目标像素值。
可选的,本申请实施例还可以将hdrLayer作为一个引导图片或参考图片,获取待处理图像中的像素值与目标图像中的像素值之间的对应关系,然后根据该对应关系对待处理图像的像素值进行处理,从而得到目标图像。
除此之外,本申请实施例还可以采用其他方法调整部分区域的像素值,获取目标图像,对此不做具体限定。
待处理图像中的所有像素都可以采用上述方法处理后,得到目标图像。可选的,当hdrLayer的分辨率小于或大于待处理图像的分辨率时,可以先对hdrLayer进行图像超分辨处理或者下采样处理,从而使得hdrLayer的分辨率和待处理图像的分辨率相等,然后再采用上述公式获取目标图像。
本申请实施例中,如果电子设备的背光亮度被提升,那么待处理图像的部分区域可能会过亮导致刺眼,因此基于hdrLayer的像素调整可以降低前述部分区域的像素亮度,从而避免刺眼;如果电子设备的背光亮度被降低,那么可能待处理图像的部分区域可能会过暗导致细节缺失,因此基于hdrLayer的像素调整可以提升前述部分区域的像素亮度,从而避免细节缺失。
第二方面,本申请提供一种编码方法,包括:获取待处理图像;获取元数据,所述元数据包括采集场景的最大亮度;对所述待处理图像和所述元数据进行编码以得到第一码流。
采集端使用任意采集设备,例如摄像机,针对同一场景,采集多帧不同曝光条件下的图片,例如,采集长曝光图片(L(long)帧)、正常曝光图片(N(normal)帧)和短曝光图片(S(short)帧),其中,L帧的曝光时间较长,这样场景中很暗的区域也能拍清楚,但是亮的区域会过曝;N帧是正常曝光帧,场景中的中等亮度的区域会很好,但是很亮的区域会过曝,很暗的区域又会看不清;S帧的曝光时间较短,这样场景中很亮的区域不会过曝,但是中等亮度和暗的区域会偏暗、看不清。对多帧图片(L帧、N帧和S帧)进行多帧融合,生成一张高比特(bit)的图片,该高bit的图片融合了L帧、N帧和S帧,可以具备多帧的优势并摒除多帧各自的劣势,例如,场景中很亮的区域不会过曝,中等亮 度的区域很好,很暗的区域也很清楚。再对高bit的图片经过动态范围压缩(dynamic range compress,DRC)等处理得到一张8bit的融合图片。本申请实施例中,上述8bit的融合图片即为待处理图像。
在图片经过DRC处理时,采集端可以获取采集场景的动态范围、最大亮度、最小亮度等元数据(metadata)信息。采集场景是采集端采集待处理图像(原图)时的场景,例如,采集场景为中午的室外、天黑后的室外、阴天的室外、有灯光的室内等等。
在一种可能的实现方式中,采集端可以根据上述L帧、N帧和S帧获取元数据。
在一种可能的实现方式中,采集端可以根据预先设定的拍照参数计算得到所述元数据。
采集端在获取待处理图像和元数据后,可以对二者进行编码以得到第一码流,其中,采集端对待处理图像进行编码所采用的的编码方式可以包括标准的混合视频编码技术,端到端编码网络,基于机器学习模型的编码技术,等等,本申请实施例对待处理图像的编码方式不做具体限定;对元数据可以将其编码进码流的保留字段中,例如JPG的appn字段。此外采集端还可以采用其他方法对元数据进行编码,对此不做具体限定。
本申请实施例可以获取高动态范围图层(hdrLayer),hdrLayer可以是二维单通道8bit的图像,用于标记待处理图像中的高亮区域,hdrLayer的分辨率可以等于待处理图像的分辨率,hdrLayer的分辨率也可以小于或大于待处理图像的分辨率,当hdrLayer的分辨率小于或大于待处理图像的分辨率时,显示端可以对hdrLayer进行图像超分辨处理或者下采样处理,从而和待处理图像匹配,这样可以减小存储空间,本申请实施例对此不做具体限定。或者,hdrLayer也可以呈现为二维数组、三维数组或其他维度的数组等任意可以存储多个参数的数据形式。本申请对hdrLayer的具体形式不做限定。
hdrLayer为灰度图,可以标记出原图高亮的区域,数值越大,表示原图亮度越大,因此对应原图像亮度较高的区域,hdrLayer呈现的较亮,对应原图像亮度较低的区域,hdrLayer呈现的较暗。
hdrLayer主要是辅助显示端对图像进行亮度调节,以适应人眼感知,因此显示端需要获取到hdrLayer,而为了配合显示端获取hdrLayer,采集端可以采用以下两种方式:
一种方式是,采集端生成N×M组参数,每组参数包括k个参数,N×M组参数与待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;对N×M组参数进行编码以得到第二码流。
上述N×M组参数的作用是生成hdrLayer,因此为了节省码流,采集端可以不直接生成hdrLayer,而是将用于生成hdrLayer的N×M组参数编码后得到的第二码流传输给显示端,再由显示端解码流恢复出N×M组参数,再根据该N×M组参数生成hdrLayer,这样可以提高传输效率。
采集端除了获取元数据,还可以生成N×M组参数,再将元数据、待处理图像和N×M组参数进行编码得到第一码流和第二码流,进而可以将第一码流和第二码流传输给显示端。需要说明的是,第一码流和第二码流可以先后串接合并为一个码流,也可以以其他预设方式融合为一个码流,还可以作为单独的码流逐一传输,对此不做具体限定。
另一种方式是,采集端生成N×M组参数,每组参数包括k个参数,N×M组参数与待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;根据N×M组参数分别对对应的图像块进行处理以得到高动态范围图层hdrLayer;对hdrLayer进行编码以得到第三码流。
采集端也可以根据N×M组参数生成hdrLayer,再将hdrLayer编码后得到的第三码流传输给显示端,再由显示端解码流恢复出hdrLayer,这样可以提高显示端的处理效率。
采集端除了获取元数据,还可以生成hdrLayer,再将元数据、待处理图像和hdrLayer进行编码得到第一码流和第三码流,进而可以将第一码流和第三码流传输给显示端。需要说明的是,第一码流和第三码流可以先后串接合并为一个码流,也可以以其他预设方式融合为一个码流,还可以作为单独的码流逐一传输,对此不做具体限定。
此外,采集端也可以除了传输第一码流外,将第二码流和第三码流均传输给显示端,此时,第一码流、第二码流和第三码流可以先后串接合并为一个码流,也可以以预设方式融合为一个码流,还可以作为单独的码流逐一传输,对此不做具体限定。
本申请实施例中,N×M组参数可以通过机器学习模型得到;或者,N×M组参数也可以基于待处理图像的直方图得到。N×M组参数的获取方式可以参照第一方面中的相关描述,此处不再赘述。
第三方面,本申请提供一种应用于电子设备的图像显示装置,包括:获取模块,用于获取待处理图像;获取高亮增强数据,所述高亮增强数据包括高动态范围图层hdrLayer;获取所述电子设备的初始背光亮度;根据所述初始背光亮度获取所述电子设备的目标背光亮度;调节模块,用于根据所述hdrLayer对所述待处理图像进行亮度调节以得到适用于所述目标背光亮度的目标图像;显示模块,用于在所述目标背光亮度下显示所述目标图像。
在一种可能的实现方式中,所述获取模块,具体用于接收码流,解码所述码流以得到所述hdrLayer。
在一种可能的实现方式中,所述获取模块,具体用于接收码流,解码所述码流以得到N×M组参数,每组参数包括k个参数,所述N×M组参数与所述待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;根据所述N×M组参数分别对对应的所述图像块进行处理以得到所述hdrLayer。
在一种可能的实现方式中,所述获取模块,具体用于生成N×M组参数,每组参数包括k个参数,所述N×M组参数与所述待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;根据所述N×M组参数分别对对应的所述图像块进行处理以得到所述hdrLayer。
在一种可能的实现方式中,所述获取模块,具体用于根据预设背光调节比例对所述初始背光亮度进行处理以得到所述目标背光亮度。
在一种可能的实现方式中,所述高亮增强数据还包括元数据;所述获取模块,具体用于根据所述元数据获取背光调节比例;根据所述背光调节比例对所述初始背光亮度进行处理以得到所述目标背光亮度。
在一种可能的实现方式中,所述调节模块,具体用于根据所述hdrLayer获取目标权重;根据所述目标权重对所述待处理图像进行亮度调节以得到所述目标图像。
在一种可能的实现方式中,所述调节模块,具体用于将所述hdrLayer中第一像素值除以预设阈值以得到所述第一像素值的第一权重值,所述第一像素值是所述hdrLayer中的任意一个像素值,所述目标权重包括所述第一权重值。
在一种可能的实现方式中,所述调节模块,具体用于获取像素调整系数;根据所述像素调整系数和所述待处理图像获取经调整的图像;根据所述待处理图像、所述经调整的图像和所述目标权重获取所述目标图像。
在一种可能的实现方式中,所述N×M组参数通过机器学习模型得到;或者,所述N×M组参数基于所述待处理图像的直方图得到。
在一种可能的实现方式中,所述元数据包括采集场景的最大亮度;所述获取模块,具体用于根据所述采集场景的最大亮度获取第一比例,所述第一比例是所述采集场景的人眼的亮度感知与白色漫反射感知的比例;根据所述第一比例获取第二比例,所述第二比例是显示端的人眼的亮度感知与白色漫反射感知的比例,所述第二比例小于或等于所述第一比例;根据所述第二比例获取所述背光调节比例。
在一种可能的实现方式中,所述获取模块,具体用于根据以下公式计算得到所述第一比例:
P1=L11/γs
其中,P1表示所述第一比例;
Lmax表示所述采集场景的最大亮度;
在一种可能的实现方式中,所述获取模块,具体用于根据以下公式计算得到所述第二比例:
P2=a×P1
其中,P2表示所述第二比例;
a表示预设系数,a≤1。
在一种可能的实现方式中,所述获取模块,具体用于根据以下公式计算得到所述背光调节比例:
其中,gainBL表示所述背光调节比例;
AmbientLum表示环境光强度;
在一种可能的实现方式中,所述元数据是通过以下方式获取的:解码码流以得到所述元数据;或者,接收码流,解码码流以得到采集场景的拍照参数,再根据所述拍照参数计算得到所述元数据。
在一种可能的实现方式中,所述元数据还包括所述采集场景的最小亮度和/或所述采集场景的动态范围。
第四方面,本申请提供一种编码装置,包括:获取模块,用于获取待处理图像;获取元数据,所述元数据包括采集场景的最大亮度;编码模块,用于对所述待处理图像和所述元数据进行编码以得到第一码流。
在一种可能的实现方式中,所述获取模块,还用于生成N×M组参数,每组参数包括k个参数,所述N×M组参数与所述待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;所述编码模块,还用于对所述N×M组参数进行编码以得到第二码流。
在一种可能的实现方式中,所述获取模块,还用于生成N×M组参数,每组参数包括k个参数,所述N×M组参数与所述待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;根据所述N×M组参数分别对对应的所述图像块进行处理以得到高动态范围图层hdrLayer;所述编码模块,还用于对所述hdrLayer进行编码以得到第三码流。
在一种可能的实现方式中,所述N×M组参数通过机器学习模型得到;或者,所述N×M组参数基于所述待处理图像的直方图得到。
在一种可能的实现方式中,所述获取模块,具体用于根据长曝光图片、正常曝光图片和短曝光图片获取所述元数据;或者,根据预先设定的拍照参数计算得到所述元数据。
在一种可能的实现方式中,所述元数据还包括所述采集场景的最小亮度和/或所述采集场景的动态范围。
第五方面,本申请提供一种解码器,包括:一个或多个处理器;存储器,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上述第一方面中任一项所述的方法。
第六方面,本申请提供一种编码器,包括:一个或多个处理器;存储器,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上述第二方面中任一项所述的方法。
第七方面,本申请提供一种计算机可读存储介质,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行上述第一至二方面中任一项所述的方法。
第八方面,本申请提供一种计算机程序产品,所述计算机程序产品中包含指令,其特征在于,当所述指令在计算机或处理器上运行时,使得所述计算机或所述处理器实现上述第一至二方面中任一项所述的方法。
第九方面,本申请提供一种码流,该码流可以存储在计算机可读存储介质中,或通过电磁波等信号形式进行传输,该码流中包括经编码的图像数据和元数据,元数据包括采集场景的最大亮度。采集场景为采集编码前的图像时的场景。
第十方面,本申请提供一种图像采集显示系统,包括采集端电子设备和显示端电子设备,采集端电子设备可以包括上述第六方面的编码器,显示端电子设备可以包括上述第五方面的解码器。
第十一方面,本申请提供一种芯片系统,其特征在于,所述芯片系统包括逻辑电路和输入输出接口,其中:所述输入输出接口用于与所述芯片系统之外的其他通信装置进行通信,所述逻辑电路用于 执行如上述第一至第二方面中任一项所述的方法。
第十二方面,本申请提供3.一种计算机可读存储介质,其上存储待解码的码流,或者存储有编码得到的码流,所述码流通过第二方面或第二方面的任一种实现方式的编码方法得到。
附图说明
图1为图像采集显示系统的示意图;
图2A为示例性的译码系统10的示意性框图;
图2B是视频译码系统40的实例的说明图;
图3为本发明实施例提供的视频译码设备300的示意图;
图4为示例性实施例提供的装置400的简化框图;
图5为本申请实施例的图像编码方法的过程500的流程图;
图6为图像多帧融合的示意图;
图7为采集端编码过程的示意图;
图8a和图8b为hdrLayer的示意图;
图9为采集端编码过程的示意图;
图10为采集端编码过程的示意图;
图11为N×M组参数的生成示意图;
图12为值域插值的示意图;
图13为空域插值的示意图;
图14为本申请实施例的图像处理方法的过程1400的流程图;
图15为人眼对亮度感知和白色漫反射的示意图;
图16为显示端的处理过程的示意图;
图17为本申请实施例图像处理装置1700的一个示例性的结构示意图;
图18为本申请实施例编码装置1800的一个示例性的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请中的附图,对本申请中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书实施例和权利要求书及附图中的术语“第一”、“第二”等仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
本申请实施例涉及机器学习模型的应用,为了便于理解,下面先对相关名词或术语进行解释说明:
1、神经网络
神经网络(neural network,NN)是机器学习模型,神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以为:
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是ReLU等非线性函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部感受野(local receptive field)相连,来提取局部感受野的特征,局部感受野可以是由若干个神经单元组成的区域。
2、多层感知器(multi-layer perception,MLP)
MLP是一种简单的深度神经网络(deep neural network,DNN)(不同层之间是全连接的),也称多层神经网络,可以理解为具有很多层隐含层的神经网络,这里的“很多”并没有特别的度量标准。从DNN按不同层的位置划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:其中,是输入向量,是输出向量,是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量经过如此简单的操作得到输出向量由于DNN层数多,则系数W和偏移向量的数量也就很多了。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。总结就是:第L-1层的第k个神经元到第L层的第j个神经元的系数定义为需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
3、卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。卷积神经网络包含了一个由卷积层和池化层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。
卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。卷积层可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵,即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的 特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的特征图的尺寸也相同,再将提取到的多个尺寸相同的特征图合并形成卷积运算的输出。这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络进行正确的预测。当卷积神经网络有多个卷积层的时候,初始的卷积层往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络深度的加深,越往后的卷积层提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。
在经过卷积层/池化层的处理后,卷积神经网络还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息),卷积神经网络需要利用神经网络层来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层中可以包括多层隐含层,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等。
可选的,在神经网络层中的多层隐含层之后,还包括整个卷积神经网络的输出层,该输出层具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络的前向传播完成,反向传播就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络的损失,及卷积神经网络通过输出层输出的结果和理想结果之间的误差。
4、循环神经网络
循环神经网络(recurrent neural networks,RNN)是用来处理序列数据的。在传统的神经网络模型中,是从输入层到隐含层再到输出层,层与层之间是全连接的,而对于每一层层内之间的各个节点是无连接的。这种普通的神经网络虽然解决了很多难题,但是却仍然对很多问题却无能无力。例如,你要预测句子的下一个单词是什么,一般需要用到前面的单词,因为一个句子中前后单词并不是独立的。RNN之所以称为循环神经网路,即一个序列当前的输出与前面的输出也有关。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐含层本层之间的节点不再无连接而是有连接的,并且隐含层的输入不仅包括输入层的输出还包括上一时刻隐含层的输出。理论上,RNN能够对任何长度的序列数据进行处理。对于RNN的训练和对传统的CNN或DNN的训练一样。同样使用误差反向传播算法,不过有一点区别:即,如果将RNN进行网络展开,那么其中的参数,如W,是共享的;而如上举例上述的传统神经网络却不是这样。并且在使用梯度下降算法中,每一步的输出不仅依赖当前步的网络,还依赖前面若干步网络的状态。该学习算法称为基于时间的反向传播算法(Back propagation Through Time,BPTT)。
既然已经有了卷积神经网络,为什么还要循环神经网络?原因很简单,在卷积神经网络中,有一个前提假设是:元素之间是相互独立的,输入与输出也是独立的,比如猫和狗。但现实世界中,很多元素都是相互连接的,比如股票随时间的变化,再比如一个人说了:我喜欢旅游,其中最喜欢的地方是云南,以后有机会一定要去。这里填空,人类应该都知道是填“云南”。因为人类会根据上下文的内容进行推断,但如何让机器做到这一步?RNN就应运而生了。RNN旨在让机器像人一样拥有记忆的能力。因此,RNN的输出就需要依赖当前的输入信息和历史的记忆信息。
5、损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层 神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
6、反向传播算法
卷积神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的超分辨率模型中参数的大小,使得超分辨率模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的超分辨率模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的超分辨率模型的参数,例如权重矩阵。
7、生成式对抗网络
生成式对抗网络(generative adversarial networks,GAN)是一种深度学习模型。该模型中至少包括两个模块:一个模块是生成模型(Generative Model),另一个模块是判别模型(Discriminative Model),通过这两个模块互相博弈学习,从而产生更好的输出。生成模型和判别模型都可以是神经网络,具体可以是深度神经网络,或者卷积神经网络。GAN的基本原理如下:以生成图片的GAN为例,假设有两个网络,G(Generator)和D(Discriminator),其中G是一个生成图片的网络,它接收一个随机的噪声z,通过这个噪声生成图片,记做G(z);D是一个判别网络,用于判别一张图片是不是“真实的”。它的输入参数是x,x代表一张图片,输出D(x)代表x为真实图片的概率,如果为1,就代表130%是真实的图片,如果为0,就代表不可能是真实的图片。在对该生成式对抗网络进行训练的过程中,生成网络G的目标就是尽可能生成真实的图片去欺骗判别网络D,而判别网络D的目标就是尽量把G生成的图片和真实的图片区分开来。这样,G和D就构成了一个动态的“博弈”过程,也即“生成式对抗网络”中的“对抗”。最后博弈的结果,在理想的状态下,G可以生成足以“以假乱真”的图片G(z),而D难以判定G生成的图片究竟是不是真实的,即D(G(z))=0.5。这样就得到了一个优异的生成模型G,它可以用来生成图片。
图1为图像采集显示系统的示意图,如图1所示,在图像采集显示系统中,通常采集端(例如,摄像机、照相机、监控摄像头等电子设备)负责图像视频采集,记录场景内容,并对图像进行编码压缩,显示端(例如,手机、平板、智慧屏等电子设备)负责解码重建得到图像,并根据环境光的强弱自适应的调整屏幕亮度(即自动背光技术)。
手机、平板、智慧屏等电子设备作为显示端,基本上都有自动背光技术,其主要考虑的是屏幕亮度对人眼的舒适性。定义一个最佳舒适区间,包括舒适性上限(太亮会刺眼)和舒适性下限(太暗看不清),同时考虑屏幕功耗,亮度越大功耗越大,所以通常会根据舒适性下限值调整屏幕亮度。目前电子设备的屏幕的峰值亮度可以达到1000nit甚至更高,但是,自动背光技术中,只使用了较低的屏幕亮度,例如,普通室内环境光下,手机背光亮度设置为100-200nit,还有很大的亮度范围未使用,没有充分利用屏幕的亮度范围进行图像显示来实现端到端呈现出最佳的效果体验。
需要说明的是,上述内容示例性的描述了图像采集显示系统的一种实施方式。可选的,采集端和显示端的部分或全部功能也可以集成于同一个电子设备上;或者,图像采集和图像编码可以由不同的电子设备实现,本申请实施例所提供的方法用于具有图像编码和/或解码功能的电子设备;或者,图像解码功能和显示功能也可以由不同的电子设备实现,例如投屏显示或外接显示屏等场景。总之,本申请实施例对使用场景不做具体限定。
为此,本申请实施例提供了一种图像编码和处理方法,以充分利用屏幕的亮度范围进行图像显示来实现端到端呈现出最佳的效果体验。
以图1所示的图像采集显示系统为例,采集端除了采集图像外,还具有编码图像的功能,显示端 在显示图像之前,要先解码重建图像,因此图像采集显示系统还可以看作一个译码系统。图2A为示例性的译码系统10的示意性框图。译码系统10中的视频编码器20(或简称为编码器20)和视频解码器30(或简称为解码器30)可用于执行本申请实施例中描述的各种示例的方案。
如图2A所示,译码系统10包括源设备12,源设备12用于将编码图像等编码图像数据21提供给用于对编码图像数据21进行解码的目的设备14。
源设备12包括编码器20,另外即可选地,可包括图像源16、图像预处理器等预处理器(或预处理单元)18、通信接口(或通信单元)22。
图像源16可包括或可以为任意类型的用于捕获现实世界图像等的图像捕获设备,和/或任意类型的图像生成设备,例如用于生成计算机动画图像的计算机图形处理器或任意类型的用于获取和/或提供现实世界图像、计算机生成图像(例如,屏幕内容、虚拟现实(virtual reality,VR)图像和/或其任意组合(例如增强现实(augmented reality,AR)图像)的设备。所述图像源可以为存储上述图像中的任意图像的任意类型的内存或存储器。
为了区分预处理器(或预处理单元)18执行的处理,图像(或图像数据)17也可称为原始图像(或原始图像数据)17。
预处理器18用于接收(原始)图像数据17,并对图像数据17进行预处理,得到预处理图像(或预处理图像数据)19。例如,预处理器18执行的预处理可包括修剪、颜色格式转换(例如从RGB转换为YCbCr)、调色或去噪。可以理解的是,预处理单元18可以为可选组件。
视频编码器(或编码器)20用于接收预处理图像数据19并提供编码图像数据21(下面将根据图3等进一步描述)。
源设备12中的通信接口22可用于:接收编码图像数据21并通过通信信道13向目的设备14等另一设备或任何其它设备发送编码图像数据21(或其它任意处理后的版本),以便存储或直接重建。
目的设备14包括解码器30,另外即可选地,可包括通信接口(或通信单元)28、后处理器(或后处理单元)32和显示设备34。
目的设备14中的通信接口28用于直接从源设备12或从存储设备等任意其它源设备接收编码图像数据21(或其它任意处理后的版本),例如,存储设备为编码图像数据存储设备,并将编码图像数据21提供给解码器30。
通信接口22和通信接口28可用于通过源设备12与目的设备14之间的直连通信链路,例如直接有线或无线连接等,或者通过任意类型的网络,例如有线网络、无线网络或其任意组合、任意类型的私网和公网或其任意类型的组合,发送或接收编码图像数据(或编码数据)21。
例如,通信接口22可用于将编码图像数据21封装为报文等合适的格式,和/或使用任意类型的传输编码或处理来处理所述编码后的图像数据,以便在通信链路或通信网络上进行传输。
通信接口28与通信接口22对应,例如,可用于接收传输数据,并使用任意类型的对应传输解码或处理和/或解封装对传输数据进行处理,得到编码图像数据21。
通信接口22和通信接口28均可配置为如图2A中从源设备12指向目的设备14的对应通信信道13的箭头所指示的单向通信接口,或双向通信接口,并且可用于发送和接收消息等,以建立连接,确认并交换与通信链路和/或例如编码后的图像数据传输等数据传输相关的任何其它信息,等等。
视频解码器(或解码器)30用于接收编码图像数据21并提供解码图像数据(或解码图像数据)31(下面将根据图4等进一步描述)。
后处理器32用于对解码后的图像等解码图像数据31(也称为重建后的图像数据)进行后处理,得到后处理后的图像等后处理图像数据33。后处理单元32执行的后处理可以包括例如颜色格式转换(例如从YCbCr转换为RGB)、调色、修剪或重采样,或者用于产生供显示设备34等显示的解码图像数据31等任何其它处理。
显示设备34用于接收后处理图像数据33,以向用户或观看者等显示图像。显示设备34可以为或包括任意类型的用于表示重建后图像的显示器,例如,集成或外部显示屏或显示器。例如,显示屏可包括液晶显示器(liquid crystal display,LCD)、有机发光二极管(organic light emitting diode,OLED)显示器、等离子显示器、投影仪、微型LED显示器、硅基液晶显示器(liquid crystal on silicon,LCoS)、数字光处理器(digital light processor,DLP)或任意类型的其它显示屏。
译码系统10还包括训练引擎25,训练引擎25用于训练编码器20或解码器30,尤其是编码器20或解码器30中用到的神经网络(下文将详细描述)。
本申请实施例中训练数据可以存入数据库(未示意)中,训练引擎25基于训练数据训练得到神经网络。需要说明的是,本申请实施例对于训练数据的来源不做限定,例如可以是从云端或其他地方获取训练数据进行模型训练。
尽管图2A示出了源设备12和目的设备14作为独立的设备,但设备实施例也可以同时包括源设备12和目的设备14或同时包括源设备12和目的设备14的功能,即同时包括源设备12或对应功能和目的设备14或对应功能。在这些实施例中,源设备12或对应功能和目的设备14或对应功能可以使用相同硬件和/或软件或通过单独的硬件和/或软件或其任意组合来实现。
根据描述,图2A所示的源设备12和/或目的设备14中的不同单元或功能的存在和(准确)划分可能根据实际设备和应用而有所不同,这对技术人员来说是显而易见的。
编码器20(例如视频编码器20)或解码器30(例如视频解码器30)或两者都可通过如图2B所示的处理电路实现,例如一个或多个微处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、离散逻辑、硬件、视频编码专用处理器或其任意组合。编码器20和解码器30分别可以通过处理电路46实现。所述处理电路46可用于执行下文论述的各种操作。如果部分技术在软件中实施,则设备可以将软件的指令存储在合适的非瞬时性计算机可读存储介质中,并且使用一个或多个处理器在硬件中执行指令,从而执行本申请技术。编码器20和解码器30中的其中一个可作为组合编解码器(encoder/decoder,CODEC)的一部分集成在单个设备中,如图2B所示。
源设备12和目的设备14可包括各种设备中的任一种,包括任意类型的手持设备或固定设备,例如,笔记本电脑或膝上型电脑、智能手机、平板或平板电脑、相机、台式计算机、机顶盒、电视机、显示设备、数字媒体播放器、视频游戏控制台、视频流设备(例如,内容业务服务器或内容分发服务器),等等,并可以不使用或使用任意类型的操作系统。在一些情况下,源设备12和目的设备14可配备用于无线通信的组件。因此,源设备12和目的设备14可以是无线通信设备。
在一些情况下,图2A所示的译码系统10仅仅是示例性的,本申请提供的技术可适用于视频译码设备(例如,视频编码或视频解码),这些设备不一定包括编码设备与解码设备之间的任何数据通信。在其它示例中,数据从本地存储器中检索,通过网络发送,等等。视频编码设备可以对数据进行编码并将数据存储到存储器中,和/或视频解码设备可以从存储器中检索数据并对数据进行解码。在一些示例中,编码和解码由相互不通信而只是编码数据到存储器和/或从存储器中检索并解码数据的设备来执行。
图2B是视频译码系统40的实例的说明图。视频译码系统40可以包含成像设备41、视频编码器20、视频解码器30(和/或藉由处理电路46实施的视频编/解码器)、天线42、一个或多个处理器43、一个或多个内存存储器44和/或显示设备45。
如图2B所示,成像设备41、天线42、处理电路46、视频编码器20、视频解码器30、处理器43、内存存储器44和/或显示设备45能够互相通信。在不同实例中,视频译码系统40可以只包含视频编码器20或只包含视频解码器30。
在一些实例中,天线42可以用于传输或接收视频数据的经编码比特流。另外,在一些实例中,显示设备45可以用于呈现视频数据。处理电路46可以包含专用集成电路(application-specific integrated circuit,ASIC)逻辑、图形处理器、通用处理器等。视频译码系统40也可以包含可选的处理器43,该可选处理器43类似地可以包含专用集成电路(application-specific integrated circuit,ASIC)逻辑、图形处理器、通用处理器等。另外,内存存储器44可以是任何类型的存储器,例如易失性存储器(例如,静态随机存取存储器(static random access memory,SRAM)、动态随机存储器(dynamic random access memory,DRAM)等)或非易失性存储器(例如,闪存等)等。在非限制性实例中,内存存储器44可以由超速缓存内存实施。在其它实例中,处理电路46可以包含存储器(例如,缓存等)用于实施图像缓冲器等。
在一些实例中,通过逻辑电路实施的视频编码器20可以包含(例如,通过处理电路46或内存存储器44实施的)图像缓冲器和(例如,通过处理电路46实施的)图形处理单元。图形处理单元可以通信耦合至图像缓冲器。图形处理单元可以包含通过处理电路46实施的视频编码器20。逻辑电路可以 用于执行本文所论述的各种操作。
在一些实例中,视频解码器30可以以类似方式通过处理电路46实施,以实施参照图4的视频解码器30和/或本文中所描述的任何其它解码器系统或子系统所论述的各种模块。在一些实例中,逻辑电路实施的视频解码器30可以包含(通过处理电路46或内存存储器44实施的)图像缓冲器和(例如,通过处理电路46实施的)图形处理单元。图形处理单元可以通信耦合至图像缓冲器。图形处理单元可以包含通过处理电路46实施的视频解码器30。
在一些实例中,天线42可以用于接收视频数据的经编码比特流。如所论述,经编码比特流可以包含本文所论述的与编码视频帧相关的数据、指示符、索引值、模式选择数据等,例如与编码分割相关的数据(例如,变换系数或经量化变换系数,(如所论述的)可选指示符,和/或定义编码分割的数据)。视频译码系统40还可包含耦合至天线42并用于解码经编码比特流的视频解码器30。显示设备45用于呈现视频帧。
应理解,本申请实施例中对于参考视频编码器20所描述的实例,视频解码器30可以用于执行相反过程。关于信令语法元素,视频解码器30可以用于接收并解析这种语法元素,相应地解码相关视频数据。在一些例子中,视频编码器20可以将语法元素熵编码成经编码视频比特流。在此类实例中,视频解码器30可以解析这种语法元素,并相应地解码相关视频数据。
为便于描述,参考通用视频译码(Versatile video coding,VVC)参考软件或由ITU-T视频译码专家组(Video Coding Experts Group,VCEG)和ISO/IEC运动图像专家组(Motion Picture Experts Group,MPEG)的视频译码联合工作组(Joint Collaboration Team on Video Coding,JCT-VC)开发的高性能视频译码(High-Efficiency Video Coding,HEVC)描述本发明实施例。本领域普通技术人员理解本发明实施例不限于HEVC或VVC。
图3为本发明实施例提供的视频译码设备300的示意图。视频译码设备300适用于实现本文描述的公开实施例。在一个实施例中,视频译码设备300可以是解码器,例如图2A中的视频解码器30,也可以是编码器,例如图2A中的视频编码器20。
视频译码设备300包括:用于接收数据的入端口310(或输入端口310)和接收单元(receiver unit,Rx)320;用于处理数据的处理器、逻辑单元或中央处理器(central processing unit,CPU)330;例如,这里的处理器330可以是神经网络处理器330;用于传输数据的发送单元(transmitter unit,Tx)340和出端口350(或输出端口350);用于存储数据的存储器360。视频译码设备300还可包括耦合到入端口310、接收单元320、发送单元340和出端口350的光电(optical-to-electrical,OE)组件和电光(electrical-to-optical,EO)组件,用于光信号或电信号的出口或入口。
处理器330通过硬件和软件实现。处理器330可实现为一个或多个处理器芯片、核(例如,多核处理器)、FPGA、ASIC和DSP。处理器330与入端口310、接收单元320、发送单元340、出端口350和存储器360通信。处理器330包括译码模块370(例如,基于神经网络NN的译码模块370)。译码模块370实施上文所公开的实施例。例如,译码模块370执行、处理、准备或提供各种编码操作。因此,通过译码模块370为视频译码设备300的功能提供了实质性的改进,并且影响了视频译码设备300到不同状态的切换。或者,以存储在存储器360中并由处理器330执行的指令来实现译码模块370。
存储器360包括一个或多个磁盘、磁带机和固态硬盘,可以用作溢出数据存储设备,用于在选择执行程序时存储此类程序,并且存储在程序执行过程中读取的指令和数据。存储器360可以是易失性和/或非易失性的,可以是只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、三态内容寻址存储器(ternary content-addressable memory,TCAM)和/或静态随机存取存储器(static random-access memory,SRAM)。
图4为示例性实施例提供的装置400的简化框图,装置400可用作图2A中的源设备12和目的设备14中的任一个或两个。
装置400中的处理器402可以是中央处理器。或者,处理器402可以是现有的或今后将研发出的能够操控或处理信息的任何其它类型设备或多个设备。虽然可以使用如图所示的处理器402等单个处理器来实施已公开的实现方式,但使用一个以上的处理器速度更快和效率更高。
在一种实现方式中,装置400中的存储器404可以是只读存储器(ROM)设备或随机存取存储器(RAM)设备。任何其它合适类型的存储设备都可以用作存储器404。存储器404可以包括处理器402 通过总线412访问的代码和数据406。存储器404还可包括操作系统408和应用程序410,应用程序410包括允许处理器402执行本文所述方法的至少一个程序。例如,应用程序410可以包括应用1至N,还包括执行本文所述方法的视频译码应用。
装置400还可以包括一个或多个输出设备,例如显示器418。在一个示例中,显示器418可以是将显示器与可用于感测触摸输入的触敏元件组合的触敏显示器。显示器418可以通过总线412耦合到处理器402。
虽然装置400中的总线412在本文中描述为单个总线,但是总线412可以包括多个总线。此外,辅助储存器可以直接耦合到装置400的其它组件或通过网络访问,并且可以包括存储卡等单个集成单元或多个存储卡等多个单元。因此,装置400可以具有各种各样的配置。
基于上述系统,下文将对本申请实施例提供的方法进行详细说明。
图5为本申请实施例的图像编码方法的过程500的流程图。过程500可由上文的采集端电子设备(亦即视频编码器20)执行。过程500描述为一系列的步骤或操作,应当理解的是,过程500可以以各种顺序执行和/或同时发生,不限于图5所示的执行顺序。过程500可以包括:
步骤501、获取待处理图像。
图6为图像多帧融合的示意图,如图6所示,采集端使用任意采集设备,例如摄像机,针对同一场景,采集多帧不同曝光条件下的图片,例如,采集长曝光图片(L(long)帧)、正常曝光图片(N(normal)帧)和短曝光图片(S(short)帧),其中,L帧的曝光时间较长,这样场景中很暗的区域也能拍清楚,但是亮的区域会过曝;N帧是正常曝光帧,场景中的中等亮度的区域会很好,但是很亮的区域会过曝,很暗的区域又会看不清;S帧的曝光时间较短,这样场景中很亮的区域不会过曝,但是中等亮度和暗的区域会偏暗、看不清。对多帧图片(L帧、N帧和S帧)进行多帧融合,生成一张高比特(bit)的图片,该高bit的图片融合了L帧、N帧和S帧,可以具备多帧的优势并摒除多帧各自的劣势,例如,场景中很亮的区域不会过曝,中等亮度的区域很好,很暗的区域也很清楚。再对高bit的图片经过动态范围压缩(dynamic range compress,DRC)等处理得到一张8bit的融合图片。
本申请实施例中,上述8bit的融合图片即为待处理图像。
步骤502、获取元数据,该元数据包括采集场景的最大亮度。
在图片经过DRC处理时,采集端可以获取采集场景的动态范围、最大亮度、最小亮度等元数据(metadata)信息。采集场景是采集端采集待处理图像(原图)时的场景,例如,采集场景为中午的室外、天黑后的室外、阴天的室外、有灯光的室内等等。
在一种可能的实现方式中,采集端可以根据上述L帧、N帧和S帧获取元数据。
如上所述,把L帧,N帧和S帧三帧进行融合,既可以使得高亮区亮度不丢失,也可以使得很暗区能看得见,可见多帧融合可以得到高动态范围的相关信息,进而生成元数据。
在一种可能的实现方式中,采集端可以根据预先设定的拍照参数计算得到所述元数据。
示例性的,采集端选取一个基准,例如,真实的采集场景的最大亮度为baseLum(255对应的亮度),标记出待处理图像中各个像素值对应的亮度,存为lumLUT[256],亦即,真实场景的亮度和图片像素值的一一对应关系,图像像素值的范围为0~255,共256个值,选取的基准就是让各个像素值和场景真实亮度值一一对应,用minval标记像素的图像灰阶,对应的感光度(ISO)为baseISO,曝光时间为baseExp。实拍时,N帧的ISO为curISO,曝光时间为curExp,S帧降电子伏特(electron volt,EV)对应的收益为Dgain,亦即,S帧是通过降EV实现的,EV有不同大小,不同大小对应不同的Dgain,如果没有S帧,Dgain为1。则采集场景的最大亮度maxLumScene和最小量度minLumScene可以采用以下方法计算:
maxLumScene=(baseISO/curISO)Dgain×(curExp/baseExp)×baseLum
minLumScene=(baseISO/curISO)Dgain×(curExp/baseExp)×lumLUT[minval]
需要说明的是,本申请实施例还可以采用其他方法获取采集场景的最大亮度和最小量度,对此不 做具体限定。
步骤503、对待处理图像和元数据进行编码以得到第一码流。
采集端在获取待处理图像和元数据后,可以对二者进行编码以得到第一码流,其中,采集端对待处理图像进行编码所采用的的编码方式可以包括标准的混合视频编码技术,端到端编码网络,基于机器学习模型的编码技术,等等,本申请实施例对待处理图像的编码方式不做具体限定;对元数据可以将其编码进码流的保留字段中,例如JPG的appn字段。此外采集端还可以采用其他方法对元数据进行编码,对此不做具体限定。
图7为采集端编码过程的示意图,如图7所示,采集端可以在图片经过DRC处理时,采用步骤502的方法获取元数据,再采用步骤503的方法对待处理图像和元数据进行编码得到第一码流,进而可以将第一码流传输给显示端。
在一种可能的实现方式中,本申请实施例可以获取一张高动态范围图层(hdrLayer),hdrLayer可以是一张二维单通道8bit的图像,用于标记待处理图像中的高亮区域,hdrLayer的分辨率可以等于待处理图像的分辨率,hdrLayer的分辨率也可以小于或大于待处理图像的分辨率,当hdrLayer的分辨率小于或大于待处理图像的分辨率时,显示端可以对hdrLayer进行图像超分辨处理或下采样处理,从而和待处理图像匹配,这样可以减小存储空间,本申请实施例对此不做具体限定。或者,hdrLayer也可以呈现为二维数组、三维数组或其他维度的数组等任意可以存储多个参数的数据形式。本申请对hdrLayer的具体形式不做限定。图8a和图8b为hdrLayer的示意图,如图8a和图8b所示,hdrLayer为灰度图,可以标记出原图高亮的区域,数值越大,表示原图亮度越大,因此对应原图像亮度较高的区域,hdrLayer呈现的较亮,对应原图像亮度较低的区域,hdrLayer呈现的较暗。
hdrLayer主要是辅助显示端对图像进行亮度调节,以适应人眼感知,因此显示端需要获取到hdrLayer,而为了配合显示端获取hdrLayer,采集端可以采用以下两种方式:
一种方式是,采集端生成N×M组参数,每组参数包括k个参数,N×M组参数与待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;对N×M组参数进行编码以得到第二码流。
上述N×M组参数的作用是生成hdrLayer,因此为了节省码流,采集端可以不直接生成hdrLayer,而是将用于生成hdrLayer的N×M组参数编码后得到的第二码流传输给显示端,再由显示端解码流恢复出N×M组参数,再根据该N×M组参数生成hdrLayer,这样可以提高传输效率。
图9为采集端编码过程的示意图,如图9所示,采集端除了获取元数据,还可以生成N×M组参数,再将元数据、待处理图像和N×M组参数进行编码得到第一码流和第二码流,进而可以将第一码流和第二码流传输给显示端。需要说明的是,第一码流和第二码流可以先后串接合并为一个码流,也可以以预设方式融合为一个码流,还可以作为单独的码流逐一传输,对此不做具体限定。
另一种方式是,采集端生成N×M组参数,每组参数包括k个参数,N×M组参数与待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;根据N×M组参数分别对对应的图像块进行处理以得到高动态范围图层hdrLayer;对hdrLayer进行编码以得到第三码流。
采集端也可以根据N×M组参数生成hdrLayer,再将hdrLayer编码后得到的第三码流传输给显示端,再由显示端解码流恢复出hdrLayer,这样可以提高显示端的处理效率。
图10为采集端编码过程的示意图,如图10所示,采集端除了获取元数据,还可以生成hdrLayer,再将元数据、待处理图像和hdrLayer进行编码得到第一码流和第三码流,进而可以将第一码流和第三码流传输给显示端。需要说明的是,第一码流和第三码流可以先后串接合并为一个码流,也可以以预设方式融合为一个码流,还可以作为单独的码流逐一传输,对此不做具体限定。
此外,采集端也可以除了传输第一码流外,将第二码流和第三码流均传输给显示端,此时,第一码流、第二码流和第三码流可以先后串接合并为一个码流,也可以以预设方式融合为一个码流,还可以作为单独的码流逐一传输,对此不做具体限定。
本申请实施例中,N×M组参数可以通过机器学习模型(机器学习模型可以参照上文描述,此处不 再赘述)得到;或者,N×M组参数也可以基于待处理图像的直方图得到。
示例性的,图11为N×M组参数的生成示意图,如图11所示,采集端将待处理图像(原图)分成N×M块,每块输出k个参数,总共可以获取到N×M×k个参数。
本申请实施例中,可以把原图缩放到一个较小的分辨率,例如,256×256。缩略图进入机器学习模型(例如网络),通过网络学习出N×M×k个参数,该网络可以包括局部分支和全局分支。对缩略图做卷积操作、下采样、通道数增多等处理。重复这些操作,例如,做4次(4次下采样),此时分辨率就变成16×16了。此后,进入局部分支,该局部分支分辨率维持16×16,但做一些卷积,不做下采样。进入全局分支,全局分支继续做下采样,直到变成1×1。再把局部分支的输出和全局分支的输出相加(分辨率16×16与分辨率1×1相加,会先把1×1的变成16×16,例如,重复拷贝),然后再做一些卷积,变成16×16×k,k这里可以取9,17等等,大概是2的n次方加1。最后输出N×M×k个参数。
另外,根据N×M组参数生成hdrLayer本质是一个插值的过程,N×M×k个参数的数值范围是0~1,也可以认为是0~255。N和M是空间划分,将图像划分成了N×M个块,k是值域的划分,把值域划分成了k-1段,k个定点。但实际上,输入的数值是连续的,不会恰好是k个值,所以中间需要插值得到。空间上插值也是一样的。亦即,空域上是二维插值,可以称之为双线性插值,值域上是线性插值。
示例性的,空域上是N×M个块,为了保证块间的平滑,需要取邻近的四个块进行插值。值域上的k,输入的原图的亮度Y是连续的,而这个k是间隔的,所以中间也需要插值。值域的k,以范围是0~255为例,输出为255时,hdrLayer上就很亮,输出为0时,hdrLayer上就很暗,k个值就是0~255直接的数值。图12为值域插值的示意图,如图12所示,在横坐标BinN和BinN+1之间插值得到Vi,再从曲线上找到与Vi对应的纵轴坐标Gam[Vi]。图13为空域插值的示意图,如图13所示,取邻近4个块,P点离哪个块越近,哪个块的权重就越大。
图14为本申请实施例的应用于电子设备的图像显示方法的过程1400的流程图。过程1400可由上文的显示端电子设备(亦即视频解码器30)执行,其中,显示目标图像时可以由显示组件执行,该显示组件可以是集成于电子设备上的显示模块,例如,触摸屏,该显示组件也可以是独立于电子设备的显示器,例如,电子设备外接的显示器,电子设备投屏的智慧屏、幕布等,对此不做具体限定。过程1400描述为一系列的步骤或操作,应当理解的是,过程1400可以以各种顺序执行和/或同时发生,不限于图14所示的执行顺序。过程1400可以包括:
步骤1401、获取待处理图像。
显示端接收来自采集端的码流,解码码流以得到待处理图像,可以采用上文的混合解码方式,也可以采用端到端解码方式,还可以采用基于机器学习模型的解码方式,等等,本申请实施例对待处理图像的解码方式不做具体限定。
步骤1402、获取高亮增强数据,高亮增强数据包括hdrLayer。
在显示端,可以获取一张高动态范围图层(hdrLayer),hdrLayer可以是一张二维单通道8bit的图像,用于标记待处理图像中的高亮区域,hdrLayer的分辨率可以等于待处理图像的分辨率,hdrLayer的分辨率也可以小于待处理图像的分辨率,对此不做具体限定。
hdrLayer主要是辅助显示端对图像进行亮度调节,以适应人眼感知,因此显示端要获取hdrLayer可以采用以下三种方式:
一种方式是,接收码流,解码码流以得到hdrLayer。
与上文图9所示实施例中采集端采用的方式相对应,采集端生成hdrLayer,将其编码为码流传输给显示端。相应的,显示端可以接收码流,解码流直接得到hdrLayer。这样可以提高显示端的处理效率。
另一种方式是,接收码流,解码码流以得到N×M组参数,每组参数包括k个参数,N×M组参数与待处理图像包括的N×M个图像块对应;根据N×M组参数分别对对应的图像块进行处理以得到hdrLayer。
与上文图8所示实施例中采集端采用的方式相对应,采集端生成N×M组参数,将其编码为码流传输给显示端。相应的,显示端可以接收码流,解码流先得到N×M组参数,再根据N×M组参数获取hdrLayer。这样可以节省码流,并提高传输效率。
又一种方式是,生成N×M组参数,每组参数包括k个参数,N×M组参数与待处理图像包括的N×M个图像块对应;根据N×M组参数分别对对应的图像块进行处理以得到hdrLayer。
本申请实施例中,采集端可以不针对hdrLayer做处理,即既不需要生成hdrLayer,又不需要生成N×M组参数,完全由显示端根据待处理图像生成N×M组参数,进而获取hdrLayer。这样可以节省码流,并提高传输效率。
显示端可以将待处理图像包括的N×M个图像块,针对每个图像块获取k个参数,从而可以得到N×M组参数。每个图像块的k个参数可以表示成一维表。将N×M组参数作用待处理图像上,得到最终的hdrLayer,该过程可以参照上文描述,此处不再赘述。
其中,N×M组参数通过机器学习模型得到;或者,N×M组参数基于待处理图像的直方图得到,可以参照上文描述,此处不再赘述。
需要说明的是,本申请实施例中还可以采用其他方式获取hdrLayer,对此不做具体限定。
在一种可能的实现方式中,高亮增强数据还包括元数据。元数据可以包括采集场景的动态范围、最大亮度、最小亮度等。显示端可以采用以下两种方式获取元数据:
一种方法是接收码流,解码流以得到元数据。
本申请实施例中,采集端可以在获取元数据后,将元数据和待处理图像编码以得到码流,再将码流传输给显示端。相应的,显示端可以接收并解码码流,从而直接获取到元数据。
另一种方法是接收码流,解码流以得到采集场景的拍照参数,根据拍照参数计算得到元数据。
本申请实施例中,采集端可以将获取元数据所需的采集场景的拍照参数编入码流传输给显示端,这样显示端接收并解码码流得到的就是采集场景的拍照参数,然后再根据该拍照参数获取元数据。
示例性的,采集场景的拍照参数可以包括:真实的采集场景的最大亮度baseLum(255对应的亮度),标记出待处理图像中各个像素值对应的亮度,存为lumLUT[256],亦即,真实场景的亮度和图片像素值的一一对应关系,图像像素值的范围为0~255,共256个值,选取的基准就是让各个像素值和场景真实亮度值一一对应,用minval标记像素的图像灰阶,对应的感光度(ISO)baseISO,曝光时间baseExp。实拍时,N帧的ISO curISO,曝光时间curExp,S帧降电子伏特(electron volt,EV)对应的收益Dgain,亦即,S帧是通过降EV实现的,EV有不同大小,不同大小对应不同的Dgain,如果没有S帧,Dgain为1。因此,显示端可以采用以下方法计算采集场景的最大亮度maxLumScene和最小量度minLumScene:
maxLumScene=(baseISO/curISO)Dgain×(curExp/baseExp)×baseLum
minLumScene=(baseISO/curISO)Dgain×(curExp/baseExp)×limLUT[minval]
需要说明的是,本申请实施例还可以采用其他方法获取采集场景的最大亮度和最小量度,对此不做具体限定。
步骤1403、获取电子设备的初始背光亮度。
电子设备具有背光技术,因此可以根据周围环境设置电子设备的初始背光亮度,可以参照相关背光技术,不再赘述。
步骤1404、根据初始背光亮度获取电子设备的目标背光亮度。
为了得到较好的视觉体验,显示端可以结合元数据中与采集场景相关的亮度信息(例如,采集场景的动态范围、最大亮度、最小亮度等)对电子设备的背光进行调节,包括提升背光亮度或者降低背光亮度。相比于相关技术中,考虑到屏幕功耗而降低背光亮度的情况,可以提升背光亮度,以充分利用电子设备的屏幕的高动态范围(high dynamic range,HDR),因此电子设备的目标背光亮度高于电子设 备的初始背光亮度。
示例性的,显示端可以采用以下两种方法获取电子设备的目标背光亮度:
一种方法是根据预设背光调节比例对初始背光亮度进行处理以得到目标背光亮度。
显示端可以基于历史记录、大数据分析、电子设备的屏幕属性等预先设定一个比例,例如,背光提升比例(用于提升背光亮度,目标背光亮度>初始背光亮度)或者背光降低比例(用于降低背光亮度,目标背光亮度<初始背光亮度)。显示端可以根据该预设背光调节比例对初始背光亮度进行处理,例如,将预设背光调节比例与初始背光亮度相乘以得到目标背光亮度。
需要说明的是,上文描述的方法不构成限定,本申请实施例对预设背光调节比例的设置方式,以及目标背光亮度的获取方式均不作具体限定。
另一种方法是根据元数据获取背光调节比例;根据背光调节比例对初始背光亮度进行处理以得到目标背光亮度。
与上一方法的区别在于背光调节比例不是预先设定,可以由显示端计算得到。背光调节比例也可以是背光提升比例(用于提升背光亮度,目标背光亮度>初始背光亮度)或者背光降低比例(用于降低背光亮度,目标背光亮度<初始背光亮度)。
本申请实施例中,显示端可以根据采集场景的最大亮度获取第一比例,该第一比例是采集场景的人眼的亮度感知与白色漫反射感知的比例;根据第一比例获取第二比例,第二比例是显示端的人眼的亮度感知与白色漫反射感知的比例,该第二比例小于或等于第一比例;根据第二比例获取背光调节比例。
在不同的白色漫反射下,人眼对亮度感知通常满足幂函数,如图15所示,per=lum1/γ 通常情况下,人眼在显示端的亮度感知小于在采集端的亮度感知,在白色漫反射相同的情况下,可以得到P2=a×P1,a表示预设系数,a≤1。而最理想的状态下,人眼在显示端的亮度感知与在采集端的亮度感知一样,在白色漫反射相同的情况下,可以得到P1=P2。
代入上述幂函数可以得到:
P1=L11/γs
其中,Lmax表示采集场景的最大亮度;

P2=(gainBL×L2)1/γd
其中,gainBL表示背光调节比例;
AmbientLum表示环境光强度;
根据P1和P2的等式关系可以得到背光调节比例:
最理想状态下,a=1。
显示端可以根据背光调节比例对初始背光亮度进行处理,例如,将背光调节比例与初始背光亮度相乘以得到目标背光亮度。
需要说明的是,本申请实施例还可以采用其他方法获取背光调节比例,对此不做具体限定。
步骤1405、根据hdrLayer对待处理图像进行亮度调节以得到适用于目标背光亮度的目标图像。
上述步骤中,以尽可能趋近于达到人眼在显示端的亮度感知等于在采集端的亮度感知的最理想状 态为目的,计算得到了电子设备的目标背光亮度,并将电子设备的背光亮度调节为目标背光亮度,使得待处理图像在显示端的显示效果符合人眼对真实采集场景下的亮度感知。但是,待处理图像中有一些HDR区域,经过前述背光调节后可能会失真,例如,当目标背光亮度大于初始背光亮度时,就是提升电子设备的背光亮度,此时待处理图像中的HDR区域可能会比较刺眼。
为了保证人眼在显示端的亮度感知尽可能趋近于在采集端的亮度感知,可以对待处理图像进行像素处理,调整部分区域的像素值,使得该部分区域的亮度和背光调节之前相同,避免刺眼。
本申请实施例可以采用以下方法调整部分区域的像素值:显示端根据hdrLayer获取目标权重。例如,显示端可以将hdrLayer中第一像素值除以预设阈值以得到第一像素值的第一权重值,该第一像素值是hdrLayer中的任意一个像素值,目标权重包括第一权重值;再根据目标权重对待处理图像进行亮度调节以得到目标图像。
例如,上述过程可以表示为如下公式:
pixelLow=pow(1/gainBL,1/2.2)×pixelSrc;
weight=hdrLayer/255;
pixelOut=pixelSrc×weight+pixelLow×(1–weight)
其中,pow(1/gainBL,1/2.2)表示像素调整系数;pixelSrc表示待处理图像中的任意一个像素值;pixelLow表示前述任意一个像素值经调整后的像素值;weight表示目标权重;pixelOut表示前述任意一个像素值对应的目标像素值。
待处理图像中的所有像素都可以采用上述方法处理后,得到目标图像。
可选的,本申请实施例还可以将hdrLayer作为一个引导图片或参考图片,获取待处理图像中的像素值与目标图像中的像素值之间的对应关系,然后根据该对应关系对待处理图像的像素值进行处理,从而得到目标图像。
除此之外,本申请实施例还可以采用其他方法调整部分区域的像素值,获取目标图像,对此不做具体限定。
本申请实施例中,如果电子设备的背光亮度被提升,那么可能待处理图像的部分区域可能会过亮导致刺眼,因此基于hdrLayer的像素调整可以降低前述部分区域的像素亮度,从而避免刺眼;如果电子设备的背光亮度被降低,那么可能待处理图像的部分区域可能会过暗导致细节缺失,因此基于hdrLayer的像素调整可以提升前述部分区域的像素亮度,从而避免细节缺失。
步骤1406、在目标背光亮度下显示目标图像。
在步骤1404中获取了电子设备的目标背光亮度,因此可以基于电子设备的背光技术,对电子设备的屏幕亮度进行调节,从而使其达到目标背光亮度,再在该亮度下显示经步骤1405调节的目标图像,既可以解决因背光亮度提升,出现的待处理图像的部分区域过亮导致刺眼的问题,也可以解决背光亮度降低,出现的待处理图像的部分区域过暗导致细节缺失的问题。
图16为显示端的处理过程的示意图,如图16所示,显示端采用上述实施例所描述的方法获取待处理图像和元数据,并通过上述步骤1402的三种方法获取hdrLayer。通过元数据获取背光调节比例,进而对电子设备的背光亮度进行调节。通过hdrLayer对待处理图像中的像素进行调整。二者结合得到最终的目标图像,进而将目标图像送显。
本申请实施例,根据电子设备的初始背光亮度获取其目标背光亮度,从而对电子设备的背光亮度进行调节,以充分利用屏幕的亮度范围进行图像显示,同时对于待处理图像中由于亮度调节出现失真的区域,结合hdrLayer进行像素调整以得到适用于目标背光亮度的目标图像,从而解决图像失真的问题,再在目标背光亮度下显示目标图像,目标背光亮度和目标图像配合显示,实现了端到端呈现出最佳的效果体验。
图17为本申请实施例应用于电子设备的图像显示装置1700的一个示例性的结构示意图,如图17所示,本实施例的应用于电子设备的图像显示装置1700可以应用于解码端30。该应用于电子设备的图像显示装置1700可以包括:获取模块1701、调节模块1702和显示模块1703。其中,
获取模块1701,用于获取待处理图像;获取高亮增强数据,所述高亮增强数据包括高动态范围图层hdrLayer;获取所述电子设备的初始背光亮度;根据所述初始背光亮度获取所述电子设备的目标背 光亮度;调节模块1702,用于根据所述hdrLayer对所述待处理图像进行亮度调节以得到适用于所述目标背光亮度的目标图像;显示模块1703,用于在所述目标背光亮度下显示所述目标图像。
在一种可能的实现方式中,所述获取模块1701,具体用于接收码流,解码所述码流以得到所述hdrLayer。
在一种可能的实现方式中,所述获取模块1701,具体用于接收码流,解码所述码流以得到N×M组参数,每组参数包括k个参数,所述N×M组参数与所述待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;根据所述N×M组参数分别对对应的所述图像块进行处理以得到所述hdrLayer。
在一种可能的实现方式中,所述获取模块1701,具体用于生成N×M组参数,每组参数包括k个参数,所述N×M组参数与所述待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;根据所述N×M组参数分别对对应的所述图像块进行处理以得到所述hdrLayer。
在一种可能的实现方式中,所述获取模块1701,具体用于根据预设背光调节比例对所述初始背光亮度进行处理以得到所述目标背光亮度。
在一种可能的实现方式中,所述高亮增强数据还包括元数据;所述获取模块1701,具体用于根据所述元数据获取背光调节比例;根据所述背光调节比例对所述初始背光亮度进行处理以得到所述目标背光亮度。
在一种可能的实现方式中,所述调节模块1702,具体用于根据所述hdrLayer获取目标权重;根据所述目标权重对所述待处理图像进行亮度调节以得到所述目标图像。
在一种可能的实现方式中,所述调节模块1702,具体用于将所述hdrLayer中第一像素值除以预设阈值以得到所述第一像素值的第一权重值,所述第一像素值是所述hdrLayer中的任意一个像素值,所述目标权重包括所述第一权重值。
在一种可能的实现方式中,所述调节模块1702,具体用于获取像素调整系数;根据所述像素调整系数和所述待处理图像获取经调整的图像;根据所述待处理图像、所述经调整的图像和所述目标权重获取所述目标图像。
在一种可能的实现方式中,所述N×M组参数通过机器学习模型得到;或者,所述N×M组参数基于所述待处理图像的直方图得到。
在一种可能的实现方式中,所述元数据包括采集场景的最大亮度;所述获取模块1701,具体用于根据所述采集场景的最大亮度获取第一比例,所述第一比例是所述采集场景的人眼的亮度感知与白色漫反射感知的比例;根据所述第一比例获取第二比例,所述第二比例是显示端的人眼的亮度感知与白色漫反射感知的比例,所述第二比例小于或等于所述第一比例;根据所述第二比例获取所述背光调节比例。
在一种可能的实现方式中,所述获取模块1701,具体用于根据以下公式计算得到所述第一比例:
P1=L11/γs
其中,P1表示所述第一比例;
Lmax表示所述采集场景的最大亮度;
在一种可能的实现方式中,所述获取模块1701,具体用于根据以下公式计算得到所述第二比例:P2=a×P1
其中,P2表示所述第二比例;
a表示预设系数,a≤1。
在一种可能的实现方式中,所述获取模块1701,具体用于根据以下公式计算得到所述背光调节比例:
其中,gainBL表示所述背光调节比例;
AmbientLum表示环境光强度;
在一种可能的实现方式中,所述元数据是通过以下方式获取的:解码码流以得到所述元数据;或者,接收码流,解码码流以得到采集场景的拍照参数,再根据所述拍照参数计算得到所述元数据。
在一种可能的实现方式中,所述元数据还包括所述采集场景的最小亮度和/或所述采集场景的动态范围。
本实施例的装置,可以用于执行图14所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
图18为本申请实施例编码装置1800的一个示例性的结构示意图,如图18所示,本实施例的编码装置1800可以应用于编码端20。该编码装置1800可以包括:获取模块1801和编码模块1802。其中,
获取模块1801,用于获取待处理图像;获取元数据,所述元数据包括采集场景的最大亮度;编码模块1802,用于对所述待处理图像和所述元数据进行编码以得到第一码流。
在一种可能的实现方式中,所述获取模块1801,还用于生成N×M组参数,每组参数包括k个参数,所述N×M组参数与所述待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;所述编码模块1802,还用于对所述N×M组参数进行编码以得到第二码流。
在一种可能的实现方式中,所述获取模块1801,还用于生成N×M组参数,每组参数包括k个参数,所述N×M组参数与所述待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;根据所述N×M组参数分别对对应的所述图像块进行处理以得到高动态范围图层hdrLayer;所述编码模块,还用于对所述hdrLayer进行编码以得到第三码流。
在一种可能的实现方式中,所述N×M组参数通过机器学习模型得到;或者,所述N×M组参数基于所述待处理图像的直方图得到。
在一种可能的实现方式中,所述获取模块1801,具体用于根据长曝光图片、正常曝光图片和短曝光图片获取所述元数据;或者,根据预先设定的拍照参数计算得到所述元数据。
在一种可能的实现方式中,所述元数据还包括所述采集场景的最小亮度和/或所述采集场景的动态范围。
本实施例的装置,可以用于执行图5所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
此外,本申请还提供一种计算机可读存储介质,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行图5或图14所示方法实施例的技术方案。
本申请还提供一种计算机程序产品,所述计算机程序产品中包含指令,其特征在于,当所述指令在计算机或处理器上运行时,使得所述计算机或所述处理器实现图5或图14所示方法实施例的技术方案。
本申请还提供一种码流,该码流可以存储在计算机可读存储介质中,或通过电磁波等信号形式进行传输,该码流中包括经编码的图像数据和元数据,元数据包括采集场景的最大亮度。采集场景为采集编码前的图像时的场景。
本申请还提供一种芯片系统,其特征在于,所述芯片系统包括逻辑电路和输入输出接口,其中:所述输入输出接口用于与所述芯片系统之外的其他通信装置进行通信,所述逻辑电路用于执行图5或图14所示方法实施例的技术方案。
在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式 的指令完成。处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、特定应用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。本申请实施例公开的方法的步骤可以直接体现为硬件编码处理器执行完成,或者用编码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
上述各实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (48)

  1. 一种应用于电子设备的图像显示方法,其特征在于,包括:
    获取待处理图像;
    获取高亮增强数据,所述高亮增强数据包括高动态范围图层hdrLayer;
    获取所述电子设备的初始背光亮度;
    根据所述初始背光亮度,获取所述电子设备的目标背光亮度;
    根据所述hdrLayer对所述待处理图像进行亮度调节以得到适用于所述目标背光亮度的目标图像;
    在所述目标背光亮度下显示所述目标图像。
  2. 根据权利要求1所述的方法,其特征在于,所述获取高亮增强数据,包括:
    接收码流,解码所述码流以得到所述hdrLayer。
  3. 根据权利要求1所述的方法,其特征在于,所述获取高亮增强数据,包括:
    接收码流,解码所述码流以得到N×M组参数,每组参数包括k个参数,所述N×M组参数与所述待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;
    根据所述N×M组参数分别对对应的所述图像块进行处理以得到所述hdrLayer。
  4. 根据权利要求1所述的方法,其特征在于,所述获取高亮增强数据,包括:
    生成N×M组参数,每组参数包括k个参数,所述N×M组参数与所述待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;
    根据所述N×M组参数分别对对应的所述图像块进行处理以得到所述hdrLayer。
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述根据所述初始背光亮度,获取所述电子设备的目标背光亮度,包括:
    根据预设背光调节比例对所述初始背光亮度进行处理以得到所述目标背光亮度。
  6. 根据权利要求1-4中任一项所述的方法,其特征在于,所述高亮增强数据还包括元数据;所述根据所述初始背光亮度,获取所述电子设备的目标背光亮度,包括:
    根据所述元数据获取背光调节比例;
    根据所述背光调节比例对所述初始背光亮度进行处理以得到所述目标背光亮度。
  7. 根据权利要求1-6中任一项所述的方法,其特征在于,所述根据所述hdrLayer对所述待处理图像进行亮度调节以得到适用于所述目标背光亮度的目标图像,包括:
    根据所述hdrLayer获取目标权重;
    根据所述目标权重对所述待处理图像进行亮度调节以得到所述目标图像。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述hdrLayer获取目标权重,包括:
    将所述hdrLayer中第一像素值除以预设阈值以得到所述第一像素值的第一权重值,所述第一像素值是所述hdrLayer中的任意一个像素值,所述目标权重包括所述第一权重值。
  9. 根据权利要求7或8所述的方法,其特征在于,所述根据所述目标权重对所述待处理图像进行亮度调节以得到目标图像,包括:
    获取像素调整系数;
    根据所述像素调整系数和所述待处理图像获取经调整的图像;
    根据所述待处理图像、所述经调整的图像和所述目标权重获取所述目标图像。
  10. 根据权利要求4所述的方法,其特征在于,所述N×M组参数通过机器学习模型得到;或者,所述N×M组参数基于所述待处理图像的直方图得到。
  11. 根据权利要求6所述的方法,其特征在于,所述元数据包括采集场景的最大亮度;所述根据所述元数据获取背光调节比例,包括:
    根据所述采集场景的最大亮度获取第一比例,所述第一比例是所述采集场景的人眼的亮度感知与白色漫反射感知的比例;
    根据所述第一比例获取第二比例,所述第二比例是显示端的人眼的亮度感知与白色漫反射感知的比例,所述第二比例小于或等于所述第一比例;
    根据所述第二比例获取所述背光调节比例。
  12. 根据权利要求11所述的方法,其特征在于,所述根据所述采集场景的最大亮度获取第一比例,包括:
    根据以下公式计算得到所述第一比例:
    P1=L11/γs
    其中,P1表示所述第一比例;
    Lmax表示所述采集场景的最大亮度;
  13. 根据权利要求12所述的方法,其特征在于,所述根据所述第一比例获取第二比例,包括:
    根据以下公式计算得到所述第二比例:
    P2=a×P1
    其中,P2表示所述第二比例;
    a表示预设系数,a≤1。
  14. 根据权利要求13所述的方法,其特征在于,所述根据所述第二比例获取所述背光调节比例,包括:
    根据以下公式计算得到所述背光调节比例:
    其中,gainBL表示所述背光调节比例;
    AmbientLum表示环境光强度;
  15. 根据权利要求6所述的方法,其特征在于,所述元数据是通过以下方式获取的:
    解码码流以得到所述元数据;或者,
    解码码流以得到采集场景的拍照参数,再根据所述拍照参数计算得到所述元数据。
  16. 根据权利要求6所述的方法,其特征在于,所述元数据还包括所述采集场景的最小亮度和/或所述采集场景的动态范围。
  17. 一种编码方法,其特征在于,所述方法包括:
    获取待处理图像;
    获取元数据,所述元数据包括采集场景的最大亮度;
    对所述待处理图像和所述元数据进行编码以得到第一码流。
  18. 根据权利要求17所述的方法,其特征在于,还包括:
    生成N×M组参数,每组参数包括k个参数,所述N×M组参数与所述待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;
    对所述N×M组参数进行编码以得到第二码流。
  19. 根据权利要求17所述的方法,其特征在于,还包括:
    生成N×M组参数,每组参数包括k个参数,所述N×M组参数与所述待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;
    根据所述N×M组参数分别对对应的所述图像块进行处理以得到高动态范围图层hdrLayer;
    对所述hdrLayer进行编码以得到第三码流。
  20. 根据权利要求18或19所述的方法,其特征在于,所述N×M组参数通过机器学习模型得到; 或者,所述N×M组参数基于所述待处理图像的直方图得到。
  21. 根据权利要求17-20中任一项所述的方法,其特征在于,所述获取元数据,包括:
    根据长曝光图片、正常曝光图片和短曝光图片获取所述元数据;或者,根据预先设定的拍照参数计算得到所述元数据。
  22. 根据权利要求17-21中任一项所述的方法,其特征在于,所述元数据还包括所述采集场景的最小亮度和/或所述采集场景的动态范围。
  23. 一种应用于电子设备的图像显示装置,其特征在于,包括:
    获取模块,用于获取待处理图像;获取高亮增强数据,所述高亮增强数据包括高动态范围图层hdrLayer;获取所述电子设备的初始背光亮度;根据所述初始背光亮度,获取所述电子设备的目标背光亮度;
    调节模块,用于根据所述hdrLayer对所述待处理图像进行亮度调节以得到适用于所述目标背光亮度的目标图像;
    显示模块,用于在所述目标背光亮度下显示所述目标图像。
  24. 根据权利要求23所述的装置,其特征在于,所述获取模块,具体用于接收码流,解码所述码流以得到所述hdrLayer。
  25. 根据权利要求23所述的装置,其特征在于,所述获取模块,具体用于接收码流,解码所述码流以得到N×M组参数,每组参数包括k个参数,所述N×M组参数与所述待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;根据所述N×M组参数分别对对应的所述图像块进行处理以得到所述hdrLayer。
  26. 根据权利要求23所述的装置,其特征在于,所述获取模块,具体用于生成N×M组参数,每组参数包括k个参数,所述N×M组参数与所述待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;根据所述N×M组参数分别对对应的所述图像块进行处理以得到所述hdrLayer。
  27. 根据权利要求23-26中任一项所述的装置,其特征在于,所述获取模块,具体用于根据预设背光调节比例对所述初始背光亮度进行处理以得到所述目标背光亮度。
  28. 根据权利要求23-26中任一项所述的装置,其特征在于,所述高亮增强数据还包括元数据;所述获取模块,具体用于根据所述元数据获取背光调节比例;根据所述背光调节比例对所述初始背光亮度进行处理以得到所述目标背光亮度。
  29. 根据权利要求23-28中任一项所述的装置,其特征在于,所述调节模块,具体用于根据所述hdrLayer获取目标权重;根据所述目标权重对所述待处理图像进行亮度调节以得到所述目标图像。
  30. 根据权利要求29所述的装置,其特征在于,所述调节模块,具体用于将所述hdrLayer中第一像素值除以预设阈值以得到所述第一像素值的第一权重值,所述第一像素值是所述hdrLayer中的任意一个像素值,所述目标权重包括所述第一权重值。
  31. 根据权利要求29或30所述的装置,其特征在于,所述调节模块,具体用于获取像素调整系数;根据所述像素调整系数和所述待处理图像获取经调整的图像;根据所述待处理图像、所述经调整的图像和所述目标权重获取所述目标图像。
  32. 根据权利要求26所述的装置,其特征在于,所述N×M组参数通过机器学习模型得到;或者,所述N×M组参数基于所述待处理图像的直方图得到。
  33. 根据权利要求28所述的装置,其特征在于,所述元数据包括采集场景的最大亮度;所述获取模块,具体用于根据所述采集场景的最大亮度获取第一比例,所述第一比例是所述采集场景的人眼的亮度感知与白色漫反射感知的比例;根据所述第一比例获取第二比例,所述第二比例是显示端的人眼的亮度感知与白色漫反射感知的比例,所述第二比例小于或等于所述第一比例;根据所述第二比例获取所述背光调节比例。
  34. 根据权利要求33所述的装置,其特征在于,所述获取模块,具体用于根据以下公式计算得到所述第一比例:
    P1=L11/γs
    其中,P1表示所述第一比例;
    Lmax表示所述采集场景的最大亮度;
  35. 根据权利要求34所述的装置,其特征在于,所述获取模块,具体用于根据以下公式计算得到所述第二比例:
    P2=a×P1
    其中,P2表示所述第二比例;
    a表示预设系数,a≤1。
  36. 根据权利要求35所述的装置,其特征在于,所述获取模块,具体用于根据以下公式计算得到所述背光调节比例:
    其中,gainBL表示所述背光调节比例;
    AmbientLum表示环境光强度;
  37. 根据权利要求28所述的装置,其特征在于,所述元数据是通过以下方式获取的:
    解码码流以得到所述元数据;或者,
    解码码流以得到采集场景的拍照参数,再根据所述拍照参数计算得到所述元数据。
  38. 根据权利要求28所述的装置,其特征在于,所述元数据还包括所述采集场景的最小亮度和/或所述采集场景的动态范围。
  39. 一种编码装置,其特征在于,包括:
    获取模块,用于获取待处理图像;获取元数据,所述元数据包括采集场景的最大亮度;
    编码模块,用于对所述待处理图像和所述元数据进行编码以得到第一码流。
  40. 根据权利要求39所述的装置,其特征在于,所述获取模块,还用于生成N×M组参数,每组参数包括k个参数,所述N×M组参数与所述待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;
    所述编码模块,还用于对所述N×M组参数进行编码以得到第二码流。
  41. 根据权利要求39所述的装置,其特征在于,所述获取模块,还用于生成N×M组参数,每组参数包括k个参数,所述N×M组参数与所述待处理图像包括的N×M个图像块对应,N和M均为正整数,N×M>1,k>1;根据所述N×M组参数分别对对应的所述图像块进行处理以得到高动态范围图层hdrLayer;
    所述编码模块,还用于对所述hdrLayer进行编码以得到第三码流。
  42. 根据权利要求40或41所述的装置,其特征在于,所述N×M组参数通过机器学习模型得到;或者,所述N×M组参数基于所述待处理图像的直方图得到。
  43. 根据权利要求39-42中任一项所述的装置,其特征在于,所述获取模块,具体用于根据长曝光图片、正常曝光图片和短曝光图片获取所述元数据;或者,根据预先设定的拍照参数计算得到所述元数据。
  44. 根据权利要求39-43中任一项所述的装置,其特征在于,所述元数据还包括所述采集场景的最小亮度和/或所述采集场景的动态范围。
  45. 一种解码器,其特征在于,包括:
    一个或多个处理器;
    存储器,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-16中任一项所述的方法。
  46. 一种编码器,其特征在于,包括:
    一个或多个处理器;
    存储器,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求17-22中任一项所述的方法。
  47. 一种计算机可读存储介质,其特征在于,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行权利要求1-22中任一项所述的方法。
  48. 一种计算机程序产品,所述计算机程序产品中包含指令,其特征在于,当所述指令在计算机或处理器上运行时,使得所述计算机或所述处理器实现上述权利要求1-22中任一项所述的方法。
PCT/CN2023/104105 2022-07-15 2023-06-29 应用于电子设备的图像显示方法、编码方法及相关装置 WO2024012227A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210831137.6 2022-07-15
CN202210831137.6A CN117478902A (zh) 2022-07-15 2022-07-15 应用于电子设备的图像显示方法、编码方法及相关装置

Publications (1)

Publication Number Publication Date
WO2024012227A1 true WO2024012227A1 (zh) 2024-01-18

Family

ID=89535527

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/104105 WO2024012227A1 (zh) 2022-07-15 2023-06-29 应用于电子设备的图像显示方法、编码方法及相关装置

Country Status (2)

Country Link
CN (1) CN117478902A (zh)
WO (1) WO2024012227A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620819A (zh) * 2009-06-25 2010-01-06 北京中星微电子有限公司 显示图像背光亮度的动态调整方法、装置及移动显示设备
CN107545871A (zh) * 2017-09-30 2018-01-05 青岛海信电器股份有限公司 图像亮度处理方法及装置
US20180220101A1 (en) * 2017-01-27 2018-08-02 Microsoft Technology Licensing, Llc Content-adaptive adjustment of display device brightness levels when rendering high dynamic range content
CN108510955A (zh) * 2018-04-23 2018-09-07 Oppo广东移动通信有限公司 调整显示屏亮度的方法以及相关产品
CN113496685A (zh) * 2020-04-08 2021-10-12 华为技术有限公司 一种显示亮度调整方法及相关装置
CN114639356A (zh) * 2022-03-14 2022-06-17 Oppo广东移动通信有限公司 显示亮度调节方法、装置、电子设备和计算机可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620819A (zh) * 2009-06-25 2010-01-06 北京中星微电子有限公司 显示图像背光亮度的动态调整方法、装置及移动显示设备
US20180220101A1 (en) * 2017-01-27 2018-08-02 Microsoft Technology Licensing, Llc Content-adaptive adjustment of display device brightness levels when rendering high dynamic range content
CN107545871A (zh) * 2017-09-30 2018-01-05 青岛海信电器股份有限公司 图像亮度处理方法及装置
CN108510955A (zh) * 2018-04-23 2018-09-07 Oppo广东移动通信有限公司 调整显示屏亮度的方法以及相关产品
CN113496685A (zh) * 2020-04-08 2021-10-12 华为技术有限公司 一种显示亮度调整方法及相关装置
CN114639356A (zh) * 2022-03-14 2022-06-17 Oppo广东移动通信有限公司 显示亮度调节方法、装置、电子设备和计算机可读存储介质

Also Published As

Publication number Publication date
CN117478902A (zh) 2024-01-30

Similar Documents

Publication Publication Date Title
WO2020192483A1 (zh) 图像显示方法和设备
US20230214976A1 (en) Image fusion method and apparatus and training method and apparatus for image fusion model
US20220188999A1 (en) Image enhancement method and apparatus
US20230080693A1 (en) Image processing method, electronic device and readable storage medium
WO2021135657A1 (zh) 图像处理方法、装置和图像处理系统
WO2021164731A1 (zh) 图像增强方法以及图像增强装置
CN113066017B (zh) 一种图像增强方法、模型训练方法及设备
CN111292264A (zh) 一种基于深度学习的图像高动态范围重建方法
CN110751649B (zh) 视频质量评估方法、装置、电子设备及存储介质
CN110555800A (zh) 图像处理装置及方法
WO2022073282A1 (zh) 一种基于特征交互学习的动作识别方法及终端设备
CN116803079A (zh) 视频和相关特征的可分级译码
WO2024002211A1 (zh) 一种图像处理方法及相关装置
WO2023151511A1 (zh) 模型训练方法、图像去摩尔纹方法、装置及电子设备
US20230209096A1 (en) Loop filtering method and apparatus
CN113096021A (zh) 一种图像处理方法、装置、设备及存储介质
CN113920010A (zh) 图像帧的超分辨率实现方法和装置
CN114915783A (zh) 编码方法和装置
US20240037802A1 (en) Configurable positions for auxiliary information input into a picture data processing neural network
US20240161488A1 (en) Independent positioning of auxiliary information in neural network based picture processing
TWI826160B (zh) 圖像編解碼方法和裝置
WO2023193629A1 (zh) 区域增强层的编解码方法和装置
WO2024012227A1 (zh) 应用于电子设备的图像显示方法、编码方法及相关装置
WO2023010981A1 (zh) 编解码方法及装置
CN115330633A (zh) 图像色调映射方法及装置、电子设备、存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23838720

Country of ref document: EP

Kind code of ref document: A1