CN108337516B - Multi-user-oriented HDR video dynamic range scalable coding method - Google Patents

Multi-user-oriented HDR video dynamic range scalable coding method Download PDF

Info

Publication number
CN108337516B
CN108337516B CN201810094956.0A CN201810094956A CN108337516B CN 108337516 B CN108337516 B CN 108337516B CN 201810094956 A CN201810094956 A CN 201810094956A CN 108337516 B CN108337516 B CN 108337516B
Authority
CN
China
Prior art keywords
dynamic range
video
hdr video
quantization
residual signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810094956.0A
Other languages
Chinese (zh)
Other versions
CN108337516A (en
Inventor
蒋刚毅
陈璐俊
郁梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yiqilin Cultural Media Co ltd
Nantian Shujin Beijing Information Industry Development Co ltd
Shanghai Ruishenglian Information Technology Co ltd
Original Assignee
Ningbo University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo University filed Critical Ningbo University
Priority to CN201810094956.0A priority Critical patent/CN108337516B/en
Publication of CN108337516A publication Critical patent/CN108337516A/en
Application granted granted Critical
Publication of CN108337516B publication Critical patent/CN108337516B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/98Adaptive-dynamic-range coding [ADRC]

Abstract

The invention relates to a multi-user-oriented HDR video dynamic range scalable coding method, which comprises the steps of firstly, considering the perceptual characteristic of an HDR video, providing a dynamic range scalable model, and decomposing the HDR video with different dynamic range levels into a standard dynamic range video and a plurality of residual signal frame sequences; then, combining the brightness masking effect and the human eye perception characteristic, quantizing and filtering a plurality of residual signal frame sequences, filtering scattered data points in the residual, retaining the overall difference information between adjacent dynamic range levels of the residual signal frame sequences, and improving the coding efficiency of the residual signal frame sequences; and finally, reconstructing at a decoding end to obtain a standard dynamic range video and HDR videos of various dynamic range levels so as to adapt to multi-user-end display equipment. The method also inhibits noise by using visual masking characteristics, and improves the efficiency of the dynamic range scalable coding.

Description

Multi-user-oriented HDR video dynamic range scalable coding method
Technical Field
The invention relates to the technical field of video coding and decoding, in particular to a multi-user-oriented HDR video dynamic range scalable coding method.
Background
High Dynamic Range (HDR) videos are a new research hotspot in the field of digital media, and can reflect real scenes more truly and bring more realistic visual experience to users. The display device at the user end of the traditional video system mainly adopts an 8bit Standard Dynamic Range (SDR) display, and can not directly display HDR video. Few existing HDR display apparatuses display or partially display HDR video by expanding a luminance range, but the dynamic ranges of these HDR display apparatuses are uneven, with 10 bits, 12bits, and the like. In existing HDR video coding transmission, it is difficult for the same HDR video stream to simultaneously meet the requirements of multiple users for different dynamic range display devices. This requires the video system server to fully consider the requirement of the end user to display Multi Dynamic Range (MDR) video during the HDR video encoding process. Obviously, implementing dynamic range scalable coding of HDR video is an effective way to meet the requirements of MDR video.
SDR video is generally stored in 8bit quantization depth, and the dynamic range is fixed and unchanged. The existing video scalable coding technology mainly considers scalability in time, space and quality to meet the requirements of users for adjusting different frame rates, image resolutions and image quality levels. In10 months of 2014, the Joint Video Team on Video Coding (JCT-VC) of Video Coding released the Scalable High Efficiency Video Coding (SHVC) standard on the basis of the High Efficiency Video Coding (HEVC) standard, and mainly improved the specific Coding tool and decoding process of Scalable Coding, wherein the Scalable Main and Scalable Main10 grades respectively realize the Scalable Coding of the SDR Video and the HDR Video in terms of space-time and quality. In 2 months 2015, in order to improve the compression performance of SHVC on HDR video, the Moving Picture Experts Group (MPEG) initiated a technical research draft (CfE), and it is hoped to implement efficient coding of HDR and Wide Color Gamut (WCG) video by extending the HEVC standard and adding new technology. Some techniques related to HDR video image content processing, such as Perceptual Quantization (PQ) coding, nonlinear operation of logarithmic Gamma distribution (HLG), and development of HDR video subjective test experiments, have been proposed.
Although the extended version of the existing HEVC can encode HDR video, it is a coding standard proposed by SDR video, mainly encoding video with fixed dynamic range singly, and has great limitation in dynamic range scalability. To this end, some scholars have also proposed new schemes for HDR video coding. Rusannovskyy et al propose to encode HDR video using a Dynamic Range Adaptation (DRA) backward compatible encoding scheme, which improves performance compared to a non-backward compatible encoding scheme; mir et al proposed an improved scheme for dual-layer backward compatible HDR video coding and compared with single-layer HLG performance, but the existing backward compatible coding methods still have disadvantages, they can only meet the requirements of users with two different dynamic ranges, and cannot realize joint coding of videos with different dynamic ranges. In view of the dynamic range scalability required by MDR users, there is also a need to enable high efficiency HDR video coding with user-side display device friendly compatibility with the HEVC standard. In addition, the quantization depth of the video pixel expresses the dynamic range characteristic thereof, and the larger the quantization depth is, the wider the dynamic range can be expressed; HDR display devices with pixel quantization depths of 10bit and 12bit exist on the market today, and it is believed that people will develop HDR display devices with higher dynamic ranges in the near future. Therefore, in order to realize HDR video coding transmission for MDR users, research on an HDR video dynamic range scalable coding technology compatible with the HEVC standard is an effective approach to simultaneously meet the HDR video streaming service requirements of MDR users.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a multi-user-oriented HDR video dynamic range scalable coding method, which can enable the same HDR video stream to be displayed on different dynamic range display devices of multiple users at the same time.
The technical scheme adopted by the invention is that a multi-user-oriented HDR video dynamic range scalable coding method comprises the following steps:
(1) converting the input HDR video into an HDR video sequence with a plurality of dynamic range levels represented by different quantization depths through a conversion process based on Perception Quantization (PQ);
(2) decomposing an HDR video frame into an SDR basic frame and a plurality of Residual Signal Frames (RSFs) by establishing a Dynamic Range Scalable Model (DRSM), wherein the RSFs represent difference information between two adjacent dynamic range levels and record the maximum value and the minimum value of the original RSFs;
(3) carrying out median filtering pretreatment on the RSFs sequence according to statistical analysis and perception characteristic analysis, filtering out pixel points which have little influence on perception quality in the RSFs by using human eye brightness masking effect, and keeping the total difference which can be reflected by the RSFs;
(4) the processed RSFs sequence and the SDR sequence are respectively encoded into a dynamic range hierarchical video code stream through a unified HEVC encoder, and meanwhile, the maximum value and the minimum value of the recorded RSFs are used as auxiliary Enhancement Information (SEI) for encoding and transmission so as to assist the HDR video reconstruction of a decoding end;
(5) and decoding and reconstructing the video to obtain SDR and HDR videos with different dynamic range quantization depths through the inverse process of DRSM at a decoding end so as to realize that the HDR video content is suitable for being displayed on multi-user-end MDR display equipment.
The invention has the beneficial effects that: the method decomposes HDR video stream into a standard dynamic range SDR video and a plurality of residual signal frame RSFs sequences by considering a dynamic range clustering model DRSM of HDR video perception characteristics to form a code stream with a dynamic range grading, thereby meeting the requirements of multi-user multi-dynamic range display equipment; meanwhile, filtering processing is carried out on the RSFs by combining the brightness masking effect and the human eye perception characteristic, the coding efficiency of the RSFs is improved, and the efficiency of the coding method is further improved.
In the step (1), a specific method for performing Perceptual Quantization (PQ) -based conversion processing on an input HDR video is as follows: comprises the following steps:
firstly, HDR-RGB image data in an original OpenEXR format is converted into RG' B in a perception domain through a non-linear function of PQ;
secondly, realizing color space conversion from R 'G' B 'to Y' CbCr through a 3 multiplied by 3 conversion matrix;
thirdly, quantizing the converted data into integer data with different bit depths, namely:
Figure BDA0001564789160000031
wherein, (Y ', Cb, Cr) represents 4:4:4 floating-point data obtained by color space conversion, (DY', DCb, DCr) represents quantized integer data, Clip3(·) represents clipping functions of two directional restrictions, 219*2b-8Represents the brightness scale, 2b-4Representing the luminance signal offset, 224 x 2b-8Denotes the chromaticity scale, 2b-1Represents the color difference signal offset, b represents the quantization depth, Round (·) represents the rounding function;
and fourthly, sampling the 4:4:4 chroma format into a 4:2:0 chroma format, and converting the 4:2:0 chroma format into a Y' CbCr video sequence to adapt to a subsequent HEVC coding system.
In the step (2), the specific process of establishing a dynamic range scalable model DRSM is as follows:
firstly, performing dynamic range up-sampling on video content in a lower-level dynamic range to obtain an HDR video in a higher-level quantization depth, namely: vd'(x,y)=Vd-Δd(x,y)<<2,d∈{10,12,14,16},Δd=2,Vd' means by Vd-ΔdHDR video sequence obtained by dynamic range up-sampling has dynamic range more than Vd-ΔdThe height is higher by one level;
secondly, making a difference with the originally converted HDR video sequence with the same dynamic range level, quantizing the residual error obtained by decomposition into RSFs with the same quantization depth as the SDR sequence in order to adapt to the HEVC encoder of the SDR video, and using the RSFs to represent the difference information between two adjacent dynamic range levels, namely
Figure BDA0001564789160000032
d∈{10,12,14,16},i∈N*,Vd' means by Vd-ΔdHDR video sequence, V, obtained by dynamic range up-samplingdRepresenting the original HDR video sequence, i.e. further quantizing the normalized residual data to a data range of the same quantization depth as the SDR video frame, to achieve compatibility with data encoding of the SDR video frame;
in the step (3), the specific process of performing median filtering on the RSFs sequence is as follows:
according to the brightness masking effect in the visual perception characteristic of human eyes, the human eyes have low detail perception capability on a flat area and low distortion perception capability on a complex area, the flat area in a picture is taken, and information insensitive to the human eyes can be filtered through filtering processing corresponding to the area in RSFs; taking a complex content region in a picture, wherein the region in the corresponding RSFs contains less information, and the filtering processing does not influence the expression of valuable contents of the region;
secondly, counting the pixel value characteristics of RSFso before the RSFs quantization of the balloon effectiveness sequence;
through statistical analysis of RSFs, the method discovers that a large amount of isolated noise point information is contained in a complex region, isolated data point information which is not easy to be sensed by a user exists in a flat region, and information of edge and texture characteristics which are easy to be sensed exists in a region with a foreground and a background;
considering that human eyes have a brightness masking effect, namely the human eyes are sensitive to texture and detail information in a single bright area or a single dark area and are insensitive to texture and detail in a scene containing the bright and dark areas at the same time, most HDR video sequence scenes contain the bright and dark areas at the same time through analysis, and the RSFs can be preprocessed in a median filtering mode to enable the content of corresponding positions of the RSFs to tend to be smooth and the overall difference characteristic between adjacent dynamic range levels can be reserved;
in the step (5), the specific process of performing HDR video reconstruction at the decoding end through the inverse process of DRSM is as follows:
firstly, an SDR video facing a standard dynamic range display device is obtained by directly decoding an SDR video code stream through an HEVC decoder;
secondly, the HDR video for the high dynamic range display device can be obtained by reconstructing an inverse process of the DRSM, that is:
Figure BDA0001564789160000041
d e {10,12,14,16}, Δ d2, i e N, wherein,
Figure BDA0001564789160000042
representing a reconstructed HDR video sequence with a dynamic range of dbit,
Figure BDA0001564789160000043
representing a reconstructed HDR video sequence with a lower level dynamic range of d-ad,
Figure BDA0001564789160000044
representing a pixel value at a coordinate position (x, y) in a reconstructed dbit video frame, and if the resolution of the video frame is L × W, { (x, y) x ═ 0,1,2,. said., L-1, y ═ 0,1,2,. said., W-1 };
Figure BDA0001564789160000045
the inverse quantization process representing the pixel value p, namely:
Figure BDA0001564789160000046
wherein p ismaxAnd pminIs obtained from the auxiliary enhancement information.
Drawings
FIG. 1 is a block diagram of an overall implementation of a multi-user-oriented HDR video dynamic range scalable encoding method according to the present invention;
FIG. 2 is a diagram of a dynamic range scalable model of the present invention, as exemplified by frame 1 of the balloon effectiveness sequence;
FIG. 3 is a diagram of a HDR video test sequence employed to test the encoding method of the present invention;
FIG. 4 is a graph comparing rate-distortion performance curves for the sequence balloon effectiveness;
FIG. 5 is a graph comparing rate-distortion performance curves for the sequence SunRise;
FIG. 6 is a graph comparing rate-distortion performance curves for sequence Market 3;
fig. 7 is a graph comparing the rate-distortion performance curves of the sequence Tibul 2.
Detailed Description
The invention is further described below with reference to the accompanying drawings in combination with specific embodiments so that those skilled in the art can practice the invention with reference to the description, and the scope of the invention is not limited to the specific embodiments.
The invention relates to a multi-user-oriented HDR video dynamic range scalable coding method, which comprises the following steps:
1. converting an input HDR video into an HDR video sequence with a plurality of dynamic range levels represented by different quantization depths (such as 8bit, 10bit, 12bit and the like) through a conversion process based on Perception Quantization (PQ);
2. in order to enable the existing MDR display equipment to bring high-quality HDR video pictures to users, a Dynamic Range Scalable Model (DRSM) is provided, one HDR video frame is decomposed into one SDR basic frame and a plurality of Residual Signal Frames (RSFs), and the RSFs can represent difference information between two adjacent Dynamic Range levels;
3. perceptual filtering preprocessing is carried out on the RSFs sequence, and then a dynamic range hierarchical video code stream is formed through an HEVC (high efficiency video coding) coder suitable for an SDR (standard definition extension) video together with the SDR sequence;
4. coding and transmitting the maximum value and the minimum value of the RSFs as auxiliary enhancement information (SEI) so as to assist the HDR video reconstruction of a decoding end;
5. and decoding and reconstructing the video to obtain SDR and HDR videos with different dynamic range quantization depths through the inverse process of DRSM at a decoding end so as to realize that HDR video content can adapt to be displayed on MDR display equipment with multiple user ends.
Fig. 1 is a general implementation block diagram of a multi-user-oriented HDR video dynamic range scalable coding method, which takes luminance depths of 8 bits, 10 bits, and 12bits as examples, and the specific implementation steps are as follows:
1. the input HDR video is converted into HDR video sequences with a plurality of dynamic range levels represented by different quantization depths through a conversion process based on Perception Quantization (PQ), wherein the HDR video sequences are respectively marked as V by taking 8bit, 10bit and 12bit brightness depths as examplesSDR_8bit、VSDR_10bit、VSDR_12bit
2. Converting the HDR-RGB image data in the original OpenEXR format into R ' G ' B ' of a perception domain through a non-linear function of PQ;
3. color space conversion from R 'G' B 'to Y' CbCr is achieved via a 3 x 3 conversion matrix;
4. the converted data is quantized into 8bit, 10bit and 12bit integer data,
Figure BDA0001564789160000051
wherein, (Y ', Cb, Cr) represents 4:4:4 floating-point data obtained by color space conversion, (DY', DCb, DCr) represents quantized integer data, and Clip3(·) represents two directional constraints (i.e. 0-2)b-1) Of (3) a clipping function of 219 x 2b-8Represents the brightness scale, 2b-4Representing the luminance signal offset, 224 x 2b-8Denotes the chromaticity scale, 2b-1Represents the color difference signal offset, b represents the quantization depth, Round (·) represents the rounding function;
5. downsampling a 4:4:4 chroma format into a 4:2:0 chroma format, and converting to obtain 8bit, 10bit and 12bit Y' CbCr video sequences of the 4:2:0 chroma format so as to adapt to a subsequent HEVC coding system;
6. the video content in the lower dynamic range is up-sampled in the dynamic range to obtain the HDR video in the higher quantization depth, that is, the HDR video
Figure BDA0001564789160000061
Wherein, VSDR8(x, y) and VHDR10(x, y) represents the pixel value at the coordinate position (x, y) in the 8-bit and 10-bit video frames, respectively, (i.e., (DY', DCb, DCr) described above, which is further subjected to chroma downsampling), VHDR10' (x, y) and VHDR12' (x, y)) means VSDR8(x, y) and VHDR10(x, y) pixel values at (x, y) are processed through dynamic range upsampling,<<2 denotes left-shift by 2bits, and if the resolution of the video frame is L × W, { (x, y) y { (x, y) 0,1, 2., L-1, y { (x, y) 0,1, 2.,. u., W-1 };
7. in order to adapt to the HEVC encoder of SDR video, the residual obtained by decomposition is quantized to RSFs with the same quantization depth as the SDR sequence, so as to represent the difference information between two adjacent dynamic range levels,
Figure BDA0001564789160000062
Figure BDA0001564789160000063
wherein the content of the first and second substances,SRSF1、SRSF2RSFs representing 8bit to 10bit respectively1And RSF of 10bit to 12bit2Q (p) denotes a uniform quantization function, i.e. the normalized residual data is further quantized to a data range of the same quantization depth as the SDR video frame to achieve compatibility with the data coding of the SDR video frame, b denotes the quantization depth, 8 denotes the same level as the SDR video frame data, p denotes the quantization depthmaxAnd pminRespectively representing the maximum value and the minimum value of all pixel values, and simultaneously recording p of each frame of RSFsmaxAnd pmin
8. A Dynamic Range Scalable Model (DRSM) is established according to steps 7 and 8, a HDR video frame is decomposed into an SDR basic frame and a plurality of Residual Signal Frames (RSFs), the RSFs can represent difference information between two adjacent dynamic range levels, and simultaneously record the maximum and minimum values of the original RSFs;
9. according to the brightness masking effect in the visual perception characteristic of human eyes, the human eyes have low detail perception capability on a flat area and low distortion perception capability on a complex area, for example, in fig. 2, the sky in a virtual frame belongs to the flat area, the contained content is relatively smooth, the information is less, and the information insensitive to the human eyes can be filtered by filtering processing corresponding to the area in the RSFs; grassland and people belong to a complex content region, the tolerable distortion of human eyes is large, the information contained in the region in the corresponding RSFs is less, and the filtering processing does not influence the expression of the valuable content of the region;
10. counting the pixel value characteristics of RSFso before RSFs quantization of a Balloon Festival sequence, wherein RSFso represents a residual signal before quantization, RSF1o and RSF2o represent original residual signals of 8bit to 10bit and 10bit to 12bit respectively, and the pixel values of RSF1o and RSF2o are both found in an interval of [ -7,6], mainly concentrated near a 0 value and are integer pixel values, and the maximum and minimum values of RSFso pixels of 20 frames before the Balloon Festival sequence are listed in the following table 1, namely the content of coded transmission as SEI;
11. through statistical analysis of RSFs, a large amount of isolated noise point information is contained in a complex region, isolated data point information which is not easy to be sensed by a user exists in a flat region, and information of easily sensed edge and texture characteristics exists in a region with a foreground and a background;
12. considering that human eyes have a brightness masking effect, namely the human eyes are sensitive to texture and detail information in a single bright area or a single dark area and are insensitive to texture and detail in a scene containing the bright and dark areas at the same time, most HDR video sequence scenes contain the bright and dark areas at the same time through analysis, the RSFs can be preprocessed in a median filtering mode, so that the content of corresponding positions of the RSFs tends to be smooth, and the overall difference characteristic between adjacent dynamic range levels can be reserved;
13. pixel points which have little influence on the perception quality in the RSFs are effectively filtered by using the human eye brightness masking effect, and the total difference which can be reflected by the RSFs is reserved;
14. respectively encoding the processed RSFs sequence and the SDR sequence into a dynamic range hierarchical video code stream through a unified HEVC (high efficiency video coding) encoder, and simultaneously encoding and transmitting the maximum value and the minimum value of the recorded RSFs as auxiliary Enhancement Information (SEI) so as to assist the HDR video reconstruction of a decoding end;
15. SDR video (V) facing 8bit display equipmentRSDR8) Directly decoding by an SDR video code stream HEVC decoder;
16. HDR video (V) for 10-bit and 12-bit display devicesRHDR10And VRHDR12) Can be reconstructed from the reverse process of the DRSM,
Figure BDA0001564789160000071
wherein, VRHDR10(x, y) and VRHDR12(x, y) denotes the pixel value at coordinate position (x, y) in the reconstructed 10-bit and 12-bit video frames, respectively, VRSDR8(x, y) represents the pixel value of an 8-bit SDR decoded video frame at the coordinate position (x, y), and if the resolution of the video frame is L × W, { (x, y) x { (x, y) 0,1, 2., L-1, y { (x, y) 0,1, 2., W-1 }; qinv(p) represents the inverse quantization process of the pixel value p, p is 8, pmaxAnd pminObtained from the SEI information.
Next, the encoding method of the present invention was tested to prove the effectiveness and feasibility of the encoding method of the present invention.
The HDR video test sequences used in the test are all from a recognized test database, provided by MPEG, and are balloon effect, SunRise, Market3 and Tibul2, respectively, the resolution size is 1920 × 1080, the original frame image format is OpenEXR, and the first frame content is as shown in fig. 3.
Table 1 is a table for summarizing the coding rate statistics of balloon effect sequence. Before and after the filtering preprocessing of the RSFs, the consumption difference of the coding code rates is large, when QPs are 12, 17, 22 and 27, experimental tests show that the median filtering windows are respectively 3 × 3, 7 × 7, 11 × 11 and 15 × 15, the situation of coding is carried out by full-frame configuration, and the consumption ratio situations of the coding code rates in the 4 states and the 5 states before the RSFs processing are counted. Taking the balloonestival sequence as an example, when QP is 12, the average code rates of the full intra coding SDR, RSF1, and RSF2 are 58342.58 (7.88%), 304769.44 (41.17%), and 377137.01 (50.95%), respectively. Wherein, SDR code rate containing basic picture content only occupies 7.88% of total code rate, while RSF1 and RSF2 code rate representing difference information between dynamic range grades occupy a ratio as high as 92.12%, and the cost of excessively high code rate consumption is not favorable for application in practical coding transmission. The RSFs are subjected to median filtering preprocessing by combining human visual perception, scattered data points in a local block can be filtered through a set window, the coding rate of the RSFs is effectively reduced, and meanwhile the overall difference between adjacent dynamic range levels is kept. In table 1, non in the Medfilt column indicates that the RSFs are directly encoded without being processed, and W × W (W ═ 3, 7, 11, 15) indicates the size of the median filter window, and all RSFs are encoded after being filter-preprocessed. In the table, the influence of filtering preprocessing of different degrees on the consumption of coding code rate under different QPs is counted, the code rate ratio under each condition is calculated, and finally, the rate of reducing the code rate under each condition of filtering preprocessing relative to the rate under the condition of not processing under the same QP is calculated. As can be seen from the data in the table, the RSFs are subjected to filtering pretreatment and then are encoded, so that the code rate can be reduced to a large extent, and compared with the code rate which is not directly encoded, the code rate is reduced by 88.18% to the maximum extent.
TABLE 1
Figure BDA0001564789160000081
Figure BDA0001564789160000091
Table 2 shows the BD-rate (%) for the methods of the present invention and the original reference platform. Scheme one deployed 1 represents an encoding scheme that employs a 15 x 15 window filtering process; scheme two deployed 2 represents an encoding scheme that employs an 11 x 11 window filtering process. The rate-distortion performance of the first scheme is optimal, the code rates are averagely saved by 32.03% and 31.28%, the highest code rate is saved by 59.0%, and the code rates are averagely saved by 4.05% and 4.30%. The BD-rate change fluctuation is large because isolated data points in the RSFs are effectively removed through filtering preprocessing, intra-frame correlation is improved, code rate is obviously reduced, and the reconstruction quality is not greatly influenced. The RSFs of the balloon effect sequence contain a large amount of gradual change information, a large code rate can still be consumed through quantization and filtering, and the performance of the first scheme is similar to that of HM-16.4. The RSFs of the SunRise sequence contain more information in lighter and darker areas, the filtering processing can well remove isolated noise points and reserve valuable contents, and the optimal method code rate saves 25.5% and 26.5%. The content information contained in the RSFs of the mark 3 sequence is less and gentler, after isolated noise points are filtered, the coding correlation of the RSFs is greatly improved, the coding rate is reduced, the optimal method code rate is saved by 59.0% and 59.1%, and the suboptimal method code rate is also saved by 49.8% and 50.3%. The RSFs of the Tibul2 sequence have more information in the edge region and the uneven surface, the filtering processing can effectively filter the meaningless noise of the uneven surface, and the optimal method code rate saves 39.1% and 38.3%.
TABLE 2
Figure BDA0001564789160000092
Figure BDA0001564789160000101
Table 3 shows BD-rate results (%) for different filter processing schemes compared to no processing scheme. In order to study the influence of the RSFs on the scalable coding performance before and after filtering processing, 4 different filtering processing schemes are compared with a scheme without filtering processing, and after coding reconstruction, the BD-rate measured by PSNR and HDR-VDP-2.2 is used for representing. Here, propofol 1 represents a scheme using 3 × 3 window filtering processing, propofol 2 represents a scheme using 7 × 7 window filtering processing, propofol 3 represents a scheme using 11 × 11 window filtering processing, and propofol 4 represents a scheme using 15 × 15 window filtering processing. Compared with the scheme without filtering after DRSM, each filtering scheme saves a lot of code rates, which further illustrates that appropriate filtering can effectively remove meaningless scattered data points, increase the correlation of RSFs intraframe coding, save the code rates, and simultaneously retain the overall difference information between dynamic range levels for reconstruction.
TABLE 3
Figure BDA0001564789160000102
Fig. 4, 5, 6 and 7 are rate-distortion curves plotted according to HDR-VDP-2.2 quality index and code rate consumption. The method of directly encoding all sequences generated by DRSM ranking is herein denoted as deployed 0. Fig. 4 is a graph of a rate-distortion curve for reconstructing a 12-bit balloon estimation sequence, where the performance of the deployed 0 is much lower than that of the HM platform, and the coding performance can be improved by proper filtering preprocessing, and both the deployed 3 and the deployed 4 are close to or better than the HM platform coding algorithm; fig. 5 is a rate-distortion curve for reconstructing a 12bit sunrise sequence, and the influence on the rate-distortion performance is not large when the filter window is increased to a certain value, which indicates that a saturation threshold exists in the filter window, and the performance of the deployed 3 and the deployed 4 is slightly better than that of the HM platform coding algorithm; fig. 6 is a rate-distortion curve for reconstructing a 12-bit Market3 sequence, where the performance is improved slowly when the filtering window is large, the RSFs of the sequence include less inter-stage difference information of dynamic range and are filtered more by filtering, but the rate-distortion performance of the deployed 2, deployed 3, and deployed 4 is improved greatly compared with the HM platform; fig. 7 is a rate-distortion curve for reconstructing a 12-bit Tibul2 sequence, the performance of the deployed 1 is also poor, the rate-distortion performance is improved more and more when the filter window is larger, and the performance of the deployed 3 and the deployed 4 is generally better than that of the coding algorithm of the HM platform.

Claims (5)

1. A multi-user-oriented HDR video dynamic range scalable coding method is characterized by comprising the following steps: comprises the following steps:
(1) subjecting the input HDR video to a conversion process based on perceptual quantization to convert to an HDR video sequence of multiple dynamic range levels represented with different quantization depths;
(2) decomposing an HDR video frame into an SDR base frame and a plurality of residual signal frames by establishing a dynamic range scalable model, wherein the residual signal frames represent difference information between two adjacent dynamic range levels, and simultaneously recording the maximum value and the minimum value of the original residual signal frames;
(3) according to statistical analysis and perception characteristic analysis, carrying out median filtering pretreatment on a residual signal frame sequence, filtering out pixel points with small influence on perception quality in the residual signal frame by using human eye brightness masking effect, and keeping the total difference reflected by the residual signal frame;
(4) coding the processed residual signal frame sequence and the SDR sequence into a video code stream with a hierarchical dynamic range through a unified HEVC (high efficiency video coding) coder respectively, and simultaneously carrying out coding transmission by taking the maximum value and the minimum value of the residual signal frame as auxiliary enhancement information so as to assist the HDR video reconstruction at a decoding end;
(5) and decoding and reconstructing the video through the inverse process of the dynamic range scalable model at a decoding end to obtain SDR and HDR videos with different dynamic range quantization depths so as to realize that the HDR video content is suitable for being displayed on multi-user-end MDR display equipment.
2. The multi-user-oriented HDR video dynamic range scalable encoding method as claimed in claim 1, wherein: in step (1), the specific method for subjecting the input HDR video to the transform process based on perceptual quantization includes the following steps:
converting HDR-RGB image data in an original OpenEXR format into R ' G ' B ' in a perception domain through a perceptually quantized nonlinear function;
second, color space conversion from R 'G' B 'to Y' CbCr is achieved via a 3 × 3 conversion matrix;
quantizing the converted data into integer data with different bit depths, namely:
Figure FDA0003252197960000011
wherein, (Y ', Cb, Cr) represents floating point type data in 4:4:4 chroma format obtained by color space conversion, (DY', DCb, DCr) represents quantized integer type data, Clip3(·) represents clipping function with two directional restrictions, 219 × 2b-8Represents the brightness scale, 2b-4Representing the luminance signal offset, 224 x 2b-8Denotes the chromaticity scale, 2b-1Represents the color difference signal offset, b represents the quantization depth, Round (·) represents the rounding function;
and fourthly, sampling the 4:4:4 chroma format into a 4:2:0 chroma format, and converting the 4:2:0 chroma format into a Y' CbCr video sequence to adapt to a subsequent HEVC coding system.
3. The multi-user-oriented HDR video dynamic range scalable encoding method as claimed in claim 1, wherein: in the step (2), the specific process of establishing a dynamic range scalable model is as follows:
firstly, performing dynamic range up-sampling on the video content of a lower-level dynamic range to obtain an HDR video of a higher-level quantization depth, namely: vd'(x,y)=Vd-Δd(x, y) < 2, d ∈ {10,12,14,16}, Δ d ═ 2, where V isd' means by Vd-ΔdHDR video sequence obtained by dynamic range up-sampling has dynamic range more than Vd-ΔdFirst order, Vd-ΔdAn HDR video sequence representing a lower level dynamic range;
secondly, the HDR video with higher quantization depth obtained in the step one is subjected to subtraction with the HDR video sequence with the same dynamic range level obtained by original conversion, and in order to adapt to an HEVC (high efficiency video coding) coder of an SDR (standard definition) video, residual errors obtained by subtraction are quantized to the HDR video sequence with the same quantization depth as the SDR video sequenceResidual signal frames representing difference information between two adjacent dynamic range levels after quantization, i.e.
Figure FDA0003252197960000021
Wherein RSFiRepresenting the difference information, V, between two adjacent dynamic range levels before quantizationd' means Vd-ΔdHDR video sequence, V, obtained by dynamic range up-samplingdHDR video sequence representing the same dynamic range level as the original one, p representing the pixel value, p ═ Vd(x,y)-Vd' (x, y)), if the video resolution is L × W, { (x, y) | x { (x, y) | 0,1, 2., L-1, y ═ 0,1, 2.,. W-1 }; q (p) represents a uniform quantization function, i.e. the normalized residual data is further quantized to a data range of the same quantization depth as the SDR video frame to achieve compatibility with the data coding of the SDR video frame, b represents the quantization depth, 8 represents the same level as the SDR video frame data, pmaxAnd pminRespectively representing the maximum value and the minimum value of all pixel values, and simultaneously recording p of each frame in a plurality of residual signal frames RSFs before quantizationmaxAnd pmin
4. The multi-user-oriented HDR video dynamic range scalable encoding method as claimed in claim 1, wherein: in step (3), the specific process of performing median filtering on the residual signal frame sequence is as follows:
according to the brightness masking effect in the visual perception characteristic of human eyes, the human eyes have lower detail perception capability on a flat area and lower distortion perception capability on a complex area, the flat area in a picture is taken, and information insensitive to the human eyes is filtered out through filtering processing corresponding to the area in a residual signal frame; taking a complex content area in a picture, wherein the area in a corresponding residual signal frame contains less information, and filtering processing cannot influence the expression of valuable contents of the area;
counting and recording pixel value characteristics of a residual signal frame of the balloon sequence before quantization;
through statistical analysis of a residual signal frame, a large amount of isolated noise point information is contained in a complex region, isolated data point information which is not easy to be sensed by a user exists in a flat region, and information of easily sensed edge and texture characteristics exists in a region with a foreground and a background;
considering that human eyes have a brightness masking effect, namely the human eyes are sensitive to texture and detail information in a single bright area or a single dark area and are insensitive to texture and detail in a scene containing the bright and dark areas at the same time, most HDR video sequence scenes contain the bright and dark areas at the same time through analysis, residual signal frames can be preprocessed in a median filtering mode, the content of corresponding positions of the residual signal frames tends to be smooth, and the overall difference characteristic between adjacent dynamic range levels can be reserved.
5. The multi-user-oriented HDR video dynamic range scalable encoding method as claimed in claim 1, wherein: in the step (5), the specific process of performing HDR video reconstruction by the inverse process of the dynamic range scalable model DRSM at the decoding end is as follows:
the SDR video facing the standard dynamic range display equipment is obtained by directly decoding an SDR video code stream through an HEVC decoder;
the HDR video for the high dynamic range display device can be reconstructed by the inverse process of DRSM, that is:
Figure FDA0003252197960000031
wherein S isRSFiRepresenting the difference information between two adjacent dynamic range levels after quantization,
Figure FDA0003252197960000032
representing a reconstructed HDR video sequence with a dynamic range of d bits,
Figure FDA0003252197960000033
representing a reconstructed HDR video sequence with a lower level dynamic range of d-ad,
Figure FDA0003252197960000034
represents the pixel value at the coordinate position (x, y) in the reconstructed d-bit video frame, and if the resolution of the video frame is L × W, { (x, y) | x ═ 0,1,2, · L-1, y ═ 0,1,2, ·, W-1},
Figure FDA0003252197960000037
represents the pixel value, Q, at coordinate position (x, y) in the reconstructed (d- Δ d) bit video frameinv(p) represents the inverse quantization process of the pixel value p, i.e.:
Figure FDA0003252197960000036
wherein p ismaxAnd pminRespectively representing the maximum and minimum of all pixel values, pmaxAnd pminIs obtained from the auxiliary enhancement information, b represents the quantization depth.
CN201810094956.0A 2018-01-31 2018-01-31 Multi-user-oriented HDR video dynamic range scalable coding method Active CN108337516B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810094956.0A CN108337516B (en) 2018-01-31 2018-01-31 Multi-user-oriented HDR video dynamic range scalable coding method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810094956.0A CN108337516B (en) 2018-01-31 2018-01-31 Multi-user-oriented HDR video dynamic range scalable coding method

Publications (2)

Publication Number Publication Date
CN108337516A CN108337516A (en) 2018-07-27
CN108337516B true CN108337516B (en) 2022-01-18

Family

ID=62926894

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810094956.0A Active CN108337516B (en) 2018-01-31 2018-01-31 Multi-user-oriented HDR video dynamic range scalable coding method

Country Status (1)

Country Link
CN (1) CN108337516B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108769677B (en) * 2018-05-31 2021-07-30 宁波大学 High dynamic range video dynamic range scalable coding method based on perception
CN110933416B (en) * 2019-11-12 2021-07-20 宁波大学 High dynamic range video self-adaptive preprocessing method
CN112261442B (en) * 2020-10-19 2022-11-11 上海网达软件股份有限公司 Method and system for real-time transcoding of HDR (high-definition link) and SDR (short-definition link) of video
CN113225553B (en) * 2021-04-18 2022-09-06 南京理工大学 Method for predicting optimal threshold point in high dynamic video double-layer backward compatible coding system
CN114173189B (en) * 2021-10-29 2023-02-07 荣耀终端有限公司 Video editing method, electronic device and storage medium
CN114549359B (en) * 2022-02-24 2024-04-02 中国传媒大学 HDR video dynamic range non-reference quality evaluation method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105828089A (en) * 2016-01-31 2016-08-03 西安电子科技大学 Video coding method based on self-adaptive perception quantization and video coding system thereof
WO2017015397A1 (en) * 2015-07-22 2017-01-26 Dolby Laboratories Licensing Corporation Video coding and delivery with both spatial and dynamic range scalability
CN107197235A (en) * 2017-06-26 2017-09-22 杭州当虹科技有限公司 A kind of HDR video pre-filterings method
CN107211152A (en) * 2015-01-30 2017-09-26 汤姆逊许可公司 For to HDR(HDR)The method and apparatus that video is coded and decoded
CN107438182A (en) * 2016-05-04 2017-12-05 汤姆逊许可公司 Method and apparatus for being encoded/decoded to HDR picture

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016172394A1 (en) * 2015-04-21 2016-10-27 Arris Enterprises Llc Adaptive perceptual mapping and signaling for video coding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107211152A (en) * 2015-01-30 2017-09-26 汤姆逊许可公司 For to HDR(HDR)The method and apparatus that video is coded and decoded
WO2017015397A1 (en) * 2015-07-22 2017-01-26 Dolby Laboratories Licensing Corporation Video coding and delivery with both spatial and dynamic range scalability
CN105828089A (en) * 2016-01-31 2016-08-03 西安电子科技大学 Video coding method based on self-adaptive perception quantization and video coding system thereof
CN107438182A (en) * 2016-05-04 2017-12-05 汤姆逊许可公司 Method and apparatus for being encoded/decoded to HDR picture
CN107197235A (en) * 2017-06-26 2017-09-22 杭州当虹科技有限公司 A kind of HDR video pre-filterings method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
High dynamic range video compression exploiting luminance masking;Zhang Y et al.;《IEEE Transactions on Circuits and Systems for Video Technology》;20150427;第26卷(第5期);第950-964页 *
融合视觉感知特性的HDR视频编码率失真优化算法;杨桐 等.;《光电工程》;20180115;第45卷(第01期);第83-93页 *

Also Published As

Publication number Publication date
CN108337516A (en) 2018-07-27

Similar Documents

Publication Publication Date Title
CN108337516B (en) Multi-user-oriented HDR video dynamic range scalable coding method
US11375193B2 (en) System for coding high dynamic range and wide color gamut sequences
US20240048739A1 (en) Adaptive perceptual mapping and signaling for video coding
CN108769677B (en) High dynamic range video dynamic range scalable coding method based on perception
US10506232B2 (en) System for reshaping and coding high dynamic range and wide color gamut sequences
US9544610B2 (en) High dynamic range codecs
JP2018524871A5 (en)
RU2758035C2 (en) Method and device for reconstructing image data by decoded image data
JP2010531584A (en) Method and apparatus for encoding and / or decoding video data using enhancement layer residual prediction for bit depth scalability
CN1695381A (en) Sharpness enhancement in post-processing of digital video signals using coding information and local spatial features
US8422779B2 (en) Image data processing for more efficient compression
MX2011000692A (en) Systems and methods for highly efficient video compression using selective retention of relevant visual detail.
CN110545426B (en) Spatial domain scalable video coding method based on coding damage repair (CNN)
Kumar et al. Human visual system based enhanced AMBTC for color image compression using interpolation
CN108900838B (en) Rate distortion optimization method based on HDR-VDP-2 distortion criterion
US7702161B2 (en) Progressive differential motion JPEG codec
Boitard et al. Impact of temporal coherence-based tone mapping on video compression
Lauga et al. Segmentation-based optimized tone mapping for high dynamic range image and video coding
CN111882564A (en) Compression processing method for ultra-high definition medical pathological image
Lu et al. Compression efficiency improvement over HEVC main 10 profile for HDR and WCG content
WO2019203973A1 (en) Method and device for encoding an image or video with optimized compression efficiency preserving image or video fidelity
EP3026908A1 (en) Method and device for quantizing and de-quantizing a picture using scaling factors for chrominance based on luminance
EP3308541B1 (en) System for coding high dynamic range and wide color gamut sequences
Boitard et al. Chroma scaling for high dynamic range video compression
Lauga et al. Region-based tone mapping for efficient High Dynamic Range video coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230721

Address after: 100000 No. 159, Yard 1, Shuiding Road, Miaofengshan, Mentougou District, Beijing

Patentee after: Beijing Yiqilin Cultural Media Co.,Ltd.

Address before: Building C, No.888, Huanhu West 2nd Road, Lingang New District, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Patentee before: Shanghai ruishenglian Information Technology Co.,Ltd.

Effective date of registration: 20230721

Address after: Room 407, 4th Floor, No. 10 Shangdi Information Road, Haidian District, Beijing, 100000

Patentee after: Nantian Shujin (Beijing) Information Industry Development Co.,Ltd.

Address before: 100000 No. 159, Yard 1, Shuiding Road, Miaofengshan, Mentougou District, Beijing

Patentee before: Beijing Yiqilin Cultural Media Co.,Ltd.

Effective date of registration: 20230721

Address after: Building C, No.888, Huanhu West 2nd Road, Lingang New District, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Patentee after: Shanghai ruishenglian Information Technology Co.,Ltd.

Address before: 315211, Fenghua Road, Jiangbei District, Zhejiang, Ningbo 818

Patentee before: Ningbo University

TR01 Transfer of patent right