CN108337516B

CN108337516B - Multi-user-oriented HDR video dynamic range scalable coding method

Info

Publication number: CN108337516B
Application number: CN201810094956.0A
Authority: CN
Inventors: 蒋刚毅; 陈璐俊; 郁梅
Original assignee: Ningbo University
Current assignee: Beijing Yiqilin Cultural Media Co ltd; Nantian Shujin Beijing Information Industry Development Co ltd; Shanghai Ruishenglian Information Technology Co ltd
Priority date: 2018-01-31
Filing date: 2018-01-31
Publication date: 2022-01-18
Anticipated expiration: 2038-01-31
Also published as: CN108337516A

Abstract

The invention relates to a multi-user-oriented HDR video dynamic range scalable coding method, which comprises the steps of firstly, considering the perceptual characteristic of an HDR video, providing a dynamic range scalable model, and decomposing the HDR video with different dynamic range levels into a standard dynamic range video and a plurality of residual signal frame sequences; then, combining the brightness masking effect and the human eye perception characteristic, quantizing and filtering a plurality of residual signal frame sequences, filtering scattered data points in the residual, retaining the overall difference information between adjacent dynamic range levels of the residual signal frame sequences, and improving the coding efficiency of the residual signal frame sequences; and finally, reconstructing at a decoding end to obtain a standard dynamic range video and HDR videos of various dynamic range levels so as to adapt to multi-user-end display equipment. The method also inhibits noise by using visual masking characteristics, and improves the efficiency of the dynamic range scalable coding.

Description

Multi-user-oriented HDR video dynamic range scalable coding method

Technical Field

The invention relates to the technical field of video coding and decoding, in particular to a multi-user-oriented HDR video dynamic range scalable coding method.

Background

High Dynamic Range (HDR) videos are a new research hotspot in the field of digital media, and can reflect real scenes more truly and bring more realistic visual experience to users. The display device at the user end of the traditional video system mainly adopts an 8bit Standard Dynamic Range (SDR) display, and can not directly display HDR video. Few existing HDR display apparatuses display or partially display HDR video by expanding a luminance range, but the dynamic ranges of these HDR display apparatuses are uneven, with 10 bits, 12bits, and the like. In existing HDR video coding transmission, it is difficult for the same HDR video stream to simultaneously meet the requirements of multiple users for different dynamic range display devices. This requires the video system server to fully consider the requirement of the end user to display Multi Dynamic Range (MDR) video during the HDR video encoding process. Obviously, implementing dynamic range scalable coding of HDR video is an effective way to meet the requirements of MDR video.

SDR video is generally stored in 8bit quantization depth, and the dynamic range is fixed and unchanged. The existing video scalable coding technology mainly considers scalability in time, space and quality to meet the requirements of users for adjusting different frame rates, image resolutions and image quality levels. In10 months of 2014, the Joint Video Team on Video Coding (JCT-VC) of Video Coding released the Scalable High Efficiency Video Coding (SHVC) standard on the basis of the High Efficiency Video Coding (HEVC) standard, and mainly improved the specific Coding tool and decoding process of Scalable Coding, wherein the Scalable Main and Scalable Main10 grades respectively realize the Scalable Coding of the SDR Video and the HDR Video in terms of space-time and quality. In 2 months 2015, in order to improve the compression performance of SHVC on HDR video, the Moving Picture Experts Group (MPEG) initiated a technical research draft (CfE), and it is hoped to implement efficient coding of HDR and Wide Color Gamut (WCG) video by extending the HEVC standard and adding new technology. Some techniques related to HDR video image content processing, such as Perceptual Quantization (PQ) coding, nonlinear operation of logarithmic Gamma distribution (HLG), and development of HDR video subjective test experiments, have been proposed.

Although the extended version of the existing HEVC can encode HDR video, it is a coding standard proposed by SDR video, mainly encoding video with fixed dynamic range singly, and has great limitation in dynamic range scalability. To this end, some scholars have also proposed new schemes for HDR video coding. Rusannovskyy et al propose to encode HDR video using a Dynamic Range Adaptation (DRA) backward compatible encoding scheme, which improves performance compared to a non-backward compatible encoding scheme; mir et al proposed an improved scheme for dual-layer backward compatible HDR video coding and compared with single-layer HLG performance, but the existing backward compatible coding methods still have disadvantages, they can only meet the requirements of users with two different dynamic ranges, and cannot realize joint coding of videos with different dynamic ranges. In view of the dynamic range scalability required by MDR users, there is also a need to enable high efficiency HDR video coding with user-side display device friendly compatibility with the HEVC standard. In addition, the quantization depth of the video pixel expresses the dynamic range characteristic thereof, and the larger the quantization depth is, the wider the dynamic range can be expressed; HDR display devices with pixel quantization depths of 10bit and 12bit exist on the market today, and it is believed that people will develop HDR display devices with higher dynamic ranges in the near future. Therefore, in order to realize HDR video coding transmission for MDR users, research on an HDR video dynamic range scalable coding technology compatible with the HEVC standard is an effective approach to simultaneously meet the HDR video streaming service requirements of MDR users.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a multi-user-oriented HDR video dynamic range scalable coding method, which can enable the same HDR video stream to be displayed on different dynamic range display devices of multiple users at the same time.

The technical scheme adopted by the invention is that a multi-user-oriented HDR video dynamic range scalable coding method comprises the following steps:

(1) converting the input HDR video into an HDR video sequence with a plurality of dynamic range levels represented by different quantization depths through a conversion process based on Perception Quantization (PQ);

(2) decomposing an HDR video frame into an SDR basic frame and a plurality of Residual Signal Frames (RSFs) by establishing a Dynamic Range Scalable Model (DRSM), wherein the RSFs represent difference information between two adjacent dynamic range levels and record the maximum value and the minimum value of the original RSFs;

(3) carrying out median filtering pretreatment on the RSFs sequence according to statistical analysis and perception characteristic analysis, filtering out pixel points which have little influence on perception quality in the RSFs by using human eye brightness masking effect, and keeping the total difference which can be reflected by the RSFs;

(4) the processed RSFs sequence and the SDR sequence are respectively encoded into a dynamic range hierarchical video code stream through a unified HEVC encoder, and meanwhile, the maximum value and the minimum value of the recorded RSFs are used as auxiliary Enhancement Information (SEI) for encoding and transmission so as to assist the HDR video reconstruction of a decoding end;

(5) and decoding and reconstructing the video to obtain SDR and HDR videos with different dynamic range quantization depths through the inverse process of DRSM at a decoding end so as to realize that the HDR video content is suitable for being displayed on multi-user-end MDR display equipment.

The invention has the beneficial effects that: the method decomposes HDR video stream into a standard dynamic range SDR video and a plurality of residual signal frame RSFs sequences by considering a dynamic range clustering model DRSM of HDR video perception characteristics to form a code stream with a dynamic range grading, thereby meeting the requirements of multi-user multi-dynamic range display equipment; meanwhile, filtering processing is carried out on the RSFs by combining the brightness masking effect and the human eye perception characteristic, the coding efficiency of the RSFs is improved, and the efficiency of the coding method is further improved.

In the step (1), a specific method for performing Perceptual Quantization (PQ) -based conversion processing on an input HDR video is as follows: comprises the following steps:

firstly, HDR-RGB image data in an original OpenEXR format is converted into RG' B in a perception domain through a non-linear function of PQ;

secondly, realizing color space conversion from R 'G' B 'to Y' CbCr through a 3 multiplied by 3 conversion matrix;

thirdly, quantizing the converted data into integer data with different bit depths, namely:

wherein, (Y ', Cb, Cr) represents 4:4:4 floating-point data obtained by color space conversion, (DY', DCb, DCr) represents quantized integer data, Clip3(·) represents clipping functions of two directional restrictions, 219*2^b-8Represents the brightness scale, 2^b-4Representing the luminance signal offset, 224 x 2^b-8Denotes the chromaticity scale, 2^b-1Represents the color difference signal offset, b represents the quantization depth, Round (·) represents the rounding function;

and fourthly, sampling the 4:4:4 chroma format into a 4:2:0 chroma format, and converting the 4:2:0 chroma format into a Y' CbCr video sequence to adapt to a subsequent HEVC coding system.

In the step (2), the specific process of establishing a dynamic range scalable model DRSM is as follows:

firstly, performing dynamic range up-sampling on video content in a lower-level dynamic range to obtain an HDR video in a higher-level quantization depth, namely: v_d'(x,y)＝V_d-Δd(x,y)＜＜2，d∈{10,12,14,16},Δd＝2，V_d' means by V_d-ΔdHDR video sequence obtained by dynamic range up-sampling has dynamic range more than V_d-ΔdThe height is higher by one level;

secondly, making a difference with the originally converted HDR video sequence with the same dynamic range level, quantizing the residual error obtained by decomposition into RSFs with the same quantization depth as the SDR sequence in order to adapt to the HEVC encoder of the SDR video, and using the RSFs to represent the difference information between two adjacent dynamic range levels, namely

d∈{10,12,14,16}，i∈N*，V_d' means by V_d-ΔdHDR video sequence, V, obtained by dynamic range up-sampling_dRepresenting the original HDR video sequence, i.e. further quantizing the normalized residual data to a data range of the same quantization depth as the SDR video frame, to achieve compatibility with data encoding of the SDR video frame;

in the step (3), the specific process of performing median filtering on the RSFs sequence is as follows:

according to the brightness masking effect in the visual perception characteristic of human eyes, the human eyes have low detail perception capability on a flat area and low distortion perception capability on a complex area, the flat area in a picture is taken, and information insensitive to the human eyes can be filtered through filtering processing corresponding to the area in RSFs; taking a complex content region in a picture, wherein the region in the corresponding RSFs contains less information, and the filtering processing does not influence the expression of valuable contents of the region;

secondly, counting the pixel value characteristics of RSFso before the RSFs quantization of the balloon effectiveness sequence;

through statistical analysis of RSFs, the method discovers that a large amount of isolated noise point information is contained in a complex region, isolated data point information which is not easy to be sensed by a user exists in a flat region, and information of edge and texture characteristics which are easy to be sensed exists in a region with a foreground and a background;

considering that human eyes have a brightness masking effect, namely the human eyes are sensitive to texture and detail information in a single bright area or a single dark area and are insensitive to texture and detail in a scene containing the bright and dark areas at the same time, most HDR video sequence scenes contain the bright and dark areas at the same time through analysis, and the RSFs can be preprocessed in a median filtering mode to enable the content of corresponding positions of the RSFs to tend to be smooth and the overall difference characteristic between adjacent dynamic range levels can be reserved;

in the step (5), the specific process of performing HDR video reconstruction at the decoding end through the inverse process of DRSM is as follows:

firstly, an SDR video facing a standard dynamic range display device is obtained by directly decoding an SDR video code stream through an HEVC decoder;

secondly, the HDR video for the high dynamic range display device can be obtained by reconstructing an inverse process of the DRSM, that is:

d e {10,12,14,16}, Δ d2, i e N, wherein,

representing a reconstructed HDR video sequence with a dynamic range of dbit,

representing a reconstructed HDR video sequence with a lower level dynamic range of d-ad,

representing a pixel value at a coordinate position (x, y) in a reconstructed dbit video frame, and if the resolution of the video frame is L × W, { (x, y) x ═ 0,1,2,. said., L-1, y ═ 0,1,2,. said., W-1 };

the inverse quantization process representing the pixel value p, namely:

wherein p is_maxAnd p_minIs obtained from the auxiliary enhancement information.

Drawings

FIG. 1 is a block diagram of an overall implementation of a multi-user-oriented HDR video dynamic range scalable encoding method according to the present invention;

FIG. 2 is a diagram of a dynamic range scalable model of the present invention, as exemplified by frame 1 of the balloon effectiveness sequence;

FIG. 3 is a diagram of a HDR video test sequence employed to test the encoding method of the present invention;

FIG. 4 is a graph comparing rate-distortion performance curves for the sequence balloon effectiveness;

FIG. 5 is a graph comparing rate-distortion performance curves for the sequence SunRise;

FIG. 6 is a graph comparing rate-distortion performance curves for sequence Market 3;

fig. 7 is a graph comparing the rate-distortion performance curves of the sequence Tibul 2.

Detailed Description

The invention is further described below with reference to the accompanying drawings in combination with specific embodiments so that those skilled in the art can practice the invention with reference to the description, and the scope of the invention is not limited to the specific embodiments.

The invention relates to a multi-user-oriented HDR video dynamic range scalable coding method, which comprises the following steps:

1. converting an input HDR video into an HDR video sequence with a plurality of dynamic range levels represented by different quantization depths (such as 8bit, 10bit, 12bit and the like) through a conversion process based on Perception Quantization (PQ);

2. in order to enable the existing MDR display equipment to bring high-quality HDR video pictures to users, a Dynamic Range Scalable Model (DRSM) is provided, one HDR video frame is decomposed into one SDR basic frame and a plurality of Residual Signal Frames (RSFs), and the RSFs can represent difference information between two adjacent Dynamic Range levels;

3. perceptual filtering preprocessing is carried out on the RSFs sequence, and then a dynamic range hierarchical video code stream is formed through an HEVC (high efficiency video coding) coder suitable for an SDR (standard definition extension) video together with the SDR sequence;

4. coding and transmitting the maximum value and the minimum value of the RSFs as auxiliary enhancement information (SEI) so as to assist the HDR video reconstruction of a decoding end;

5. and decoding and reconstructing the video to obtain SDR and HDR videos with different dynamic range quantization depths through the inverse process of DRSM at a decoding end so as to realize that HDR video content can adapt to be displayed on MDR display equipment with multiple user ends.

Fig. 1 is a general implementation block diagram of a multi-user-oriented HDR video dynamic range scalable coding method, which takes luminance depths of 8 bits, 10 bits, and 12bits as examples, and the specific implementation steps are as follows:

1. the input HDR video is converted into HDR video sequences with a plurality of dynamic range levels represented by different quantization depths through a conversion process based on Perception Quantization (PQ), wherein the HDR video sequences are respectively marked as V by taking 8bit, 10bit and 12bit brightness depths as examples_{SDR_8bit}、V_{SDR_10bit}、V_{SDR_12bit}；

2. Converting the HDR-RGB image data in the original OpenEXR format into R ' G ' B ' of a perception domain through a non-linear function of PQ;

3. color space conversion from R 'G' B 'to Y' CbCr is achieved via a 3 x 3 conversion matrix;

4. the converted data is quantized into 8bit, 10bit and 12bit integer data,

wherein, (Y ', Cb, Cr) represents 4:4:4 floating-point data obtained by color space conversion, (DY', DCb, DCr) represents quantized integer data, and Clip3(·) represents two directional constraints (i.e. 0-2)^b-1) Of (3) a clipping function of 219 x 2^b-8Represents the brightness scale, 2^b-4Representing the luminance signal offset, 224 x 2^b-8Denotes the chromaticity scale, 2^b-1Represents the color difference signal offset, b represents the quantization depth, Round (·) represents the rounding function;

5. downsampling a 4:4:4 chroma format into a 4:2:0 chroma format, and converting to obtain 8bit, 10bit and 12bit Y' CbCr video sequences of the 4:2:0 chroma format so as to adapt to a subsequent HEVC coding system;

6. the video content in the lower dynamic range is up-sampled in the dynamic range to obtain the HDR video in the higher quantization depth, that is, the HDR video

Wherein, V_SDR8(x, y) and V_HDR10(x, y) represents the pixel value at the coordinate position (x, y) in the 8-bit and 10-bit video frames, respectively, (i.e., (DY', DCb, DCr) described above, which is further subjected to chroma downsampling), V_HDR10' (x, y) and V_HDR12' (x, y)) means V_SDR8(x, y) and V_HDR10(x, y) pixel values at (x, y) are processed through dynamic range upsampling,<<2 denotes left-shift by 2bits, and if the resolution of the video frame is L × W, { (x, y) y { (x, y) 0,1, 2., L-1, y { (x, y) 0,1, 2.,. u., W-1 };

7. in order to adapt to the HEVC encoder of SDR video, the residual obtained by decomposition is quantized to RSFs with the same quantization depth as the SDR sequence, so as to represent the difference information between two adjacent dynamic range levels,

wherein the content of the first and second substances,S_RSF1、S_RSF2RSFs representing 8bit to 10bit respectively₁And RSF of 10bit to 12bit₂Q (p) denotes a uniform quantization function, i.e. the normalized residual data is further quantized to a data range of the same quantization depth as the SDR video frame to achieve compatibility with the data coding of the SDR video frame, b denotes the quantization depth, 8 denotes the same level as the SDR video frame data, p denotes the quantization depth_maxAnd p_minRespectively representing the maximum value and the minimum value of all pixel values, and simultaneously recording p of each frame of RSFs_maxAnd p_min；

8. A Dynamic Range Scalable Model (DRSM) is established according to steps 7 and 8, a HDR video frame is decomposed into an SDR basic frame and a plurality of Residual Signal Frames (RSFs), the RSFs can represent difference information between two adjacent dynamic range levels, and simultaneously record the maximum and minimum values of the original RSFs;

9. according to the brightness masking effect in the visual perception characteristic of human eyes, the human eyes have low detail perception capability on a flat area and low distortion perception capability on a complex area, for example, in fig. 2, the sky in a virtual frame belongs to the flat area, the contained content is relatively smooth, the information is less, and the information insensitive to the human eyes can be filtered by filtering processing corresponding to the area in the RSFs; grassland and people belong to a complex content region, the tolerable distortion of human eyes is large, the information contained in the region in the corresponding RSFs is less, and the filtering processing does not influence the expression of the valuable content of the region;

10. counting the pixel value characteristics of RSFso before RSFs quantization of a Balloon Festival sequence, wherein RSFso represents a residual signal before quantization, RSF1o and RSF2o represent original residual signals of 8bit to 10bit and 10bit to 12bit respectively, and the pixel values of RSF1o and RSF2o are both found in an interval of [ -7,6], mainly concentrated near a 0 value and are integer pixel values, and the maximum and minimum values of RSFso pixels of 20 frames before the Balloon Festival sequence are listed in the following table 1, namely the content of coded transmission as SEI;

11. through statistical analysis of RSFs, a large amount of isolated noise point information is contained in a complex region, isolated data point information which is not easy to be sensed by a user exists in a flat region, and information of easily sensed edge and texture characteristics exists in a region with a foreground and a background;

12. considering that human eyes have a brightness masking effect, namely the human eyes are sensitive to texture and detail information in a single bright area or a single dark area and are insensitive to texture and detail in a scene containing the bright and dark areas at the same time, most HDR video sequence scenes contain the bright and dark areas at the same time through analysis, the RSFs can be preprocessed in a median filtering mode, so that the content of corresponding positions of the RSFs tends to be smooth, and the overall difference characteristic between adjacent dynamic range levels can be reserved;

13. pixel points which have little influence on the perception quality in the RSFs are effectively filtered by using the human eye brightness masking effect, and the total difference which can be reflected by the RSFs is reserved;

14. respectively encoding the processed RSFs sequence and the SDR sequence into a dynamic range hierarchical video code stream through a unified HEVC (high efficiency video coding) encoder, and simultaneously encoding and transmitting the maximum value and the minimum value of the recorded RSFs as auxiliary Enhancement Information (SEI) so as to assist the HDR video reconstruction of a decoding end;

15. SDR video (V) facing 8bit display equipment_RSDR8) Directly decoding by an SDR video code stream HEVC decoder;

16. HDR video (V) for 10-bit and 12-bit display devices_RHDR10And V_RHDR12) Can be reconstructed from the reverse process of the DRSM,

wherein, V_RHDR10(x, y) and V_RHDR12(x, y) denotes the pixel value at coordinate position (x, y) in the reconstructed 10-bit and 12-bit video frames, respectively, V_RSDR8(x, y) represents the pixel value of an 8-bit SDR decoded video frame at the coordinate position (x, y), and if the resolution of the video frame is L × W, { (x, y) x { (x, y) 0,1, 2., L-1, y { (x, y) 0,1, 2., W-1 }; q_inv(p) represents the inverse quantization process of the pixel value p, p is 8, p_maxAnd p_minObtained from the SEI information.

Next, the encoding method of the present invention was tested to prove the effectiveness and feasibility of the encoding method of the present invention.

The HDR video test sequences used in the test are all from a recognized test database, provided by MPEG, and are balloon effect, SunRise, Market3 and Tibul2, respectively, the resolution size is 1920 × 1080, the original frame image format is OpenEXR, and the first frame content is as shown in fig. 3.

Table 1 is a table for summarizing the coding rate statistics of balloon effect sequence. Before and after the filtering preprocessing of the RSFs, the consumption difference of the coding code rates is large, when QPs are 12, 17, 22 and 27, experimental tests show that the median filtering windows are respectively 3 × 3, 7 × 7, 11 × 11 and 15 × 15, the situation of coding is carried out by full-frame configuration, and the consumption ratio situations of the coding code rates in the 4 states and the 5 states before the RSFs processing are counted. Taking the balloonestival sequence as an example, when QP is 12, the average code rates of the full intra coding SDR, RSF1, and RSF2 are 58342.58 (7.88%), 304769.44 (41.17%), and 377137.01 (50.95%), respectively. Wherein, SDR code rate containing basic picture content only occupies 7.88% of total code rate, while RSF1 and RSF2 code rate representing difference information between dynamic range grades occupy a ratio as high as 92.12%, and the cost of excessively high code rate consumption is not favorable for application in practical coding transmission. The RSFs are subjected to median filtering preprocessing by combining human visual perception, scattered data points in a local block can be filtered through a set window, the coding rate of the RSFs is effectively reduced, and meanwhile the overall difference between adjacent dynamic range levels is kept. In table 1, non in the Medfilt column indicates that the RSFs are directly encoded without being processed, and W × W (W ═ 3, 7, 11, 15) indicates the size of the median filter window, and all RSFs are encoded after being filter-preprocessed. In the table, the influence of filtering preprocessing of different degrees on the consumption of coding code rate under different QPs is counted, the code rate ratio under each condition is calculated, and finally, the rate of reducing the code rate under each condition of filtering preprocessing relative to the rate under the condition of not processing under the same QP is calculated. As can be seen from the data in the table, the RSFs are subjected to filtering pretreatment and then are encoded, so that the code rate can be reduced to a large extent, and compared with the code rate which is not directly encoded, the code rate is reduced by 88.18% to the maximum extent.

TABLE 1

Table 2 shows the BD-rate (%) for the methods of the present invention and the original reference platform. Scheme one deployed 1 represents an encoding scheme that employs a 15 x 15 window filtering process; scheme two deployed 2 represents an encoding scheme that employs an 11 x 11 window filtering process. The rate-distortion performance of the first scheme is optimal, the code rates are averagely saved by 32.03% and 31.28%, the highest code rate is saved by 59.0%, and the code rates are averagely saved by 4.05% and 4.30%. The BD-rate change fluctuation is large because isolated data points in the RSFs are effectively removed through filtering preprocessing, intra-frame correlation is improved, code rate is obviously reduced, and the reconstruction quality is not greatly influenced. The RSFs of the balloon effect sequence contain a large amount of gradual change information, a large code rate can still be consumed through quantization and filtering, and the performance of the first scheme is similar to that of HM-16.4. The RSFs of the SunRise sequence contain more information in lighter and darker areas, the filtering processing can well remove isolated noise points and reserve valuable contents, and the optimal method code rate saves 25.5% and 26.5%. The content information contained in the RSFs of the mark 3 sequence is less and gentler, after isolated noise points are filtered, the coding correlation of the RSFs is greatly improved, the coding rate is reduced, the optimal method code rate is saved by 59.0% and 59.1%, and the suboptimal method code rate is also saved by 49.8% and 50.3%. The RSFs of the Tibul2 sequence have more information in the edge region and the uneven surface, the filtering processing can effectively filter the meaningless noise of the uneven surface, and the optimal method code rate saves 39.1% and 38.3%.

TABLE 2

Table 3 shows BD-rate results (%) for different filter processing schemes compared to no processing scheme. In order to study the influence of the RSFs on the scalable coding performance before and after filtering processing, 4 different filtering processing schemes are compared with a scheme without filtering processing, and after coding reconstruction, the BD-rate measured by PSNR and HDR-VDP-2.2 is used for representing. Here, propofol 1 represents a scheme using 3 × 3 window filtering processing, propofol 2 represents a scheme using 7 × 7 window filtering processing, propofol 3 represents a scheme using 11 × 11 window filtering processing, and propofol 4 represents a scheme using 15 × 15 window filtering processing. Compared with the scheme without filtering after DRSM, each filtering scheme saves a lot of code rates, which further illustrates that appropriate filtering can effectively remove meaningless scattered data points, increase the correlation of RSFs intraframe coding, save the code rates, and simultaneously retain the overall difference information between dynamic range levels for reconstruction.

TABLE 3

Fig. 4, 5, 6 and 7 are rate-distortion curves plotted according to HDR-VDP-2.2 quality index and code rate consumption. The method of directly encoding all sequences generated by DRSM ranking is herein denoted as deployed 0. Fig. 4 is a graph of a rate-distortion curve for reconstructing a 12-bit balloon estimation sequence, where the performance of the deployed 0 is much lower than that of the HM platform, and the coding performance can be improved by proper filtering preprocessing, and both the deployed 3 and the deployed 4 are close to or better than the HM platform coding algorithm; fig. 5 is a rate-distortion curve for reconstructing a 12bit sunrise sequence, and the influence on the rate-distortion performance is not large when the filter window is increased to a certain value, which indicates that a saturation threshold exists in the filter window, and the performance of the deployed 3 and the deployed 4 is slightly better than that of the HM platform coding algorithm; fig. 6 is a rate-distortion curve for reconstructing a 12-bit Market3 sequence, where the performance is improved slowly when the filtering window is large, the RSFs of the sequence include less inter-stage difference information of dynamic range and are filtered more by filtering, but the rate-distortion performance of the deployed 2, deployed 3, and deployed 4 is improved greatly compared with the HM platform; fig. 7 is a rate-distortion curve for reconstructing a 12-bit Tibul2 sequence, the performance of the deployed 1 is also poor, the rate-distortion performance is improved more and more when the filter window is larger, and the performance of the deployed 3 and the deployed 4 is generally better than that of the coding algorithm of the HM platform.

Claims

1. A multi-user-oriented HDR video dynamic range scalable coding method is characterized by comprising the following steps: comprises the following steps:

(1) subjecting the input HDR video to a conversion process based on perceptual quantization to convert to an HDR video sequence of multiple dynamic range levels represented with different quantization depths;

(2) decomposing an HDR video frame into an SDR base frame and a plurality of residual signal frames by establishing a dynamic range scalable model, wherein the residual signal frames represent difference information between two adjacent dynamic range levels, and simultaneously recording the maximum value and the minimum value of the original residual signal frames;

(3) according to statistical analysis and perception characteristic analysis, carrying out median filtering pretreatment on a residual signal frame sequence, filtering out pixel points with small influence on perception quality in the residual signal frame by using human eye brightness masking effect, and keeping the total difference reflected by the residual signal frame;

(4) coding the processed residual signal frame sequence and the SDR sequence into a video code stream with a hierarchical dynamic range through a unified HEVC (high efficiency video coding) coder respectively, and simultaneously carrying out coding transmission by taking the maximum value and the minimum value of the residual signal frame as auxiliary enhancement information so as to assist the HDR video reconstruction at a decoding end;

(5) and decoding and reconstructing the video through the inverse process of the dynamic range scalable model at a decoding end to obtain SDR and HDR videos with different dynamic range quantization depths so as to realize that the HDR video content is suitable for being displayed on multi-user-end MDR display equipment.

2. The multi-user-oriented HDR video dynamic range scalable encoding method as claimed in claim 1, wherein: in step (1), the specific method for subjecting the input HDR video to the transform process based on perceptual quantization includes the following steps:

converting HDR-RGB image data in an original OpenEXR format into R ' G ' B ' in a perception domain through a perceptually quantized nonlinear function;

second, color space conversion from R 'G' B 'to Y' CbCr is achieved via a 3 × 3 conversion matrix;

quantizing the converted data into integer data with different bit depths, namely:

wherein, (Y ', Cb, Cr) represents floating point type data in 4:4:4 chroma format obtained by color space conversion, (DY', DCb, DCr) represents quantized integer type data, Clip3(·) represents clipping function with two directional restrictions, 219 × 2^b-8Represents the brightness scale, 2^b-4Representing the luminance signal offset, 224 x 2^b-8Denotes the chromaticity scale, 2^b-1Represents the color difference signal offset, b represents the quantization depth, Round (·) represents the rounding function;

3. The multi-user-oriented HDR video dynamic range scalable encoding method as claimed in claim 1, wherein: in the step (2), the specific process of establishing a dynamic range scalable model is as follows:

firstly, performing dynamic range up-sampling on the video content of a lower-level dynamic range to obtain an HDR video of a higher-level quantization depth, namely: v_d'(x,y)＝V_d-Δd(x, y) < 2, d ∈ {10,12,14,16}, Δ d ═ 2, where V is_d' means by V_d-ΔdHDR video sequence obtained by dynamic range up-sampling has dynamic range more than V_d-ΔdFirst order, V_d-ΔdAn HDR video sequence representing a lower level dynamic range;

secondly, the HDR video with higher quantization depth obtained in the step one is subjected to subtraction with the HDR video sequence with the same dynamic range level obtained by original conversion, and in order to adapt to an HEVC (high efficiency video coding) coder of an SDR (standard definition) video, residual errors obtained by subtraction are quantized to the HDR video sequence with the same quantization depth as the SDR video sequenceResidual signal frames representing difference information between two adjacent dynamic range levels after quantization, i.e.

Wherein RSF_iRepresenting the difference information, V, between two adjacent dynamic range levels before quantization_d' means V_d-ΔdHDR video sequence, V, obtained by dynamic range up-sampling_dHDR video sequence representing the same dynamic range level as the original one, p representing the pixel value, p ═ V_d(x，y)-V_d' (x, y)), if the video resolution is L × W, { (x, y) | x { (x, y) | 0,1, 2., L-1, y ═ 0,1, 2.,. W-1 }; q (p) represents a uniform quantization function, i.e. the normalized residual data is further quantized to a data range of the same quantization depth as the SDR video frame to achieve compatibility with the data coding of the SDR video frame, b represents the quantization depth, 8 represents the same level as the SDR video frame data, p_maxAnd p_minRespectively representing the maximum value and the minimum value of all pixel values, and simultaneously recording p of each frame in a plurality of residual signal frames RSFs before quantization_maxAnd p_min。

4. The multi-user-oriented HDR video dynamic range scalable encoding method as claimed in claim 1, wherein: in step (3), the specific process of performing median filtering on the residual signal frame sequence is as follows:

according to the brightness masking effect in the visual perception characteristic of human eyes, the human eyes have lower detail perception capability on a flat area and lower distortion perception capability on a complex area, the flat area in a picture is taken, and information insensitive to the human eyes is filtered out through filtering processing corresponding to the area in a residual signal frame; taking a complex content area in a picture, wherein the area in a corresponding residual signal frame contains less information, and filtering processing cannot influence the expression of valuable contents of the area;

counting and recording pixel value characteristics of a residual signal frame of the balloon sequence before quantization;

through statistical analysis of a residual signal frame, a large amount of isolated noise point information is contained in a complex region, isolated data point information which is not easy to be sensed by a user exists in a flat region, and information of easily sensed edge and texture characteristics exists in a region with a foreground and a background;

considering that human eyes have a brightness masking effect, namely the human eyes are sensitive to texture and detail information in a single bright area or a single dark area and are insensitive to texture and detail in a scene containing the bright and dark areas at the same time, most HDR video sequence scenes contain the bright and dark areas at the same time through analysis, residual signal frames can be preprocessed in a median filtering mode, the content of corresponding positions of the residual signal frames tends to be smooth, and the overall difference characteristic between adjacent dynamic range levels can be reserved.

5. The multi-user-oriented HDR video dynamic range scalable encoding method as claimed in claim 1, wherein: in the step (5), the specific process of performing HDR video reconstruction by the inverse process of the dynamic range scalable model DRSM at the decoding end is as follows:

the SDR video facing the standard dynamic range display equipment is obtained by directly decoding an SDR video code stream through an HEVC decoder;

the HDR video for the high dynamic range display device can be reconstructed by the inverse process of DRSM, that is:

wherein S is_RSFiRepresenting the difference information between two adjacent dynamic range levels after quantization,

representing a reconstructed HDR video sequence with a dynamic range of d bits,

represents the pixel value at the coordinate position (x, y) in the reconstructed d-bit video frame, and if the resolution of the video frame is L × W, { (x, y) | x ═ 0,1,2, · L-1, y ═ 0,1,2, ·, W-1},

represents the pixel value, Q, at coordinate position (x, y) in the reconstructed (d- Δ d) bit video frame_inv(p) represents the inverse quantization process of the pixel value p, i.e.:

wherein p is_maxAnd p_minRespectively representing the maximum and minimum of all pixel values, p_maxAnd p_minIs obtained from the auxiliary enhancement information, b represents the quantization depth.