CN113225553A

CN113225553A - Method for predicting optimal threshold point in high dynamic video double-layer backward compatible coding system

Info

Publication number: CN113225553A
Application number: CN202110415522.8A
Authority: CN
Inventors: 伏长虹; 楚小荷; 任杰; 洪弘
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2021-04-18
Filing date: 2021-04-18
Publication date: 2021-08-06
Anticipated expiration: 2041-04-18
Also published as: CN113225553B

Abstract

The invention discloses a method for predicting an optimal threshold point in a high-dynamic video double-layer backward compatible coding system, which takes a selected characteristic value and an actual optimal threshold as labels to input and train to obtain a Gaussian process regression model; and inputting the characteristic value of the new sequence into the model, so that the position of the optimal threshold point of the current video frame can be predicted very accurately. The method provided by the invention greatly improves the PSNR of the coded video and greatly reduces the time consumed by manually searching the threshold value globally.

Description

Method for predicting optimal threshold point in high dynamic video double-layer backward compatible coding system

Technical Field

The invention relates to the field of video coding and decoding, in particular to a method and a system for HDR double-layer backward compatible video coding and decoding.

Background

In recent years, the technology of broadcasting and television is changing day by day, 4K ultra-high definition television has gradually entered thousands of households, and the resolution of television has reached 4K or even 8K. The requirement of people for high-quality image quality cannot be well met by simply improving the resolution and the frame rate. Therefore, enterprises and organizations in the industry find, through research, that the visual experience of people can be improved from the other two dimensions, and one of the technologies is the high dynamic range HDR technology.

Although HDR is a big trend, most consumers use gamma EOTF and bt.709 color spaces as well as conventional SDR displays, which cannot be displayed if HDR video is represented by other EOTF and color spaces. It is necessary to transmit video of the corresponding correct format to the different displays. The SDR version of video content is the same as the HDR version of video, and a lot of redundancy is generated between the two. A system provider with limited storage space would prefer to provide a bitstream that can be displayed on both the SDR and HDR, a backward compatible system being one that implements this technology. Without backward compatibility with these devices, high dynamic range video formats are unlikely to be widely accepted.

The optimal threshold point in the high-dynamic video double-layer backward compatible coding system is in the gray value range of 0-255, the traditional method tries the threshold value every two pixel distances, calculates the PSNR of the video finally reconstructed by the current threshold value, and selects the threshold value with the highest PSNR as the optimal threshold point. Each attempt to threshold requires the enhancement layer to be coded with HEVC and also requires the base layer enhancement layer to be aggregated to reconstruct the HDR video file, which is time consuming.

Disclosure of Invention

The invention aims to provide a method for predicting the optimal threshold point in a high-dynamic video double-layer backward compatible coding system, which has high accuracy and obvious improvement on video quality.

The technical solution for realizing the purpose of the invention is as follows: a method for predicting the optimal threshold point in a high dynamic video double-layer backward compatible coding system is characterized in that a Gaussian process regression model is trained by inputting selected characteristic values and actual optimal threshold values as labels; and inputting the characteristic value of the new sequence into the model, and predicting the position of the optimal threshold value point of the current video frame.

Further, the selected characteristic value and the actual optimal threshold value are input as a label to train a gaussian process regression model, which is specifically as follows:

the double-layer backward compatible video coding system generates a corresponding 8-bit SDR video serving as a base layer by truncating 2-bit content of an input 10-bit HDR video and codes the base layer;

finding an optimal threshold value with a size in the range of 0-255 gray values;

creating a mask by using a threshold, setting the part of the mask of the pixel value smaller than the threshold to be 0, setting the part of the mask larger than the threshold to be 1, setting the pixel point of the SDR video with the pixel value larger than the threshold to be a value of the threshold, making a difference with the original HDR video, applying the mask to the generated residual error, then scaling the pixel value to generate an enhancement layer, and encoding the enhancement layer video;

the encoded base layer and enhancement layer are finally combined to generate a reconstructed HDR video.

Further, combining the coded enhancement layer and the base layer to generate a reconstructed HDR video, and comparing PSNR of the reconstructed HDR video and PSNR of the SDR video; the threshold size is continuously transformed to determine the threshold that best performs the PSNR for the reconstructed HDR.

Further, the peak signal-to-noise ratio PSNR of the reconstructed HDR video data and the original HDR video data is calculated:

wherein MAX_IRepresenting the largest possible pixel value in the image, MAX if each pixel value is represented by an 8-bit binary_IIs 255; MSE represents an imageThe variance of (a); the test sequence format is YUV, each frame of picture is divided into three YUV channels, and because the optimal threshold points of the three channels need to be obtained respectively, the PSNR of the three channels needs to be calculated respectively.

Further, the distribution of the pixel value histogram of each frame of the test sequence, the distribution of the standard deviation, the total bit rate set when the sequence is encoded, and the ratio of the bit rates allocated to the base layer and the enhancement layer when the sequence is encoded are used as the characteristic values.

Further, the total bit rate set when the sequence is encoded is set to 20Mbps, 16Mbps, 8Mbps as the characteristic values.

Further, the ratio of bit rates allocated for the base layer and the enhancement layer at the time of encoding is set to (2:1), (3: 1), (4: 1), (5:1), (6:1) as the characteristic value.

Further, dividing training sample data into 6 parts, and performing 6-fold cross validation; in the verification process, the model tries different kernel functions and corresponding super parameters according to training sample data and labels, finally determines the Exponental kernel function with the best effect and the super parameters thereof, and generates a Gaussian process regression model for predicting the optimal threshold point.

An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the prediction method when executing the program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the prediction method as described above.

Compared with the prior art, the invention has the following remarkable advantages: (1) the invention greatly improves the PSNR of the coded video; (2) the optimal threshold point in the system is accurately predicted, the artificial mechanical global search threshold is eliminated, the accuracy is high, and the time is greatly saved.

Drawings

Fig. 1 is a flowchart of a method for predicting an optimal threshold point in a high dynamic video bi-layer backward compatible coding system according to the present invention.

Fig. 2 is a flow diagram of a two-layer backward compatible encoding system utilized by the present invention.

Fig. 3 is a flow diagram of a dual layer backward compatible decoding system utilized by the present invention.

FIG. 4 is a diagram of PSNR and base layer PSNR corresponding to different gray scale values selected as thresholds according to the present invention.

Fig. 5 is a distribution diagram of a feature value pixel distribution histogram of the present invention.

Fig. 6 is a schematic diagram of the distribution of the standard difference distribution histogram of the feature values of the present invention.

FIG. 7 is a diagram of the actual optimal threshold point for 50 frames of a test sequence according to the present invention and the predicted threshold point-to-point ratio obtained by the method of the present invention.

Detailed Description

The invention provides a method for predicting an optimal threshold point in a high-dynamic video double-layer backward compatible coding system, which takes a selected characteristic value and an actual optimal threshold as labels to input and train to obtain a Gaussian process regression model; and inputting the characteristic value of the new sequence into the model, and accurately predicting the position of the optimal threshold value point of the current video frame. The method greatly reduces a large amount of time consumed by manual global search threshold and greatly improves the PSNR of the encoded video.

The high-dynamic video double-layer backward compatible coding and decoding system inputs HDR video, quantizes HDR video data with pixel gray value range of 0-1023 into 8bit, quantizes SDR video data (namely a basic layer) with pixel gray value range of 0-255, generates artifacts when the HDR video generates the SDR video, and the artifacts are obvious in highlight areas according to the action of a perception quantization curve PQ. An optimal threshold point (the size of which is between the gray value range of 0-255) is selected, and the content of the pixel gray value range which is larger than the threshold value is supplemented by an enhancement layer, so that the probability of the occurrence of the artifact is reduced, the video quality of a highlight area is improved, and the visual experience is improved. Therefore, two code streams are output at the output end of the coding system, namely the base layer and the enhancement layer, the two code streams are combined in the decoding system to generate the reconstructed HDR video, and the PSNR of the HDR video is greatly improved compared with that of the base layer. The method comprises the following specific steps:

Specifically, the feature value of the regression model in the gaussian process is trained as follows:

the distribution of pixel value histograms of each frame of the test sequence, the distribution of standard deviation, the total bit rate (20Mbps, 16Mbps, 8Mbps) set when the sequence is encoded, and the ratio (2, 3, 4, 5, 6) of the allocated bit rates for the base layer and the enhancement layer when the sequence is encoded are taken as characteristic values.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the prediction method when executing the program.

The invention is described in further detail below with reference to the figures and specific embodiments.

Examples

For finding the optimal threshold point of the HDR bi-layer backward compatible coding system, an attempt is needed to be made every 2 gray values in space, video coding is needed, and a great deal of time is wasted. The method for predicting the optimal threshold point in the high-dynamic video double-layer backward compatible coding system not only ensures that the PSNR of the video coded according to the predicted threshold point is greatly improved, but also improves the efficiency of searching the threshold point, and as shown in figure 1, the method selects the good characteristic value: inputting pixel distribution histogram data, standard deviation histogram data, total coding bit rate and a base layer enhancement layer of a sequence frame into a program to train a Gaussian process regression model, wherein the method specifically comprises the following steps:

s11, firstly, manually searching an optimal threshold point in a sample, carrying out truncation of 2 bits on an original 10-bit HDR video to obtain a corresponding 8-bit SDR video, namely a base layer, and carrying out coding operation on the SDR video;

s12, determining the size of a threshold, creating a mask, setting the mask of a pixel value part smaller than the threshold to be 0, setting the mask of a part larger than the threshold to be 1, applying the mask to a residual error generated by the original HDR video and the SDR video which is coded and reconstructed, then scaling the pixel value to generate an enhancement layer, and coding the enhancement layer video;

s13, combining the coded enhancement layer and the base layer to generate a reconstructed HDR video, and comparing PSNR of the reconstructed HDR video and PSNR of the SDR video;

s14, continuously transforming the threshold value size, and repeating the steps S11-S13 to determine the threshold value which enables the PSNR of the reconstructed HDR to perform best;

s15, extracting characteristic values of all samples: pixel distribution histogram data, standard deviation histogram data, total bit rate (20Mbps, 16Mbps, 8Mbps, respectively), bit rate distribution ratio at the time of base layer and enhancement layer coding (2, 3, 4, 5, 6, respectively) of the sequence frame; the data were divided into 6 portions and cross-validated at 6 folds. In the verification process, the model tries different kernel functions and corresponding super parameters according to training sample data and labels, finally determines the Exponental kernel function with the best effect and the super parameters thereof, and generates a Gaussian process regression model for predicting the optimal threshold point.

S16, determining whether the reconstructed HDR video is good or bad by Peak Signal-to-Noise Ratio (PSNR) of the reconstructed HDR video data and the original HDR video data.

The PSNR calculation formula is as follows:

wherein MAX_IRepresenting the largest possible pixel value in the image, MAX if each pixel value is represented by an 8-bit binary_IIs 255; MSE represents the variance of the image. The test sequence format is YUV, each frame of picture is divided into three YUV channels, and because the optimal threshold points of the three channels need to be obtained respectively, the PSNR of the three channels needs to be calculated respectively.

And S17, finally, obtaining a relatively accurate threshold point prediction result, greatly improving the PSNR of the reconstructed video relative to the PSNR of the base layer, and greatly reducing a large amount of time consumed by manually searching the threshold globally.

Fig. 2 is a system diagram of high dynamic range video dual-layer backward compatible encoding, and fig. 3 is a system diagram of high dynamic range video dual-layer backward compatible decoding.

Fig. 4 identifies the entire experimental process of trying the optimal threshold point, which is performed according to the coding requirement, and one reconstruction process, which generates the enhancement layer and one video, takes several tens of minutes. When the best threshold point is searched manually and globally, an attempt is made every 2 gray values from the minimum point to the maximum point of the pixel value of the current frame to ensure that the found threshold point is accurate. This takes a very large amount of time.

The PSNR of the reconstructed HDR video generated by the selected optimal threshold is 44.8106 and the PSNR of the video of the base layer is 43.2934, so that the video quality reconstructed by the optimal threshold is greatly improved.

Fig. 5 and 6 are two of the feature values determined for the input gaussian process regression system: a distribution of pixel values and a distribution of standard deviations.

FIG. 7 shows a comparison of the actual best threshold point of a test sequence and the threshold point predicted using the Gaussian process regression model method of this patent.

In summary, in the present invention, the gaussian process regression model trained by using the selected feature value can predict the threshold point more accurately, and the video quality encoded by using the threshold value is improved.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims

1. A method for predicting the optimal threshold point in a high dynamic video double-layer backward compatible coding system is characterized in that a Gaussian process regression model is trained by inputting the selected characteristic value and the actual optimal threshold as labels; and inputting the characteristic value of the new sequence into the model, and predicting the position of the optimal threshold value point of the current video frame.

2. The method of claim 1, wherein the selected eigenvalue and the actual optimal threshold are input as labels to train a gaussian regression model, and the method comprises the following steps:

3. The method of predicting the optimal threshold point in a high dynamic video bi-layer backward compatible coding system as claimed in claim 1, wherein the encoded enhancement layer and the base layer are combined to generate a reconstructed HDR video, PSNR of the reconstructed HDR and SDR videos are compared; the threshold size is continuously transformed to determine the threshold that best performs the PSNR for the reconstructed HDR.

4. The method of claim 3, wherein the PSNR is calculated as follows:

wherein MAX_IRepresenting the largest possible pixel value in the image, MAX if each pixel value is represented by an 8-bit binary_IIs 255; MSE represents the variance of the image; each frame picture is divided into three YUV channels, and since the optimal threshold points of the three channels need to be obtained, the PSNR of the three channels need to be calculated respectively.

5. The method of claim 1, wherein the distribution of pixel histogram of each frame of the sequence, the distribution of standard deviation, the total bit rate set when the sequence is encoded and the ratio of bit rates allocated to the base layer and the enhancement layer when the sequence is encoded are used as the characteristic values.

6. The method of claim 5, wherein the total bit rate set for coding the sequence is set to 20Mbps, 16Mbps, or 8Mbps as the characteristic value.

7. The method of claim 5, wherein the ratio of bit rates allocated to the base layer and the enhancement layer during encoding is set to (2:1), (3: 1), (4: 1), (5:1) and (6:1) as the characteristic value.

8. The method of claim 5, wherein the training sample data is divided into 6 parts for 6-fold cross validation; in the verification process, the model tries different kernel functions and corresponding super parameters according to training sample data and labels, finally determines the Exponental kernel function with the best effect and the super parameters thereof, and generates a Gaussian process regression model for predicting the optimal threshold point.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the prediction method according to any one of claims 1 to 8 when executing the program.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the prediction method according to any one of claims 1 to 8.