CN115174898A

CN115174898A - Rate distortion optimization method based on visual perception

Info

Publication number: CN115174898A
Application number: CN202210744549.6A
Authority: CN
Inventors: 魏宏安; 刘宇翔; 陈炜玲; 林丽群
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2022-06-27
Filing date: 2022-06-27
Publication date: 2022-10-11

Abstract

The invention provides a rate distortion optimization method based on visual perception, which comprises the following steps of; s1, establishing a JND prediction model of a video application pixel domain by using a JND subjective test data set which accords with a just noticeable distortion standard of human eyes, and obtaining a JND threshold which accords with the visual perception of the human eyes from the model; s2, deriving Lagrange multiplier factors for reducing perception redundancy in the video and reducing video data which cannot be perceived by human eyes from the video in the encoding process; s3, optimizing Lagrange multiplier weight coefficients by using a significance model of video playing so as to optimize the distribution of coding bits of a significant region and a non-significant region in the video coding process; s4, fusing the video application with rate distortion optimization according to the results obtained in the steps S2 and S3, and fully playing the advantages of the two models; the method can combine the data-driven pixel domain JND model with the significance model, improve the video coding compression ratio on the premise of ensuring the video perception quality, and realize rate distortion optimization.

Description

Rate distortion optimization method based on visual perception

Technical Field

The invention relates to the technical field of videos, in particular to a rate distortion optimization method based on visual perception.

Background

With the advent of the big data age, video coding techniques face significant challenges. The conventional video coding method inevitably brings distortion while reducing the video bit rate. How to reduce distortion as much as possible under the condition of limited bit rate has become a hot research focus in recent years. The existing rate-distortion optimization method mainly eliminates time and space redundancy, ignores visual perception redundancy in video content and further improves the space of coding performance.

Disclosure of Invention

The invention provides a rate distortion optimization method based on visual perception, which can combine a data-driven pixel domain JND model with a significance model, improve the video coding compression rate and realize rate distortion optimization on the premise of ensuring the video perception quality.

The invention adopts the following technical scheme.

A rate-distortion optimization method based on visual perception comprises the following steps;

s1, establishing a JND prediction model of a video application pixel domain by using a JND subjective test data set which accords with a just noticeable distortion standard of human eyes, and obtaining a JND threshold which accords with the visual perception of the human eyes from the model;

s2, deriving Lagrange multiplier factors according to the JND threshold obtained in the S1, wherein the Lagrange multiplier factors are used for reducing perceptual redundancy in the video and reducing video data which cannot be perceived by human eyes from the video in the encoding process;

s3, optimizing Lagrange multiplier weight coefficients by using a significance model of video playing so as to optimize the distribution of coding bits of a significant region and a non-significant region in the video coding process;

and S4, performing self-adaptive adjustment on the Lagrange multiplier of the LCU level based on the video content perception characteristic according to the results obtained in the steps S2 and S3, and fusing the video application to rate distortion optimization so as to fully exert the advantages of the two models.

In the step S1, a data-driven pixel domain JND prediction model is constructed based on a large-scale JND subjective test data set, and the method specifically comprises the following steps;

s11, dividing each frame of an original video into N multiplied by N LCUs to increase the diversity and the number of training samples, improve the generalization performance during neural network training and enable a model to adapt to more video coding scenes;

s12, calculating an average pixel domain JND threshold of an original video LCU block, taking the average pixel domain JND threshold as a training sample of a JND prediction model, and predicting the pixel domain JND threshold of the video to be coded by using a J-VGGNet deep neural network;

the average pixel domain JND threshold formula for obtaining the original video LCU block is as follows:

where l (i, j) is a pixel value of the original video, l' (i, j) is a pixel value of the corresponding JND video, and m is a side length of the LCU block.

Inputting a pixel domain JND prediction model into a gray map matrix of each original video LCU block, and outputting a target as a pixel domain JND threshold value of the corresponding LCU block;

the pixel domain JND model provided in step S1 is:

wherein, p (x, y) is the gray value of the pixel point with the coordinate (x, y) in the original video LCU block, m is the side length of the LCU, JND _LCU For pixel domain JND thresholds of LCU blocks, the prediction function is implemented based on a J-VGGNet network.

The pixel domain JND prediction model is based on a large-scale JND subjective test data set, video, which comprises 220 sequences, each sequence comprises four resolutions, namely 1920 × 1080, 1280 × 720, 960 × 540 and 640 × 360, each video is coded by using 52 QP values from 0 to 51, and 45760 video sequences.

The step S2 specifically comprises the following steps: taking the JND threshold value obtained in the step S1 as a perception sensitivity adjustment factor of the objective distortion SSE, correcting the SSE in the original rate distortion formula of the objective distortion into perception distortion more conforming to the characteristics of human eyes, and deducing an adaptive Lagrange multiplier adjustment factor based on a JND model to be applied to perception video coding; the method specifically comprises the following steps:

step S21, setting JND threshold value JND in JND prediction model _LCU As the perception sensitivity adjusting factor of the objective distortion, the objective distortion in the rate distortion formula is corrected into the perception distortion D which is more in line with the characteristics of human eyes _p ；

Step S22, for each LCU block, respectively using the common distortion index SSE and the perception distortion index D _p Solving the optimal rate-distortion cost solution, and calculating the coding bits consumed by the whole video image frame;

step S23, deducing an adaptive Lagrange multiplier adjustment factor of the ith LCU block based on JND: the calculation method is as follows:

wherein N represents the number of LCUs of the current frame,

the JND threshold, c, representing the ith LCU, is a constant to prevent numerical instability.

In step S3, lagrange multiplier weight coefficient omega is deduced based on the significance model _i The method is used for optimizing the distribution of the coding bits of the salient region and the non-salient region, and specifically comprises the following steps;

step S31, starting from the overall situation of a saliency map of the saliency model, determining the saliency for each LCU block;

step S32, defining the significance of the LCU as:

wherein s is _i For the significance weight, p, of the ith LCU in the entire frame _i Is the pixel average, p, of the ith LCU in the saliency map _avg The average value of the pixel points of the whole frame of the saliency map is obtained. If s _i Greater than 1, the LCU is defined as a significant LCU; otherwise, the LCU is not significant;

step S33, determining a significant weight coefficient omega of the corresponding LCU according to the significant degree distribution condition _i And adjusting the objective distortion of each LCU based on the significance;

step S34, in order to balance the global code rate and distortion, lagrange multipliers in a rate distortion formula are weighted based on a significance model:

wherein

Representing the ith LCU based on a significance weighted lagrange multiplier,

representing a Lagrange multiplier in an original rate distortion formula;

in the allocation of coding bits for salient regions and non-salient regions of video applications, for an LCU of a region of interest, an encoder tends to select a coding mode with less objective distortion and more coding bits; for LCUs in regions of no interest, the encoder selects a coding mode with fewer coding bits.

In step S4, the results obtained in step S2 and step S3 are combined to obtain an improved perceptual rate-distortion optimization method, and the calculation method is expressed by a formula as follows:

wherein, the first and the second end of the pipe are connected with each other,

representing the objective distortion, R, of the ith LCU block _i Representing the code rate of the code.

In the above-mentioned scheme, the first and second light sources,

JND is Just Noticeable Distortion, namely Just Noticeable Distortion of human visual system, for Chinese short;

RDO (Rate-Distortion Optimization), namely Rate-Distortion Optimization, for short, chinese is Rate-Distortion Optimization;

SSE, sum of Square Error, chinese for short, is objective distortion.

The invention relates to the technical field of rate distortion optimization in video coding, and provides a rate distortion optimization method based on visual perception, aiming at the current situation that the subjective perception redundancy of video content is not fully considered in the existing rate distortion optimization method. Firstly, deriving Lagrange multiplier factors through a data-driven JND prediction model to effectively reduce perceptual redundancy in a video; then optimizing Lagrange multiplier weight coefficients by using a significance model so as to reasonably realize the distribution of coding bits of a significant region and a non-significant region; and finally, fully fusing the two models to obtain a Lagrange factor based on perceptual coding, and realizing rate distortion optimization of the video coding standard.

Compared with the prior art, the invention has the following beneficial effects:

1. the existing RDO method mainly eliminates the time and space redundancy in the video and ignores the subjective perception redundancy. Aiming at the current situation, the invention introduces human visual characteristics in video coding and eliminates perception redundancy;

2. at present, the pixel domain JND threshold value deduced according to a mathematical model cannot accurately describe the visual characteristics of human eyes, and the perceptual redundancy in videos cannot be fully eliminated. Aiming at the current situation, the data-driven pixel domain JND model is constructed based on the deep neural network and the large JND subjective database, the constructed model can adapt to more scene images, and the predicted pixel domain JND threshold value is more consistent with the visual perception experience of real human eyes;

3. the invention uses a rate distortion method based on the salient characteristics to realize the reasonable distribution of the coded bits in the salient region and the non-salient region and eliminate partial perception redundancy. In addition, the used end-to-end neural network significance detection model is lighter and better in performance compared with other recent significance models based on the neural network;

4. the invention integrates a data-driven pixel domain JND prediction model and a significance model, provides a rate-distortion optimization method based on visual perception, can improve the video coding compression rate on the premise of ensuring the video perception quality, and has certain reference significance and application value for the research in the field of perceptual video coding.

Drawings

The invention is described in further detail below with reference to the following figures and detailed description:

FIG. 1 is a schematic flow chart of a rate-distortion optimization method of the present invention;

FIG. 2 is a schematic flow diagram of a data-driven JND prediction model;

FIG. 3 is a schematic diagram of the saliency map detection effect;

fig. 4 is a block division diagram.

Detailed Description

As shown in the figure, a rate-distortion optimization method based on visual perception comprises the following steps;

s3, optimizing Lagrange multiplier weight coefficients by using a video playing significance model so as to optimize the distribution of coding bits of a significant region and a non-significant region in the video coding process;

where l (i, j) is the pixel value of the original video, l' (i, j) is the pixel value of the corresponding JND video, and m is the side length of the LCU block.

the pixel domain JND model provided in step S1 is:

The step S2 specifically includes: taking the JND threshold value obtained in the step S1 as a perception sensitivity adjustment factor of the objective distortion SSE, correcting the SSE in the original rate distortion formula of the objective distortion into perception distortion more conforming to the characteristics of human eyes, and deducing an adaptive Lagrange multiplier adjustment factor based on a JND model to be applied to perception video coding; the method specifically comprises the following steps:

step S21, setting JND threshold value JND in JND prediction model _LCU As the perception sensitivity adjusting factor of the objective distortion, the objective distortion in the rate distortion formula is corrected into the perception distortion D which is more in line with the characteristics of human eyes _p (ii) a Step S22, for each LCU block, respectively using the common distortion index SSE and the perception distortion index D _p Solving the optimal rate-distortion cost solution, and calculating the coding bits consumed by the whole video image frame;

wherein N represents the number of LCUs of the current frame,

In step S3, a Lagrange multiplier weight coefficient omega is deduced based on the significance model _i The method is used for optimizing the distribution of the coding bits of the salient region and the non-salient region, and specifically comprises the following steps;

step S32, defining the significance of the LCU as:

wherein s is _i For the significance weight, p, of the ith LCU in the entire frame _i Is the pixel average, p, of the ith LCU in the saliency map _avg The average value of the pixel points of the whole frame of the saliency map is obtained. If s is the block division diagram of the original image and the saliency map shown in fig. 4 _i Greater than 1, then the LCU is defined as a significant LCU; otherwise, the LCU is not significant;

step S33, determining the significant weight coefficient omega of the corresponding LCU according to the significant degree distribution condition _i And adjusting the objective distortion of each LCU based on the significance;

step S34, in order to balance the global code rate and distortion, weighting Lagrange multipliers in a rate distortion formula based on a significance model:

wherein

Representing the ith LCU based on a significance weighted lagrange multiplier,

representing a Lagrange multiplier in an original rate distortion formula;

in the allocation of coding bits for salient regions and non-salient regions of video applications, for an LCU of a region of interest, an encoder tends to select a coding mode with less objective distortion and more coding bits; for the LCU of the region of non-interest, the encoder selects a coding mode with less coding bits.

In step S4, combining the results obtained in step S2 and step S3 to obtain an improved perceptual rate-distortion optimization method, where the calculation method is expressed by a formula as follows:

wherein the content of the first and second substances,

Example (b):

in order to verify the effectiveness of the method for eliminating visual perception redundancy, the Y-PSNR is used as an index in an experiment, the method, the existing time domain rate distortion optimization (method one) based on the hierarchy and the Lagrange multiplier method (method two) based on the reference structure are respectively compared with the rate distortion method of the AVS3 standard, and the experimental result is shown in the table 1.

TABLE 1 BD-Rate comparison of the existing method with the AVS Standard method

Data show that the method can save 6.13% of code rate on average, and compared with the existing method, the experimental result of the method is far superior to that of a comparison method. Therefore, the method provided by the invention fully considers the minimum threshold visible to human eyes and the visual attention, can better eliminate the perception redundancy, and carries out non-equal distribution on the code rate resources, thereby more effectively saving the coding code rate and better realizing the perception coding optimization.

In summary, the invention provides a rate-distortion optimization method combining a JND model and a significance model. According to the method, through the research on a human eye vision system, on the basis of fully considering the minimum threshold value and the visual attention which can be perceived by human eyes, the Lagrange factor is adjusted to eliminate the perception redundancy in video coding and adaptively allocate coding bits, so that the optimization of a hybrid video encoder is facilitated, the video perception quality is improved, and the research on a perception video coding scheme is promoted.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any person skilled in the art may modify or modify the technical details disclosed above into equivalent embodiments with equivalent variations. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

Claims

1. A rate-distortion optimization method based on visual perception is characterized by comprising the following steps: comprises the following steps;

and S4, according to the results obtained in the steps S2 and S3, performing self-adaptive adjustment on the Lagrange multiplier of the LCU level based on the video content perception characteristic, and fusing the video application with rate distortion optimization to fully exert the advantages of the two models.

2. A method of visual perception-based rate-distortion optimization according to claim 1, wherein:

s11, dividing each frame of an original video into N multiplied by N LCUs to increase the diversity and the number of training samples and improve the generalization performance during neural network training;

and S12, calculating an average pixel domain JND threshold of an original video LCU block, taking the average pixel domain JND threshold as a training sample of a JND prediction model, and predicting the pixel domain JND threshold of the video to be coded by using a deep neural network.

3. A method of visual perception-based rate-distortion optimization according to claim 1, wherein: the step S2 specifically comprises the following steps: taking the JND threshold value obtained in the step S1 as a perception sensitivity adjustment factor of the objective distortion SSE, correcting the SSE in the original rate distortion formula of the objective distortion into perception distortion more conforming to the characteristics of human eyes, and deducing an adaptive Lagrange multiplier adjustment factor based on a JND model to be applied to perception video coding; the method specifically comprises the following steps:

step S23, deducing an adaptive Lagrange multiplier adjustment factor of the ith LCU block based on JND:

the calculation method is as follows:

wherein N represents the number of LCUs of the current frame,

4. A method for rate-distortion optimization based on visual perception according to claim 1, characterized in that: in step S3, based on displayLagrange multiplier weight coefficient omega is deduced by a saliency model _i The method is used for optimizing the distribution of the coding bits of the salient region and the non-salient region, and specifically comprises the following steps;

s31, starting from the overall situation of a saliency map of the saliency model, determining the saliency for each LCU block;

step S32, defining the significance of the LCU as:

wherein s is _i Is the significance specific gravity, p, of the ith LCU in the whole frame _i Is the pixel average, p, of the ith LCU in the saliency map _avg The average value of the pixel points of the whole frame of the saliency map is obtained. If s _i Greater than 1, then the LCU is defined as a significant LCU; otherwise, the LCU is not significant;

wherein

Representing the ith LCU based on a significance weighted lagrange multiplier,

representing a Lagrange multiplier in an original rate distortion formula;

5. A method of visual perception-based rate-distortion optimization according to claim 1, wherein: in step S4, the results obtained in step S2 and step S3 are combined to obtain an improved perceptual rate-distortion optimization method, and the calculation method is expressed by a formula as follows:

6. A method of visual perception-based rate-distortion optimization according to claim 2, wherein: the input of the pixel domain JND prediction model is a gray map matrix of each original video LCU block, and the target output is a pixel domain JND threshold value of the corresponding LCU block;

the pixel domain JND model proposed in the step S1 is as follows:

wherein p (x, y) is the gray value of pixel point with coordinate (x, y) in the original video LCU block, m is the side length of LCU, JND _LCU For pixel domain JND thresholds of LCU blocks, the prediction function is implemented based on a J-VGGNet network.

7. A method of visual perception-based rate-distortion optimization according to claim 2, wherein: the data-driven pixel domain JND prediction model is based on a large-scale JND subjective test data set VideoSet, the data set comprises 220 sequences, each sequence comprises four resolutions, namely 1920 × 1080, 1280 × 720, 960 × 540 and 640 × 360, each video is coded by using 52 QP values from 0 to 51, and 45760 video sequences are obtained;

in step S11, the original video and the corresponding JND video are preprocessed, and each frame of the video is divided into nxn LCUs, so as to increase the diversity and number of training samples, improve the generalization performance during neural network training, and adapt the model to more video coding scenes; further, calculating the mean square error of each original video and the corresponding JND video LCU to obtain the average pixel domain JND threshold of the original video LCU block, wherein the formula is as follows:

wherein l (i, j) is a pixel value of an original video, l' (i, j) is a pixel value of a corresponding JND video, and m is a side length of an LCU block;

in step S12, the result obtained in step S1 is used as a training sample of a JND prediction model, and a J-VGGNet network is used for predicting a pixel domain JND threshold of the video to be coded.