CN115174898A - Rate distortion optimization method based on visual perception - Google Patents

Rate distortion optimization method based on visual perception Download PDF

Info

Publication number
CN115174898A
CN115174898A CN202210744549.6A CN202210744549A CN115174898A CN 115174898 A CN115174898 A CN 115174898A CN 202210744549 A CN202210744549 A CN 202210744549A CN 115174898 A CN115174898 A CN 115174898A
Authority
CN
China
Prior art keywords
video
jnd
lcu
distortion
rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210744549.6A
Other languages
Chinese (zh)
Inventor
魏宏安
刘宇翔
陈炜玲
林丽群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202210744549.6A priority Critical patent/CN115174898A/en
Publication of CN115174898A publication Critical patent/CN115174898A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]

Abstract

The invention provides a rate distortion optimization method based on visual perception, which comprises the following steps of; s1, establishing a JND prediction model of a video application pixel domain by using a JND subjective test data set which accords with a just noticeable distortion standard of human eyes, and obtaining a JND threshold which accords with the visual perception of the human eyes from the model; s2, deriving Lagrange multiplier factors for reducing perception redundancy in the video and reducing video data which cannot be perceived by human eyes from the video in the encoding process; s3, optimizing Lagrange multiplier weight coefficients by using a significance model of video playing so as to optimize the distribution of coding bits of a significant region and a non-significant region in the video coding process; s4, fusing the video application with rate distortion optimization according to the results obtained in the steps S2 and S3, and fully playing the advantages of the two models; the method can combine the data-driven pixel domain JND model with the significance model, improve the video coding compression ratio on the premise of ensuring the video perception quality, and realize rate distortion optimization.

Description

Rate distortion optimization method based on visual perception
Technical Field
The invention relates to the technical field of videos, in particular to a rate distortion optimization method based on visual perception.
Background
With the advent of the big data age, video coding techniques face significant challenges. The conventional video coding method inevitably brings distortion while reducing the video bit rate. How to reduce distortion as much as possible under the condition of limited bit rate has become a hot research focus in recent years. The existing rate-distortion optimization method mainly eliminates time and space redundancy, ignores visual perception redundancy in video content and further improves the space of coding performance.
Disclosure of Invention
The invention provides a rate distortion optimization method based on visual perception, which can combine a data-driven pixel domain JND model with a significance model, improve the video coding compression rate and realize rate distortion optimization on the premise of ensuring the video perception quality.
The invention adopts the following technical scheme.
A rate-distortion optimization method based on visual perception comprises the following steps;
s1, establishing a JND prediction model of a video application pixel domain by using a JND subjective test data set which accords with a just noticeable distortion standard of human eyes, and obtaining a JND threshold which accords with the visual perception of the human eyes from the model;
s2, deriving Lagrange multiplier factors according to the JND threshold obtained in the S1, wherein the Lagrange multiplier factors are used for reducing perceptual redundancy in the video and reducing video data which cannot be perceived by human eyes from the video in the encoding process;
s3, optimizing Lagrange multiplier weight coefficients by using a significance model of video playing so as to optimize the distribution of coding bits of a significant region and a non-significant region in the video coding process;
and S4, performing self-adaptive adjustment on the Lagrange multiplier of the LCU level based on the video content perception characteristic according to the results obtained in the steps S2 and S3, and fusing the video application to rate distortion optimization so as to fully exert the advantages of the two models.
In the step S1, a data-driven pixel domain JND prediction model is constructed based on a large-scale JND subjective test data set, and the method specifically comprises the following steps;
s11, dividing each frame of an original video into N multiplied by N LCUs to increase the diversity and the number of training samples, improve the generalization performance during neural network training and enable a model to adapt to more video coding scenes;
s12, calculating an average pixel domain JND threshold of an original video LCU block, taking the average pixel domain JND threshold as a training sample of a JND prediction model, and predicting the pixel domain JND threshold of the video to be coded by using a J-VGGNet deep neural network;
the average pixel domain JND threshold formula for obtaining the original video LCU block is as follows:
Figure BDA0003716542240000021
where l (i, j) is a pixel value of the original video, l' (i, j) is a pixel value of the corresponding JND video, and m is a side length of the LCU block.
Inputting a pixel domain JND prediction model into a gray map matrix of each original video LCU block, and outputting a target as a pixel domain JND threshold value of the corresponding LCU block;
the pixel domain JND model provided in step S1 is:
Figure BDA0003716542240000022
wherein, p (x, y) is the gray value of the pixel point with the coordinate (x, y) in the original video LCU block, m is the side length of the LCU, JND LCU For pixel domain JND thresholds of LCU blocks, the prediction function is implemented based on a J-VGGNet network.
The pixel domain JND prediction model is based on a large-scale JND subjective test data set, video, which comprises 220 sequences, each sequence comprises four resolutions, namely 1920 × 1080, 1280 × 720, 960 × 540 and 640 × 360, each video is coded by using 52 QP values from 0 to 51, and 45760 video sequences.
The step S2 specifically comprises the following steps: taking the JND threshold value obtained in the step S1 as a perception sensitivity adjustment factor of the objective distortion SSE, correcting the SSE in the original rate distortion formula of the objective distortion into perception distortion more conforming to the characteristics of human eyes, and deducing an adaptive Lagrange multiplier adjustment factor based on a JND model to be applied to perception video coding; the method specifically comprises the following steps:
step S21, setting JND threshold value JND in JND prediction model LCU As the perception sensitivity adjusting factor of the objective distortion, the objective distortion in the rate distortion formula is corrected into the perception distortion D which is more in line with the characteristics of human eyes p
Step S22, for each LCU block, respectively using the common distortion index SSE and the perception distortion index D p Solving the optimal rate-distortion cost solution, and calculating the coding bits consumed by the whole video image frame;
step S23, deducing an adaptive Lagrange multiplier adjustment factor of the ith LCU block based on JND: the calculation method is as follows:
Figure BDA0003716542240000031
wherein N represents the number of LCUs of the current frame,
Figure BDA0003716542240000032
the JND threshold, c, representing the ith LCU, is a constant to prevent numerical instability.
In step S3, lagrange multiplier weight coefficient omega is deduced based on the significance model i The method is used for optimizing the distribution of the coding bits of the salient region and the non-salient region, and specifically comprises the following steps;
step S31, starting from the overall situation of a saliency map of the saliency model, determining the saliency for each LCU block;
step S32, defining the significance of the LCU as:
Figure BDA0003716542240000033
wherein s is i For the significance weight, p, of the ith LCU in the entire frame i Is the pixel average, p, of the ith LCU in the saliency map avg The average value of the pixel points of the whole frame of the saliency map is obtained. If s i Greater than 1, the LCU is defined as a significant LCU; otherwise, the LCU is not significant;
step S33, determining a significant weight coefficient omega of the corresponding LCU according to the significant degree distribution condition i And adjusting the objective distortion of each LCU based on the significance;
step S34, in order to balance the global code rate and distortion, lagrange multipliers in a rate distortion formula are weighted based on a significance model:
Figure BDA0003716542240000034
wherein
Figure BDA0003716542240000041
Representing the ith LCU based on a significance weighted lagrange multiplier,
Figure BDA0003716542240000042
representing a Lagrange multiplier in an original rate distortion formula;
in the allocation of coding bits for salient regions and non-salient regions of video applications, for an LCU of a region of interest, an encoder tends to select a coding mode with less objective distortion and more coding bits; for LCUs in regions of no interest, the encoder selects a coding mode with fewer coding bits.
In step S4, the results obtained in step S2 and step S3 are combined to obtain an improved perceptual rate-distortion optimization method, and the calculation method is expressed by a formula as follows:
Figure BDA0003716542240000043
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003716542240000044
representing the objective distortion, R, of the ith LCU block i Representing the code rate of the code.
In the above-mentioned scheme, the first and second light sources,
JND is Just Noticeable Distortion, namely Just Noticeable Distortion of human visual system, for Chinese short;
RDO (Rate-Distortion Optimization), namely Rate-Distortion Optimization, for short, chinese is Rate-Distortion Optimization;
SSE, sum of Square Error, chinese for short, is objective distortion.
The invention relates to the technical field of rate distortion optimization in video coding, and provides a rate distortion optimization method based on visual perception, aiming at the current situation that the subjective perception redundancy of video content is not fully considered in the existing rate distortion optimization method. Firstly, deriving Lagrange multiplier factors through a data-driven JND prediction model to effectively reduce perceptual redundancy in a video; then optimizing Lagrange multiplier weight coefficients by using a significance model so as to reasonably realize the distribution of coding bits of a significant region and a non-significant region; and finally, fully fusing the two models to obtain a Lagrange factor based on perceptual coding, and realizing rate distortion optimization of the video coding standard.
Compared with the prior art, the invention has the following beneficial effects:
1. the existing RDO method mainly eliminates the time and space redundancy in the video and ignores the subjective perception redundancy. Aiming at the current situation, the invention introduces human visual characteristics in video coding and eliminates perception redundancy;
2. at present, the pixel domain JND threshold value deduced according to a mathematical model cannot accurately describe the visual characteristics of human eyes, and the perceptual redundancy in videos cannot be fully eliminated. Aiming at the current situation, the data-driven pixel domain JND model is constructed based on the deep neural network and the large JND subjective database, the constructed model can adapt to more scene images, and the predicted pixel domain JND threshold value is more consistent with the visual perception experience of real human eyes;
3. the invention uses a rate distortion method based on the salient characteristics to realize the reasonable distribution of the coded bits in the salient region and the non-salient region and eliminate partial perception redundancy. In addition, the used end-to-end neural network significance detection model is lighter and better in performance compared with other recent significance models based on the neural network;
4. the invention integrates a data-driven pixel domain JND prediction model and a significance model, provides a rate-distortion optimization method based on visual perception, can improve the video coding compression rate on the premise of ensuring the video perception quality, and has certain reference significance and application value for the research in the field of perceptual video coding.
Drawings
The invention is described in further detail below with reference to the following figures and detailed description:
FIG. 1 is a schematic flow chart of a rate-distortion optimization method of the present invention;
FIG. 2 is a schematic flow diagram of a data-driven JND prediction model;
FIG. 3 is a schematic diagram of the saliency map detection effect;
fig. 4 is a block division diagram.
Detailed Description
As shown in the figure, a rate-distortion optimization method based on visual perception comprises the following steps;
s1, establishing a JND prediction model of a video application pixel domain by using a JND subjective test data set which accords with a just noticeable distortion standard of human eyes, and obtaining a JND threshold which accords with the visual perception of the human eyes from the model;
s2, deriving Lagrange multiplier factors according to the JND threshold obtained in the S1, wherein the Lagrange multiplier factors are used for reducing perceptual redundancy in the video and reducing video data which cannot be perceived by human eyes from the video in the encoding process;
s3, optimizing Lagrange multiplier weight coefficients by using a video playing significance model so as to optimize the distribution of coding bits of a significant region and a non-significant region in the video coding process;
and S4, performing self-adaptive adjustment on the Lagrange multiplier of the LCU level based on the video content perception characteristic according to the results obtained in the steps S2 and S3, and fusing the video application to rate distortion optimization so as to fully exert the advantages of the two models.
In the step S1, a data-driven pixel domain JND prediction model is constructed based on a large-scale JND subjective test data set, and the method specifically comprises the following steps;
s11, dividing each frame of an original video into N multiplied by N LCUs to increase the diversity and the number of training samples, improve the generalization performance during neural network training and enable a model to adapt to more video coding scenes;
s12, calculating an average pixel domain JND threshold of an original video LCU block, taking the average pixel domain JND threshold as a training sample of a JND prediction model, and predicting the pixel domain JND threshold of the video to be coded by using a J-VGGNet deep neural network;
the average pixel domain JND threshold formula for obtaining the original video LCU block is as follows:
Figure BDA0003716542240000061
where l (i, j) is the pixel value of the original video, l' (i, j) is the pixel value of the corresponding JND video, and m is the side length of the LCU block.
Inputting a pixel domain JND prediction model into a gray map matrix of each original video LCU block, and outputting a target as a pixel domain JND threshold value of the corresponding LCU block;
the pixel domain JND model provided in step S1 is:
Figure BDA0003716542240000062
wherein, p (x, y) is the gray value of the pixel point with the coordinate (x, y) in the original video LCU block, m is the side length of the LCU, JND LCU For pixel domain JND thresholds of LCU blocks, the prediction function is implemented based on a J-VGGNet network.
The pixel domain JND prediction model is based on a large-scale JND subjective test data set, video, which comprises 220 sequences, each sequence comprises four resolutions, namely 1920 × 1080, 1280 × 720, 960 × 540 and 640 × 360, each video is coded by using 52 QP values from 0 to 51, and 45760 video sequences.
The step S2 specifically includes: taking the JND threshold value obtained in the step S1 as a perception sensitivity adjustment factor of the objective distortion SSE, correcting the SSE in the original rate distortion formula of the objective distortion into perception distortion more conforming to the characteristics of human eyes, and deducing an adaptive Lagrange multiplier adjustment factor based on a JND model to be applied to perception video coding; the method specifically comprises the following steps:
step S21, setting JND threshold value JND in JND prediction model LCU As the perception sensitivity adjusting factor of the objective distortion, the objective distortion in the rate distortion formula is corrected into the perception distortion D which is more in line with the characteristics of human eyes p (ii) a Step S22, for each LCU block, respectively using the common distortion index SSE and the perception distortion index D p Solving the optimal rate-distortion cost solution, and calculating the coding bits consumed by the whole video image frame;
step S23, deducing an adaptive Lagrange multiplier adjustment factor of the ith LCU block based on JND: the calculation method is as follows:
Figure BDA0003716542240000071
wherein N represents the number of LCUs of the current frame,
Figure BDA0003716542240000072
the JND threshold, c, representing the ith LCU, is a constant to prevent numerical instability.
In step S3, a Lagrange multiplier weight coefficient omega is deduced based on the significance model i The method is used for optimizing the distribution of the coding bits of the salient region and the non-salient region, and specifically comprises the following steps;
step S31, starting from the overall situation of a saliency map of the saliency model, determining the saliency for each LCU block;
step S32, defining the significance of the LCU as:
Figure BDA0003716542240000073
wherein s is i For the significance weight, p, of the ith LCU in the entire frame i Is the pixel average, p, of the ith LCU in the saliency map avg The average value of the pixel points of the whole frame of the saliency map is obtained. If s is the block division diagram of the original image and the saliency map shown in fig. 4 i Greater than 1, then the LCU is defined as a significant LCU; otherwise, the LCU is not significant;
step S33, determining the significant weight coefficient omega of the corresponding LCU according to the significant degree distribution condition i And adjusting the objective distortion of each LCU based on the significance;
step S34, in order to balance the global code rate and distortion, weighting Lagrange multipliers in a rate distortion formula based on a significance model:
Figure BDA0003716542240000074
wherein
Figure BDA0003716542240000075
Representing the ith LCU based on a significance weighted lagrange multiplier,
Figure BDA0003716542240000076
representing a Lagrange multiplier in an original rate distortion formula;
in the allocation of coding bits for salient regions and non-salient regions of video applications, for an LCU of a region of interest, an encoder tends to select a coding mode with less objective distortion and more coding bits; for the LCU of the region of non-interest, the encoder selects a coding mode with less coding bits.
In step S4, combining the results obtained in step S2 and step S3 to obtain an improved perceptual rate-distortion optimization method, where the calculation method is expressed by a formula as follows:
Figure BDA0003716542240000081
wherein the content of the first and second substances,
Figure BDA0003716542240000082
representing the objective distortion, R, of the ith LCU block i Representing the code rate of the code.
Example (b):
in order to verify the effectiveness of the method for eliminating visual perception redundancy, the Y-PSNR is used as an index in an experiment, the method, the existing time domain rate distortion optimization (method one) based on the hierarchy and the Lagrange multiplier method (method two) based on the reference structure are respectively compared with the rate distortion method of the AVS3 standard, and the experimental result is shown in the table 1.
TABLE 1 BD-Rate comparison of the existing method with the AVS Standard method
Figure BDA0003716542240000083
Data show that the method can save 6.13% of code rate on average, and compared with the existing method, the experimental result of the method is far superior to that of a comparison method. Therefore, the method provided by the invention fully considers the minimum threshold visible to human eyes and the visual attention, can better eliminate the perception redundancy, and carries out non-equal distribution on the code rate resources, thereby more effectively saving the coding code rate and better realizing the perception coding optimization.
In summary, the invention provides a rate-distortion optimization method combining a JND model and a significance model. According to the method, through the research on a human eye vision system, on the basis of fully considering the minimum threshold value and the visual attention which can be perceived by human eyes, the Lagrange factor is adjusted to eliminate the perception redundancy in video coding and adaptively allocate coding bits, so that the optimization of a hybrid video encoder is facilitated, the video perception quality is improved, and the research on a perception video coding scheme is promoted.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any person skilled in the art may modify or modify the technical details disclosed above into equivalent embodiments with equivalent variations. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

Claims (7)

1. A rate-distortion optimization method based on visual perception is characterized by comprising the following steps: comprises the following steps;
s1, establishing a JND prediction model of a video application pixel domain by using a JND subjective test data set which accords with a just noticeable distortion standard of human eyes, and obtaining a JND threshold which accords with the visual perception of the human eyes from the model;
s2, deriving Lagrange multiplier factors according to the JND threshold obtained in the S1, wherein the Lagrange multiplier factors are used for reducing perceptual redundancy in the video and reducing video data which cannot be perceived by human eyes from the video in the encoding process;
s3, optimizing Lagrange multiplier weight coefficients by using a significance model of video playing so as to optimize the distribution of coding bits of a significant region and a non-significant region in the video coding process;
and S4, according to the results obtained in the steps S2 and S3, performing self-adaptive adjustment on the Lagrange multiplier of the LCU level based on the video content perception characteristic, and fusing the video application with rate distortion optimization to fully exert the advantages of the two models.
2. A method of visual perception-based rate-distortion optimization according to claim 1, wherein:
in the step S1, a data-driven pixel domain JND prediction model is constructed based on a large-scale JND subjective test data set, and the method specifically comprises the following steps;
s11, dividing each frame of an original video into N multiplied by N LCUs to increase the diversity and the number of training samples and improve the generalization performance during neural network training;
and S12, calculating an average pixel domain JND threshold of an original video LCU block, taking the average pixel domain JND threshold as a training sample of a JND prediction model, and predicting the pixel domain JND threshold of the video to be coded by using a deep neural network.
3. A method of visual perception-based rate-distortion optimization according to claim 1, wherein: the step S2 specifically comprises the following steps: taking the JND threshold value obtained in the step S1 as a perception sensitivity adjustment factor of the objective distortion SSE, correcting the SSE in the original rate distortion formula of the objective distortion into perception distortion more conforming to the characteristics of human eyes, and deducing an adaptive Lagrange multiplier adjustment factor based on a JND model to be applied to perception video coding; the method specifically comprises the following steps:
step S21, setting JND threshold value JND in JND prediction model LCU As the perception sensitivity adjusting factor of the objective distortion, the objective distortion in the rate distortion formula is corrected into the perception distortion D which is more in line with the characteristics of human eyes p
Step S22, for each LCU block, respectively using the common distortion index SSE and the perception distortion index D p Solving the optimal rate-distortion cost solution, and calculating the coding bits consumed by the whole video image frame;
step S23, deducing an adaptive Lagrange multiplier adjustment factor of the ith LCU block based on JND:
the calculation method is as follows:
Figure FDA0003716542230000021
wherein N represents the number of LCUs of the current frame,
Figure FDA0003716542230000026
the JND threshold, c, representing the ith LCU, is a constant to prevent numerical instability.
4. A method for rate-distortion optimization based on visual perception according to claim 1, characterized in that: in step S3, based on displayLagrange multiplier weight coefficient omega is deduced by a saliency model i The method is used for optimizing the distribution of the coding bits of the salient region and the non-salient region, and specifically comprises the following steps;
s31, starting from the overall situation of a saliency map of the saliency model, determining the saliency for each LCU block;
step S32, defining the significance of the LCU as:
Figure FDA0003716542230000022
wherein s is i Is the significance specific gravity, p, of the ith LCU in the whole frame i Is the pixel average, p, of the ith LCU in the saliency map avg The average value of the pixel points of the whole frame of the saliency map is obtained. If s i Greater than 1, then the LCU is defined as a significant LCU; otherwise, the LCU is not significant;
step S33, determining a significant weight coefficient omega of the corresponding LCU according to the significant degree distribution condition i And adjusting the objective distortion of each LCU based on the significance;
step S34, in order to balance the global code rate and distortion, weighting Lagrange multipliers in a rate distortion formula based on a significance model:
Figure FDA0003716542230000023
wherein
Figure FDA0003716542230000024
Representing the ith LCU based on a significance weighted lagrange multiplier,
Figure FDA0003716542230000025
representing a Lagrange multiplier in an original rate distortion formula;
in the allocation of coding bits for salient regions and non-salient regions of video applications, for an LCU of a region of interest, an encoder tends to select a coding mode with less objective distortion and more coding bits; for LCUs in regions of no interest, the encoder selects a coding mode with fewer coding bits.
5. A method of visual perception-based rate-distortion optimization according to claim 1, wherein: in step S4, the results obtained in step S2 and step S3 are combined to obtain an improved perceptual rate-distortion optimization method, and the calculation method is expressed by a formula as follows:
Figure FDA0003716542230000031
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003716542230000032
representing the objective distortion, R, of the ith LCU block i Representing the code rate of the code.
6. A method of visual perception-based rate-distortion optimization according to claim 2, wherein: the input of the pixel domain JND prediction model is a gray map matrix of each original video LCU block, and the target output is a pixel domain JND threshold value of the corresponding LCU block;
the pixel domain JND model proposed in the step S1 is as follows:
Figure FDA0003716542230000033
wherein p (x, y) is the gray value of pixel point with coordinate (x, y) in the original video LCU block, m is the side length of LCU, JND LCU For pixel domain JND thresholds of LCU blocks, the prediction function is implemented based on a J-VGGNet network.
7. A method of visual perception-based rate-distortion optimization according to claim 2, wherein: the data-driven pixel domain JND prediction model is based on a large-scale JND subjective test data set VideoSet, the data set comprises 220 sequences, each sequence comprises four resolutions, namely 1920 × 1080, 1280 × 720, 960 × 540 and 640 × 360, each video is coded by using 52 QP values from 0 to 51, and 45760 video sequences are obtained;
in step S11, the original video and the corresponding JND video are preprocessed, and each frame of the video is divided into nxn LCUs, so as to increase the diversity and number of training samples, improve the generalization performance during neural network training, and adapt the model to more video coding scenes; further, calculating the mean square error of each original video and the corresponding JND video LCU to obtain the average pixel domain JND threshold of the original video LCU block, wherein the formula is as follows:
Figure FDA0003716542230000034
wherein l (i, j) is a pixel value of an original video, l' (i, j) is a pixel value of a corresponding JND video, and m is a side length of an LCU block;
in step S12, the result obtained in step S1 is used as a training sample of a JND prediction model, and a J-VGGNet network is used for predicting a pixel domain JND threshold of the video to be coded.
CN202210744549.6A 2022-06-27 2022-06-27 Rate distortion optimization method based on visual perception Pending CN115174898A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210744549.6A CN115174898A (en) 2022-06-27 2022-06-27 Rate distortion optimization method based on visual perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210744549.6A CN115174898A (en) 2022-06-27 2022-06-27 Rate distortion optimization method based on visual perception

Publications (1)

Publication Number Publication Date
CN115174898A true CN115174898A (en) 2022-10-11

Family

ID=83490163

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210744549.6A Pending CN115174898A (en) 2022-06-27 2022-06-27 Rate distortion optimization method based on visual perception

Country Status (1)

Country Link
CN (1) CN115174898A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115665415A (en) * 2022-10-27 2023-01-31 华医数字(湖北)医疗技术股份有限公司 Perception-based interframe image coding rate distortion optimization method and system
CN115988201A (en) * 2023-03-14 2023-04-18 杭州微帧信息科技有限公司 Method, apparatus, electronic device and storage medium for encoding film grain

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115665415A (en) * 2022-10-27 2023-01-31 华医数字(湖北)医疗技术股份有限公司 Perception-based interframe image coding rate distortion optimization method and system
CN115665415B (en) * 2022-10-27 2023-09-29 华医数字(湖北)医疗技术股份有限公司 Inter-frame image coding rate distortion optimization method and system based on perception
CN115988201A (en) * 2023-03-14 2023-04-18 杭州微帧信息科技有限公司 Method, apparatus, electronic device and storage medium for encoding film grain
CN115988201B (en) * 2023-03-14 2023-05-30 杭州微帧信息科技有限公司 Method, apparatus, electronic device and storage medium for encoding film grain

Similar Documents

Publication Publication Date Title
CN115174898A (en) Rate distortion optimization method based on visual perception
CN108063944B (en) Perception code rate control method based on visual saliency
CN106358040B (en) Code rate control bit distribution method based on significance
CN102300094B (en) Video coding method
CN108900838B (en) Rate distortion optimization method based on HDR-VDP-2 distortion criterion
CN108200431B (en) Bit allocation method for video coding code rate control frame layer
CN111970511B (en) VMAF-based perceptual video rate distortion coding optimization method and device
CN111193931B (en) Video data coding processing method and computer storage medium
CN110139112B (en) Video coding method based on JND model
CN103051901A (en) Video data coding device and video data encoding method
CN110708570B (en) Video coding rate determining method, device, equipment and storage medium
WO2023134523A1 (en) Content adaptive video coding method and apparatus, device and storage medium
CN111447446B (en) HEVC (high efficiency video coding) rate control method based on human eye visual region importance analysis
CN114900692A (en) Video stream frame rate adjusting method and device, equipment, medium and product thereof
CN112825557A (en) Self-adaptive sensing time-space domain quantization method aiming at video coding
CN113556544B (en) Video coding method, device, equipment and storage medium based on scene self-adaption
CN108881905B (en) Probability-based intra-frame encoder optimization method
CN112218084B (en) High-efficiency video coding standard frame-level code rate control method facing surveillance video
KR100557618B1 (en) Bit rate control system based on object
CN111476866B (en) Video optimization and playing method, system, electronic equipment and storage medium
CN110365981B (en) Video coding method and device, electronic equipment and storage medium
Yang et al. Just-noticeable-difference based coding and rate control of mobile 360° video streaming
CN112437301A (en) Code rate control method and device for visual analysis, storage medium and terminal
CN111757112B (en) HEVC (high efficiency video coding) perception code rate control method based on just noticeable distortion
CN108737826B (en) Video coding method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination