CN113099226A - Multi-level perception video coding algorithm optimization method for smart court scene - Google Patents
Multi-level perception video coding algorithm optimization method for smart court scene Download PDFInfo
- Publication number
- CN113099226A CN113099226A CN202110384146.0A CN202110384146A CN113099226A CN 113099226 A CN113099226 A CN 113099226A CN 202110384146 A CN202110384146 A CN 202110384146A CN 113099226 A CN113099226 A CN 113099226A
- Authority
- CN
- China
- Prior art keywords
- quantization
- parameter
- perception
- algorithm
- modules
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/625—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Discrete Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention discloses a multilevel perception video coding algorithm optimization method facing to a smart court scene, which comprises the following steps: s1, constructing a multi-level perception coding frame, realizing perception code rate control by combining content self-adaptive bit allocation, time domain perception quantization control and space domain perception quantization control, realizing mode decision of effective intra-frame and inter-frame prediction by psychovisual rate distortion optimization, and realizing coefficient level perception quantization by coefficient level psychovisual rate distortion optimization quantization; s2, analyzing the correlation among the modules, measuring quantitative parameters of the correlation among the modules, evaluating the influence degree between the two algorithm modules, converting the complex multi-module optimization into continuous single-module optimization by selecting key control parameters, and determining the algorithm decision sequence of a plurality of customizable modules; s3, constructing an online self-adaptive parameter model, and constructing a content self-adaptive parameter calculation model by utilizing the intrinsic relevance among the modules.
Description
Technical Field
The invention relates to the field of video coding, in particular to a multilevel perception video coding algorithm optimization method facing to a smart court scene.
Background
A plurality of business links such as intelligent court trial, mediation and execution need to record videos, and browsing, analyzing and intelligent processing are carried out according to the videos so as to improve the intelligent level of court business. Due to a plurality of cases, video storage management faces a large pressure, and video data are continuously compressed further so as to improve the video processing capacity of court business.
Video coding standards have greatly accelerated video applications over the past thirty years. Video compression is achieved by removing redundancy in the original video sequence, and signal processing-based video coding techniques are approaching the upper compression limit, at the cost of an exponential increase in computational complexity. In comparison, there is a potential to mine in further eliminating perceptual redundancy. The Human Visual System (HVS) is the ultimate judger of visual quality for reconstructed video and it has several important perceptual properties that can be exploited to improve coding performance without significantly degrading perceptual quality.
In the aspect of customizing and optimizing a video coding algorithm, the video coding algorithm has two stages of tasks, namely an algorithm framework (algorithm control flow) and key algorithm parameter selection. The former task is to design what algorithms are used, such as full search, three-step search and diamond search in the motion estimation module. The latter task is to determine the optimal control parameters, e.g. select the search range, given the reference pixel precision of the diamond search algorithm, by balancing the rate-distortion performance and the computational complexity, given the algorithm flow. Generally, these two tasks are jointly considered in single-module-level algorithm optimization, in which case a plurality of algorithm-customizable modules should be jointly optimized, which is a very complex problem. Rate Distortion Optimization (RDO) is widely used as a theoretical basis for algorithm optimization of multi-level customizable modules in video encoders, including rate control, mode decision, motion estimation, transformation, and quantization. Complex inter-module relationships exist between these interacting modules.
The current research only carries out intensive research on algorithm optimization of a single module, such as quantization, mode decision, motion estimation and code rate control. From the perspective of depth optimization, the academic community still lacks research on multi-level perceptual coding, and the joint optimization of inter-module correlation analysis and multi-level perceptual coding is still elusive in open literature, including the following technical problems:
(1) a multi-level perceptual coding algorithm framework is lacked;
(2) complex relationships exist among all algorithm modules of the multi-level perception coding framework, and how to quantify the relevance among the modules is determined to evaluate the influence degree among the modules;
(3) how to determine the algorithm decision sequence of a plurality of customizable modules through a method and realize the conversion of a complex multi-module optimization problem into a continuous single-module optimization problem through selecting a series of key control parameters.
(4) How to utilize the intrinsic relevance among the modules and put forward a content self-adaptive parameter calculation model to realize the online self-adaptive multi-module combined optimization.
Disclosure of Invention
In order to solve the defects of the prior art and achieve the purpose of improving the perception quality of the video, the invention adopts the following technical scheme:
a multilevel perception video coding algorithm optimization method facing to a smart court scene comprises the following steps:
s1, constructing a multi-level perception coding frame, realizing perception code rate control by combining content self-adaptive bit allocation, time domain perception quantization control and space domain perception quantization control, realizing mode decision of effective intra-frame and inter-frame prediction in the perception RDO sense by psychovisual rate distortion optimization, and realizing coefficient level perception quantization by coefficient level psychovisual rate distortion optimization quantization;
the content self-adaptive bit allocation is realized by combining a lookup head pre-analysis based on a sliding window and frame-level perception complexity measurement, wherein the pre-analysis adopts simplified motion estimation and mode decision to track the space-time characteristics of a coded video, and the frame-level perception complexity measurement is realized by measuring the perception fuzzy complexity of perception bit allocation and adopting a perception complexity quantization model to obtain a frame-level quantization parameter QpfrmRealizing frame-level quantization control;
the time-domain perceptual quantization control analyzes content-adaptive bit allocation through time-domain Qp concatenation to reduce time-domain distortion fluctuations, which is achieved with the help of video content analysis based on lookup head pre-analysis, and obtains Δ Qp for adaptive adjustment of Qp for time-domain fine quantization controlTemp;
The spatial domain perceptual quantization control is illustrated by a content adaptive bit allocation using a spatial masking effect, which employs a quantization adjustment parameter Δ QpVAQSelf-adaptively adjusting the Qp to carry out space-domain fine-grained quantization control;
finally, the final quantization parameter delta Qp is obtainedfinal:
ΔQpfinal=Qpfrm+ΔQpTemp+ΔQpVAQ (1)
The psycho-visual rate-distortion optimization is performed by an improved perceptual distortion metric DRDOReplacing the traditional MSE;
the psycho-visual rate-distortion optimization quantization is measured by perceptual distortion DquantReplacing the traditional MSE;
s2, analyzing correlation among modules, measuring quantitative parameters of the correlation among the modules, evaluating the influence degree between two algorithm modules, converting a complex multi-module optimization problem into a continuous single-module optimization problem by selecting a series of key control parameters, determining the algorithm decision sequence of a plurality of customizable modules, and providing a scheme for searching parameter sets by balancing rate-distortion performance;
s3, constructing an online self-adaptive parameter model, and constructing a content self-adaptive parameter calculation model by utilizing the intrinsic relevance among the modules.
Further, the content adaptive bit allocation is frame-level bit allocation and quantization control, complexity adaptive bit allocation, frame-level quantization control achieved by exploiting coarse-grained HVS features including temporal contrast sensitivity function and temporal masking effect, inherited from empirical models employed in open-source MPEG-4xvid and h.264avc x264 encoders, which work measures perceptual content complexity using a qcomp domain compression model, the original being based on a qcomp domain compression modelComplexity of SATD Cplx Using Cplx1 -qcompThe model is compressed, qcomp is at [0.5,1 ]]And (3) a compression constant for compressing Cplx, dynamically estimating a quantization step size qscale for frame-level scaling by adjusting a rate scaling factor Rfactor:
in the h.264/AVC and h.265/HEVC standards, the quantization parameter Qp is mapped to qscale by equation (3):
where c is a constant, the frame-level quantization parameter Qp is obtained according to equations (2) and (3)frm:
The fuzzy complexity compression model is inspired by time-domain HVS characteristics, and areas with high fuzzy complexity have complex textures or high-motion areas. The HVS is insensitive to high frequency component distortion in these regions. Thus, the distortion of these regions is relatively imperceptible, i.e., these complex regions can hide larger coding distortion, the target code rates allocated to these complex regions will be reduced by the compression complexity, and these saved code rates can be allocated to regions that are more sensitive to the human eye, thereby enabling perceptually adaptive bit allocation and quantization control and improving coding RpD performance in the sense of HVS perception.
Further, the temporal perceptual quantization control is a temporal quantization parameter concatenation, in video coding, the distortion of an I-frame and a previous P-frame is propagated to a following P-frame and a B-frame due to successive inter-prediction in a group of pictures (GOP), for which inter-prediction the quality of a reference frame obviously has a direct impact on the quality of the current frame, in order to reduce temporal distortion propagation to improve the videoThe visual quality of the frame I and the previous P frame is ensured to be small in distortion, and the x264 and the x265 adopt MBTree and CUTree quantization control algorithms to fully utilize HVS characteristics and weight according to the reference importance, wherein the parameter comprises a parameter thetaTemp、ζintra、γpropagateAdaptively allocating target code rates between coding units (macroblock and CU) to smooth temporal distortion fluctuation, and using reference importance weights to adjust QP, Δ Qp, of coding blocksTempThe calculation formula is as follows:
wherein, thetaTempIs a control strength parameter, ζ, of a quantitative control algorithmintraIs an intra-prediction cost, gamma, based on SATDpropagateIs the inter-frame transfer cost, the transfer cost of the current block to the block to which it refers is measured.
Further, the spatial domain perception quantization control is variance adaptive quantization, spatial masking is an important feature of HVS, human eyes are more sensitive to distortion of flat regions than distortion of high texture regions, the feature is usually used for assisting the spatial adaptive quantization, a VAQ algorithm smoothes distortion fluctuation between adjacent blocks in the flat regions of textures and reduces blurring effect of the relatively flat regions by using the spatial domain masking effect, the perception quality improvement of the regions with flat textures is realized at the cost of quality degradation in the regions with complex textures, the VAQ is cooperated with a time domain Qp cascade algorithm to realize block-level perception quantization control, and in x265, the VAQ algorithm has 4 modes, and overall Δ Qp is a function of the overall Δ QpVAQIs calculated as follows:
ΔQpVAQ=θVAQ×(var-varadjust) (6)
wherein, thetaVAQControl strength parameters, var and var, which are variance adaptive quantizationadjustRespectively, the variance and the variance adjustment value of the current block.
Further, the psycho-visual rate-distortion optimization is J'1=DRDO+λ1×R1Replace the conventional RDO code J1=D1+λ1×R1,λ1Lagrange factor, R, representing psycho-visual rate-distortion optimization1Representing the number of coding bits for psycho-visual rate-distortion optimization, so that a perceptual RDO-based mode decision can be implemented to determine a perceptually optimal coding mode from the candidate modes;
the psycho-visual rate distortion optimization (Psyrdo), visual studies show that the human eye not only wants the reconstructed image to look similar to the original image, but also wants the image to have similar content complexity, i.e. we prefer to see a somewhat distorted but still detailed block instead of a block that is not distorted but completely blurred, in the Psyrdo algorithm, SSD (Single Shot multiple box Detector) is perceptually distorted DRDOInstead, the calculation is as follows:
DRDO=SSD+λpsy_rdo×psyrdo×psycost (7)
wherein λ ispsy_rdoIs a control parameter related to the quantization parameter, psyro is a control strength parameter for psycho-visual rate-distortion optimization, and psycost is the energy difference between the original block and the reconstructed block, defined as follows:
where SATD, SAD are used to measure block complexity distortion and the subscripts rec and ori denote reconstructed and original blocks, respectively.
Further, the psycho-visual rate-distortion optimization quantization is to J'2=Dquant+λ2×R2Replace the conventional RDO encoding J2=D2+λ2×R2,λ2Lagrange factor, R, representing psycho-visual rate-distortion optimized quantization2A coding bit number representing psycho-visual rate-distortion optimized quantization;
the psycho-visual rate-distortion optimized quantization (psyquant), in a conventional hard-decision quantization (HDQ) algorithm, does not consider inter-coefficient between adjacent coefficients within a blockCorrelation, for context coding in CABAC, the quantization strength of each Discrete Cosine Transform (DCT) coefficient essentially depends not only on how its neighboring DCT coefficients are quantized, but also on how all quantized DCT coefficients are entropy coded. Soft Decision Quantization (SDQ) is thus proposed to achieve coefficient-level rate-distortion optimized quantization, SDQ employing dynamic programming such as the viterbi search algorithm and converting a complex multi-effect rate optimized quantization problem into a grid-based shortest path search problem, RDOQ-based perceptual distortion DquantThe calculation is as follows:
Dquant=diff×diff-psyrdoq×|trec| (9)
where diff × diff is the standard SSD, psyrdoq × | trecAnd | is the product of the control intensity of the psycho-visual rate-distortion optimization quantization and the reconstruction coefficient obtained after DCT inverse transformation.
Further, the inter-module correlation analysis is based on algorithm modules of content adaptive bit allocation (frame-level quantization control of qcomp domain complexity), time domain perceptual quantization control (tree-shaped time domain Qp cascade cube), spatial domain perceptual quantization control (VAQ), psychovisual rate distortion optimization (Psyrdo) and psychovisual rate distortion optimization quantization (psyquant), and mutual influence between the algorithm modules is quantitatively tested through key parameters of the algorithm modules.
Further, the relationship between the parameter curtreestength of the time-domain perceptual quantization control and the parameter qcomp of the content adaptive bit allocation is as follows:
cutreestrength=5×(1-qcomp) (10)
wherein Curtreestength is a control strength parameter of a quantitative control algorithm, and qcomp is a compression constant.
Furthermore, for parameters with continuous values, the value range is discretized into a plurality of parameter values by setting discrete step length, and since the number of calculated parameter combinations is huge, a method must be designed to simplify the complicated multi-module algorithm optimization problem, and the RD performance change amounts caused by the modification of two algorithm modules are respectively νiV and vj,νiOr vjIs the module i or j is turned on relative to allAnd v is the average BD-VMAF difference value when the modules i and j are opened together relative to the average BD-VMAF difference value when all the perception coding tools are closed, and the correlation between the two algorithm modules is defined as follows:
φij=ν-(νi+νj) (11)。
through experiments, it is known that the RD performance changes generated by the algorithm of the two modules customized simultaneously are not equal to the RD performance changes generated by the algorithm of the two modules customized separately, so that the influence of the algorithm modification of each module on the performance of the overall coding RD does not satisfy a linear relationship, that is, the performance between the modules is coupled.
Obtaining a difference phi between each algorithm moduleijThen, the inter-module correlation level of a single module is defined as:
according to thetaiThe priority level of each algorithm module decision is determined according to the size of the algorithm module, and the related parameters of each module are optimized in sequence according to the priority level from large to small. Therefore, the complex multi-module joint optimization problem is simplified into a continuous single-module optimization problem, and the number of parameter combinations is greatly reduced.
Furthermore, the construction of the online adaptive parameter model is carried out, and the analysis of the correlation among the modules shows that the influence of the algorithm modification of a single module on the performance of the overall coding RD does not satisfy the linear relation, the correlation between two modules has positive or negative, and the correlation phi of different video sequences is positive or negativeijThe size of the correlation depends on the video content, and the characteristic parameter omega for representing the image contentjLet var in the formula (6) be ω1Gamma in the formula (5)propagateAs ω2Let PSYCOST in equation (7) be ω3Will be | t in the formula (9)recL as ω4Let Cplx in formula (2) be ω5Furthermore, in the case where the output code rate distribution is differentAccording to the two results, in order to further improve the result after the off-line optimization, on the basis of the parameter combination obtained by the off-line optimization, a content adaptive parameter offset model is constructed for five non-discrete parameters:
wherein a and b are constants, phiijIs the inter-module correlation between two modules, ωjIs a characteristic parameter, β, characterizing the image content in each moduleiAre parameters used to control the dynamic range of each parameter,is to describe Δ piCode rate change amount delta R corresponding to the code rate change amountiFunction of the relationship between h (ω)j) Is to describe Δ ωjCode rate change amount delta R corresponding to the code rate change amountjA function of the relationship between, while taking into account phiij、ωj、h(ωj) The value of the non-discrete parameter is adaptively adjusted according to the image content, and the combination of a plurality of perceptual coding parameters reaches the optimal value by equivalently reducing the code rate.
The invention has the advantages and beneficial effects that:
(1) the invention considers perceptual coding from the perspective of multi-level joint optimization. On one hand, perceptual video coding is realized from low to high layers, and quantization, code rate control, mode selection and the like are carried out; on the other hand, complex correlations between multi-level customizable algorithm modules are quantitatively studied.
(2) The algorithm decision priorities of all modules are determined according to the correlation level between the modules of each module, and the complex multi-module joint optimization problem can be simplified into a continuous single-module optimization problem. The difficulty in realizing the optimization of the multi-level perception video coding algorithm is greatly reduced.
(3) By utilizing the inherent relevance among the modules, a content self-adaptive parameter calculation model is provided, and the online self-adaptive multi-module combined optimization is realized.
(4) Compared with the Slow presetting of x265, the multi-module optimization method based on the correlation between the modules can obtain better visual perception quality under the condition of the same given code rate.
Drawings
Fig. 1 is a flow chart of the multi-level perceptual video coding optimization of the present invention.
FIG. 2a is a diagram of the relationship between multi-level perceptual coding tools in the present invention.
FIG. 2b is a schematic diagram of the relationship between multi-level perceptual coding tools in the present invention.
FIG. 3 is a diagram of a multi-level perceptual coding framework according to the present invention.
FIG. 4 is a diagram of the relationship between modules in the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
As shown in fig. 1, a multilevel perceptual video coding algorithm optimization method oriented to a smart court scenario first proposes an algorithm framework through inheritance and integration development of a mature algorithm, and proposes a multilevel perceptual coding framework through an algorithm process of inheriting maturity and full verification for all customizable modules, including frame-level bit allocation and quantization control (qcomp), time-domain quantization parameter cascade (cutree), spatial Variance Adaptive Quantization (VAQ), psycho-visual rate distortion optimization (psydo), and psycho-visual rate distortion optimization quantization (psydo).
Secondly, a quantitative parameter for measuring the correlation between the modules is provided for evaluating the influence degree between the two algorithm modules, complex relations exist between the five perception coding modules, and in order to quantitatively test the mutual influence between the modules, algorithm optimization is actually realized by selecting key parameters of the modules, so that the correlation level between the modules of each module is calculated.
Thirdly, a new method is provided for determining the algorithm decision sequence of a plurality of customizable modules, the relevant parameters of each module are optimized in sequence from large to small according to the relevance levels of the five module algorithm decisions and the priority levels, the complex multi-module optimization problem is converted into a continuous single-module optimization problem by selecting a series of key control parameters, and a scheme for searching parameter sets by balancing the rate-distortion performance is provided;
finally, a content adaptive parameter calculation model is designed by utilizing the inherent relevance among the modules, the relevance among the modules is different for different test sequences, the rate-sensing distortion (RpD) performance is related to the content of the video, so that the characteristic parameters related to the video content are extracted from five sensing coding modules, and the content adaptive parameter calculation model is designed by considering the influence of the code rate on the coding performance of different parameter combinations.
1. Multi-level perceptual coding framework
There are a number of algorithm customization modules that have no implementation details specified by the video standard, including coefficient level quantization, block level mode decision, and frame and GOP level rate control, among others. Quantization directly determines the coding rate and distortion, affecting the rate-distortion quantization behavior, which is very important for evaluating lagrangian coding costs in mode decision and rate control. The mode decision and rate control are also highly correlated, rate control aims at determining the quantization parameter chain in the space-time domain, thus determining distortion distribution and coding rate consumption distribution, and accurate rate control should be achieved by effective perceptual content adaptive bit allocation. In RD optimization mode decision, the lagrangian multiplier is typically dependent on the quantization parameter determined by the rate control. As shown in fig. 2a and b, in order to achieve the relationship between modules which are dependent on each other, a plurality of customizable modules affect the coding performance together, a complex relationship exists between the customizable modules, and the inter-module influence mechanism between the modules is crucial to the simultaneous optimization of a multi-module algorithm. However, this inherent mechanism of operation is very complex and the spatial-temporal rate distortion propagation exacerbates this difficulty. Different modules have different algorithm customization priorities according to the degree of association between the modules. If the dynamic optimization is applied to the performance optimization of the multilevel customizable module, the calculation complexity is too high to realize in real time, and an algorithm customization method with suboptimal but acceptable complexity is required.
Here we propose a framework for multi-level perceptual coding by inheriting a mature and well-validated algorithm flow for all customizable modules, as shown in fig. 3. Perceptual rate control is achieved by combining content-adaptive bit allocation (abbreviated qcomp), time-domain perceptual quantization control (quantization parameter concatenation, abbreviated cutree), and spatial-domain perceptual quantization control (variance-adaptive quantization, abbreviated VAQ). Content adaptive bit allocation is achieved by combining sliding window based lookahead analysis and frame level perceptual complexity measurement. The pre-analysis module employs simplified motion estimation and mode decision to track spatio-temporal features of the video to be encoded. The statistical information is used to measure the perceptual ambiguity complexity of the perceptual bit allocation. Obtaining a frame-level quantization parameter Qp by employing a perceptual complexity quantization modelfrmFrame-level quantization control is realized. The time domain Qp cascade analyzes the content adaptive bit allocation to reduce time domain distortion fluctuations. It is implemented with the help of a lookup head based video content analysis and obtains a delta Qp for adaptive adjustment of Qp for temporal fine quantization controlTemp. In addition, by exploiting the spatial masking effect, the VAQ is used to account for spatial quantization control through content adaptive bit allocation, which employs a quantization adjustment parameter Δ QpVAQAnd self-adaptively adjusting the Qp to perform spatial fine-grained quantization control. The final quantization parameter Δ QpfinalThe calculation is as follows:
ΔQpfinal=Qpfrm+ΔQpTemp+ΔQpVAQ (1)
psycho-visual rate-distortion optimization, abbreviated as psy-RDO, for implementing mode decisions for efficient intra and inter prediction in the sense of perceptual RDO by using an improved perceptual distortion metric DRDOReplacing the conventional MSE. Therefore, the first and second electrodes are formed on the substrate,conventional RDO encoding cost J1=D1+λ1×R1Is J'1=DRDO+λ1×R1Instead, a perceptual RDO-based mode decision may thus be implemented to determine a perceptually optimal coding mode from the candidate modes. Similarly, coefficient-level psycho-visual rate-distortion optimized quantization, abbreviated as psy-quant, is used to implement coefficient-level perceptual quantization, a perceptual distortion measure DquantIs used to replace the conventional MSE. Similarly, the conventional RDO cost J2=D2+λ2×R2Is of similar J'2=Dquant+λ2×R2And (4) substituting. These algorithm modules are analyzed in detail below.
(1) Frame-level bit allocation and quantization control (qcomp)
Complexity adaptive bit allocation achieves frame-level quantization control by exploiting coarse-grained HVS characteristics including temporal contrast sensitivity functions and temporal masking effects. Inherited from empirical models employed in open source MPEG-4xvid and H.264AVC x264 encoders, this work measured perceptual content complexity using a qcomp domain compression model, and the original SATD-based complexity Cplx used Cplx1-qcompThe model is compressed and then the frame-level scaled quantization step qscale is dynamically estimated by adjusting the rate scaling factor Rfactor:
in the H.264/AVC and H.265/HEVC standards, the quantization parameter Qp is formulated by
Mapping to qscale. Here, c is a constant. According to equations (2) and (3), the frame-level quantization parameter QpfrmThe calculation formula of (a) is as follows:
the fuzzy complexity compression model is inspired by time-domain HVS characteristics, and areas with high fuzzy complexity have complex textures or high-motion areas. The HVS is insensitive to high frequency component distortion in these regions. Thus, the distortion of these regions is relatively imperceptible, i.e., these complex regions can hide larger coding distortion, the target code rates allocated to these complex regions will be reduced by the compression complexity, and these saved code rates can be allocated to regions that are more sensitive to the human eye, thereby enabling perceptually adaptive bit allocation and quantization control and improving coding RpD performance in the sense of HVS perception.
(2) Time-domain quantization parameter cascading (cutree)
In video coding, due to continuous inter-frame prediction in a group of pictures (GOP), distortion of an I frame and a previous P frame is propagated to a later P frame and a B frame, for inter-frame prediction, the quality of a reference frame obviously has a direct influence on the quality of a current frame, in order to reduce propagation of time-domain distortion and improve the visual quality of a video, the distortion of the I frame and the previous P frame is ensured to be small, and x264 and x265 adopt MBTree and CUTree quantization control algorithms to fully utilize HVS characteristics. This algorithm adaptively allocates a target code rate between coding units (macroblocks and CUs) to smooth temporal distortion fluctuations according to a reference importance weight that is used to adjust the QP of the coding block. Delta QpTempThe calculation formula is as follows:
wherein, thetaTempIs a control strength parameter, ζintraIs an intra-prediction cost, gamma, based on SATDpropagateIs the inter-frame transfer cost, the transfer cost of the current block to the block to which it refers is measured.
(3) Variance Adaptive Quantization (VAQ)
Spatial masking is an important feature of HVS. Distortion of flat regions by the human eye versus high-texture regionsMore sensitively, this property is often used to assist spatial adaptive quantization. Using spatial masking effects, the VAQ algorithm smoothes out distortion fluctuations between adjacent blocks in a flat region of the texture, reducing the blurring effects of a relatively flat region, such as grass on a football field. Perceptual quality improvement for regions with flat texture is achieved at the cost of quality degradation in regions with complex texture, and in general, VAQ works in conjunction with a time-domain Qp cascade algorithm to achieve block-level perceptual quantization control. In x265, the VAQ algorithm has 4 modes. Overall Δ QpVAQThe calculation of (d) is given by:
ΔQpVAQ=θVAQ×(var-varadjust) (6)
wherein, thetaVAQIs to control the intensity parameters, var and varadjustRespectively, the variance and the variance adjustment value of the current block.
(4) Psychovisual rate-distortion optimization (Psyrdo)
Visual studies have shown that the human eye not only wants the reconstructed image to look similar to the original image, but also wants the image to have similar content complexity. That is, we prefer to see a somewhat distorted but still detailed block, rather than a completely blurred block that is not distorted. In the Psyrdo algorithm, SSD is perceptually distorted DRDOInstead, the calculation is as follows:
DRDO=SSD+λpsy_rdo×psyrdo×psycost (7)
wherein psyrdo is a control intensity parameter, λpsy_rdoIs a control parameter related to the quantization parameter, and psycost is the energy difference between the original block and the reconstructed block, and is defined as follows:
(5) psychovisual rate-distortion optimization quantization (psyquant)
In conventional Hard Decision Quantization (HDQ) algorithms, inter-coefficient correlation between adjacent coefficients within a block is not considered. For context coding in CABAC, each Discrete Cosine Transform (DCT) systemThe quantization strength of a number essentially depends not only on how its neighboring DCT coefficients are quantized, but also on how all quantized DCT coefficients are entropy encoded. In view of this problem, Soft Decision Quantization (SDQ) is proposed to achieve coefficient-level rate-distortion optimized quantization. SDQ employs dynamic programming such as the viterbi search algorithm and converts complex multi-effect rate-optimized quantization problems into a grid-based shortest path search problem. RDOQ-based perceptual distortion DquantThe calculation is as follows:
Dquant=diff×diff-psyrdoq×|trec| (9)
the former term is standard SSD, and the latter term is the product of control intensity and reconstructed coefficient obtained after DCT inverse transformation.
2. Inter-module correlation analysis
As described above, frame-level quantization control based on the qcomp domain complexity (qcomp), tree-type time-domain Qp concatenation (cutree), spatial Variance Adaptive Quantization (VAQ), psycho-visual rate-distortion optimization (Psyrdo), and psycho-visual rate-distortion optimized quantization (psyquant) are incorporated into the algorithm framework of perceptual coding. Due to the complex HVS characteristics and interrelationships between modules, it is a challenge how to customize five algorithmic customizable modules in a joint optimization sense. In order to be able to quantitatively test the interaction between modules, the algorithm optimization is actually achieved by selecting the key parameters of these modules, and the relevant parameters involved in each module are listed in table 1. Wherein, the relationship between Curtreestength and qcomp is as follows:
cutreestrength=5×(1-qcomp) (10)
for parameters with continuous values, such as AQstrength, the value range is 0-3, the step length can be discretized to 31 parameter values with the step length of 0.1, and the discretized step length is given in table 1. Assume a total of M parameters, and K is defined for the mth parametermUsing the step sizes in Table 1, the number of parameter combinations can be calculated to be 1.04 × 109((4 × 30+1) × 2 × (6 × 50+1) × (2 × 140+1) × 51), the number is very large, and therefore, a method must be designed to simplify the complex multi-module algorithm optimization problem.
TABLE 1 relevant parameters involved in the five modules
Suppose that the RD performance changes caused by the algorithm modification of the two modules are respectively viV and vjV is shown in FIG. 4i(νj) Is the average BD-VMAF difference for a module i (j) on relative to all perceptual coding tools off. In contrast, ν is the average BD-VMAF difference when modules i and j are turned on together relative to when all perceptual coding tools are turned off. Define the inter-module correlation between two algorithm modules as:
φij=ν-(νi+νj) (11)
through experiments, it is known that the RD performance changes generated by the algorithm of the two modules customized simultaneously are not equal to the RD performance changes generated by the algorithm of the two modules customized separately, so that the influence of the algorithm modification of each module on the performance of the overall coding RD does not satisfy a linear relationship, that is, the performance between the modules is coupled.
Obtaining a difference phi of each moduleijThen, the inter-module correlation level of a single module is defined as:
according to thetaiThe priority level of the algorithm decision of the five modules is determined, and the relevant parameters of each module are optimized in turn according to the priority level from large to small. Thus, the complex multi-module joint optimization problem is reduced to a continuous single-module optimization problem, and the number of parameter combinations in table 1 is also reduced to 756((4 × 30+1) +2+ (6 × 50+1) + (2 × 140+1) + 51).
3. Online adaptive parametric model
Through the analysis of the correlation between the modules, the influence of the algorithm modification of a single module on the performance of the overall code RD does not satisfy the linear relation, and the correlation between two modules has positiveHaving a negative and different video sequence correlation phiijThe size is different, i.e. the size of the correlation depends on the video content. A characteristic parameter omega characterizing the content of an image is determinedjLet var in the formula (6) be ω1Gamma in the formula (5)propagateAs ω2Let PSYCOST in equation (7) be ω3Will be | t in the formula (8)recL as ω4Let Cplx in formula (2) be ω5. Furthermore, in case of different output code rate distributions, different parameter combinations may result in different perceptual coding performances. According to the two results, in order to further improve the result after the off-line optimization in the previous section, on the basis of the parameter combination obtained by the off-line optimization in the previous section, a content adaptive parameter migration model is designed for five non-discrete parameters:
wherein a and b are constants, phiijIs the inter-module correlation between two modules, ωjIs a characteristic parameter, β, characterizing the image content in each moduleiAre parameters used to control the dynamic range of each parameter,is to describe Δ piCode rate change amount delta R corresponding to the code rate change amountiFunction of the relationship between h (ω)j) Is to describe Δ ωjCode rate change amount delta R corresponding to the code rate change amountjA function of the relationship between. The parameter offset model provided by the invention simultaneously considers phiij、ωj、h(ωj) The value of the non-discrete parameter is adaptively adjusted according to the image content, and the combination of a plurality of perceptual coding parameters reaches the optimal value by equivalently reducing the code rate.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. A multilevel perception video coding algorithm optimization method facing to a smart court scene is characterized by comprising the following steps:
s1, constructing a multi-level perception coding frame, realizing perception code rate control by combining content self-adaptive bit allocation, time domain perception quantization control and space domain perception quantization control, realizing mode decision of effective intra-frame and inter-frame prediction by psychovisual rate distortion optimization, and realizing coefficient level perception quantization by coefficient level psychovisual rate distortion optimization quantization;
the content self-adaptive bit allocation is realized by combining the lookup head preanalysis based on the sliding window and the frame-level perception complexity measurement, wherein the frame-level perception complexity measurement is realized by measuring the perception fuzzy complexity of the perception bit allocation and adopting a perception complexity quantization model to obtain a frame-level quantization parameter Qpfrm;
The time domain perception quantization control analyzes content self-adaptive bit allocation through time domain Qp cascade, obtains delta Qp of self-adaptive regulation Qp for time domain fine quantization control based on video content analysis of lookup head preanalysisTemp;
The space-domain perception quantization control quantifies and adjusts the parameter delta Qp by utilizing the space masking effect and through the content self-adaptive bit distributionVAQSelf-adaptively adjusting the Qp to carry out space-domain fine-grained quantization control;
finally, the final quantization parameter delta Qp is obtainedfinal:
ΔQpfinal=Qpfrm+ΔQpTemp+ΔQpVAQ (1)
The psycho-visual rate-distortion optimization through improved perceptual distortionQuantity DRDOReplacing the traditional MSE;
the psycho-visual rate-distortion optimization quantization is measured by perceptual distortion DquantReplacing the traditional MSE;
s2, analyzing the correlation among the modules, measuring quantitative parameters of the correlation among the modules, evaluating the influence degree between the two algorithm modules, converting the complex multi-module optimization into continuous single-module optimization by selecting key control parameters, and determining the algorithm decision sequence of a plurality of customizable modules;
s3, constructing an online self-adaptive parameter model, and constructing a content self-adaptive parameter calculation model by utilizing the intrinsic relevance among the modules.
2. The method of claim 1, wherein the content adaptive bit allocation is frame-level bit allocation and quantization control, the perceptual content complexity is measured using qcomp domain compression model, and the original complexity Cplx is measured using Cplx1-qcompThe model is compressed, qcomp is a compression constant, and the quantization step size qscale of frame-level scaling is dynamically estimated by adjusting a code rate scaling factor Rfactor:
the quantization parameter Qp is mapped to qscale by equation (3):
where c is a constant, the frame-level quantization parameter Qp is obtained according to equations (2) and (3)frm:
3. The method of claim 1, wherein the temporal perceptual quantization control is a temporal quantization parameter cascade, adaptively allocating target code rates among the coding units according to a reference importance weight, the reference importance weight being used to adjust the QP, Δ QP, of the coding blockTempThe calculation formula is as follows:
wherein, thetaTempIs a control strength parameter, ζ, of a quantitative control algorithmintraIs an intra-prediction cost, gamma, based on SATDpropagateIs the inter-frame transfer cost, the transfer cost of the current block to the block to which it refers is measured.
4. The method of claim 1, wherein the spatial perceptual quantization control is a variance adaptive quantization, overall Δ QpVAQIs calculated as follows:
ΔQpVAQ=θVAQ×(var-varadjust) (6)
wherein, thetaVAQControl strength parameters, var and var, which are variance adaptive quantizationadjustRespectively, the variance and the variance adjustment value of the current block.
5. The method of claim 1, wherein the psychovisual rate-distortion optimization is J'1=DRDO+λ1×R1Replace the conventional RDO encoding J1=D1+λ1×R1,λ1Lagrange factor, R, representing psycho-visual rate-distortion optimization1A coding bit number representing a psycho-visual rate-distortion optimization;
the psycho-visual rate distortion optimization is performed in a Psyrdo algorithmSSD is perceived distortion DRDOInstead, the calculation is as follows:
DRDO=SSD+λpsy_rdo×psyrdo×psycost (7)
wherein λ ispsy_rdoIs a control parameter related to the quantization parameter, psyro is a control strength parameter for psycho-visual rate-distortion optimization, and psycost is the energy difference between the original block and the reconstructed block, defined as follows:
where SATD, SAD are used to measure block complexity distortion and the subscripts rec and ori denote reconstructed and original blocks, respectively.
6. The method of claim 1, wherein the psycho-visual rate-distortion optimization quantization is J'2=Dquant+2λ×2R replaces the conventional RDO code J2=D2+λ2×R2,λ2Lagrange factor, R, representing psycho-visual rate-distortion optimized quantization2A coding bit number representing psycho-visual rate-distortion optimized quantization;
the psycho-visual rate-distortion optimization quantization is based on the perception distortion D of the RDOQquantThe calculation is as follows:
Dquant=diff×diff-psyrdoq×|trec| (9)
where diff × diff is the standard SSD, psyrdoq × | trecAnd | is the product of the control intensity of the psycho-visual rate-distortion optimization quantization and the reconstruction coefficient obtained after DCT inverse transformation.
7. The method as claimed in claim 1, wherein the inter-module correlation analysis is based on algorithm modules of content adaptive bit allocation, temporal perceptual quantization control, spatial perceptual quantization control, psycho-visual rate distortion optimization and psycho-visual rate distortion optimization quantization, and the interaction between the algorithm modules is quantitatively tested by key parameters of the algorithm modules.
8. The method as claimed in claim 7, wherein the relationship between the curtreestengh parameter for temporal perceptual quantization control and the qcomp parameter for content adaptive bit allocation is:
cutreestrength=5×(1-qcomp) (10)
wherein Curtreestength is a control strength parameter of a quantitative control algorithm, and qcomp is a compression constant.
9. The intelligent court-oriented multi-level perceptual video coding algorithm optimization method of claim 7, wherein for parameters with continuous values, the value range is discretized into a plurality of parameter values by setting discrete step sizes, and the RD performance change amounts caused by the modification of the two algorithm modules are respectively νiV and vj,νiOr vjThe average BD-VMAF difference value is obtained when the module i or j is opened and is closed relative to all the perception coding tools, and compared with the average BD-VMAF difference value when the module i and j are opened and is closed relative to all the perception coding tools, the module correlation between the two algorithm modules is defined as follows:
φij=ν-(νi+νj) (11)。
obtaining a difference phi between each algorithm moduleijThen, the inter-module correlation level of a single module is defined as:
according to thetaiThe priority level of each algorithm module decision is determined according to the size of the algorithm module, and the related parameters of each module are optimized in sequence according to the priority level from large to small.
10. The method of claim 7, wherein the online adaptive parametric model is constructed by characterizing a characteristic parameter ω of image contentjLet var in the formula (6) be ω1Gamma in the formula (5)propagateAs ω2Let PSYCOST in equation (7) be ω3Will be | t in the formula (9)recL as ω4Let Cplx in formula (2) be ω5And on the basis of parameter combinations obtained by off-line optimization, constructing a content adaptive parameter migration model for five non-discrete parameters:
wherein a and b are constants, phiijIs the inter-module correlation between two modules, ωjIs a characteristic parameter, β, characterizing the image content in each moduleiIs a parameter for controlling the dynamic range of each parameter, θiIs to describe Δ piCode rate change amount delta R corresponding to the code rate change amountiFunction of the relationship between h (ω)j) Is to describe Δ ωjCode rate change amount delta R corresponding to the code rate change amountjA function of the relationship between, while taking into account phiij、ωj、θi、h(ωj) The influence of (c).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110384146.0A CN113099226B (en) | 2021-04-09 | 2021-04-09 | Multi-level perception video coding algorithm optimization method for smart court scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110384146.0A CN113099226B (en) | 2021-04-09 | 2021-04-09 | Multi-level perception video coding algorithm optimization method for smart court scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113099226A true CN113099226A (en) | 2021-07-09 |
CN113099226B CN113099226B (en) | 2023-01-20 |
Family
ID=76675918
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110384146.0A Active CN113099226B (en) | 2021-04-09 | 2021-04-09 | Multi-level perception video coding algorithm optimization method for smart court scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113099226B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115103186A (en) * | 2022-06-20 | 2022-09-23 | 北京大学深圳研究生院 | Code rate control method and device, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109120934A (en) * | 2018-09-25 | 2019-01-01 | 杭州电子科技大学 | A kind of frame level quantization parameter calculation method suitable for HEVC Video coding |
US20190082182A1 (en) * | 2017-09-08 | 2019-03-14 | Université de Nantes | Method and device for encoding dynamic textures |
US20190281302A1 (en) * | 2018-03-12 | 2019-09-12 | Nvidia Corporation | Ssim-based rate distortion optimization for improved video perceptual quality |
CN110493597A (en) * | 2019-07-11 | 2019-11-22 | 同济大学 | A kind of efficiently perception video encoding optimization method |
CN110944199A (en) * | 2019-11-28 | 2020-03-31 | 华侨大学 | Screen content video code rate control method based on space-time perception characteristics |
CN111193931A (en) * | 2018-11-14 | 2020-05-22 | 深圳市中兴微电子技术有限公司 | Video data coding processing method and computer storage medium |
CN112004084A (en) * | 2019-05-27 | 2020-11-27 | 北京君正集成电路股份有限公司 | Code rate control optimization method and system by utilizing quantization parameter sequencing |
-
2021
- 2021-04-09 CN CN202110384146.0A patent/CN113099226B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190082182A1 (en) * | 2017-09-08 | 2019-03-14 | Université de Nantes | Method and device for encoding dynamic textures |
US20190281302A1 (en) * | 2018-03-12 | 2019-09-12 | Nvidia Corporation | Ssim-based rate distortion optimization for improved video perceptual quality |
CN109120934A (en) * | 2018-09-25 | 2019-01-01 | 杭州电子科技大学 | A kind of frame level quantization parameter calculation method suitable for HEVC Video coding |
CN111193931A (en) * | 2018-11-14 | 2020-05-22 | 深圳市中兴微电子技术有限公司 | Video data coding processing method and computer storage medium |
CN112004084A (en) * | 2019-05-27 | 2020-11-27 | 北京君正集成电路股份有限公司 | Code rate control optimization method and system by utilizing quantization parameter sequencing |
CN110493597A (en) * | 2019-07-11 | 2019-11-22 | 同济大学 | A kind of efficiently perception video encoding optimization method |
CN110944199A (en) * | 2019-11-28 | 2020-03-31 | 华侨大学 | Screen content video code rate control method based on space-time perception characteristics |
Non-Patent Citations (1)
Title |
---|
杨桐等: "融合视觉感知特性的HDR视频编码率失真优化算法", 《光电工程》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115103186A (en) * | 2022-06-20 | 2022-09-23 | 北京大学深圳研究生院 | Code rate control method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113099226B (en) | 2023-01-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104539962B (en) | It is a kind of merge visually-perceptible feature can scalable video coding method | |
CN103179394B (en) | A kind of based on area video quality stable I frame bit rate control method | |
Huang et al. | Perceptual rate-distortion optimization using structural similarity index as quality metric | |
CN108989802B (en) | HEVC video stream quality estimation method and system by utilizing inter-frame relation | |
KR20080042827A (en) | A video encoding system and method for providing content adaptive rate control | |
CN110996102B (en) | Video coding method and device for inhibiting intra-frame block respiration effect in P/B frame | |
CN109286812B (en) | HEVC video quality estimation method | |
US20060256856A1 (en) | Method and system for testing rate control in a video encoder | |
EP3545677A1 (en) | Methods and apparatuses for encoding and decoding video based on perceptual metric classification | |
GB2459671A (en) | Scene Change Detection For Use With Bit-Rate Control Of A Video Compression System | |
Pan et al. | Frame-level Bit Allocation Optimization Based on<? brk?> Video Content Characteristics for HEVC | |
WO2024082580A1 (en) | Low-complexity panoramic video encoding method considering time-domain distortion propagation | |
US20160353107A1 (en) | Adaptive quantization parameter modulation for eye sensitive areas | |
CN112825557A (en) | Self-adaptive sensing time-space domain quantization method aiming at video coding | |
CN116916036A (en) | Video compression method, device and system | |
Zhang et al. | An adaptive Lagrange multiplier determination method for rate-distortion optimisation in hybrid video codecs | |
Sanchez | Rate control for predictive transform screen content video coding based on RANSAC | |
CN113099226B (en) | Multi-level perception video coding algorithm optimization method for smart court scene | |
Ma et al. | An adaptive lagrange multiplier determination method for dynamic texture in HEVC | |
Minoo et al. | Perceptual video coding with H. 264 | |
WO2008079353A1 (en) | Scaling the complexity of video encoding | |
CN110800298A (en) | Code rate allocation method, code rate control method, encoder, and recording medium | |
Chen et al. | CNN-based fast HEVC quantization parameter mode decision | |
CN111246218B (en) | CU segmentation prediction and mode decision texture coding method based on JND model | |
CN109618155B (en) | Compression encoding method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |