CN113099226A

CN113099226A - Multi-level perception video coding algorithm optimization method for smart court scene

Info

Publication number: CN113099226A
Application number: CN202110384146.0A
Authority: CN
Inventors: 殷海兵; 周华健
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-04-09
Filing date: 2021-04-09
Publication date: 2021-07-09
Anticipated expiration: 2041-04-09
Also published as: CN113099226B

Abstract

The invention discloses a multi-level perceptual video coding algorithm optimization method for a smart court scene, comprising the following steps: S1, constructing a multi-level perceptual coding framework, by combining content adaptive bit allocation, time-domain perceptual quantization control and spatial-domain perceptual quantization control, Realize perceptual rate control, realize effective intra-frame and inter-frame prediction mode decision through psychovisual rate-distortion optimization, realize coefficient-level perceptual quantization through coefficient-level psychovisual rate-distortion optimization and quantization; S2, inter-module correlation analysis, measurement Quantitative parameters of inter-module correlation, evaluating the degree of influence between two algorithm modules, by selecting key control parameters, transforming complex multi-module optimization into continuous single-module optimization, and determining the algorithm decision-making order of multiple customizable modules; S3 , constructing an online adaptive parameter model, and constructing a content-adaptive parameter calculation model by using the internal correlation between modules.

Description

Multi-level perception video coding algorithm optimization method for smart court scene

Technical Field

The invention relates to the field of video coding, in particular to a multilevel perception video coding algorithm optimization method facing to a smart court scene.

Background

A plurality of business links such as intelligent court trial, mediation and execution need to record videos, and browsing, analyzing and intelligent processing are carried out according to the videos so as to improve the intelligent level of court business. Due to a plurality of cases, video storage management faces a large pressure, and video data are continuously compressed further so as to improve the video processing capacity of court business.

Video coding standards have greatly accelerated video applications over the past thirty years. Video compression is achieved by removing redundancy in the original video sequence, and signal processing-based video coding techniques are approaching the upper compression limit, at the cost of an exponential increase in computational complexity. In comparison, there is a potential to mine in further eliminating perceptual redundancy. The Human Visual System (HVS) is the ultimate judger of visual quality for reconstructed video and it has several important perceptual properties that can be exploited to improve coding performance without significantly degrading perceptual quality.

In the aspect of customizing and optimizing a video coding algorithm, the video coding algorithm has two stages of tasks, namely an algorithm framework (algorithm control flow) and key algorithm parameter selection. The former task is to design what algorithms are used, such as full search, three-step search and diamond search in the motion estimation module. The latter task is to determine the optimal control parameters, e.g. select the search range, given the reference pixel precision of the diamond search algorithm, by balancing the rate-distortion performance and the computational complexity, given the algorithm flow. Generally, these two tasks are jointly considered in single-module-level algorithm optimization, in which case a plurality of algorithm-customizable modules should be jointly optimized, which is a very complex problem. Rate Distortion Optimization (RDO) is widely used as a theoretical basis for algorithm optimization of multi-level customizable modules in video encoders, including rate control, mode decision, motion estimation, transformation, and quantization. Complex inter-module relationships exist between these interacting modules.

The current research only carries out intensive research on algorithm optimization of a single module, such as quantization, mode decision, motion estimation and code rate control. From the perspective of depth optimization, the academic community still lacks research on multi-level perceptual coding, and the joint optimization of inter-module correlation analysis and multi-level perceptual coding is still elusive in open literature, including the following technical problems:

(1) a multi-level perceptual coding algorithm framework is lacked;

(2) complex relationships exist among all algorithm modules of the multi-level perception coding framework, and how to quantify the relevance among the modules is determined to evaluate the influence degree among the modules;

(3) how to determine the algorithm decision sequence of a plurality of customizable modules through a method and realize the conversion of a complex multi-module optimization problem into a continuous single-module optimization problem through selecting a series of key control parameters.

(4) How to utilize the intrinsic relevance among the modules and put forward a content self-adaptive parameter calculation model to realize the online self-adaptive multi-module combined optimization.

Disclosure of Invention

In order to solve the defects of the prior art and achieve the purpose of improving the perception quality of the video, the invention adopts the following technical scheme:

a multilevel perception video coding algorithm optimization method facing to a smart court scene comprises the following steps:

s1, constructing a multi-level perception coding frame, realizing perception code rate control by combining content self-adaptive bit allocation, time domain perception quantization control and space domain perception quantization control, realizing mode decision of effective intra-frame and inter-frame prediction in the perception RDO sense by psychovisual rate distortion optimization, and realizing coefficient level perception quantization by coefficient level psychovisual rate distortion optimization quantization;

the content self-adaptive bit allocation is realized by combining a lookup head pre-analysis based on a sliding window and frame-level perception complexity measurement, wherein the pre-analysis adopts simplified motion estimation and mode decision to track the space-time characteristics of a coded video, and the frame-level perception complexity measurement is realized by measuring the perception fuzzy complexity of perception bit allocation and adopting a perception complexity quantization model to obtain a frame-level quantization parameter Qp_frmRealizing frame-level quantization control;

the time-domain perceptual quantization control analyzes content-adaptive bit allocation through time-domain Qp concatenation to reduce time-domain distortion fluctuations, which is achieved with the help of video content analysis based on lookup head pre-analysis, and obtains Δ Qp for adaptive adjustment of Qp for time-domain fine quantization control_Temp；

The spatial domain perceptual quantization control is illustrated by a content adaptive bit allocation using a spatial masking effect, which employs a quantization adjustment parameter Δ Qp_VAQSelf-adaptively adjusting the Qp to carry out space-domain fine-grained quantization control;

finally, the final quantization parameter delta Qp is obtained_final：

ΔQp_final＝Qp_frm+ΔQp_Temp+ΔQp_VAQ (1)

The psycho-visual rate-distortion optimization is performed by an improved perceptual distortion metric D_RDOReplacing the traditional MSE;

the psycho-visual rate-distortion optimization quantization is measured by perceptual distortion D_quantReplacing the traditional MSE;

s2, analyzing correlation among modules, measuring quantitative parameters of the correlation among the modules, evaluating the influence degree between two algorithm modules, converting a complex multi-module optimization problem into a continuous single-module optimization problem by selecting a series of key control parameters, determining the algorithm decision sequence of a plurality of customizable modules, and providing a scheme for searching parameter sets by balancing rate-distortion performance;

s3, constructing an online self-adaptive parameter model, and constructing a content self-adaptive parameter calculation model by utilizing the intrinsic relevance among the modules.

Further, the content adaptive bit allocation is frame-level bit allocation and quantization control, complexity adaptive bit allocation, frame-level quantization control achieved by exploiting coarse-grained HVS features including temporal contrast sensitivity function and temporal masking effect, inherited from empirical models employed in open-source MPEG-4xvid and h.264avc x264 encoders, which work measures perceptual content complexity using a qcomp domain compression model, the original being based on a qcomp domain compression modelComplexity of SATD Cplx Using Cplx¹ ^-qcompThe model is compressed, qcomp is at [0.5,1 ]]And (3) a compression constant for compressing Cplx, dynamically estimating a quantization step size qscale for frame-level scaling by adjusting a rate scaling factor Rfactor:

in the h.264/AVC and h.265/HEVC standards, the quantization parameter Qp is mapped to qscale by equation (3):

where c is a constant, the frame-level quantization parameter Qp is obtained according to equations (2) and (3)_frm：

The fuzzy complexity compression model is inspired by time-domain HVS characteristics, and areas with high fuzzy complexity have complex textures or high-motion areas. The HVS is insensitive to high frequency component distortion in these regions. Thus, the distortion of these regions is relatively imperceptible, i.e., these complex regions can hide larger coding distortion, the target code rates allocated to these complex regions will be reduced by the compression complexity, and these saved code rates can be allocated to regions that are more sensitive to the human eye, thereby enabling perceptually adaptive bit allocation and quantization control and improving coding RpD performance in the sense of HVS perception.

Further, the temporal perceptual quantization control is a temporal quantization parameter concatenation, in video coding, the distortion of an I-frame and a previous P-frame is propagated to a following P-frame and a B-frame due to successive inter-prediction in a group of pictures (GOP), for which inter-prediction the quality of a reference frame obviously has a direct impact on the quality of the current frame, in order to reduce temporal distortion propagation to improve the videoThe visual quality of the frame I and the previous P frame is ensured to be small in distortion, and the x264 and the x265 adopt MBTree and CUTree quantization control algorithms to fully utilize HVS characteristics and weight according to the reference importance, wherein the parameter comprises a parameter theta_Temp、ζ_intra、γ_propagateAdaptively allocating target code rates between coding units (macroblock and CU) to smooth temporal distortion fluctuation, and using reference importance weights to adjust QP, Δ Qp, of coding blocks_TempThe calculation formula is as follows:

wherein, theta_TempIs a control strength parameter, ζ, of a quantitative control algorithm_intraIs an intra-prediction cost, gamma, based on SATD_propagateIs the inter-frame transfer cost, the transfer cost of the current block to the block to which it refers is measured.

Further, the spatial domain perception quantization control is variance adaptive quantization, spatial masking is an important feature of HVS, human eyes are more sensitive to distortion of flat regions than distortion of high texture regions, the feature is usually used for assisting the spatial adaptive quantization, a VAQ algorithm smoothes distortion fluctuation between adjacent blocks in the flat regions of textures and reduces blurring effect of the relatively flat regions by using the spatial domain masking effect, the perception quality improvement of the regions with flat textures is realized at the cost of quality degradation in the regions with complex textures, the VAQ is cooperated with a time domain Qp cascade algorithm to realize block-level perception quantization control, and in x265, the VAQ algorithm has 4 modes, and overall Δ Qp is a function of the overall Δ Qp_VAQIs calculated as follows:

ΔQp_VAQ＝θ_VAQ×(var-var_adjust) (6)

wherein, theta_VAQControl strength parameters, var and var, which are variance adaptive quantization_adjustRespectively, the variance and the variance adjustment value of the current block.

Further, the psycho-visual rate-distortion optimization is J'₁＝D_RDO+λ₁×R₁Replace the conventional RDO code J₁＝D₁+λ₁×R₁，λ₁Lagrange factor, R, representing psycho-visual rate-distortion optimization₁Representing the number of coding bits for psycho-visual rate-distortion optimization, so that a perceptual RDO-based mode decision can be implemented to determine a perceptually optimal coding mode from the candidate modes;

the psycho-visual rate distortion optimization (Psyrdo), visual studies show that the human eye not only wants the reconstructed image to look similar to the original image, but also wants the image to have similar content complexity, i.e. we prefer to see a somewhat distorted but still detailed block instead of a block that is not distorted but completely blurred, in the Psyrdo algorithm, SSD (Single Shot multiple box Detector) is perceptually distorted D_RDOInstead, the calculation is as follows:

D_RDO＝SSD+λ_{psy_rdo}×psyrdo×psycost (7)

wherein λ is_{psy_rdo}Is a control parameter related to the quantization parameter, psyro is a control strength parameter for psycho-visual rate-distortion optimization, and psycost is the energy difference between the original block and the reconstructed block, defined as follows:

where SATD, SAD are used to measure block complexity distortion and the subscripts rec and ori denote reconstructed and original blocks, respectively.

Further, the psycho-visual rate-distortion optimization quantization is to J'₂＝D_quant+λ₂×R₂Replace the conventional RDO encoding J₂＝D₂+λ₂×R₂，λ₂Lagrange factor, R, representing psycho-visual rate-distortion optimized quantization₂A coding bit number representing psycho-visual rate-distortion optimized quantization;

the psycho-visual rate-distortion optimized quantization (psyquant), in a conventional hard-decision quantization (HDQ) algorithm, does not consider inter-coefficient between adjacent coefficients within a blockCorrelation, for context coding in CABAC, the quantization strength of each Discrete Cosine Transform (DCT) coefficient essentially depends not only on how its neighboring DCT coefficients are quantized, but also on how all quantized DCT coefficients are entropy coded. Soft Decision Quantization (SDQ) is thus proposed to achieve coefficient-level rate-distortion optimized quantization, SDQ employing dynamic programming such as the viterbi search algorithm and converting a complex multi-effect rate optimized quantization problem into a grid-based shortest path search problem, RDOQ-based perceptual distortion D_quantThe calculation is as follows:

D_quant＝diff×diff-psyrdoq×|t_rec| (9)

where diff × diff is the standard SSD, psyrdoq × | t_recAnd | is the product of the control intensity of the psycho-visual rate-distortion optimization quantization and the reconstruction coefficient obtained after DCT inverse transformation.

Further, the inter-module correlation analysis is based on algorithm modules of content adaptive bit allocation (frame-level quantization control of qcomp domain complexity), time domain perceptual quantization control (tree-shaped time domain Qp cascade cube), spatial domain perceptual quantization control (VAQ), psychovisual rate distortion optimization (Psyrdo) and psychovisual rate distortion optimization quantization (psyquant), and mutual influence between the algorithm modules is quantitatively tested through key parameters of the algorithm modules.

Further, the relationship between the parameter curtreestength of the time-domain perceptual quantization control and the parameter qcomp of the content adaptive bit allocation is as follows:

cutreestrength＝5×(1-qcomp) (10)

wherein Curtreestength is a control strength parameter of a quantitative control algorithm, and qcomp is a compression constant.

Furthermore, for parameters with continuous values, the value range is discretized into a plurality of parameter values by setting discrete step length, and since the number of calculated parameter combinations is huge, a method must be designed to simplify the complicated multi-module algorithm optimization problem, and the RD performance change amounts caused by the modification of two algorithm modules are respectively ν_iV and v_j，ν_iOr v_jIs the module i or j is turned on relative to allAnd v is the average BD-VMAF difference value when the modules i and j are opened together relative to the average BD-VMAF difference value when all the perception coding tools are closed, and the correlation between the two algorithm modules is defined as follows:

φ_ij＝ν-(ν_i+ν_j) (11)。

through experiments, it is known that the RD performance changes generated by the algorithm of the two modules customized simultaneously are not equal to the RD performance changes generated by the algorithm of the two modules customized separately, so that the influence of the algorithm modification of each module on the performance of the overall coding RD does not satisfy a linear relationship, that is, the performance between the modules is coupled.

Obtaining a difference phi between each algorithm module_ijThen, the inter-module correlation level of a single module is defined as:

according to theta_iThe priority level of each algorithm module decision is determined according to the size of the algorithm module, and the related parameters of each module are optimized in sequence according to the priority level from large to small. Therefore, the complex multi-module joint optimization problem is simplified into a continuous single-module optimization problem, and the number of parameter combinations is greatly reduced.

Furthermore, the construction of the online adaptive parameter model is carried out, and the analysis of the correlation among the modules shows that the influence of the algorithm modification of a single module on the performance of the overall coding RD does not satisfy the linear relation, the correlation between two modules has positive or negative, and the correlation phi of different video sequences is positive or negative_ijThe size of the correlation depends on the video content, and the characteristic parameter omega for representing the image content_jLet var in the formula (6) be ω₁Gamma in the formula (5)_propagateAs ω₂Let PSYCOST in equation (7) be ω₃Will be | t in the formula (9)_recL as ω₄Let Cplx in formula (2) be ω₅Furthermore, in the case where the output code rate distribution is differentAccording to the two results, in order to further improve the result after the off-line optimization, on the basis of the parameter combination obtained by the off-line optimization, a content adaptive parameter offset model is constructed for five non-discrete parameters:

wherein a and b are constants, phi_ijIs the inter-module correlation between two modules, ω_jIs a characteristic parameter, β, characterizing the image content in each module_iAre parameters used to control the dynamic range of each parameter,

is to describe Δ p_iCode rate change amount delta R corresponding to the code rate change amount_iFunction of the relationship between h (ω)_j) Is to describe Δ ω_jCode rate change amount delta R corresponding to the code rate change amount_jA function of the relationship between, while taking into account phi_ij、ω_j、

h(ω_j) The value of the non-discrete parameter is adaptively adjusted according to the image content, and the combination of a plurality of perceptual coding parameters reaches the optimal value by equivalently reducing the code rate.

The invention has the advantages and beneficial effects that:

(1) the invention considers perceptual coding from the perspective of multi-level joint optimization. On one hand, perceptual video coding is realized from low to high layers, and quantization, code rate control, mode selection and the like are carried out; on the other hand, complex correlations between multi-level customizable algorithm modules are quantitatively studied.

(2) The algorithm decision priorities of all modules are determined according to the correlation level between the modules of each module, and the complex multi-module joint optimization problem can be simplified into a continuous single-module optimization problem. The difficulty in realizing the optimization of the multi-level perception video coding algorithm is greatly reduced.

(3) By utilizing the inherent relevance among the modules, a content self-adaptive parameter calculation model is provided, and the online self-adaptive multi-module combined optimization is realized.

(4) Compared with the Slow presetting of x265, the multi-module optimization method based on the correlation between the modules can obtain better visual perception quality under the condition of the same given code rate.

Drawings

Fig. 1 is a flow chart of the multi-level perceptual video coding optimization of the present invention.

FIG. 2a is a diagram of the relationship between multi-level perceptual coding tools in the present invention.

FIG. 2b is a schematic diagram of the relationship between multi-level perceptual coding tools in the present invention.

FIG. 3 is a diagram of a multi-level perceptual coding framework according to the present invention.

FIG. 4 is a diagram of the relationship between modules in the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.

As shown in fig. 1, a multilevel perceptual video coding algorithm optimization method oriented to a smart court scenario first proposes an algorithm framework through inheritance and integration development of a mature algorithm, and proposes a multilevel perceptual coding framework through an algorithm process of inheriting maturity and full verification for all customizable modules, including frame-level bit allocation and quantization control (qcomp), time-domain quantization parameter cascade (cutree), spatial Variance Adaptive Quantization (VAQ), psycho-visual rate distortion optimization (psydo), and psycho-visual rate distortion optimization quantization (psydo).

Secondly, a quantitative parameter for measuring the correlation between the modules is provided for evaluating the influence degree between the two algorithm modules, complex relations exist between the five perception coding modules, and in order to quantitatively test the mutual influence between the modules, algorithm optimization is actually realized by selecting key parameters of the modules, so that the correlation level between the modules of each module is calculated.

Thirdly, a new method is provided for determining the algorithm decision sequence of a plurality of customizable modules, the relevant parameters of each module are optimized in sequence from large to small according to the relevance levels of the five module algorithm decisions and the priority levels, the complex multi-module optimization problem is converted into a continuous single-module optimization problem by selecting a series of key control parameters, and a scheme for searching parameter sets by balancing the rate-distortion performance is provided;

finally, a content adaptive parameter calculation model is designed by utilizing the inherent relevance among the modules, the relevance among the modules is different for different test sequences, the rate-sensing distortion (RpD) performance is related to the content of the video, so that the characteristic parameters related to the video content are extracted from five sensing coding modules, and the content adaptive parameter calculation model is designed by considering the influence of the code rate on the coding performance of different parameter combinations.

1. Multi-level perceptual coding framework

There are a number of algorithm customization modules that have no implementation details specified by the video standard, including coefficient level quantization, block level mode decision, and frame and GOP level rate control, among others. Quantization directly determines the coding rate and distortion, affecting the rate-distortion quantization behavior, which is very important for evaluating lagrangian coding costs in mode decision and rate control. The mode decision and rate control are also highly correlated, rate control aims at determining the quantization parameter chain in the space-time domain, thus determining distortion distribution and coding rate consumption distribution, and accurate rate control should be achieved by effective perceptual content adaptive bit allocation. In RD optimization mode decision, the lagrangian multiplier is typically dependent on the quantization parameter determined by the rate control. As shown in fig. 2a and b, in order to achieve the relationship between modules which are dependent on each other, a plurality of customizable modules affect the coding performance together, a complex relationship exists between the customizable modules, and the inter-module influence mechanism between the modules is crucial to the simultaneous optimization of a multi-module algorithm. However, this inherent mechanism of operation is very complex and the spatial-temporal rate distortion propagation exacerbates this difficulty. Different modules have different algorithm customization priorities according to the degree of association between the modules. If the dynamic optimization is applied to the performance optimization of the multilevel customizable module, the calculation complexity is too high to realize in real time, and an algorithm customization method with suboptimal but acceptable complexity is required.

Here we propose a framework for multi-level perceptual coding by inheriting a mature and well-validated algorithm flow for all customizable modules, as shown in fig. 3. Perceptual rate control is achieved by combining content-adaptive bit allocation (abbreviated qcomp), time-domain perceptual quantization control (quantization parameter concatenation, abbreviated cutree), and spatial-domain perceptual quantization control (variance-adaptive quantization, abbreviated VAQ). Content adaptive bit allocation is achieved by combining sliding window based lookahead analysis and frame level perceptual complexity measurement. The pre-analysis module employs simplified motion estimation and mode decision to track spatio-temporal features of the video to be encoded. The statistical information is used to measure the perceptual ambiguity complexity of the perceptual bit allocation. Obtaining a frame-level quantization parameter Qp by employing a perceptual complexity quantization model_frmFrame-level quantization control is realized. The time domain Qp cascade analyzes the content adaptive bit allocation to reduce time domain distortion fluctuations. It is implemented with the help of a lookup head based video content analysis and obtains a delta Qp for adaptive adjustment of Qp for temporal fine quantization control_Temp. In addition, by exploiting the spatial masking effect, the VAQ is used to account for spatial quantization control through content adaptive bit allocation, which employs a quantization adjustment parameter Δ Qp_VAQAnd self-adaptively adjusting the Qp to perform spatial fine-grained quantization control. The final quantization parameter Δ Qp_finalThe calculation is as follows:

ΔQp_final＝Qp_frm+ΔQp_Temp+ΔQp_VAQ (1)

psycho-visual rate-distortion optimization, abbreviated as psy-RDO, for implementing mode decisions for efficient intra and inter prediction in the sense of perceptual RDO by using an improved perceptual distortion metric D_RDOReplacing the conventional MSE. Therefore, the first and second electrodes are formed on the substrate,conventional RDO encoding cost J₁＝D₁+λ₁×R₁Is J'₁＝D_RDO+λ₁×R₁Instead, a perceptual RDO-based mode decision may thus be implemented to determine a perceptually optimal coding mode from the candidate modes. Similarly, coefficient-level psycho-visual rate-distortion optimized quantization, abbreviated as psy-quant, is used to implement coefficient-level perceptual quantization, a perceptual distortion measure D_quantIs used to replace the conventional MSE. Similarly, the conventional RDO cost J₂＝D₂+λ₂×R₂Is of similar J'₂＝D_quant+λ₂×R₂And (4) substituting. These algorithm modules are analyzed in detail below.

(1) Frame-level bit allocation and quantization control (qcomp)

Complexity adaptive bit allocation achieves frame-level quantization control by exploiting coarse-grained HVS characteristics including temporal contrast sensitivity functions and temporal masking effects. Inherited from empirical models employed in open source MPEG-4xvid and H.264AVC x264 encoders, this work measured perceptual content complexity using a qcomp domain compression model, and the original SATD-based complexity Cplx used Cplx^1-qcompThe model is compressed and then the frame-level scaled quantization step qscale is dynamically estimated by adjusting the rate scaling factor Rfactor:

in the H.264/AVC and H.265/HEVC standards, the quantization parameter Qp is formulated by

Mapping to qscale. Here, c is a constant. According to equations (2) and (3), the frame-level quantization parameter Qp_frmThe calculation formula of (a) is as follows:

(2) Time-domain quantization parameter cascading (cutree)

In video coding, due to continuous inter-frame prediction in a group of pictures (GOP), distortion of an I frame and a previous P frame is propagated to a later P frame and a B frame, for inter-frame prediction, the quality of a reference frame obviously has a direct influence on the quality of a current frame, in order to reduce propagation of time-domain distortion and improve the visual quality of a video, the distortion of the I frame and the previous P frame is ensured to be small, and x264 and x265 adopt MBTree and CUTree quantization control algorithms to fully utilize HVS characteristics. This algorithm adaptively allocates a target code rate between coding units (macroblocks and CUs) to smooth temporal distortion fluctuations according to a reference importance weight that is used to adjust the QP of the coding block. Delta Qp_TempThe calculation formula is as follows:

wherein, theta_TempIs a control strength parameter, ζ_intraIs an intra-prediction cost, gamma, based on SATD_propagateIs the inter-frame transfer cost, the transfer cost of the current block to the block to which it refers is measured.

(3) Variance Adaptive Quantization (VAQ)

Spatial masking is an important feature of HVS. Distortion of flat regions by the human eye versus high-texture regionsMore sensitively, this property is often used to assist spatial adaptive quantization. Using spatial masking effects, the VAQ algorithm smoothes out distortion fluctuations between adjacent blocks in a flat region of the texture, reducing the blurring effects of a relatively flat region, such as grass on a football field. Perceptual quality improvement for regions with flat texture is achieved at the cost of quality degradation in regions with complex texture, and in general, VAQ works in conjunction with a time-domain Qp cascade algorithm to achieve block-level perceptual quantization control. In x265, the VAQ algorithm has 4 modes. Overall Δ Qp_VAQThe calculation of (d) is given by:

ΔQp_VAQ＝θ_VAQ×(var-var_adjust) (6)

wherein, theta_VAQIs to control the intensity parameters, var and var_adjustRespectively, the variance and the variance adjustment value of the current block.

(4) Psychovisual rate-distortion optimization (Psyrdo)

Visual studies have shown that the human eye not only wants the reconstructed image to look similar to the original image, but also wants the image to have similar content complexity. That is, we prefer to see a somewhat distorted but still detailed block, rather than a completely blurred block that is not distorted. In the Psyrdo algorithm, SSD is perceptually distorted D_RDOInstead, the calculation is as follows:

D_RDO＝SSD+λ_{psy_rdo}×psyrdo×psycost (7)

wherein psyrdo is a control intensity parameter, λ_{psy_rdo}Is a control parameter related to the quantization parameter, and psycost is the energy difference between the original block and the reconstructed block, and is defined as follows:

(5) psychovisual rate-distortion optimization quantization (psyquant)

In conventional Hard Decision Quantization (HDQ) algorithms, inter-coefficient correlation between adjacent coefficients within a block is not considered. For context coding in CABAC, each Discrete Cosine Transform (DCT) systemThe quantization strength of a number essentially depends not only on how its neighboring DCT coefficients are quantized, but also on how all quantized DCT coefficients are entropy encoded. In view of this problem, Soft Decision Quantization (SDQ) is proposed to achieve coefficient-level rate-distortion optimized quantization. SDQ employs dynamic programming such as the viterbi search algorithm and converts complex multi-effect rate-optimized quantization problems into a grid-based shortest path search problem. RDOQ-based perceptual distortion D_quantThe calculation is as follows:

D_quant＝diff×diff-psyrdoq×|t_rec| (9)

the former term is standard SSD, and the latter term is the product of control intensity and reconstructed coefficient obtained after DCT inverse transformation.

2. Inter-module correlation analysis

As described above, frame-level quantization control based on the qcomp domain complexity (qcomp), tree-type time-domain Qp concatenation (cutree), spatial Variance Adaptive Quantization (VAQ), psycho-visual rate-distortion optimization (Psyrdo), and psycho-visual rate-distortion optimized quantization (psyquant) are incorporated into the algorithm framework of perceptual coding. Due to the complex HVS characteristics and interrelationships between modules, it is a challenge how to customize five algorithmic customizable modules in a joint optimization sense. In order to be able to quantitatively test the interaction between modules, the algorithm optimization is actually achieved by selecting the key parameters of these modules, and the relevant parameters involved in each module are listed in table 1. Wherein, the relationship between Curtreestength and qcomp is as follows:

cutreestrength＝5×(1-qcomp) (10)

for parameters with continuous values, such as AQstrength, the value range is 0-3, the step length can be discretized to 31 parameter values with the step length of 0.1, and the discretized step length is given in table 1. Assume a total of M parameters, and K is defined for the mth parameter_mUsing the step sizes in Table 1, the number of parameter combinations can be calculated to be 1.04 × 10⁹((4 × 30+1) × 2 × (6 × 50+1) × (2 × 140+1) × 51), the number is very large, and therefore, a method must be designed to simplify the complex multi-module algorithm optimization problem.

TABLE 1 relevant parameters involved in the five modules

Suppose that the RD performance changes caused by the algorithm modification of the two modules are respectively v_iV and v_jV is shown in FIG. 4_i(ν_j) Is the average BD-VMAF difference for a module i (j) on relative to all perceptual coding tools off. In contrast, ν is the average BD-VMAF difference when modules i and j are turned on together relative to when all perceptual coding tools are turned off. Define the inter-module correlation between two algorithm modules as:

φ_ij＝ν-(ν_i+ν_j) (11)

Obtaining a difference phi of each module_ijThen, the inter-module correlation level of a single module is defined as:

according to theta_iThe priority level of the algorithm decision of the five modules is determined, and the relevant parameters of each module are optimized in turn according to the priority level from large to small. Thus, the complex multi-module joint optimization problem is reduced to a continuous single-module optimization problem, and the number of parameter combinations in table 1 is also reduced to 756((4 × 30+1) +2+ (6 × 50+1) + (2 × 140+1) + 51).

3. Online adaptive parametric model

Through the analysis of the correlation between the modules, the influence of the algorithm modification of a single module on the performance of the overall code RD does not satisfy the linear relation, and the correlation between two modules has positiveHaving a negative and different video sequence correlation phi_ijThe size is different, i.e. the size of the correlation depends on the video content. A characteristic parameter omega characterizing the content of an image is determined_jLet var in the formula (6) be ω₁Gamma in the formula (5)_propagateAs ω₂Let PSYCOST in equation (7) be ω₃Will be | t in the formula (8)_recL as ω₄Let Cplx in formula (2) be ω₅. Furthermore, in case of different output code rate distributions, different parameter combinations may result in different perceptual coding performances. According to the two results, in order to further improve the result after the off-line optimization in the previous section, on the basis of the parameter combination obtained by the off-line optimization in the previous section, a content adaptive parameter migration model is designed for five non-discrete parameters:

is to describe Δ p_iCode rate change amount delta R corresponding to the code rate change amount_iFunction of the relationship between h (ω)_j) Is to describe Δ ω_jCode rate change amount delta R corresponding to the code rate change amount_jA function of the relationship between. The parameter offset model provided by the invention simultaneously considers phi_ij、ω_j、

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A multilevel perception video coding algorithm optimization method facing to a smart court scene is characterized by comprising the following steps:

s1, constructing a multi-level perception coding frame, realizing perception code rate control by combining content self-adaptive bit allocation, time domain perception quantization control and space domain perception quantization control, realizing mode decision of effective intra-frame and inter-frame prediction by psychovisual rate distortion optimization, and realizing coefficient level perception quantization by coefficient level psychovisual rate distortion optimization quantization;

the content self-adaptive bit allocation is realized by combining the lookup head preanalysis based on the sliding window and the frame-level perception complexity measurement, wherein the frame-level perception complexity measurement is realized by measuring the perception fuzzy complexity of the perception bit allocation and adopting a perception complexity quantization model to obtain a frame-level quantization parameter Qp_frm；

The time domain perception quantization control analyzes content self-adaptive bit allocation through time domain Qp cascade, obtains delta Qp of self-adaptive regulation Qp for time domain fine quantization control based on video content analysis of lookup head preanalysis_Temp；

The space-domain perception quantization control quantifies and adjusts the parameter delta Qp by utilizing the space masking effect and through the content self-adaptive bit distribution_VAQSelf-adaptively adjusting the Qp to carry out space-domain fine-grained quantization control;

finally, the final quantization parameter delta Qp is obtained_final：

ΔQp_final＝Qp_frm+ΔQp_Temp+ΔQp_VAQ (1)

The psycho-visual rate-distortion optimization through improved perceptual distortionQuantity D_RDOReplacing the traditional MSE;

s2, analyzing the correlation among the modules, measuring quantitative parameters of the correlation among the modules, evaluating the influence degree between the two algorithm modules, converting the complex multi-module optimization into continuous single-module optimization by selecting key control parameters, and determining the algorithm decision sequence of a plurality of customizable modules;

2. The method of claim 1, wherein the content adaptive bit allocation is frame-level bit allocation and quantization control, the perceptual content complexity is measured using qcomp domain compression model, and the original complexity Cplx is measured using Cplx^1-qcompThe model is compressed, qcomp is a compression constant, and the quantization step size qscale of frame-level scaling is dynamically estimated by adjusting a code rate scaling factor Rfactor:

the quantization parameter Qp is mapped to qscale by equation (3):

3. The method of claim 1, wherein the temporal perceptual quantization control is a temporal quantization parameter cascade, adaptively allocating target code rates among the coding units according to a reference importance weight, the reference importance weight being used to adjust the QP, Δ QP, of the coding block_TempThe calculation formula is as follows:

4. The method of claim 1, wherein the spatial perceptual quantization control is a variance adaptive quantization, overall Δ Qp_VAQIs calculated as follows:

ΔQp_VAQ＝θ_VAQ×(var-var_adjust) (6)

5. The method of claim 1, wherein the psychovisual rate-distortion optimization is J'₁＝D_RDO+λ₁×R₁Replace the conventional RDO encoding J₁＝D₁+λ₁×R₁，λ₁Lagrange factor, R, representing psycho-visual rate-distortion optimization₁A coding bit number representing a psycho-visual rate-distortion optimization;

the psycho-visual rate distortion optimization is performed in a Psyrdo algorithmSSD is perceived distortion D_RDOInstead, the calculation is as follows:

D_RDO＝SSD+λ_{psy_rdo}×psyrdo×psycost (7)

6. The method of claim 1, wherein the psycho-visual rate-distortion optimization quantization is J'₂＝D_quant+₂λ×₂R replaces the conventional RDO code J₂＝D₂+λ₂×R₂，λ₂Lagrange factor, R, representing psycho-visual rate-distortion optimized quantization₂A coding bit number representing psycho-visual rate-distortion optimized quantization;

the psycho-visual rate-distortion optimization quantization is based on the perception distortion D of the RDOQ_quantThe calculation is as follows:

D_quant＝diff×diff-psyrdoq×|t_rec| (9)

7. The method as claimed in claim 1, wherein the inter-module correlation analysis is based on algorithm modules of content adaptive bit allocation, temporal perceptual quantization control, spatial perceptual quantization control, psycho-visual rate distortion optimization and psycho-visual rate distortion optimization quantization, and the interaction between the algorithm modules is quantitatively tested by key parameters of the algorithm modules.

8. The method as claimed in claim 7, wherein the relationship between the curtreestengh parameter for temporal perceptual quantization control and the qcomp parameter for content adaptive bit allocation is:

cutreestrength＝5×(1-qcomp) (10)

9. The intelligent court-oriented multi-level perceptual video coding algorithm optimization method of claim 7, wherein for parameters with continuous values, the value range is discretized into a plurality of parameter values by setting discrete step sizes, and the RD performance change amounts caused by the modification of the two algorithm modules are respectively ν_iV and v_j，ν_iOr v_jThe average BD-VMAF difference value is obtained when the module i or j is opened and is closed relative to all the perception coding tools, and compared with the average BD-VMAF difference value when the module i and j are opened and is closed relative to all the perception coding tools, the module correlation between the two algorithm modules is defined as follows:

φ_ij＝ν-(ν_i+ν_j) (11)。

according to theta_iThe priority level of each algorithm module decision is determined according to the size of the algorithm module, and the related parameters of each module are optimized in sequence according to the priority level from large to small.

10. The method of claim 7, wherein the online adaptive parametric model is constructed by characterizing a characteristic parameter ω of image content_jLet var in the formula (6) be ω₁Gamma in the formula (5)_propagateAs ω₂Let PSYCOST in equation (7) be ω₃Will be | t in the formula (9)_recL as ω₄Let Cplx in formula (2) be ω₅And on the basis of parameter combinations obtained by off-line optimization, constructing a content adaptive parameter migration model for five non-discrete parameters:

wherein a and b are constants, phi_ijIs the inter-module correlation between two modules, ω_jIs a characteristic parameter, β, characterizing the image content in each module_iIs a parameter for controlling the dynamic range of each parameter, θ_iIs to describe Δ p_iCode rate change amount delta R corresponding to the code rate change amount_iFunction of the relationship between h (ω)_j) Is to describe Δ ω_jCode rate change amount delta R corresponding to the code rate change amount_jA function of the relationship between, while taking into account phi_ij、ω_j、θ_i、h(ω_j) The influence of (c).