CN114567776A

CN114567776A - Video low-complexity coding method based on panoramic visual perception characteristic

Info

Publication number: CN114567776A
Application number: CN202210157533.5A
Authority: CN
Inventors: 杜宝祯; 张奇
Original assignee: Ningbo Polytechnic
Current assignee: Zhejiang Chuanzhi Electronic Technology Co ltd
Priority date: 2022-02-21
Filing date: 2022-02-21
Publication date: 2022-05-31
Anticipated expiration: 2042-02-21
Also published as: CN114567776B

Abstract

The invention discloses a video low-complexity coding method based on panoramic visual perception characteristics, which utilizes a spatial domain JND threshold value as a spatial domain perception factor, obtains a motion perception factor through a weighted gradient value, further obtains an average value of space-time weighted perception factors of all pixel points in a maximum coding unit, calculates a Lagrange coefficient adjusting factor based on the space-time weighted perception factor of the maximum coding unit according to a rate distortion optimization theory, and further obtains a quantization parameter variable quantity based on the space-time weighted perception factor of the maximum coding unit; simultaneously calculating the quantization parameter variation quantity of the maximum coding unit based on the dimension weight; calculating a new coding quantization parameter of the maximum coding unit according to the two quantization parameter variable quantities, and applying the new coding quantization parameter to coding; the method has the advantages that the coding quality can be guaranteed, the coding rate can be effectively reduced, the coding complexity can be effectively reduced, the rate distortion performance is obviously improved, and particularly when the initial coding quantization parameter is small, the coding effect is better.

Description

Video low-complexity coding method based on panoramic visual perception characteristic

Technical Field

The invention relates to a video coding technology, in particular to a video low-complexity coding method based on panoramic visual perception characteristics.

Background

In recent years, panoramic video systems are widely popular with people for their "immersive" visual experience, and have great application prospects in the fields of virtual reality, simulated driving, and the like. However, the current panoramic video system still has the problem of too high coding complexity in terms of coding, which brings great challenges to the application of the panoramic video system. Therefore, how to reduce the encoding complexity has become an urgent technical problem to be solved in the field.

The existing low-complexity coding algorithm of the panoramic video does not fully consider the perception characteristic of a Human Visual System (HVS) and the characteristics of the panoramic video, and the optimal coding performance is difficult to achieve. The main purpose of video coding is to reduce the code rate of coding as much as possible on the premise of ensuring certain video quality; or under the condition that the code rate of the coding is limited, the coding is carried out by adopting a mode with the minimum distortion. Therefore, how to combine and utilize the perception characteristics of the human visual system and the characteristics of the panoramic video to guide the selection of the coding parameters becomes an important breakthrough direction for researching and reducing the coding complexity in the field.

Disclosure of Invention

The invention aims to solve the technical problem of providing a video low-complexity coding method based on panoramic visual perception characteristics, which can effectively save coding code rate and further effectively reduce coding complexity.

The technical scheme adopted by the invention for solving the technical problems is as follows: a video low complexity coding method based on panoramic visual perception characteristics is characterized by comprising the following steps:

step 1: defining a video frame to be coded currently in the panoramic video in the ERP projection format as a current frame; the width of a video frame in the panoramic video in the ERP projection format is W, and the height of the video frame in the panoramic video in the ERP projection format is H;

and 2, step: judging whether the current frame is a 1 st frame video frame, if so, encoding the current frame by adopting an original algorithm of an HEVC video encoder, and then executing the step 10; otherwise, executing step 3;

and step 3: performing spatial JND threshold calculation on each pixel point in the current frame to obtain a panoramic spatial JND threshold map of the current frame, and recording the map as G₁，G₁The pixel value of each pixel point in the current frame is the spatial JND threshold value of the corresponding pixel point in the current frame; and performing weighted gradient calculation on each pixel point in the current frame to obtain a weighted gradient image of the current frame, which is marked as G₂，G₂The pixel value of each pixel point in the current frame is the weighted gradient value of the corresponding pixel point in the current frame;

and 4, step 4: calculating the space domain perception factor of each pixel point in the current frame, and recording the space domain perception factor of the pixel point with the coordinate position of (x, y) in the current frame as delta_A(x,y)，δ_A(x,y)＝G₁(x, y); calculating the motion perception factor of each pixel point in the current frame, and recording the motion perception factor of the pixel point with the coordinate position of (x, y) in the current frame as delta_T(x,y)，

Then calculating the space-time weighted perception factor of each pixel point in the current frame, and recording the space-time weighted perception factor of the pixel point with the coordinate position of (x, y) in the current frame as delta (x, y), wherein the delta (x, y) is delta_A(x,y)×δ_T(x, y); then calculating the average value of the space-time weighted perception factors of all the pixel points in the current frame, and recording the average value as S_δ(ii) a Calculating the dimension weight of each pixel point in the current frame, and recording the dimension weight of the pixel point with the coordinate position of (x, y) in the current frame as w_ERP(x,y)，

Wherein x is more than or equal to 0 and less than or equal to W-1, y is more than or equal to 0 and less than or equal to H-1, G₁(x, y) denotes G₁The pixel value G of the pixel point with the middle coordinate position (x, y)₁(x, y) also represents the spatial JND threshold, G, of the pixel with coordinate position (x, y) in the current frame₂(x, y) represents G₂The pixel value G of the pixel point with the middle coordinate position (x, y)₂(x, y) also represents the weighted gradient value of the pixel with coordinate position (x, y) in the current frame, S_FRepresents G₂All images inAverage value of pixel values of pixel points, S_FAlso represents the average value of the weighted gradient values of all the pixels in the current frame, wherein epsilon is a motion perception constant and epsilon belongs to [1,2 ]]Cos () is a cosine function;

and 5: defining a maximum coding unit to be processed currently in a current frame as a current maximum coding unit;

step 6: calculating the average value of the space-time weighted perception factors of all the pixel points in the current maximum coding unit, and recording the average value as S_{δ_LCU}(ii) a Then, calculating Lagrange coefficient adjustment factors based on space-time weighting perception factors of the current maximum coding unit, and marking the Lagrange coefficient adjustment factors as psi_LCU，

Then, the quantization parameter variation quantity based on the space-time weighting perception factor of the current maximum coding unit is calculated and recorded as delta QP₁，ΔQP₁＝3log₂(Ψ_LCU) (ii) a Wherein, K_LCUAnd B_LCUAre all adjustment parameters, K_LCU∈(0,1)，B_LCU∈(0,1)；

And 7: calculating the average value of the dimensionality weights of all the pixel points in the current maximum coding unit, and recording the average value as S_{wERP_LCU}(ii) a Then, the quantization parameter variation quantity based on the dimension weight of the current maximum coding unit is calculated and recorded as delta QP₂，

Wherein a and b are both adjusting parameters, a belongs to (0,1), b belongs to (0,1), and b is less than a;

and 8: calculating new coding quantization parameter, denoted as QP, of the current maximum coding unit_new，

Then using QP_newUpdating the coding quantization parameter of the current maximum coding unit; then coding the current maximum coding unit; wherein, QP_orgOriginal coded quantization parameter, symbol representing current maximum coding unit

Is a rounded-down operation sign;

and step 9: taking the next maximum coding unit to be processed in the current frame as the current maximum coding unit, then returning to the step 6 to continue executing until all the maximum coding units in the current frame are processed, and then executing the step 10;

step 10: and (3) taking the next frame of video frame to be coded in the panoramic video in the ERP projection format as the current frame, and then returning to the step (2) to continue executing until all the video frames in the panoramic video in the ERP projection format are coded.

In said step 3, G₁The acquisition mode is as follows: performing spatial JND threshold calculation on each pixel point in the current frame by adopting a spatial domain just noticeable distortion model to obtain G₁。

In said step 3, G₂The acquisition process comprises the following steps: g is to be₂The pixel value of the pixel point with the middle coordinate position (x, y) is marked as G₂(x,y)，

Wherein x is more than or equal to 0 and less than or equal to W-1, y is more than or equal to 0 and less than or equal to H-1, G₂(x, y) also represents the weighted gradient value of the pixel with coordinate position (x, y) in the current frame,

which is indicative of the horizontal direction,

which is indicative of the vertical direction of the,

which represents the direction of the time domain,

representing the horizontal direction gradient value of the pixel point with the coordinate position of (x, y) in the current frame,

indicating sitting in the current frameThe vertical gradient value of the pixel point with the index position of (x, y),

representing the time domain direction gradient value of the pixel point with the coordinate position of (x, y) in the current frame,

and

and calculating by using a 3D-sobel operator, wherein alpha represents a gradient adjustment factor in the horizontal direction, beta represents a gradient adjustment factor in the vertical direction, gamma represents a gradient adjustment factor in the time domain direction, and alpha + beta + gamma is equal to 1.

Compared with the prior art, the invention has the advantages that:

the method fully considers the perception characteristics of a human eye visual system and the characteristics of panoramic video, utilizes a spatial domain JND threshold value (visual perception information) as a spatial domain perception factor, obtains a motion perception factor through a weighted gradient value (visual perception information), further obtains an average value of space-time weighted perception factors of all pixel points in a maximum coding unit through calculation, calculates Lagrange coefficient regulating factors of the maximum coding unit based on the space-time weighted perception factors according to a rate distortion optimization theory, and further obtains quantization parameter variable quantity of the maximum coding unit based on the space-time weighted perception factors; meanwhile, the method takes the dimension weight characteristics of the panoramic video in the ERP projection format into consideration, and calculates the quantitative parameter variation of the maximum coding unit based on the dimension weight; and calculating a new coding quantization parameter of the maximum coding unit according to the two quantization parameter variable quantities, and applying the new coding quantization parameter to coding. The method can adaptively adjust the coding quantization parameters aiming at the time-space domain and the panoramic latitude characteristics of the specific maximum coding unit, and experimental tests show that the method can effectively reduce the coding rate while ensuring the coding quality, thereby effectively reducing the coding complexity, obviously improving the rate-distortion performance, and particularly aiming at the condition that the initial coding quantization parameter is smaller, the coding effect is better.

Drawings

Fig. 1 is a block diagram of a general implementation of the method of the present invention.

Detailed Description

The invention is described in further detail below with reference to the accompanying examples.

The overall implementation block diagram of the video low-complexity coding method based on the panoramic visual perception characteristic is shown in fig. 1, and the method comprises the following steps:

step 1: defining a video frame to be coded currently in a panoramic video in an ERP (equivalent projection) projection format as a current frame; the width of a video frame in the panoramic video in the ERP projection format is W, and the height of the video frame in the panoramic video in the ERP projection format is H.

Step 2: judging whether the current frame is a 1 st frame video frame, if so, encoding the current frame by adopting an original algorithm of an HEVC video encoder, and then executing the step 10; otherwise, step 3 is executed.

And step 3: performing spatial JND (Just Noticeable Distortion) threshold calculation on each pixel point in the current frame to obtain a panoramic spatial JND threshold map of the current frame, and recording the map as G₁，G₁The pixel value of each pixel point in the current frame is the spatial domain JND threshold value of the corresponding pixel point in the current frame; and performing weighted gradient calculation on each pixel point in the current frame to obtain a weighted gradient map of the current frame, and recording the weighted gradient map as G₂，G₂The pixel value of each pixel point in the current frame is the weighted gradient value of the corresponding pixel point in the current frame; the larger the spatial domain JND threshold value is, the larger the representation just noticeable distortion is, namely the stronger the spatial domain masking property of the corresponding region is; conversely, the smaller the spatial JND threshold, the weaker the spatial masking of the corresponding region.

In this embodiment, G₁The acquisition mode is as follows: performing airspace JND threshold calculation on each pixel point in the current frame by adopting the conventional classical airspace just noticeable distortion model to obtain G₁。

In this embodiment, G₂The acquisition process comprises the following steps: g is to be₂The pixel value of the pixel point with the middle coordinate position (x, y) is marked as G₂(x,y)，

which is indicative of the horizontal direction,

which is indicative of the vertical direction of the,

which represents the direction of the time domain,

representing the vertical gradient value of the pixel point with the coordinate position of (x, y) in the current frame,

the time domain direction gradient value of the pixel point with the coordinate position (x, y) in the current frame is represented, namely the gradient value of the pixel point with the coordinate position (x, y) in the current frame and the pixel point with the coordinate position (x, y) in the previous frame of video frame along the time domain direction,

and

the method is calculated by an existing 3D-sobel operator, where α represents a gradient adjustment factor in a horizontal direction, β represents a gradient adjustment factor in a vertical direction, γ represents a gradient adjustment factor in a time domain direction, α + β + γ is 1, and in this embodiment, α is 0.25, β is 0.25, and γ is 0.5.

And 4, step 4: calculating the null of each pixel point in the current frameThe domain perception factor records the space domain perception factor of the pixel point with the coordinate position of (x, y) in the current frame as delta_A(x,y)，δ_A(x,y)＝G₁(x, y); calculating the motion perception factor of each pixel point in the current frame, and recording the motion perception factor of the pixel point with the coordinate position of (x, y) in the current frame as delta_T(x,y)，

Then calculating the space-time weighted perception factor of each pixel point in the current frame, and recording the space-time weighted perception factor of the pixel point with the coordinate position of (x, y) in the current frame as delta (x, y), wherein the delta (x, y) is delta_A(x,y)×δ_T(x, y); then calculating the average value of the space-time weighted perception factors of all the pixel points in the current frame, and recording the average value as S_δ，

Calculating the dimension weight of each pixel point in the current frame, and recording the dimension weight of the pixel point with the coordinate position of (x, y) in the current frame as w_ERP(x,y)，

Wherein x is more than or equal to 0 and less than or equal to W-1, y is more than or equal to 0 and less than or equal to H-1, G₁(x, y) denotes G₁The pixel value G of the pixel point with the middle coordinate position (x, y)₁(x, y) also represents the spatial JND threshold, G, of the pixel with coordinate position (x, y) in the current frame₂(x, y) denotes G₂The pixel value G of the pixel point with the middle coordinate position (x, y)₂(x, y) also represents the weighted gradient value of the pixel with coordinate position (x, y) in the current frame, S_FRepresents G₂Average value of pixel values of all pixel points in (S)_FAlso represents the average value of the weighted gradient values of all the pixels in the current frame,

epsilon is a motion perception constant, epsilon is [1,2 ]]In this embodiment, epsilon is 1, cos () is a cosine function, and pi is 3.14 ….

In this embodiment, an ERP project gridBecause each latitude adopts pixel sampling of different degrees, different pixel redundancies exist in different dimensions in a plane, and extreme lifting redundancy of two poles is most obvious, after a sphere is projected to an ERP projection format, the center of the sphere is usually taken as a base point, the longitude theta of the ERP projection format corresponds to the longitude of the spherical surface of the sphere, and the latitude of the ERP projection format is

Corresponding to the latitude of the sphere, θ ∈ [ - π, π]，

The characteristic of the panoramic latitude is considered, and a dimension weight parameter w of an ERP projection format is introduced_ERP(x,y)。

And 5: defining a Largest Coding Unit (LCU) to be processed currently in a current frame as a current Largest Coding Unit.

Step 6: calculating the average value of the space-time weighted perception factors of all the pixel points in the current maximum coding unit, and recording the average value as S_{δ_LCU}，

Then, calculating Lagrange coefficient adjustment factors based on space-time weighting perception factors of the current maximum coding unit, and marking the Lagrange coefficient adjustment factors as psi_LCU，

Then, the quantization parameter variation based on the space-time weighting perception factor of the current maximum coding unit is calculated and recorded as delta QP₁，ΔQP₁＝3log₂(Ψ_LCU) (ii) a Wherein i is more than or equal to 0 and less than or equal to 63, j is more than or equal to 0 and less than or equal to 63, and delta_LCU(i, j) represents the space-time weighted perception factor of the pixel point with the coordinate position (i, j) in the block in the current maximum coding unit, K_LCUAnd B_LCUAre all adjustment parameters, K_LCU∈(0,1)，B_LCUE (0,1), K is finally determined by a large number of experiments in this example_LCUAnd B_LCUAll values are 0.5.

And 7: calculating a current maximum coding unitThe average value of the dimension weights of all the pixel points is recorded as S_{wERP_LCU}，

Then, the quantization parameter variation quantity based on the dimension weight of the current maximum coding unit is calculated and recorded as delta QP₂，

Wherein, w_{ERP_LCU}(i, j) represents the dimension weight of a pixel point with the coordinate position (i, j) in the block in the current maximum coding unit, a and b are both adjustment parameters, a belongs to (0,1), b belongs to (0,1), and b is less than a, and in the embodiment, the value of a is finally determined to be 0.85 and the value of b is finally determined to be 0.3 through a large number of experiments.

Then using QP_newUpdating the coding quantization parameter of the current maximum coding unit; coding the current maximum coding unit by adopting an HEVC (high efficiency video coding) encoder; wherein, QP_orgOriginal coding quantization parameter, QP, representing the current maximum coding unit_orgCan be read from the initialization parameter list of the encoder

Is a rounded-down operation sign.

And step 9: and taking the next maximum coding unit to be processed in the current frame as the current maximum coding unit, then returning to the step 6 to continue executing until all the maximum coding units in the current frame are processed, and then executing the step 10.

To further illustrate the performance of the method of the present invention, the method of the present invention was tested.

The method is characterized in that HEVC video encoder standard reference software HM16.14 is selected as an experimental test platform, hardware configuration is conducted to an Intel (R) core (TM) i7-10700 CPU, a 64-bit WIN10 operating system with a main frequency of 2.9GHz and a memory of 32G, and a development tool selects VS 2013. Selecting 4 panoramic video sequences as standard test sequences, wherein the standard test sequences are as follows: two 4K sequences "Aeriological City", "DrivingCity" and two 6K sequences "BranCastle 2", "bonding 2". The number of test frames of each standard test sequence is 100 frames, an intra-frame coding mode is adopted, SearchRange is set to be 64, MaxPartitionDepth is set to be 4, and an initial coding quantization parameter QP (namely an original coding quantization parameter QP)_org) Respectively 22, 27, 32, 37.

Table 1 lists the relevant parameter information of "audiocity", "drivingmincity", "BranCastle 2", "bonding 2" 4 panoramic video sequences.

TABLE 1 associated parameter information for panoramic video sequences

Panoramic video sequence	Video resolution
		AerialCity	3840×1920
DrivingInCity	3840×1920
		BranCastle2	6144×3072
Landing2	6144×3072

Table 2 shows the coding rate savings associated with the inventive method for coding the panoramic video sequence shown in table 1, as compared to the HM16.14 original platform method. The code rate saving rate of the coding by adopting the method of the invention compared with the coding by adopting the HM16.14 original platform method is defined as delta R_PRO，ΔR_PRO＝(R_ORG-R_PRO)/R_ORGX 100 (%), wherein R_PRORepresenting the code rate, R, of the code encoded by the method of the invention_ORGIndicating the coding rate of coding using HM16.14 original platform method.

Table 2 comparison of code rate savings for coding using the method of the present invention compared to the HM16.14 original platform method

As can be seen from Table 2, the encoding rate can be averagely saved by 12.9% when the method of the present invention is used for encoding. Aiming at 4 panoramic video sequences with different scenes and different motion conditions, the coding rate can be effectively reduced by adopting the method for coding, and particularly aiming at an initial coding quantization parameter QP (namely an original coding quantization parameter QP)_org) And in a smaller case, the coding effect is better.

Table 3 lists the rate-distortion performance of the coding of the panoramic video sequence listed in table 1 using the method of the present invention. Evaluating the quality of coded video by adopting a classical subjective quality evaluation method, wherein in the quality evaluation, a subjective quality evaluation method MOS (Mean Opinion Score) is adopted as a quality evaluation index, and rate distortion performance indexes BDBR of all panoramic video sequences under the subjective quality evaluation method MOS are respectively calculated_MOSTo evaluate the performance of the method of the invention comprehensively.

TABLE 3 Rate distortion Performance for encoding using the method of the present invention

As can be seen from Table 3, the process of the invention employs BDBR_MOSThe rate distortion performance evaluation index represents that under the same subjective quality condition, the average value of the coding code rate is saved by about-7.4% under the quality evaluation index MOS. This shows that compared with the HM16.14 original platform method, the method of the present invention can save more coding rate under the same subjective perceptual quality. As can be seen from Table 3, the method of the present invention can effectively save coding rate and significantly improve rate distortion performance for different scenes and different motion conditions of a panoramic video sequence.

Claims

1. A video low complexity coding method based on panoramic visual perception characteristics is characterized by comprising the following steps:

step 2: judging whether the current frame is a 1 st frame video frame, if so, encoding the current frame by adopting an original algorithm of an HEVC video encoder, and then executing the step 10; otherwise, executing step 3;

and step 3: performing spatial JND threshold calculation on each pixel point in the current frame to obtain a panoramic spatial JND threshold map of the current frame, and recording the map as G₁，G₁The pixel value of each pixel point in the current frame is the spatial JND threshold value of the corresponding pixel point in the current frame; and performing weighted gradient calculation on each pixel point in the current frame to obtain a weighted gradient map of the current frame, and recording the weighted gradient map as G₂，G₂The pixel value of each pixel point in the current frame is the weighted gradient value of the corresponding pixel point in the current frame;

and 4, step 4: calculating the spatial domain perception factor of each pixel point in the current frame, and recording the spatial domain perception factor of the pixel point with the coordinate position of (x, y) in the current frame as delta_A(x,y)，δ_A(x,y)＝G₁(x, y); and calculating the operation of each pixel point in the current frameDynamic perception factor, recording the motion perception factor of the pixel point with coordinate position (x, y) in the current frame as delta_T(x,y)，

Wherein x is more than or equal to 0 and less than or equal to W-1, y is more than or equal to 0 and less than or equal to H-1, G₁(x, y) denotes G₁The pixel value G of the pixel point with the middle coordinate position (x, y)₁(x, y) also represents the spatial JND threshold, G, of the pixel with coordinate position (x, y) in the current frame₂(x, y) denotes G₂The pixel value G of the pixel point with the middle coordinate position (x, y)₂(x, y) also represents the weighted gradient value of the pixel with coordinate position (x, y) in the current frame, S_FRepresents G₂Average value of pixel values of all pixel points in (1), S_FAlso represents the average value of the weighted gradient values of all the pixels in the current frame, wherein epsilon is a motion perception constant and epsilon belongs to [1,2 ]]Cos () is a cosine function;

And 7: calculating the average value of the dimensionality weights of all the pixel points in the current maximum coding unit, and recording the average value as the average value

Is a rounded-down operator;

2. The method of claim 1, wherein in step 3, G is the number of bits of video coding with low complexity based on the perceptual property of panoramic vision₁The acquisition mode is as follows: performing spatial domain JND threshold calculation on each pixel point in the current frame by adopting a spatial domain just noticeable distortion model to obtain G₁。

3. The method for coding video with low complexity according to claim 1 or 2, wherein G is the number of G in step 3₂The acquisition process comprises the following steps: g is to be₂The pixel value of the pixel point with the middle coordinate position (x, y) is marked as G₂(x,y)，

which is indicative of the direction of the horizontal,

which is indicative of the vertical direction of the,

which represents the direction of the time domain,

and