CN114567776B

CN114567776B - Video low-complexity coding method based on panoramic visual perception characteristics

Info

Publication number: CN114567776B
Application number: CN202210157533.5A
Authority: CN
Inventors: 杜宝祯; 张奇
Original assignee: Ningbo Polytechnic
Current assignee: Zhejiang Chuanzhi Electronic Technology Co ltd
Priority date: 2022-02-21
Filing date: 2022-02-21
Publication date: 2023-05-05
Anticipated expiration: 2042-02-21
Also published as: CN114567776A

Abstract

The invention discloses a video low-complexity coding method based on panoramic visual perception characteristics, which uses an airspace JND threshold value as an airspace perception factor, obtains a motion perception factor through a weighted gradient value, further obtains the average value of space-time weighted perception factors of all pixel points in a maximum coding unit, calculates Lagrange coefficient adjustment factors based on the space-time weighted perception factors of the maximum coding unit according to a rate distortion optimization theory, and further obtains the quantization parameter variation quantity of the maximum coding unit based on the space-time weighted perception factors; simultaneously calculating quantization parameter variation of the maximum coding unit based on dimension weight; calculating new coding quantization parameters of the maximum coding unit according to the two quantization parameter variation amounts, and applying the new coding quantization parameters to coding; the method has the advantages that the coding quality can be ensured, the coding rate can be effectively reduced, the coding complexity can be effectively reduced, the rate distortion performance is obviously improved, and the coding effect is better particularly when the initial coding quantization parameter is smaller.

Description

Video low-complexity coding method based on panoramic visual perception characteristics

Technical Field

The invention relates to a video coding technology, in particular to a video low-complexity coding method based on panoramic visual perception characteristics.

Background

In recent years, panoramic video systems are widely welcomed by people through the 'immersive' visual experience, and have great application prospects in the fields of virtual reality, simulated driving and the like. However, the problem of excessive encoding complexity in the aspect of encoding still exists in the panoramic video system at present, which brings great challenges to the application of the panoramic video system. Therefore, how to reduce the coding complexity has become a technical problem to be solved in the field.

The existing panoramic video low-complexity coding algorithm does not fully consider the perception characteristics of a human eye visual system (Human Visual System, HVS) and the characteristics of the panoramic video, and is difficult to achieve optimal coding performance. The main purpose of video coding is to reduce the code rate of coding as much as possible on the premise of ensuring certain video quality; or in the case of limited code rate of the coding, the mode with minimum distortion is adopted for coding. Therefore, how to combine and use the perception characteristic of the human eye vision system and the panoramic video characteristic to guide the coding parameter selection becomes an important breakthrough direction for researching and reducing the coding complexity in the field.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a video low-complexity coding method based on panoramic visual perception characteristics, which can effectively save coding code rate, thereby effectively reducing coding complexity.

The technical scheme adopted for solving the technical problems is as follows: a video low-complexity coding method based on panoramic visual perception characteristics is characterized by comprising the following steps:

step 1: defining a current video frame to be coded in the panoramic video in the ERP projection format as a current frame; the width of a video frame in the panoramic video in the ERP projection format is W, and the height is H;

step 2: judging whether the current frame is a 1 st frame video frame or not, if so, adopting an original algorithm of an HEVC video encoder to encode the current frame, and then executing a step 10; otherwise, executing the step 3;

step 3: performing airspace JND threshold calculation on each pixel point in the current frame to obtain a panoramic airspace JND threshold diagram of the current frame, and marking the panoramic airspace JND threshold diagram as G ₁ ，G ₁ The pixel value of each pixel point in the current frame is the airspace JND threshold value of the corresponding pixel point in the current frame; and performing weighted gradient calculation on each pixel point in the current frame to obtain a weighted gradient map of the current frame, which is marked as G ₂ ，G ₂ The pixel value of each pixel point in the current frame is the weighted gradient value of the corresponding pixel point in the current frame;

step 4: calculating the airspace perception factor of each pixel point in the current frame, and sensing the airspace of the pixel point with the coordinate position of (x, y) in the current frameThe known factor is denoted as delta _A (x,y)，δ _A (x,y)＝G ₁ (x, y); calculating motion perception factors of each pixel point in the current frame, and recording the motion perception factors of the pixel points with the coordinate positions of (x, y) in the current frame as delta _T (x,y)，

Then calculating the space-time weighted perceptron of each pixel point in the current frame, and recording the space-time weighted perceptron of the pixel point with the coordinate position of (x, y) in the current frame as delta (x, y), wherein delta (x, y) =delta _A (x,y)×δ _T (x, y); calculating the average value of the space-time weighted perceptron factors of all pixel points in the current frame, and recording as S _δ The method comprises the steps of carrying out a first treatment on the surface of the Calculating the dimension weight of each pixel point in the current frame, and marking the dimension weight of the pixel point with the coordinate position of (x, y) in the current frame as w _ERP (x,y)，

Wherein x is more than or equal to 0 and less than or equal to W-1, y is more than or equal to 0 and less than or equal to H-1, G ₁ (x, y) represents G ₁ Pixel value, G, of pixel point with middle coordinate position (x, y) ₁ (x, y) also represents the spatial JND threshold, G, of the pixel point with the coordinate position (x, y) in the current frame ₂ (x, y) represents G ₂ Pixel value, G, of pixel point with middle coordinate position (x, y) ₂ (x, y) also represents the weighted gradient value of the pixel point with the coordinate position (x, y) in the current frame, S _F Represents G ₂ Average value of pixel values of all pixel points in (S) _F Also represents the average value of weighted gradient values of all pixel points in the current frame, epsilon is a motion perception constant, epsilon is [1,2 ]]Cos () is a cosine function;

step 5: defining a current maximum coding unit to be processed in a current frame as a current maximum coding unit;

step 6: calculating the average value of the space-time weighted perception factors of all pixel points in the current maximum coding unit, and recording the average value as S _{δ_LCU} The method comprises the steps of carrying out a first treatment on the surface of the Then calculating Lagrange coefficient adjusting factor based on space-time weighted perception factor of the current maximum coding unit, and marking as ψ _LCU ，

Calculating the quantization parameter variation of the current maximum coding unit based on the space-time weighted perception factor, and recording the quantization parameter variation as delta QP ₁ ，ΔQP ₁ ＝3log ₂ (Ψ _LCU ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein K is _LCU And B _LCU Are all adjusting parameters, K _LCU ∈(0,1)，B _LCU ∈(0,1)；

Step 7: calculating the average value of the dimension weights of all pixel points in the current maximum coding unit, and recording the average value as S _{wERP_LCU} The method comprises the steps of carrying out a first treatment on the surface of the Calculating the quantization parameter variation based on the dimension weight of the current maximum coding unit, and recording the quantization parameter variation as delta QP ₂ ，

Wherein a and b are both adjusting parameters, a epsilon (0, 1), b < a;

step 8: calculating new coding quantization parameter of current maximum coding unit, and recording as QP _new ，

Then use QP _new Updating the coding quantization parameter of the current maximum coding unit; coding the current maximum coding unit; wherein QP is _org Original coding quantization parameter representing current maximum coding unit, symbol

Rounding down the operator;

step 9: taking the next largest coding unit to be processed in the current frame as the current largest coding unit, returning to the step 6 to continue execution until all the largest coding units in the current frame are processed, and executing the step 10;

step 10: and taking a video frame to be encoded of the next frame in the panoramic video in the ERP projection format as a current frame, and returning to the step 2 to continue execution until all video frames in the panoramic video in the ERP projection format are encoded.

In the step 3，G ₁ The acquisition mode of (a) is as follows: performing spatial JND threshold calculation on each pixel point in the current frame by adopting spatial just noticeable distortion model to obtain G ₁ 。

In the step 3, G ₂ The acquisition process of (1) is as follows: will G ₂ The pixel value of the pixel point with the middle coordinate position of (x, y) is marked as G ₂ (x,y)，

Wherein x is more than or equal to 0 and less than or equal to W-1, y is more than or equal to 0 and less than or equal to H-1, G ₂ (x, y) also represents the weighted gradient value of the pixel point with the coordinate position (x, y) in the current frame, ">

Indicating the direction of the horizontal direction,

indicates the vertical direction +.>

Representing the time domain direction +_>

Horizontal gradient value representing pixel point with coordinate position (x, y) in current frame, +.>

Vertical gradient value representing pixel point with coordinate position (x, y) in current frame, +.>

Time domain direction gradient value representing pixel point with coordinate position (x, y) in current frame, +.>

And

calculated by a 3D-sobel operator, alpha represents a gradient adjustment factor in the horizontal directionBeta represents a gradient adjustment factor in the vertical direction, gamma represents a gradient adjustment factor in the time domain, and α+β+γ=1.

Compared with the prior art, the invention has the advantages that:

the method fully considers the perception characteristics of a human eye visual system and the characteristics of panoramic video, utilizes an airspace JND threshold (visual perception information) as an airspace perception factor, obtains a motion perception factor through a weighted gradient value (visual perception information), further calculates the average value of space-time weighted perception factors of all pixel points in a maximum coding unit, calculates a Lagrange coefficient adjustment factor of the maximum coding unit based on the space-time weighted perception factors according to a rate distortion optimization theory, and further obtains the quantization parameter variation quantity of the maximum coding unit based on the space-time weighted perception factors; meanwhile, the method takes the dimension weight characteristics of the panoramic video in the ERP projection format into consideration, and calculates the quantization parameter variation of the maximum coding unit based on the dimension weight; and calculating a new coding quantization parameter of the maximum coding unit according to the two quantization parameter variation amounts, and applying the new coding quantization parameter to coding. The method can adaptively adjust the coding quantization parameter aiming at the time-space domain and panoramic latitude characteristics of a specific maximum coding unit, and experimental tests show that the method can effectively reduce the coding rate while guaranteeing the coding quality, thereby effectively reducing the coding complexity, remarkably improving the rate distortion performance, and having better coding effect especially aiming at the condition of smaller initial coding quantization parameter.

Drawings

Fig. 1 is a block diagram of a general implementation of the method of the present invention.

Detailed Description

The invention is described in further detail below with reference to the embodiments of the drawings.

The invention provides a video low-complexity coding method based on panoramic visual perception characteristics, which is generally implemented as shown in a block diagram in fig. 1 and comprises the following steps:

step 1: defining a current video frame to be coded in the panoramic video in the ERP (Equirectangular Projection) projection format as a current frame; the width of a video frame in the panoramic video in the ERP projection format is W, and the height is H.

Step 2: judging whether the current frame is a 1 st frame video frame or not, if so, adopting an original algorithm of an HEVC video encoder to encode the current frame, and then executing a step 10; otherwise, step 3 is executed.

Step 3: performing spatial JND (Just Noticeable Distortion ) threshold calculation on each pixel point in the current frame to obtain a panoramic spatial JND threshold diagram of the current frame, and marking as G ₁ ，G ₁ The pixel value of each pixel point in the current frame is the airspace JND threshold value of the corresponding pixel point in the current frame; and performing weighted gradient calculation on each pixel point in the current frame to obtain a weighted gradient map of the current frame, which is marked as G ₂ ，G ₂ The pixel value of each pixel point in the current frame is the weighted gradient value of the corresponding pixel point in the current frame; the larger the spatial JND threshold value is, the larger the just noticeable distortion is represented, namely the stronger the spatial masking of the corresponding region is; conversely, the smaller the spatial JND threshold, the weaker the spatial masking of the corresponding region.

In the present embodiment, G ₁ The acquisition mode of (a) is as follows: performing airspace JND threshold calculation on each pixel point in the current frame by adopting the existing classical airspace just noticeable distortion model to obtain G ₁ 。

In the present embodiment, G ₂ The acquisition process of (1) is as follows: will G ₂ The pixel value of the pixel point with the middle coordinate position of (x, y) is marked as G ₂ (x,y)，

Indicates the horizontal direction +.>

Indicates the vertical direction +.>

Representing the time domain direction +_>

A vertical gradient value representing a pixel point having a coordinate position of (x, y) in the current frame,

representing the time domain direction gradient value of the pixel point with the coordinate position of (x, y) in the current frame, namely the gradient value of the pixel point with the coordinate position of (x, y) in the current frame along the time domain direction and the pixel point with the coordinate position of (x, y) in the video frame of the previous frame,

and->

The existing 3D-sobel operator calculates, α represents a gradient adjustment factor in a horizontal direction, β represents a gradient adjustment factor in a vertical direction, γ represents a gradient adjustment factor in a time domain direction, α+β+γ=1, and in this embodiment, α takes a value of 0.25, β takes a value of 0.25, and γ takes a value of 0.5.

Step 4: calculating the airspace perception factor of each pixel point in the current frame, and marking the airspace perception factor of the pixel point with the coordinate position of (x, y) in the current frame as delta _A (x,y)，δ _A (x,y)＝G ₁ (x, y); calculating motion perception factors of each pixel point in the current frame, and recording the motion perception factors of the pixel points with the coordinate positions of (x, y) in the current frame as delta _T (x,y)，

Then calculating the space-time weighted perception factor of each pixel point in the current frame, and the space-time of the pixel point with the coordinate position of (x, y) in the current frameThe weighted perceptual factor is denoted delta (x, y), delta (x, y) =delta _A (x,y)×δ _T (x, y); calculating the average value of the space-time weighted perceptron factors of all pixel points in the current frame, and recording as S _δ ，

Calculating the dimension weight of each pixel point in the current frame, and marking the dimension weight of the pixel point with the coordinate position of (x, y) in the current frame as w _ERP (x,y)，

Wherein x is more than or equal to 0 and less than or equal to W-1, y is more than or equal to 0 and less than or equal to H-1, G ₁ (x, y) represents G ₁ Pixel value, G, of pixel point with middle coordinate position (x, y) ₁ (x, y) also represents the spatial JND threshold, G, of the pixel point with the coordinate position (x, y) in the current frame ₂ (x, y) represents G ₂ Pixel value, G, of pixel point with middle coordinate position (x, y) ₂ (x, y) also represents the weighted gradient value of the pixel point with the coordinate position (x, y) in the current frame, S _F Represents G ₂ Average value of pixel values of all pixel points in (S) _F Also representing the average value of the weighted gradient values of all pixels in the current frame, +.>

Epsilon is motion perception constant and epsilon is [1,2 ]]In this embodiment, epsilon takes a value of 1, cos () is a cosine function, pi=3.14, ….

In this embodiment, since each latitude adopts different degree of pixel sampling, different pixel redundancy exists in different dimensions in the plane, and the extremely ascending redundancy of two poles is most obvious, after a sphere is projected to the ERP projection format, the center of the sphere is usually taken as a base point, the longitude θ of the ERP projection format corresponds to the longitude of the sphere, and the latitude of the ERP projection format

Corresponding to the latitude of sphere, theta is E [ -pi, pi]，

Considering the characteristic of panoramic latitude, introducing a dimension weight parameter w of an ERP projection format _ERP (x,y)。

Step 5: the current largest coding unit (Largest Coding Unit, LCU) to be processed in the current frame is defined as the current largest coding unit.

Step 6: calculating the average value of the space-time weighted perception factors of all pixel points in the current maximum coding unit, and recording the average value as S _{δ_LCU} ，

Then calculating Lagrange coefficient adjusting factor based on space-time weighted perception factor of the current maximum coding unit, and marking as ψ _LCU ，

Calculating the quantization parameter variation of the current maximum coding unit based on the space-time weighted perception factor, and recording the quantization parameter variation as delta QP ₁ ，ΔQP ₁ ＝3log ₂ (Ψ _LCU ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein i is more than or equal to 0 and less than or equal to 63,0, j is more than or equal to 63, delta _LCU (i, j) a spatio-temporal weighted perceptual factor, K, representing a pixel point of which the intra-block coordinate position is (i, j) in the current maximum coding unit _LCU And B _LCU Are all adjusting parameters, K _LCU ∈(0,1)，B _LCU E (0, 1), K is finally determined by a number of experiments in this example _LCU And B _LCU The value is 0.5.

Step 7: calculating the average value of the dimension weights of all pixel points in the current maximum coding unit, and recording the average value as S _{wERP_LCU} ，

Calculating the quantization parameter variation based on the dimension weight of the current maximum coding unit, and recording the quantization parameter variation as delta QP ₂ ，

Wherein w is _{ERP_LCU} (i, j) means that the intra-block coordinate position in the current maximum coding unit is (i, j)The dimension weights of the pixel points of (a) and (b) are adjusting parameters, a epsilon (0, 1), b < a, and in the embodiment, the value of a is finally determined to be 0.85 and the value of b is finally determined to be 0.3 through a plurality of experiments.

Then use QP _new Updating the coding quantization parameter of the current maximum coding unit; then, an HEVC video encoder is adopted to encode the current maximum encoding unit; wherein QP is _org Original coding quantization parameter representing current maximum coding unit, QP _org Can be read from the initialization parameter list of the encoder, symbol +.>

To round down the operator.

Step 9: and taking the next largest coding unit to be processed in the current frame as the current largest coding unit, returning to the step 6, continuing to execute until all the largest coding units in the current frame are processed, and executing the step 10.

To further illustrate the performance of the inventive method, the inventive method was tested.

HEVC video encoder standard reference software HM16.14 is selected as an experimental test platform, hardware is configured as Intel (R) Core (TM) i7-10700 CPU, main frequency is 2.9GHz, a 32G 64-bit WIN10 operating system is stored, and a development tool selects VS2013. 4 panoramic video sequences are selected as standard test sequences, and the standard test sequences are respectively: two 4K sequences "AerialCity", "DrivingInCity" and two 6K sequences "BranCastle2", "Landing2". The test frame number of each standard test sequence is 100 frames, and the SearchRange is set as the search range by adopting an intra-frame coding mode64, set MaxPartitionDepth to 4, initial coding quantization parameter QP (i.e., original coding quantization parameter QP _org ) Taken as 22, 27, 32, 37 respectively.

Table 1 lists the relevant parameter information for the 4 panoramic video sequences of "alialcity", "DrivingInCity", "BranCastle2", "handle 2".

Table 1 related parameter information for panoramic video sequences

Panoramic video sequence	Video resolution
		AerialCity	3840×1920
DrivingInCity	3840×1920
		BranCastle2	6144×3072
Landing2	6144×3072

Table 2 shows the savings in coding rate when encoding the panoramic video sequence listed in table 1 using the method of the present invention, as compared to using the HM16.14 raw platform method. Definition the code rate saving rate of coding by adopting the method of the invention compared with coding by adopting the HM16.14 original platform method is delta R _PRO ，ΔR _PRO ＝(R _ORG -R _PRO )/R _ORG X 100 (%), wherein R _PRO Representing the use of the inventionCoding rate R of coding by using method _ORG The code rate of the code using the HM16.14 raw plateau method is shown.

Table 2 code rate savings comparing the encoding with the inventive method versus the encoding with the HM16.14 raw platform method

As can be seen from table 2, the coding using the method of the present invention can save the coding rate by 12.9% on average. The method can effectively reduce the coding rate of the panoramic video sequence aiming at 4 different scenes and different motion conditions, and particularly aims at the initial coding quantization parameter QP (namely the original coding quantization parameter QP _org ) In smaller cases, the coding effect is better.

Table 3 lists the rate-distortion performance of encoding the panoramic video sequence listed in table 1 using the method of the present invention. The quality of the coded video is evaluated by a classical subjective quality evaluation method, in the quality evaluation, a subjective quality evaluation method MOS ((Mean Opinion Score) is used as a quality evaluation index, and the rate distortion performance index BDBR of each panoramic video sequence under the subjective quality evaluation method MOS is calculated respectively _MOS To comprehensively evaluate the performance of the method of the invention.

Table 3 rate distortion performance for encoding using the method of the present invention

As can be seen from Table 3, the process of the present invention employs BDBR _MOS The rate distortion performance evaluation index is used for representing that the average coding rate saving value is about-7.4% under the condition of the same subjective quality under the condition of the quality evaluation index MOS. This shows that the method of the invention can save more coding rate under the same subjective perceptual quality compared with the HM16.14 original platform method. From table 3 it can be seen that for different scenes, different scenes for a panoramic video sequenceThe method can effectively save the coding rate and remarkably improve the rate distortion performance under the motion condition.

Claims

1. A video low-complexity coding method based on panoramic visual perception characteristics is characterized by comprising the following steps:

Then calculating the space-time weighted perceptron of each pixel point in the current frame, and recording the space-time weighted perceptron of the pixel point with the coordinate position of (x, y) in the current frame as delta (x, y), wherein delta (x, y) =delta _A (x,y)×δ _T (x, y); calculating the space-time weighted perception factors of all pixel points in the current frameIs denoted as S _δ The method comprises the steps of carrying out a first treatment on the surface of the Calculating the dimension weight of each pixel point in the current frame, and marking the dimension weight of the pixel point with the coordinate position of (x, y) in the current frame as w _ERP (x,y)，

Step 7: calculating the average value of the dimension weights of all pixel points in the current maximum coding unit, and recording the average value as

Wherein a and b are both adjusting parameters, a epsilon (0, 1), b < a;

Rounding down the operator;

2. The method for encoding video with low complexity based on panoramic visual perception characteristics as recited in claim 1, wherein in said step 3, G ₁ The acquisition mode of (a) is as follows: performing spatial JND threshold calculation on each pixel point in the current frame by adopting spatial just noticeable distortion model to obtain G ₁ 。

3. A panoramic-vision-based perceptual feature as defined in claim 1 or 2A low complexity video coding method, characterized in that in step 3, G ₂ The acquisition process of (1) is as follows: will G ₂ The pixel value of the pixel point with the middle coordinate position of (x, y) is marked as G ₂ (x,y)，

Indicating the direction of the horizontal direction,

indicates the vertical direction +.>

Representing the time domain direction +_>

And

calculated by a 3D-sobel operator, alpha represents a gradient adjustment factor in the horizontal direction, and beta represents the vertical directionThe gradient adjustment factor in the direction, γ represents the gradient adjustment factor in the time domain direction, α+β+γ=1. />