CN108347611B

CN108347611B - Optimization method of coding block-level Lagrange multiplier for theodolite

Info

Publication number: CN108347611B
Application number: CN201810174851.6A
Authority: CN
Inventors: 周益民; 程学理; 黄航; 冷龙韬; 王宏宇
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2018-03-02
Filing date: 2018-03-02
Publication date: 2021-02-02
Anticipated expiration: 2038-03-02
Also published as: WO2019165863A1; CN108347611A

Abstract

The invention relates to a method for video codingThe method, especially for the technical field of encoding under VR360 video longitude and latitude map format, provides an optimization method of encoding block-level Lagrange multiplier for the longitude and latitude map, which calculates the ratio rho (theta) of the area of the spherical ring zone where the encoding block is located and the area of the longitude and latitude map pixel ring zone where the encoding block is located according to the position information of the encoding block in the longitude and latitude map, and according to the rho (theta) to lambda_sysAnd optimizing calculation to obtain an optimized Lagrange multiplier lambda (rho (theta)), encoding the encoding block according to the lambda (rho (theta)), and introducing the position information of the encoding block into the correction and optimization of the block-level Lagrange multiplier in an area ratio mode, so that the overall performance of the longitude and latitude map encoding is remarkably improved, and the method is suitable for video encoding under a VR360 video longitude and latitude map format.

Description

Optimization method of coding block-level Lagrange multiplier for theodolite

Technical Field

The invention relates to a video coding method, in particular to an optimization method of a coding block-level Lagrangian multiplier for a longitude and latitude map, which is also called an EquiRectangular Projection (ERP) map for short, aiming at the technical field of coding under a VR360 video longitude and latitude map format.

Background

Virtual Reality (VR) technology is a computer simulation system that creates and experiences an immersive Virtual world. The method integrates the latest development of technologies such as computer graphics, computer simulation, artificial intelligence, induction, display, network parallel processing and the like. VR technology is typically generated with the aid of computer technology, and is often in the form of a simulated virtual display system. With the rapid development of VR technology, consumer electronics related to VR gradually come into people's lives. Currently, most VR content is oriented to the visual experience. Typically presented via a computer screen, a special display device or a stereoscopic display device. VR technology application scenes are reflected in the game industry and the movie industry, and a large number of VR game products and VR video contents are continuously put into the market in recent years. More widely, VR has a large number of applications in the fields of medicine, education, aerospace, rail transit, and the like. VR technology has become a current area of research.

In order to enhance the experience of the user, the source parameters such as the resolution, the pixel representation range, the frame rate, etc. of the VR video image are generally significantly higher than those of the normal video, mainly 8K and 4K. Compared with high-definition 1080P video, the data volume is improved by dozens of times. Therefore, how to continuously improve the compression efficiency of VR video by technical means becomes a new technical challenge.

Rate Distortion Optimization (RDO) is the most critical core optimization technique in video coding, and is supported by Rate Distortion theory. The rate distortion optimization technology can solve the generation problem of the optimized code stream of the encoder, and is ensured by a rate distortion optimization theory. The basic problems of rate-distortion theory are: the rate-distortion optimization technique for video coding achieves the minimum desired distortion for a given source distribution and distortion metric at a particular code rate.

In a particular application, rate-distortion optimization translates the problem into selecting a set of parameters in a given set of encoding parameters, so that video can be encoded at the least bit rate under defined distortion conditions. The theoretical optimal coding parameters can be obtained by traversing all the optional coding parameter sets by adopting an exhaustive method, but the time complexity of the operation of the exhaustive method is extremely high, the time required by coding is extremely long, and the method can not be applied to actual coding. Meanwhile, as the video coding is performed by taking the coding units as units and the parameters of each coding unit are independent of each other, the optimal coding parameters of each coding unit can be considered to belong to the optimal coding parameter set of the whole coding process, i.e. the global optimal problem is decomposed into a set of a plurality of local optimal problems.

The rate distortion optimization process introduces a lagrange multiplier lambda (lambda) to transform the unconstrained optimization problem into a constrained optimization problem. Since the Lagrange optimization method is introduced to solve the rate distortion optimization problem, the video coding rate distortion optimization technology has practical application value. Due to the lower complexity and higher performance, the method is widely popularized. Currently, rate-distortion optimization techniques based on lagrangian multipliers are now being applied to mainstream h.264/AVC and HEVC/h.265 encoders. In general, the value of λ is determined by means of a high bit hypothesis derivation formula, and in practical use, an empirical value is added for correction according to different encoder characteristics. The quality of the selected lambda value is directly related to the coding performance.

The quality of video coding is evaluated and generally described by using BD-RATE and BD-PSNR, and the description method can be found in the following documents: [ Gisle Bjontegaard, conservation of Average PSNR Differences between RD curves, ITU-T SC16/Q6,13th VCEG Meeting, Austin, Texas, USA, April 2001, doc. VCEG-M33 ]. The calculation processes of the two are similar, and the integral difference operation is carried out on the basis of carrying out high-order interpolation connection by acquiring the objective quality PSNR and the coding Bit Rate (Bit-Rate) of the test point. In general, the bit rate is statistically normalized and unambiguous, but common two-dimensional video and VR360 theodolite video are not much the same in terms of objective quality PSNR.

Common two-dimensional video coding generally uses Peak Signal to Noise Ratio (PSNR) as an objective quality evaluation index. And because the VR360 video sequence is usually stored in a storage medium in the form of a longitude and latitude map, the VR360 video sequence is projected and mapped into a spherical surface during playing, thereby presenting the effect of 360-degree stereo surrounding. The pixel compression effect is inevitably generated in the process of mapping from the longitude and latitude map to the spherical surface. That is, pixels at the same latitude except the equator are compressed when they are mapped to the spherical surface, and the higher the latitude, the more intense the compression. Under extreme conditions, a row of pixel points of south and north poles in the longitude and latitude map are compressed into a pixel point of two poles of a sphere. Therefore, in view of the particularity of the 360VR video longitude and latitude representation format, the 360VR video longitude and latitude representation format is not directly displayed in the playing process, but is synthesized on the spherical surface and then output and displayed, so that the objective quality of the three-dimensional spherical surface cannot be accurately described by using the two-dimensional PSNR.

Therefore, the experts in the field propose improved objective evaluation models such as spherical Peak Signal to Noise Ratio (SPSNR), Weighted spherical Peak Signal to Noise Ratio (wspspsnr), krarsted Parabolic Projection Peak Signal to Noise Ratio (CPP-PSNR), etc. as the 360VR video objective evaluation indexes which are relatively common at present, wherein the SPSNR is subdivided into a spherical Peak Signal to Noise Ratio with Interpolation (SPSNR-I) and a Nearest Neighbor spherical Peak Signal to Noise Ratio (SPSNR-NN).

Therefore, it is worth noting that the existing video encoders are designed for general two-dimensional images, and the source property of VR360 longitude and latitude map format is not considered specially, which may cause the performance loss of SPSNR or WSPSNR to be serious even though the PSNR performance is kept good.

Disclosure of Invention

The invention provides an optimization method of a coding block-level Lagrange multiplier for a longitude and latitude map, which can optimize the Lagrange multiplier and is beneficial to improving the overall performance of the longitude and latitude map coding.

The invention discloses an optimization method of a coding block-level Lagrange multiplier for a longitude and latitude map, which comprises the following steps:

A. acquiring 1 frame image of a video sequence;

B. sequentially obtaining 1 coding block in a current frame;

C. b, calculating the ratio rho (theta) of the area of the annular zone of the spherical surface where the coding block is positioned to the area of the pixel annular zone of the longitude and latitude image where the coding block is positioned according to the position information of the coding block in the longitude and latitude image obtained in the step B, wherein the theta is a calculated value of the zenith angle of the coding block in the spherical surface;

D. according to rho (theta) to lambda_sysPerforming optimization calculation to obtain optimized Lagrange multiplier lambda (rho (theta)), and the lambda is_sysB, obtaining a Lagrange multiplier subsystem value of the current frame in the step A;

E. coding the coding block according to the lambda (rho (theta)) obtained in the step D;

F. judging whether all the coding blocks in the current frame are coded, if so, entering a step G, otherwise, entering a step B;

G. and C, judging whether the full sequence coding is finished after the current frame coding is finished, if so, finishing, and otherwise, continuing to code in the step A.

Further, λ (ρ (θ)) ═ λ in step D_sys·(ξ+ρ(θ))^γWherein λ is_sysIs a Greenian multiplier subsystem value, theta is a calculated value of the zenith angle of the current coding block in the spherical surface, xi is a minimum value for preventing the zero division operation,

is a model parameter related to the image content and β is a model parameter related to the source characteristics.

Specifically, in step C, the calculated zenith angle of the coding block in the spherical surface is θ, wherein,

area S of spherical annulus_spher(θ) is represented by the formula: s_spher(θ)＝2π·r·sinθ·h_ringIs obtained by calculation, wherein h_ringIs the height of the spherical annulus, where r is the radius of the sphere;

area S of pixel ring of longitude and latitude image_erp(θ) is represented by the formula:

and (6) calculating.

Further, the height h of the ring belt_ringR · sin d θ, the coding block,

area S of spherical annulus_spher(θ) is represented by the formula: s_spher(θ)＝2π·r²Sin theta sin d theta is obtained by calculation,

the calculation results are that,

area ratio in step C

The above d θ is an angle difference of a zenith angle formed by the upper edge and the lower edge of the endless belt.

Preferably, the step of obtaining the zenith angle calculated value θ in the step C includes:

C1. the coordinate position on the longitude and latitude map where the current coding block is located is expressed as: the subscript of the first row of the current coding block in the whole longitude and latitude map is k, the pixel height of the coding block is N, and the total pixel height of the longitude and latitude map is h;

C2. according to the data obtained in the step C1, the zenith angle corresponding to the pixel with the row index i in the current coding block is θ (i), and the vertex angle is θ (i)

By the formula

Calculating to obtain the arithmetic average value of the zenith angles theta (i) of the pixels of each row of the current coding block

And averaging the arithmetic mean

As a zenith angle calculation value θ in ρ (θ), obtained

Specifically, in step a, the position of the image in the sequence is determined, and the frame type, the frame attribute, and the position and the level in the picture group in which the image is located are determined; and according to the obtained frame attribute of the current frame, calculating by an encoder to obtain a Lagrange multiplier subsystem value lambda of the frame level_sys。

The invention has the beneficial effects that:

currently, the objective quality evaluation for 360VR video is still based on the second order distance (MSE) of the conventional distorted pixel Error. The distortion calculation process on the VR360 longitude and latitude map is not like point-to-point MSE statistics of a 2D image, but is put on a 3D spherical surface to perform average calculation in the meaning of effectively representing area equivalence. Obviously, rate distortion optimization on VR360 theodolite should be modified accordingly to fit the distortion calculation rule of the sphere. Because the VR360 longitude and latitude map distortion calculation is the distortion accumulation of the same area on the sphere, it is necessary to explain the ratio of pixels CTUs (coded blocks) at different latitudes in the final SPSNR (sphere peak signal-to-noise ratio) calculation according to the analysis of the longitude and latitude map to sphere mapping process.

As is known in the art, the longitude mapping from the sphere to the VR360 longitude map is scaled, and the latitude mapping is a direct projection process from the sphere to the cylinder. Then the ratio of the area of the spherical annulus to the area of the pixels of the VR360 longitude and latitude lines relates only to the latitude and not to the longitude.

While lagrange multipliers are usually expressed as functions that are closely related to the quantization step size. Various encoding platforms have different parameter correction factors for lagrange multipliers to be close to the R-D curves thereof to achieve the highest possible encoding gain.

The invention creatively constructs the weight by the ratio of the area of the spherical ring zone where the coding block is positioned to the area of the longitude and latitude image pixel ring zone where the coding block is positioned, introduces the position information of the coding block in the form of the area ratio, then uses the weight containing the position information to correct and optimize the coding block-level Lagrange multiplier, and finally uses a new quantization parameter to code, thereby obviously improving the overall performance of the longitude and latitude image coding.

Drawings

Fig. 1 is a schematic diagram of a mapping relationship between a VR360 video longitude and latitude map and spherical pixels.

Fig. 2 is a schematic view of a spherical projection of a VR360 longitude and latitude map.

Fig. 3 is a front view of fig. 2.

Fig. 4 is a right side view of fig. 2.

Fig. 5 is a rear view of fig. 2.

Fig. 6 is a left side view of fig. 2.

Fig. 7 is a top view of fig. 2.

Fig. 8 is a bottom view of fig. 2.

Fig. 9 is a flowchart of the optimization method for the encoding block-level lagrangian multiplier for the longitude and latitude map according to the present invention.

Fig. 10 is a flow chart of the optimization of the values of the lagrangian multiplier subsystem of fig. 9.

Detailed Description

As shown in fig. 1 to 8, the method for optimizing the encoding block-level lagrangian multiplier for the longitude and latitude map according to the present invention includes the following steps:

A. acquiring 1 frame image of a video sequence;

B. sequentially obtaining 1 coding block in a current frame;

Generally, the lagrange multiplier is generally trained by a large amount of experimental data, is calculated by an empirical formula, and is expressed as a function closely related to the quantization step size. Various encoding platforms have different parameter correction factors for the Lagrangian multiplier to be close to the R-D curve of the frames so as to obtain the highest possible encoding gain, and therefore the Lagrangian multiplier system definition value lambda of the frames can be obtained_sys。

Furthermore, the invention constructs the weight by the ratio lambda (rho (theta)) of the area of the spherical ring zone where the coding block is located and the area of the pixel ring zone of the longitude and latitude map where the coding block is located, introduces the position information of the coding block in the form of the area ratio, then uses the weight containing the position information to correct and optimize the coding block-level Lagrange multiplier, and finally uses a new quantization parameter to code, thereby obviously improving the overall performance of the longitude and latitude map coding.

The formula for calculating λ (ρ (θ)) in step D may be constructed differently depending on the area ratio weight structure, the optimization purpose, and the like. In this embodiment, specifically, in step D

λ(ρ(θ))＝λ_sys·(ξ+ρ(θ))^γ (1)

Wherein λ is_sysIs a Greenian multiplier subsystem value, theta is a calculated value of the zenith angle of the current coding block in the spherical surface, xi is a minimum value for preventing the zero division operation,

The derivation process of the above equation (1) is as follows:

and (4) calculating the SPSNR (spherical peak signal-to-noise ratio), and sampling according to the mapping pixel density. Specifically, the number of sampling points is large at a position with a low latitude, that is, a high pixel density; the number of sampling points is small at a position with high latitude, namely low pixel density. Based on the principle that the bit number allocated to the VR360 longitude and latitude map is consistent with the area proportion of the corresponding spherical zone, the bit number consumed by coding is reduced while the subjective quality and the objective quality are maintained.

Considering that the area ratio is a sine function of a zenith angle calculated value theta, and the distribution of the longitude and latitude map bit rate is expected to meet the requirement of spherical display, a ratio model is established through a formula (2):

wherein the content of the first and second substances,

and R (θ) is the coding bit rate calculated as the θ zone at the equator and zenith angles, respectively. Based on the model given in equation (2), the spherical surface λ can be derived from the following steps:

the R-lambda model of the relationship between bit rate and Lagrange multiplier is

R＝α·λ^β (3)

Where α and β are model parameters related to source characteristics.

Substituting equation (3) into equation (2) yields equation (4)

Wherein, λ (θ) and

respectively, the values of a sphere lambda with the calculated value of theta at the equator and the zenith angle. From equation (4), the proportional equation of the spherical surface λ can be obtained by sorting, as shown in equation (5):

wherein the content of the first and second substances,

are model parameters that are related to the image content.

Thus, according to the derivation process described above, equation (1) is established.

According to the geometric calculation, in step C, the vertex angle calculation value of the coding block in the spherical surface is theta, wherein,

and (6) calculating.

For further convenience of calculation, the height h of the annulus_ringR · sin d θ, the coding block,

the calculation results are that,

area ratio in step C

Of course, according to the difference between the area ratio calculation process and the value taking mode, ρ (θ) is not necessarily equal to sin θ, but does not affect the implementation of the present invention, and only affects the calculation difficulty of the implementation process of the present invention.

The above θ is called zenith angle calculation, not zenith angle, because:

as shown in fig. 1, VR360 is a mapping relationship between a graticule Projection (ERP) and spherical pixels, i.e. a mapping relationship between a cylindrical Projection (ERP) and a graticule map, and the mapping relationship is an angle in longitude

And expressing the latitude by using a zenith angle theta, wherein the latitude included angle corresponding to the upper boundary and the lower boundary of the spherical zone pixel is d theta. The d theta is a determined value on the premise of determining the spherical ring bandwidth, is influenced by the spherical ring bandwidth, and is not unique in each row of pixels contained in the annular band, so that theta is called a zenith angle calculated value for convenient calculation, the value of theta can be an extreme value, a specific value, an average value of the maximum and minimum values, an arithmetic average value and the like, and the value can be specifically taken according to the division rule of the coding block, the optimization requirement and the like.

In this embodiment, the zenith angle calculated value θ is taken as an arithmetic average value, and therefore, the step of obtaining the zenith angle calculated value θ in the step C includes:

By the formula

And averaging the arithmetic mean

As a zenith angle calculation value θ in ρ (θ), obtained

Therefore, the formula (7) is substituted into the formula (1), and the final optimized calculation formula of the embodiment is obtained as

As shown in fig. 9, a specific block diagram of the optimization method of the encoding block-level lagrangian multiplier for a longitude and latitude map in this embodiment is shown, and fig. 10 is a specific block diagram of the calculation of λ (ρ (θ)) in this embodiment, and the whole optimization process steps are as follows:

1) acquiring 1 frame image of a video sequence;

2) determining the position of the obtained current frame in the sequence, and determining the frame type, the frame attribute, and the position and the layer in the picture group;

3) according to the obtained frame attribute of the current frame, calculating by an encoder to obtain a Lagrange multiplier subsystem value lambda of the frame level_sys；

4) Sequentially obtaining 1 coding block in a current frame, and determining the position information of the coding block in a longitude and latitude chart, wherein the position information is expressed as: the subscript of the first row of the current coding block in the whole longitude and latitude map is k, the pixel height of the coding block is N, and the total pixel height of the longitude and latitude map is h;

5) according to the position information of the coding block in the longitude and latitude map, the zenith angle corresponding to the pixel with the line subscript i in the current coding block is theta (i), and the zenith angle is theta (i)

By the formula

6) According to rho (theta) to lambda_sysPerforming optimization calculation to obtain an optimized Lagrange multiplier lambda (rho (theta)), wherein the formula is as follows:

7) encoding the encoded block according to the calculated λ (ρ (θ));

8) judging whether all the coding blocks in the current frame are coded, if so, entering a step 9), and otherwise, entering a step 4);

9) and judging whether the coding of the full sequence is finished after the coding of the current frame is finished, if so, finishing, and otherwise, turning to the step 1) to continue coding.

Table 1 shows the test sequence of the joint virtual reality standard group (IEEE 1857.9VRU) of the international institute of electrical and electronics engineers 1857.9 topic group, and table 2 and table 3 show the performance gain of the present invention compared to the existing unoptimized lagrangian multiplier test under two different test configuration conditions, respectively. ξ above is a minimum value that prevents the divide-by-zero operation, so it is as small as possible; gamma and beta are model parameters related to image content and related to information source characteristics, and in the embodiment, xi takes an empirical value of 0.015 and takes a value of 0.20 according to test data gamma.

Sequence Name	Resolution	Frame Count	FPS	Bit depth
					Fengjing1	4096×2048	300	30	8
Fengjing3	4096×2048	300	30	8
					Hangpai1	4096×2048	300	30	8
Hangpai2	4096×2048	300	30	8
					Hangpai3	4096×2048	300	30	8
Xinwen1	4096×2048	300	30	8
					Xinwen2	4096×2048	300	30	8

TABLE 1 theodolite video test sequence

The test set is a universal 7-piece 4K VR360 longitude and latitude map video test sequence, and the test adopts full sequence test. The performance index is performed by the BD-RATE performance statistics common in the art, with negative values indicating the saved bit RATE ratio at equivalent objective quality and positive values indicating the wasted bit RATE ratio at equivalent objective quality. Negative values of the BD-RATE generally indicate the degree of gain of the algorithm. The test baseline was based on AVS2 (chinese second generation audio video standard) universal test conditions. The Test results of the four QP points 27, 32, 38, and 45 in the default configuration of the system are used as comparison basis (Anchor), the four Test results at the same code RATE point are used as Test basis (Test) in the method of this embodiment, and BD-RATE is respectively calculated and counted under two conditions of traditional PSNR (peak signal to noise ratio) and longitude and latitude map SPSNR (spherical peak signal to noise ratio). The test is divided into two typical configuration structures of Low Delay (LD) and Random Access (RA).

Table 2, experimental results of the optimization method of the present invention in a Low Delay (LD) configuration.

In table 2, the gain degree of objective quality under two different evaluation modes of traditional PSNR and SPSNR under the configuration of Low Delay (LD) is counted. We can clearly see that the gain in SPSNR reaches 2.8%, higher than the 0.4% gain in PSNR-Y. Particularly, on the sequence Fengjing1, the gain of PSNR-Y reaches 3.3 percent, and the gain of SPSNR reaches 5.8 percent; the gain of SPSNR is even more 6.4% over the Hangpai1 sequence.

TABLE 3 Experimental results of the optimization method of the present invention in Random Access (RA) configuration

In table 3, the gain degree of the objective quality under two different evaluation modes, i.e., PSNR and SPSNR, in the Random Access (RA) configuration is counted. We can clearly see that the gain in SPSNR reaches 1.5%. In particular, gains of three sequences, Hangpai1, Hangpai2 and Hangpai3, were as high as 2.8%, 2.1% and 2.3%, respectively, and an average gain of 2.4% was obtained.

The test results in tables 2 and 3 clearly show that the optimization method of the present invention can significantly optimize the coding block level lagrangian multiplier for VR360 video image coding, and significantly improve the video coding efficiency in both configurations.

Claims

1. The optimization method of the encoding block-level Lagrange multiplier for the longitude and latitude map comprises the following steps:

A. acquiring 1 frame image of a video sequence;

B. sequentially obtaining 1 coding block in a current frame;

D. according to rho (theta) to lambda_sysPerforming optimization calculation to obtain an optimized Lagrange multiplier lambda (rho (theta)):

λ(ρ(θ))＝λ_sys·(ξ+ρ(θ))^γ；

wherein λ is_sysThe Lagrange multiplier subsystem value of the current frame obtained in the step A is obtained, theta is the calculated value of the zenith angle of the current coding block in the spherical surface, xi is the minimum value for preventing the zero division operation,

is a model parameter related to the image content, β is a model parameter related to the source characteristics;

2. The method for optimizing an encoded block-level lagrange multiplier for theodolites as claimed in claim 1, characterized by: in step C, the vertex angle calculation value of the coding block in the spherical surface is theta, wherein,

and (6) calculating.

3. The method for optimizing an encoded block-level lagrange multiplier for theodolites as claimed in claim 2, characterized by:

height h of the endless belt_ringWhen r.sind theta, the coding block,

area S of spherical annulus_spher(θ) is represented by the formula: s_spher(θ)＝2π·r²Sin theta and sind theta are obtained by calculation,

the calculation results are that,

area ratio in step C

4. The method for optimizing an encoded block-level lagrange multiplier for theodolites as claimed in claim 3, characterized by: the step of obtaining the zenith angle calculated value θ in the step C includes:

By the formula

And averaging the arithmetic mean

As a zenith angle calculation value θ in ρ (θ), obtained

5. The method for optimizing an encoded block-level lagrange multiplier for theodolites as claimed in claim 3, characterized by:

in step A, determining the position of the image in the sequence, and determining the frame type, the frame attribute, the position and the layer of the image group; and according to the obtained frame attribute of the current frame, calculating by an encoder to obtain a Lagrange multiplier subsystem value lambda of the frame level_sys。