WO2024082579A1 - 一种考虑时域失真传播的零时延全景视频码率控制方法 - Google Patents

一种考虑时域失真传播的零时延全景视频码率控制方法 Download PDF

Info

Publication number
WO2024082579A1
WO2024082579A1 PCT/CN2023/087513 CN2023087513W WO2024082579A1 WO 2024082579 A1 WO2024082579 A1 WO 2024082579A1 CN 2023087513 W CN2023087513 W CN 2023087513W WO 2024082579 A1 WO2024082579 A1 WO 2024082579A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
ctu
coding
distortion
panoramic video
Prior art date
Application number
PCT/CN2023/087513
Other languages
English (en)
French (fr)
Inventor
朱策
杨栩
罗雷
郭红伟
段昶
杜金
侯晶晶
Original Assignee
电子科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 电子科技大学 filed Critical 电子科技大学
Publication of WO2024082579A1 publication Critical patent/WO2024082579A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding

Definitions

  • the present invention belongs to the technical field of image processing, and in particular relates to a zero-delay panoramic video bit rate control method considering the propagation of time domain distortion.
  • Panoramic video is a video that is shot in 360 degrees using a camera array or a 3D camera. When watching the video, users can adjust the video up, down, left, and right at will. Panoramic video has the characteristics of high frame rate, high resolution (at least 4K), and wide field of view. Storing and transmitting panoramic video consumes a lot of resources. Bit rate control is to optimize the allocation of bits layer by layer according to the target bit rate, and dynamically adjust the encoding parameters of the encoder according to a certain encoding strategy, so that the encoder output bit stream meets the bandwidth limitation of the transmission channel and the storage space requirements of the storage device, and makes the actual output bit rate consistent with the target bit rate as much as possible. It is widely used in real-time video communication systems.
  • Traditional panoramic video coding has the following problems. First, it will produce pixel redundancy during the projection process, which affects the coding performance; at the same time, the traditional encoder does not use the temporal correlation between the inter-frame coding units for encoding, and the coding performance still has a lot of room for improvement; the main reason why the traditional encoder does not consider the influencing factor of the inter-frame temporal distortion is that its computational complexity is too high, which is not conducive to real-time communication.
  • the present invention proposes a zero-delay panoramic video bit rate control method that considers the propagation of time domain distortion.
  • the present invention first adjusts the encoding parameters according to the ratio of the spherical area and the projection area of the panoramic video to reduce the encoding performance loss caused by pixel redundancy; at the same time, it determines whether the image scene has changed by calculating the change in the mean and variance values of the encoded frame pixels; in the case where no scene change occurs, according to the continuity of the video image, the reconstruction error and motion compensation prediction error of the previous encoded frame in the time domain are used to calculate the time domain distortion influence factor, and the factor is used to encode the current frame.
  • the present invention does not need to pre-store unencoded frames, and uses the encoded frame information to calculate the time domain distortion influence factor, the calculation amount is small, the encoding complexity is extremely low, and the rate distortion performance is greatly improved, so the present invention is easy to promote.
  • the H.266/VVC rate control method mainly includes the following steps:
  • VTM uses a hierarchical bit allocation structure, and allocates bits at the GOP layer, frame layer, and CTU layer. The lower you go, the more accurate the bit rate allocation is, and the easier it is to correct errors. VTM evenly distributes the target bit rate at each level. In order to gradually eliminate the bit rate control error generated by the encoded unit, it uses a sliding window to smooth the bit rate fluctuations to prevent large bit rate fluctuations. For situations that affect video quality, layered bit rate allocation is achieved through the following formula:
  • R tar is the target bit rate
  • FR is the frame rate
  • R pic is the average number of bits per frame in the entire sequence
  • T gop , T pic , and T ctu are the target number of bits allocated to the GOP to be encoded
  • SW is the sliding window for smooth bit allocation
  • N coded , R coded , N gop , ⁇ pic ⁇ ctu They are the number of encoded frames, the bits consumed in the encoded video sequence, the number of frames in a GOP, the number of encoded frames in a GOP, the weight of encoded frames, the weight of coding units, the sum of the weights of uncoded frames, and the sum of the weights of uncoded CTUs in a frame.
  • Step 2 Calculate the Lagrange multiplier according to the R- ⁇ model
  • the Lagrange multiplier ⁇ i is the slope of the rate-distortion curve, Di and Ri are the distortion and coding bits of the i-th coding unit, respectively.
  • the relationship between the Lagrange multiplier ⁇ i and the code rate Ri is
  • Step 3 Update encoding parameters
  • the parameters in the above formula are automatically updated after encoding a frame or a coding tree unit CTU.
  • the parameter update formula is
  • Di and Ri can be determined after encoding a frame or a CTU, and the frame-level ⁇ i is replaced by the Lagrangian multiplier of the nearest neighbor coded frame at the same level in the same GOP, and the CTU-level ⁇ i is replaced by the Lagrangian multiplier of the nearest neighbor coded frame at the same level as the coded frame, and then used to update the parameters c i and k i .
  • the R-Lambda rate control model in VVC can achieve high control accuracy, but it does not consider the pixel redundancy problem caused by the projection of panoramic video. At the same time, it does not use the temporal correlation between frames for rate-distortion optimization coding, and the coding quality has a lot of room for improvement.
  • the present invention provides a zero-delay panoramic video rate control method that takes into account the propagation of time domain distortion.
  • the method of the present invention is mainly to optimize the target bit allocation and global rate distortion optimization coding, including rate control at the coding tree unit (CTU) level and time domain global rate distortion optimization.
  • CTU-level rate control mainly includes optimizing target bit allocation and updating rate control parameters;
  • time domain global rate distortion optimization mainly uses the reconstruction error and motion compensation prediction error information of the encoded frame to estimate the time domain dependency of each CTU in the current encoded frame and adjust the rate distortion optimization parameters of the CTU accordingly.
  • the characteristics of oversampling and increasing pixel redundancy will be generated at the two poles, and the encoding parameters are adjusted according to the ratio of the projection area change.
  • a zero-delay panoramic video bit rate control method considering time domain distortion propagation includes the following steps:
  • step S5 calculating the global Lagrange multiplier according to the iterative algorithm and optimizing the bit allocation of each CTU in the frame, and proceeding to step S7;
  • step S9 determining whether the difference between the pixel variance value of the current frame and the previous frame is greater than a threshold value, if so, proceeding to step S10, otherwise proceeding to step S11;
  • step S10 adjusting the Lagrange multiplier of the current CTU according to the area stretching ratio obtained in step S2, and proceeding to step S13;
  • step S14 determine whether it is the last CTU, if so, then encode the current frame and go to step S15, otherwise, return to step S8;
  • the area of the stretched region is:
  • the area stretching ratio is:
  • step S3 is specifically as follows:
  • P and D represent the mean and variance respectively
  • Pi ,j represents the pixel value of each pixel
  • the resolution is n*m.
  • step S5 is specifically as follows:
  • R and r are the target bits at the frame level and CTU level respectively
  • ⁇ g is the global Lagrange multiplier
  • ⁇ i c i ⁇ k i
  • ⁇ i the slope of the rate-distortion curve
  • D i and R i are the distortion and coding bits of the i-th coding unit, respectively
  • M is the number of CTUs.
  • the Lagrange multiplier adjustment method in step 10 is:
  • ⁇ P is the frame-level Lagrangian multiplier
  • ⁇ n is the adjusted Lagrangian multiplier
  • the Lagrange multiplier adjustment method in step 12 is:
  • k is the time domain distortion influencing factor.
  • the present invention can reduce the encoding complexity and make the bit rate control error very small without caching subsequent frames, can effectively shorten the encoding time and improve the video encoding quality under the condition of a given bandwidth.
  • FIG. 1 is a schematic flow chart of the method of the present invention.
  • the encoder selects a set of optimal coding parameters and the coding mode with the lowest rate distortion cost for the input video through rate distortion optimization (RDO) technology, with the goal of reducing coding distortion as much as possible under certain bit rate constraints or reducing coding bits as much as possible under certain coding distortion constraints.
  • RDO rate distortion optimization
  • the zero-delay method proposed in the present invention means that the encoder is not allowed to obtain information about subsequent frames in advance, that is, after obtaining the frame to be encoded, the analysis data must be encoded immediately without caching, and the specific implementation method is shown in Figure 1.
  • the initialization method is consistent with the VVC bit rate control method.
  • the target number of bits for each coding level is:
  • R tar is the target bit rate
  • FR is the frame rate
  • R pic is the average number of bits per frame in the entire sequence
  • T gop , T pic , and T ctu are the target number of bits allocated to the GOP to be encoded
  • SW is the sliding window for smooth bit allocation
  • N coded , R coded , N gop , ⁇ pic ⁇ ctu They are the number of encoded frames, the bits consumed in the encoded video sequence, the number of frames in a GOP, the number of encoded frames in a GOP, the weight of encoded frames, the weight of coding units, the sum of the weights of uncoded frames, and the sum of the weights of uncoded CTUs in a frame.
  • the method of the present invention also needs to calculate the latitude value corresponding to the CTU row and calculate the area stretching ratio of the sphere and its projection plane.
  • the specific method is:
  • the ratio decreases with increasing latitude.
  • the area ratio is 1, and there is no stretching.
  • the entropy of the equatorial region does not change before and after the projection, the closer to the pole, the greater the entropy change.
  • the optimal Lagrangian multiplier in the bit allocation formula is approximated by an iterative algorithm, and the bits are allocated using the optimal Lagrangian multiplier, specifically:
  • R and r are the target bits at the frame level and CTU level respectively
  • ⁇ g is the global Lagrange multiplier
  • ⁇ i c i ⁇ k i
  • M CTUs there are M CTUs in one frame.
  • the Lagrange multiplier can be adjusted by selecting the area stretch ratio.
  • the Lagrange multiplier is adjusted based on the time domain distortion influence factor and the area stretch ratio. Specifically, the new Lagrange multiplier ⁇ n is obtained by dividing the original Lagrange multiplier by the adjustment weight, thereby achieving time domain rate distortion optimization.
  • ⁇ P is the frame-level Lagrangian multiplier.
  • ⁇ P the frame-level Lagrangian multiplier.
  • coding tree units with weak temporal dependence will be relatively poorly coded.
  • the condition for selecting the Lagrange multiplier adjustment method is to determine whether the difference between the current frame pixel variance value and the previous frame is greater than the threshold value of 50. Because the premise of temporal distortion propagation is the continuity of the video image, if the image scene switches, the propagation chain will be disconnected, so the change in the image variance value is used to determine whether the scene switches.
  • P and D represent the mean and variance respectively
  • Pi ,j represents the pixel value of each pixel
  • the resolution is n*m.
  • the Lagrange multipliers and QP are calculated as:
  • T bpp T pic /N pixels
  • N pixels refers to the number of pixels in a picture. Then the frame-level Lagrangian parameter ⁇ of the current image is calculated.
  • MAD is the mean absolute error of the pixel
  • BPP is the average target bit of the pixel.
  • the initial values of the ⁇ and ⁇ parameters are empirically set to 9.9416 and -1.367.
  • the bit rate control parameters and the actual number of bits used for the current frame and each CTU are updated, and the reconstruction error and motion compensation prediction error of each CTU saved in the current frame are obtained to calculate the temporal distortion impact factor of each CTU in the current frame.
  • the ratio of the reconstruction distortion of the coding tree unit and the motion compensation prediction error distortion is used to measure the temporal dependency in video encoding, that is,
  • the distortion D cur of the coding block is related to the motion compensation prediction error of the coding block
  • For the current coding block its distortion cannot be obtained before the actual coding.
  • the coding tree unit information of the corresponding position of the previous frame to approximately replace the time domain dependency of the current block. The reason for this is that the image characteristics of two adjacent frames are generally similar.
  • the difference in the pixel mean and variance between the current frame and the previous frame is used to determine whether the image has scene switching, thereby deciding whether to use the distortion influence factor.
  • the parameter update formula is:
  • the distortion, bit rate Di , and Ri can be determined after encoding a frame or a CTU, and the frame-level ⁇ i uses the same
  • the Lagrangian multiplier of the nearest neighbor coded frame at the same level in the GOP is used to replace ⁇ i at the CTU level.
  • the Lagrangian multiplier of the nearest neighbor coded frame at the same level as the coded frame is used to update the parameters c i and k i .
  • the panoramic video rate control algorithm is integrated into the H.266/VVC reference software VTM14.0 based on 360lib.
  • the encoder is configured as Lowdelay-P.
  • the experimental conditions comply with the standard panoramic video sequence recommended by the international coding standards organization JCT-VC as the test video, including all 14 video sequences in three categories: 8K, 6K and 4K.
  • the comparison indicators are BD-Rate, rate control accuracy and encoding time. When the BD-Rate is a negative value, it means that the rate is reduced at the same reconstruction quality, and the technology has a gain. When the BD-Rate is a positive value, it means that the rate is increased at the same reconstruction quality, and the technology has a loss.
  • the encoding quality of the VVC encoder at a given QP and the rate control algorithm of the VVC are used as the comparison benchmarks.
  • Table 1 shows the performance of this scheme and the rate control methods of VTM14.0 and 360lib on BD-Rate. Compared with the rate control method of vtm, the rate is greatly saved, and this method has a rate saving of 8.7%.
  • Table 2 shows the comparison between this solution and the rate-distortion optimization algorithm of VTM14.0 in terms of BD-rate, with an average bit rate saving of about 4.3%.
  • Table 3 shows the size of the encoding rate control error of this scheme and the VTM14.0 and 360lib rate control schemes.
  • the bit error rate is calculated by dividing the absolute error between the actual encoding output bit rate and the set target bit rate by the target bit rate. The larger the deviation, the higher the bit error rate and the lower the accuracy of the rate control. This scheme ensures an extremely low rate control error of 0.0891%.
  • Table 4 shows the performance of this scheme in terms of encoding time.
  • the total encoding time for all sequences is reduced from 3972 hours to 3963 hours.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本发明属于图像处理技术领域,具体涉及一种考虑时域失真传播的零时延全景视频码率控制方法。本发明的方法主要为优化目标比特分配和全局率失真优化编码,包括编码树单元(CTU)级的码率控制以及时域全局率失真优化。CTU级码率控制主要包括优化目标比特分配以及码率控制参数更新;时域全局率失真优化主要利用已编码帧的重建误差和运动补偿预测误差信息来估计当前编码帧内每个CTU的时域依赖性并据此调整CTU的率失真优化参数,同时根据全景球面图形投影到2D平面(ERP投影)过程中,会在两极产生过采样增加像素冗余的特征,根据投影面积变化比值来调整编码参数。本发明能有效缩短编码时间并且在给定带宽的条件下提升视频编码质量。

Description

一种考虑时域失真传播的零时延全景视频码率控制方法 技术领域
本发明属于图像处理技术领域,具体涉及一种考虑时域失真传播的零时延全景视频码率控制方法。
背景技术
全景视频是一种使用相机阵列或3D摄像机进行全方位360度进行拍摄的视频,用户在观看视频的时候,可以随意调节视频上下左右方位进行观看,全景视频具有高帧率、高分辨率(至少4K)、宽视域的特征,存储和传输全景视频非常消耗资源。码率控制是根据目标码率来逐层优化分配比特,并按照一定的编码策略动态调整编码器的编码参数,使编码器输出码流满足传输信道带宽限制及存储设备存储空间的需求,尽可能使得实际输出码率与目标码率一致。其在实时视频通信系统中广泛应用。传统全景视频编码存在以下问题,首先是其在投影过程中会产生像素冗余而影响编码性能;同时传统编码器没有利用帧间编码单元之间的时域相关性进行编码,编码性能还有极大地提升空间;传统编码器没有考虑帧间时域失真影响因子主要原因是其计算复杂度太高,不利于实时通信。本发明针对以上问题提出了一种考虑时域失真传播的零时延全景视频码率控制方法,本发明首先根据全景视频球面面积和投影面积之比来调整编码参数以减小像素冗余带来的编码性能损失;同时通过计算编码帧像素均值和方差值的变化量来确定图像场景是否发生变换;在没有发生场景变换的情况下,根据视频图像具有连续性的特点,利用时域上前一编码帧的重建误差和运动补偿预测误差计算时域失真影响因子,并利用该因子来编码当前帧。本发明不用预存未编码帧,而且利用已编码帧信息计算时域失真影响因子,计算量小,编码复杂度极低,同时率失真性能有非常大的提升,所以本发明易于推广。
目前的码率控制方法包括:
H.266/VVC码率控制方法,主要步骤为:
步骤1:比特分配
VTM采用分层比特分配结构,在GOP层、帧层、CTU层分别进行比特分配,越往下,码率分配越准确,修正误差也越好实现。VTM在各个层次上平均分配目标码率,为了逐步消除已编码单元产生的码率控制误差,通过滑动窗口来平滑码率波动,防止出现码率波动较大 而影响视频质量的情况,通过下列式子实现分层码率分配,即



其中,Rtar为目标码率,FR为帧率,Rpic为整个序列平均每帧的比特数,Tgop、Tpic、Tctu分别为分配给待编码GOP、编码帧、编码单元的目标比特数,SW是平滑比特分配的滑动窗,Ncoded、Rcoded、Ngopωpic、ωctu分别为已编码帧数、编码视频序列已消耗比特、一个GOP帧数、GOP内已编码帧数、编码帧所占权重、编码单元所占权重、未编码帧所占权重之和、帧内未编码CTU权重之和。
步骤2:根据R-λ模型计算拉格朗日乘子
拉格朗日乘子λi为率失真曲线的斜率,Di、Ri分别为第i个编码单元的失真和编码比特,拉格朗日乘子λi与码率Ri之间的关系
步骤3:更新编码参数
上式中参数在编码完一帧或者一个编码树单元CTU后会自动更新。参数更新公式为
and
式中Di、Ri在编码完一帧或一个CTU后即可确定,而帧级的λi沿用同一个GOP内同一层级最近邻已编码帧拉格朗日乘子代替,CTU级的λi沿用与编码帧处于同一层级的最近邻已编码帧对应CTU拉格朗日乘子代替,然后用来更新参数ci、ki
步骤4:计算编码过程中用到的量化参数QP
QPi=4.2005·lnλi+13.7122
可以看出,码率控制算法中的量化参数和拉格朗日乘子的对数可以用线性函数关系拟合。
上述传统的码率控制方法中,存在的问题是:VVC中的R-Lambda码率控制模型可以达到较高的控制精度,但是其没有考虑全景视频由于投影带来的像素冗余问题。同时,没有利用帧间时域相关性进行率失真优化编码,编码质量有较大的提升空间。
发明内容
针对上述问题,本发明提供一种考虑时域失真传播的零时延全景视频码率控制方法。本发明的方法主要为优化目标比特分配和全局率失真优化编码,包括编码树单元(CTU)级的码率控制以及时域全局率失真优化。CTU级码率控制主要包括优化目标比特分配以及码率控制参数更新;时域全局率失真优化主要利用已编码帧的重建误差和运动补偿预测误差信息来估计当前编码帧内每个CTU的时域依赖性并据此调整CTU的率失真优化参数,同时根据全景球面图形投影到2D平面(ERP投影)过程中,会在两极产生过采样增加像素冗余的特征,根据投影面积变化比值来调整编码参数。
本发明的技术方案为:
一种考虑时域失真传播的零时延全景视频码率控制方法,包括以下步骤:
S1、向编码器输入目标码率以及待编码序列;
S2、判断当前是否为第一帧,若是,则执行:
计算CTU行对应的纬度值并计算球面及其投影平面的面积拉伸比;
初始化码率控制单元的参数;
计算当前帧的目标比特数;
根据目标比特数计算当前帧的帧级拉格朗日乘子;
依据帧级拉格朗日乘子计算帧级QP并帧内编码;
编码完当前帧,重复步骤S2;
否则,进入步骤S3;
S3、计算帧像素均值和方差值;
S4、判断当前帧数是否大于2倍GOP大小,若是,则进入S5;否则进入S6;
S5、根据迭代算法计算全局拉格朗日乘子并优化帧内各CTU比特分配,进入步骤S7;
S6、使用编码器自带比特分配算法为帧内各CTU分配比特;
S7、计算帧级拉格朗日乘子及QP;
S8、按顺序编码CTU;
S9、判断当前帧像素方差值与上一帧的差值是否大于阈值,若是,则进入步骤S10,否则进入步骤S11;
S10、根据步骤S2中获得的面积拉伸比调节当前CTU的拉格朗日乘子,进入步骤S13;
S11、判断当前帧数是否大于3,若是,则进入S13,否则进入S10;
S12、根据前一帧的时域失真影响因子和S2中获得的面积比值调节每个CTU的拉格朗日乘子;
S13、根据S10或者S12中计算的当前CTU的拉格朗日乘子来计算当前CTU的量化参数QP并编码CTU;
S14、判断是否是最后一个CTU,若是,则编码完当前帧,进入步骤S15,否则,回到步骤S8;
S15、更新当前帧及各CTU的码率控制参数及实际用的比特数;
S16、获取当前帧保存的每个CTU的重建误差以及运动补偿预测误差来计算当前帧内各个CTU的时域失真影响因子;
S17、判断是否是最后一帧,若是,则进入S18,否则进入S2
S18、当前全景视频序列编码完成。
进一步的,步骤S2中,定义全景视频球体半径为r,则纬度为θ的球面环带区域面积为:
Ss(θ)=2π·r2·cosθ·sindθ
投影2D平面后,得到拉伸区域的面积为:
则面积拉伸比为:
进一步地,步骤S3具体为:

其中,P、D分别表示均值和方差,Pi,j表示每一个像素点的像素值,分辨率为n*m。
进一步的,步骤S5具体为:
其中,R、r分别为帧级和CTU级目标比特,λg为全局拉格朗日乘子,αi=ci·ki λi为率失真曲线的斜率,Di、Ri分别为第i个编码单元的失真和编码比特,M为CTU个数。
进一步的,步骤10中的拉格朗日乘子调节方法为:
其中,λP为帧级拉格朗日乘子,λn为调节后的拉格朗日乘子。
进一步的,步骤12中的拉格朗日乘子调节方法为:
其中,k为时域失真影响因子。
本发明的有益效果为:本发明能够在降低编码复杂度且不需要缓存后续帧的前提下,使得码率控制误差非常小,能有效缩短编码时间并且在给定带宽的条件下提升视频编码质量。
附图说明
图1是本发明的方法流程示意图。
具体实施方式
下面结合附图来对本发明进行详细描述。
在视频码率控制中,编码器通过率失真优化(RDO)技术为输入视频选择一组最优的编码参数以及率失真代价最小的编码模式,其目标是在一定码率限制条件下尽可能降低编码失真或者在一定编码失真限定条件下尽可能减少编码比特。本发明所提出的零时延方法是指不允许编码器预先获得后续帧的信息,即得到待编码帧后不缓存分析数据必须立刻编码,具体实现方式如图1所示。
对于输入的第一帧图像,因为编码器内部还没有获得时域失真影响因子,因此需要先基于给定的目标码率对编码器内部参数进行初始化,初始化方式与VVC码率控制方法中一致,各个编码层级的目标比特数为:



其中,Rtar为目标码率,FR为帧率,Rpic为整个序列平均每帧的比特数,Tgop、Tpic、Tctu分别为分配给待编码GOP、编码帧、编码单元的目标比特数,SW是平滑比特分配的滑动窗,Ncoded、Rcoded、Ngopωpic、ωctu分别为已编码帧数、编码视频序列已消耗比特、一个GOP帧数、GOP内已编码帧数、编码帧所占权重、编码单元所占权重、未编码帧所占权重之和、帧内未编码CTU权重之和。
与传统方法不同的是,本发明的方法中还需要计算CTU行对应的纬度值并计算球面及其投影平面的面积拉伸比,具体方法为:
定义r为球体半径,可以计算纬度为θ球面环带区域面积为
Ss(θ)=2π·r2·cosθ·sindθ
对应于投影2D平面,该拉伸区域的面积为
投影前后的面积比为
可以看出,该比值随着纬度的增加而减小,在赤道区域,也即纬度θ为0处,该面积比为1,没有拉伸。假设赤道区域的熵在投影前后没有发生改变,越趋近于极点,熵变化越大。
从第二个图像组GOP开始,通过迭代算法逼近比特分配公式中的最优拉格朗日乘子,用该最优拉格朗日乘子分配比特,具体为:
式中R、r分别为帧级和CTU级目标比特,λg为全局拉格朗日乘子,αi=ci·ki一帧共有M个CTU。
对于第二帧及以后图像,则可以通过选择基于面积拉伸比对拉格朗日乘子进行调节,而对于第三帧及以后的图像,因已经获得了时域失真影响因子,则基于时域失真影响因子和面积拉伸比对拉格朗日乘子进行调节,具体为:通过将原本的拉格朗日乘子除以调整权重后得到新的拉格朗日乘子λn,进而实现时域率失真优化。
其中λP为帧级拉格朗日乘子。显然,纬度值θ越大,k(θ)越小,调整后的拉格朗日乘子越大,反之越小;同时,时域依赖性强的编码树单元,k越大,那么调整后的拉格朗日乘子就越小,也就意味着其失真将会被减小,有利于后续帧达到更高的率失真性能;反之,时域依赖性弱的编码树单元就会被相对的编差一些。
选择拉格朗日乘子调节方式的条件是通过判断当前帧像素方差值与上一帧的差值是否大于阈值50。因为时域失真传播的前提是视频图像的连续性,如果发生图像场景切换,传播链将断开,所以通过图像方差值的变换量来判断场景是否发生切换。

其中,P、D分别表示均值和方差,Pi,j表示每一个像素点的像素值,分辨率为n*m。
拉格朗日乘子和QP的计算方式为:
首先计算当前帧图像的平均每像素目标比特数(Tbpp),计算公式为
Tbpp=Tpic/Npixels
Npixels是指一个图片的像素数。而后计算当前图像的帧级拉格朗日参数λ。
对于Ⅰ帧:采用基于帧内复杂度的码率控制方法。根据实验知,定义帧内复杂度为和拉格朗日乘子有如下关系:
其中,MAD为像素的平均绝对值误差,BPP是像素的平均目标比特。α和β参数的初始值经验地设置为9.9416和-1.367。
对于P帧:计算方式为λ=αRβ,α的初始值设置为1058,β的值设置为-1.327
相应的帧级QP使用如下关系式进行计算得到:
QPi=4.2005·lnλi+13.7122
当完成一帧图像编码后,更新当前帧及各CTU的码率控制参数及实际用的比特数,并获取当前帧保存的每个CTU的重建误差以及运动补偿预测误差来计算当前帧内各个CTU的时域失真影响因子。具体为使用编码树单元的重建失真和运动补偿预测误差失真的比值来度量视频编码中的时域依赖性,即
编码块的失真Dcur与编码块的运动补偿预测误差对于当前编码块来说,在真正编码之前无法获得其失真。为了做到零时延,考虑使用前一帧对应位置编码树单元信息近似代替当前块的时域依赖性。这样做的原因是相邻两帧的图像特点一般比较相近,同时根据当前帧和前一帧的像素均值和方差的差来判断图像是否发生场景切换从而决定是否使用该失真影响因子。
在编码完一帧或者一个编码树单元CTU后会自动更新。参数更新公式为
and
式中失真、码率Di、Ri在编码完一帧或一个CTU后即可确定,而帧级的λi沿用同一个 GOP内同一层级最近邻已编码帧拉格朗日乘子代替,CTU级的λi沿用与编码帧处于同一层级的最近邻已编码帧对应CTU拉格朗日乘子代替,然后用来更新参数ci、ki
下面使用实验结果说明本发明方案的有效性,全景视频码率控制算法集成到基于360lib的H.266/VVC参考软件VTM14.0中,编码器配置为Lowdelay-P,实验条件遵守国际编码标准组织JCT-VC建议的标准全景视频序列作为测试视频,包括8K、6K和4K三类中的全部14个视频序列,对比的指标有BD-Rate,码率控制精度以及编码时间。其中当BD-Rate为负值时,代表相同重建质量下码率减少,技术具有增益,当BD-Rate为正值时,代表相同重建质量下码率增加,技术存在损失。此处分别以给定QP时VVC编码器的编码质量以及VVC的码率控制算法为对比的基准。
表1给出了本方案及VTM14.0和360lib的码率控制方法在BD-Rate上的表现。相较于vtm的码率控制方法,大幅度节省了码率,本方法有8.7%的码率节省。
表1与VTM14.0码率控制算法的编码BD-rate对比
表2给出了本方案与VTM14.0的率失真优化算法在BD-rate上的对比,平均有4.3%左右的码率节省。
表2与VTM14.0率失真优化算法的BD-rate对比
表3给出了本方案以及VTM14.0和360lib码率控制方案编码码率控制误差的大小。误码率的计算方法为,实际编码输出码率与设定的目标码率之间的绝对误差除以目标码率,偏差越大,误码率越高,码率控制的精度也就越低。本方案保证了极低的码率控制误差0.0891%。
表3码率控制误差对比


表4给出了本方案在编码时间上的性能。所有序列总的编码时间由3972小时降低到3963小时。
表4编码时间对比


Claims (5)

  1. 一种考虑时域失真传播的零时延全景视频码率控制方法,其特征在于,包括以下步骤:
    S1、向编码器输入目标码率以及待编码序列;
    S2、判断当前是否为第一帧,若是,则执行:
    计算CTU行对应的纬度值并计算球面及其投影平面的面积拉伸比;
    初始化码率控制单元的参数;
    计算当前帧的目标比特数;
    根据目标比特数计算当前帧的帧级拉格朗日乘子;
    依据帧级拉格朗日乘子计算帧级QP并帧内编码;
    编码完当前帧,重复步骤S2;
    否则,进入步骤S3;
    S3、计算帧像素均值和方差值;
    S4、判断当前帧数是否大于2倍GOP大小,若是,则进入S5;否则进入S6;
    S5、根据迭代算法计算全局拉格朗日乘子并优化帧内各CTU比特分配,进入步骤S7;
    S6、使用编码器自带比特分配算法为帧内各CTU分配比特;
    S7、计算帧级拉格朗日乘子及QP;
    S8、按顺序编码CTU;
    S9、判断当前帧像素方差值与上一帧的差值是否大于阈值,若是,则进入步骤S10,否则进入步骤S11;
    S10、根据步骤S2中获得的面积拉伸比调节当前CTU的拉格朗日乘子,进入步骤S13;
    S11、判断当前帧数是否大于3,若是,则进入S13,否则进入S10;
    S12、根据前一帧的时域失真影响因子和S2中获得的面积比值调节每个CTU的拉格朗日乘子;
    S13、根据S10或者S12中计算的当前CTU的拉格朗日乘子来计算当前CTU的量化参数QP并编码CTU;
    S14、判断是否是最后一个CTU,若是,则编码完当前帧,进入步骤S15,否则,回到步骤S8;
    S15、更新当前帧及各CTU的码率控制参数及实际用的比特数;
    S16、获取当前帧保存的每个CTU的重建误差以及运动补偿预测误差来计算当前帧内各个CTU的时域失真影响因子;
    S17、判断是否是最后一帧,若是,则进入S18,否则进入S2;
    S18、当前全景视频序列编码完成。
  2. 根据权利要求1所述的一种考虑时域失真传播的零时延全景视频码率控制方法,其特征在于,步骤S2中,计算面积拉伸比的方法为:定义全景视频球体半径为r,则纬度为θ的球面环带区域面积为:
    Ss(θ)=2π·r2·cosθ·sin dθ
    投影2D平面后,得到拉伸区域的面积为:
    则面积拉伸比为:
  3. 根据权利要求2所述的一种考虑时域失真传播的零时延全景视频码率控制方法,其特征在于,步骤S5具体为:
    其中,R、r分别为帧级和CTU级目标比特,λg为全局拉格朗日乘子,αi=ci·ki λi为率失真曲线的斜率,Di、Ri分别为第i个编码单元的失真和编码比特,M为CTU个数。
  4. 根据权利要求3所述的一种考虑时域失真传播的零时延全景视频码率控制方法,其特征在于,步骤10中的拉格朗日乘子调节方法为:
    其中,λP为帧级拉格朗日乘子,λn为调节后的拉格朗日乘子。
  5. 根据权利要求3所述的一种考虑时域失真传播的零时延全景视频码率控制方法,其特征在于,步骤12中的拉格朗日乘子调节方法为:
    其中,k为时域失真影响因子。
PCT/CN2023/087513 2022-10-18 2023-04-11 一种考虑时域失真传播的零时延全景视频码率控制方法 WO2024082579A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211273536.1A CN115695799A (zh) 2022-10-18 2022-10-18 一种考虑时域失真传播的零时延全景视频码率控制方法
CN202211273536.1 2022-10-18

Publications (1)

Publication Number Publication Date
WO2024082579A1 true WO2024082579A1 (zh) 2024-04-25

Family

ID=85066861

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/087513 WO2024082579A1 (zh) 2022-10-18 2023-04-11 一种考虑时域失真传播的零时延全景视频码率控制方法

Country Status (2)

Country Link
CN (1) CN115695799A (zh)
WO (1) WO2024082579A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115695799A (zh) * 2022-10-18 2023-02-03 电子科技大学 一种考虑时域失真传播的零时延全景视频码率控制方法
CN116723330B (zh) * 2023-03-28 2024-02-23 成都师范学院 一种自适应球域失真传播链长度的全景视频编码方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020172813A1 (zh) * 2019-02-27 2020-09-03 Oppo广东移动通信有限公司 率失真优化方法及装置、计算机可读存储介质
CN113489981A (zh) * 2021-07-06 2021-10-08 电子科技大学 一种考虑时域率失真优化的零延迟码率控制方法
CN115022638A (zh) * 2022-06-30 2022-09-06 电子科技大学 一种面向全景视频编码的率失真优化方法
CN115695799A (zh) * 2022-10-18 2023-02-03 电子科技大学 一种考虑时域失真传播的零时延全景视频码率控制方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020172813A1 (zh) * 2019-02-27 2020-09-03 Oppo广东移动通信有限公司 率失真优化方法及装置、计算机可读存储介质
CN113489981A (zh) * 2021-07-06 2021-10-08 电子科技大学 一种考虑时域率失真优化的零延迟码率控制方法
CN115022638A (zh) * 2022-06-30 2022-09-06 电子科技大学 一种面向全景视频编码的率失真优化方法
CN115695799A (zh) * 2022-10-18 2023-02-03 电子科技大学 一种考虑时域失真传播的零时延全景视频码率控制方法

Also Published As

Publication number Publication date
CN115695799A (zh) 2023-02-03

Similar Documents

Publication Publication Date Title
WO2024082579A1 (zh) 一种考虑时域失真传播的零时延全景视频码率控制方法
WO2021244341A1 (zh) 图像编码方法及装置、电子设备及计算机可读存储介质
JP5351040B2 (ja) 映像符号化規格に対応した映像レート制御の改善
CN111918068B (zh) 基于视频序列特征和QP-λ修正的时域率失真优化方法
JP5400876B2 (ja) ビデオ符号化のための、スライス依存性に基づくレート制御モデル適合化
WO2024082580A1 (zh) 一种考虑时域失真传播的低复杂度全景视频编码方法
WO2021196682A1 (zh) 一种基于失真类型传播分析的时域率失真优化方法
CN108012149B (zh) 一种视频编码中码率控制的方法
CN103533359A (zh) 一种h.264码率控制方法
EP3545677A1 (en) Methods and apparatuses for encoding and decoding video based on perceptual metric classification
CN110730346A (zh) 基于编码树单元失真优化的视频编码码率控制方法
US20240040127A1 (en) Video encoding method and apparatus and electronic device
CN108989818B (zh) 一种图像编码参数调整方法及装置
KR20090017724A (ko) 동영상 부호화에 있어서 비트 발생 가능성 예측을 이용한블록 모드 결정 방법 및 장치
WO2023241376A1 (zh) 一种视频码率分配方法、系统、设备及存储介质
CN116016927A (zh) 一种考虑时域相关性和熵平衡的低延时全景视频编码方法
WO2020037501A1 (zh) 码率分配方法、码率控制方法、编码器和记录介质
WO2019141007A1 (zh) 图像编码中的预测方向选取方法、装置和存储介质
CN111541898B (zh) 一种编码模式的确定方法、装置、服务器和存储介质
CN109451309B (zh) Hevc全i帧编码基于显著性的ctu层码率分配方法
CN111757112A (zh) 一种基于恰可察觉失真的hevc感知码率控制方法
CN113099226A (zh) 面向智慧法院场景的多层次感知视频编码算法优化方法
CN118055234B (zh) 视频帧编码方法、装置、设备、介质和计算机程序产品
CN116723330B (zh) 一种自适应球域失真传播链长度的全景视频编码方法
TWI847618B (zh) 編碼器及相關的訊號處理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23878579

Country of ref document: EP

Kind code of ref document: A1