WO2021000245A1

WO2021000245A1 - Constant rate factor control for adaptive resolution video coding

Info

Publication number: WO2021000245A1
Application number: PCT/CN2019/094318
Authority: WO
Inventors: Tsuishan CHANG; Yuchen SUN; Xuguang ZUO; Ling Zhu; Jian Lou
Original assignee: Alibaba Group Holding Limited
Priority date: 2019-07-02
Filing date: 2019-07-02
Publication date: 2021-01-07

Abstract

Systems and methods are provided for implementing resolution-adaptive video coding supported by a Constant Rate Factor (CRF) mode of a video encoder, adapting the encoder to generate and calculate data that arises as a result of resolution changes between frames, and improving evaluation of complexity of video frames when resolutions between frames are different. The methods and systems described herein provide a video encoder which calculates frame complexity by modulating sums of absolute differences for each frame based on differences in resolutions between frames; calculates frame complexity by taking varying numbers of macroblocks between frames into account; calculates and stores different sets of frame complexity parameters for different resolutions; and calculates a sum of absolute transformed differences for a frame by equalizing current frame and reference frame resolution.

Description

CONSTANT RATE FACTOR CONTROL FOR ADAPTIVE RESOLUTION VIDEO CODING

BACKGROUND

In conventional video coding formats, such as the H. 264/AVC (Advanced Video Coding) and H. 265/HEVC (High Efficiency Video Coding) standards, video frames in a sequence have their size and resolution recorded at the sequence-level in a header. Thus, in order to change frame resolution, a new video sequence must be generated, starting with an intra-coded frame, which carries significantly larger bandwidth costs to transmit than inter-coded frames. Consequently, although it is desirable to adaptively transmit a down-sampled, low resolution video over a network when network bandwidth becomes low, reduced or throttled, it is difficult to realize bandwidth savings while using conventional video coding formats, because the bandwidth costs of adaptively down-sampling offset the bandwidth gains.

Research has been conducted into supporting resolution changing while transmitting coded frames. However, a video encoder coding frames implements a number of processes and algorithms, each of which may be impacted by resolution changes. For example, a video encoder may implement a rate control algorithm.

According to both the H. 264/AVC and H. 265/HEVC standards, as well as the VP9 standard, rate control methods may result in varying bitrates for frames of a video sequence. It is desirable for a rate control method of an encoder to result in higher bitrates for frames having low complexity, i.e., fewer details and motions, and lower bitrates for frames having high complexity, i.e., more details and motions, or complex textures and fast motions. This may improve perceptible visual quality in accordance with the tendency of human viewers of a video to more readily perceive visual artefacts and loss of picture information resulting from bitrates being low over frames of a video having relatively flat and/or static regions.

The current implementations of encoder rate control methods do not take resolution changes into account, and so new techniques are required in order to cause encoders implementing rate control to behave correctly when frames of a sequence have different resolutions, and a current frame may have a resolution different from a reference frame.

BRIEF DESCRIPTION OF THE DRAWINGS

The detailed description is set forth with reference to the accompanying figures. In the figures, the left-most digit (s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items or features.

FIG. 1 illustrates an example block diagram of a quantization step calculation process performed by a video encoder implementing a Constant Rate Factor rate control mode according to an example embodiment of the present disclosure.

FIG. 2A illustrates a DC intra-frame motion prediction mode of a video encoder, FIG. 2B illustrates a horizontal intra-frame motion prediction mode, and FIG. 2C illustrates a vertical intra-frame motion prediction mode according to example embodiments of the present disclosure.

FIG. 3 illustrates a sequence of frames wherein two chronologically separate sets of frames are coded using a first parameter set, and another set of frames is coded using a second parameter set.

FIG. 4 illustrates an example system for implementing the processes and methods described herein for implementing support for resolution-adaptive video coding in Constant Rate Factor rate control modes in video encoders.

DETAILED DESCRIPTION

Systems and methods discussed herein are directed to support adaptive resolution change in a video encoder, and more specifically to implement rate control methods which improve adaptive resolution change and take adaptive resolution change into account in computing Constant Rate Factor parameters.

According to example embodiments of the present disclosure implemented to be compatible with AVC, HEVC, VP9, and such video coding standards implementing rate control, a frame may be subdivided into macroblocks (MBs) each having dimensions of 16x16 pixels, which may be further subdivided into partitions. According to example embodiments of the present disclosure implemented to be compatible with the HEVC standard, a frame may be subdivided into coding tree units (CTUs) , the luma and chroma components of which may be further subdivided into coding tree blocks (CTBs) which are further subdivided into coding units (CUs) . According to example embodiments of the present disclosure implemented as other standards, a frame may be subdivided into units of NxN pixels, which may then be further subdivided into subunits. Each of these largest subdivided units of a frame may generally be referred to as a “block” for the purpose of this disclosure.

According to example embodiments of the present disclosure, motion prediction coding formats may refer to data formats wherein frames are encoded with motion vector information and prediction information of a frame by the inclusion of one or more references to motion information and prediction units (PUs) of one or more other frames. Motion information may refer to data describing motion of a block structure of a frame or a unit or subunit thereof, such as motion vectors and references to blocks of a current frame or of another frame. PUs may refer to a unit or multiple subunits corresponding to a block structure among multiple block structures of a frame, such as an MB or a CTU, wherein blocks are partitioned based on the frame data and are coded according to established video codecs. Motion information corresponding to a PU may describe motion prediction as encoded by any motion vector coding tool, including, but not limited to, those described herein.

A video encoder according to motion prediction coding may obtain a picture from a video source and code the frame to obtain a reconstructed frame that may be output for transmission. A reconstructed frame and blocks of a reconstructed frame may be intra-coded, wherein at least some motion information of the reconstructed frame refers to motion information elsewhere in the reconstructed frame, or inter-coded, wherein at least some motion information of the reconstructed frame refers to motion information of another frame. In general, frames and blocks thereof according to example embodiments of the present disclosure may be coded according to intra-coded or inter-coded motion prediction unless either is expressly specified.

A video encoder according to motion prediction coding may implement one or more rate control settings. A rate control setting may enable the video encoder to assign different quantization parameters (QPs) to different frames. The magnitude of the QP determines a scale over which picture information is quantized during encoding by the video encoder, and thus determines an extent to which the video encoder discards picture information (due to information falling between steps of the scale) from MBs of the sequence during coding.

A video encoder according to example embodiments of the present disclosure may implement Constant Rate Factor (CRF) as a rate control setting. Implementing CRF as a rate control setting may enable the video encoder to, in addition to assigning quantization parameters on a per-sequence basis, further assign quantization parameters on a per-frame basis. Consequently, bitrate of a video sequence may vary on a per-frame basis due to MBs of a frame having a higher quantization parameter causing a quantizer of an encoder to discard more picture information, and MBs of a frame having a lower quantization parameter causing the quantizer to discard comparatively less picture information.

According to CRF, a quantization step or quantization scale for a frame may be calculated based on frame-specific parameters (rather than a QP being set for a sequence as a whole) , which may be parameters calculated on a per-frame basis in a sequence, and non-frame-specific parameters, which may be parameters calculated on a per-sequence basis or constant parameters predefined for the sequence.

FIG. 1 illustrates an example block diagram of a quantization step calculation process 100 performed by a video encoder implementing CRF according to an example embodiment of the present disclosure.

In a quantization step calculation process 100, a CRF parameter 102 (herein, named f_rf_constant) may be set as a constant for a sequence of multiple frames. The CRF parameter 102 may be a non-frame-specific parameter, correlating to a baseline QP desired for the sequence. However, although the CRF parameter 102 is a constant, the video encoder may not apply a constant QP over each frame of the sequence based on the CRF parameter 102. Instead, the video encoder may perform per-frame calculations upon the CRF parameter 102, based on frame-specific parameters, to derive a quantization step for each frame that may vary over a range above and/or below the baseline QP.

In a quantization step calculation process 100, Sum of Absolute Transformed Differences (SATD) values 104 may be calculated for each frame. SATD values 104 may each be a frame-specific parameter, where the video encoder may calculate a SATD value 104 for each frame as a sum of SATD sub-values for each MB of that frame. SATD values 104 of each frame may be stored in a data structure indexed by frame. For example, SATD values 104 of each frame may be stored in an array SATD [] , wherein an SATD value 104 of each frame i may be accessed with reference to SATD [i] .

A video encoder may down-sample a 16x16 pixel MB to generate an 8x8 pixel down-sampled MB having dimensions of half width and half height of the original MB, then calculate an SATD sub-value for the down-sampled MB in place of the original MB.

The current frame may be an intra-coded frame; in this case, the video encoder calculating an SATD sub-value for the down-sampled MB may include the video encoder calculating an SATD sub-value for each of multiple intra-frame prediction modes of the MB. Intra-frame prediction modes may include, for example, a DC mode of the video encoder as illustrated by FIG. 2A, wherein the video encoder performs motion prediction with reference to a mean of sampled pixels above an upper boundary of the MB and sampled pixels left of a left boundary of the MB; a horizontal mode of the video encoder as illustrated by FIG. 2B, wherein the video encoder performs motion prediction with reference to sampled pixels left of the left boundary of the MB; and a vertical mode of the video encoder as illustrated by FIG. 2C, wherein the video encoder performs motion prediction with reference to sampled pixels above the upper boundary of the MB. It is generally known in the art to calculate an SATD value during intra-frame motion prediction. The video encoder may consider a smallest SATD sub-value among each SATD sub-value calculated from these modes to be an SATD sub-value for the down-sampled MB.

The current frame may be an inter-coded frame, wherein the reference frame is another frame; in this case, the video encoder calculating an SATD sub-value for the down-sampled MB may include the video encoder calculating an SATD sub-value for each of multiple intra-frame prediction modes of the MB, as well as the video encoder performing motion search within the other frame in accordance with inter-frame motion prediction to find a best fit reference block for the down-sampled MB. The video encoder performing motion search may include, for example, the video encoder performing integer-pixel motion search, wherein motion vectors of the current frame are matched against pixel samples of the reference frame to the accuracy of the distance between pixels; the video encoder performing half-pixel motion search, wherein motion vectors of the current frame are matched against pixel samples of the reference frame to the accuracy of half the distance between pixels; and the video encoder performing quarter-pixel motion search, wherein motion vectors of the current frame are matched against pixel samples of the reference frame to the accuracy of a quarter of the distance between pixels. The video encoder may consider a smallest SATD sub-value among each SATD sub-value calculated from these modes to be an SATD sub-value for the down-sampled MB.

According to example embodiments of the present disclosure, while the video encoder performs motion search in accordance with inter-frame motion prediction, the current frame and the reference frame may have different resolutions. In such cases, prior to performing motion search, the reference frame may be resized to a same resolution as the resolution of the current frame; the current frame may be resized to a same resolution as the resolution of the reference frame; or both the current frame and the reference frame may be resized to a common resolution, where the common resolution may be a preset resolution for the purpose of motion search in accordance with inter-frame motion prediction.

In a quantization step calculation process 100, complexity values 106 may be calculated for each frame. Complexity values 106 may correlate to level of detail, level of motion, and such factors in a frame that, when increased, tend to reduce the likelihood that visual artefacts and loss in a frame would degrade perceptible visual quality of the frame. Complexity values 106 may each be a frame-specific value, and furthermore may be cumulative over each frame of a sequence, where the video encoder may calculate a complexity value 106 for each frame i as a complexity sum value 108 (representing accumulated complexity over the sequence as of frame i) divided by a frame number value 110 (correlating to the number of frames in the sequence as of frame i) . Complexity values 106, complexity sum values 108, and frame number values 110 of each frame may be stored in a data structure indexed by frame. For example, complexity values 106 of each frame may be stored in an array blurred_complexity [] , wherein a complexity value 106 of each frame i may be accessed with reference to blurred_complexity [i] . Likewise, complexity sum values 108 may be stored as, for example, cplxsum [] and accessed for frame i as cplxsum [i] ; frame number values 110 may be stored as, for example, cplxcount [] and accessed for frame i as cplxcount [i] .

An example equation for calculating a complexity value 106 is as follows:

In a quantization step calculation process 100, a complexity sum value 108 of each frame i may be calculated. Complexity sum values 108 may each be a frame-specific value, and furthermore may be cumulative over each frame of a sequence, wherein the video encoder may calculate a complexity sum value 108 of a frame i based on a complexity sum value of a frame previous to i, i –1, plus a modulated value of a SATD value of the frame i. In the case that i = 0, indicating that the frame i is the first frame in a sequence, a complexity sum value 108 for a frame i –1 may be arbitrarily set to 0. An example equation for calculating a complexity sum value 108 is as follows:

cplxsum [i] =cplxsum [i-1] *0.5+SATD [i]

In a quantization step calculation process 100, a frame number value 110 of each frame i may be calculated. Frame number values 110 may each be a frame-specific value, and furthermore may be cumulative over each frame of a sequence, wherein the video encoder may calculate a frame number value 110 of a frame i based on a frame number value of a frame previous to i, i –1, plus a constant or a modulated constant. In the case that i = 0, indicating that the frame i is the first frame in a sequence, a frame number value 110 for a frame i –1 may be arbitrarily set to 0. An example equation for calculating a frame number value 110 is as follows:

cplxcount [i] =cplxcount [i-1] *0.5+1

In a quantization step calculation process 100, a quantization step value 112 of each frame i may be calculated. Quantization step values 112 may each be a frame-specific value, wherein the video encoder may calculate a quantization step value 112 of a frame i based on a complexity value 106 of the frame i; a compression parameter 114 representing a degree of compression to be applied to the frame i, which may be a constant set over a range of, for example, 0 to 1.0, inclusive; and a quantized CRF constant 116, which may be converted from the CRF parameter 102 to a quantization scale by a conversion function as known in the art, details of which need not be reiterated herein. Example equations for calculating a quantization step value 112 are as follows:

rate_factor_constnt = base_cplx ^1-qcornpress/qp2qscale (f_rf_constant)

wherein base_cplx is a base complexity 118 of the sequence and is a non-frame-specific value; qcompress is the compression parameter 114; rate_factor_constant is the quantized CRF constant 116; qp2qscale () is a function converting the CRF parameter 102 to the quantized CRF constant 116 over a quantization scale; iMbCount is an MB count 120 indicating a number of MBs in frames of a sequence, and is non-frame-specific parameter; and i_bframe is max B-frame number 122 indicating a maximum number of B-frames in a sequence between two consecutive P-frames of the series, and is a non-frame-specific parameter, wherein a P-frame is a frame referencing a previous frame and a B-frame is a frame referencing a previous frame and a subsequent frame.

According to example embodiments of the present disclosure, the video encoder may calculate a complexity sum value 108 by filtering and/or modulating frame-specific but resolution-dependent values of the calculation with resolution-dependent parameters having, for example, inverse correlations to resolution, so that resolution-dependence of the complexity sum value 108 is reduced by at least some extent.

For example, a complexity sum value 108 of a current frame in a sequence is cumulative of complexity sum values of previous frames in the sequence. However, although complexity sum values are resolution-dependent values, the calculation of a particular complexity sum value 108 may generally not account for differences in resolution between previous frames and the current frame, or differences in resolution among previous frames. Furthermore, since the calculation of a complexity sum value 108 uses a complexity sum value of only one previous frame rather than complexity sum values of each previous frame, differences in resolution among previous frames cannot be reflected in this calculation.

Therefore, in calculating the complexity sum value 108, the video encoder may at least partially reduce the contribution of resolution-dependence of a complexity sum value of a previous frame to the complexity sum value 108 calculation, by modulating a complexity sum value of a previous frame by a resolution-dependent, frame-specific, filtering parameter 124. Given a difference between frame resolutions of the current frame and the previous frame, the filtering parameter 124 may be a preset value, or may be a variable value inversely correlating to the difference between frame resolutions, so that multiplying the complexity sum value of a previous frame by the filtering parameter 124 reduces resolution-dependence of the complexity sum value of the previous frame by at least some extent, therefore reducing resolution-dependence of the complexity sum value 108 of the current frame by at least some extent.

Additionally, for example, a SATD value 104 of a frame i may be a resolution-independent value in its derivation, such that between two frames having the same picture content at two different resolutions, a SATD value 104 of a frame having a higher resolution is larger and a SATD value 104 of a frame having a lower resolution is smaller. Consequently, a complexity value 106 calculated from a resolution-dependent SATD value 104 is a resolution-dependent value, and a quantization step value 112 calculated from a resolution-dependent complexity value 106 is a resolution-dependent value. Video encoders performing coding using a quantization step value 112 may generally account for differences in quantization step values 112 to arise from complexity values 106 and not from resolution. Moreover, a video encoder may be unable to distinguish between the extent to which differences in resolution contribute to differences in a quantization step value 112 and the extent to which differences in complexity contribute to differences in a quantization step value 112 when both factors are variables and both factors contribute.

Therefore, in calculating the complexity sum value 108, the video encoder may at least partially normalize the contribution of an SATD value 104’s resolution-dependence to the complexity sum value 108 calculation, by modulating a resolution-dependent SATD value 104 by a resolution-dependent first-type modulation parameter 126. Given a SATD value 104 correlating to frame resolution, the first-type modulation parameter 126 may, for example, inversely correlate to frame resolution so that multiplying the SATD value 104 by the first-type modulation parameter 126 reduces resolution-dependence of the SATD value 104 by at least some extent, therefore reducing resolution-dependence of the complexity sum value 108 by at least some extent.

By examples of modulation parameters such as those described above, a video encoder, in calculating a complexity value 106 using complexity sum values 108 as calculated above, may reduce resolution-dependence of the complexity value 106 by at least some extent. An example equation for calculating a complexity value 106 according to example embodiments of the present disclosure is as follows:

cplxsum [i] =cplxsum [i-1] *cplxFilter [i] +SATD [i] *resSATDMod [i]

where filtering parameters 124 may be stored as, for example, cplxFilter [] and accessed for frame i as cplxFilter [i] ; and first-type modulation parameters 126 may be stored as, for example, resSATDMod [] and accessed for frame i as resSATDMod [i] .

According to example embodiments of the present disclosure, in calculating a quantization step value 112, a quantized CRF constant 116 may be a non-frame- specific value as a result of the base complexity 118 being a non-frame-specific value, which further results from the MB count 120 being a non-frame-specific value. However, given resolution differences between frames of a sequence, base complexity 118 may be a frame-specific value and MB count 120 may also be a frame-specific value, causing the quantized CRF constant 116 to be a frame-specific value. Using a frame-specific quantized CRF constant 116 in calculating a quantization step value 112 may increase frame-specificity of the quantization step value 112.

Therefore, the video encoder may store base complexity 118 and MB count 120 as arrays rather than single values, where base complexity 118 may be stored as, for example, base_cplx [] and accessed for frame i as cplxsum [i] ; and MB count 120 may be stored as, for example, iMbCount [] and accessed for frame i as iMbCount [i] .

Additionally, MB count 120 as a frame-specific parameter may correspond not only to a number of MBs in a current frame, but may further reflect a cumulative MB count, and may be filtered. An example equation for calculating an MB count 120 as a frame-specific parameter according to example embodiments of the present disclosure is as follows:

iMbCount [i] =iMbCount [i-1] *countFilter+iMbCountCur

where countFilter denotes a coefficient used to filter MB count 120, and iMbCountCur denotes an MB count of the current frame. However, any variable incorporating MB count per frame as a component may be used as an MB count 120 for the purpose of the present disclosure, without limitation.

Furthermore, the video encoder may calculate a base complexity 118 for each frame i by modulating frame-specific but resolution-dependent values of the calculation with other resolution-dependent parameters, so that resolution-dependence of the base complexity 118 of a frame is reduced by at least some extent. An example equation for calculating a base complexity 108 is as follows:

base_cplx [i] = (iMbCount* (i_bframe? 120: 80) )

For example, an MB count 120 of a frame i may be a resolution-independent value in its derivation, such that a frame having a higher resolution may have more MBs than a frame having a lower resolution. Consequently, a base complexity 118 calculated from a resolution-dependent MB count 120 is a resolution-dependent value, a quantization CRF constant 116 calculated from a resolution-dependent base complexity 118 is a resolution-dependent value, and a quantization step value 112 calculated from a resolution-dependent quantization CRF constant 116 is a resolution dependent value. Video encoders performing coding using a quantization step value 112 may generally account for differences in quantization step values 112 to arise from complexity values 106 and not from resolution. Moreover, a video encoder may be unable to distinguish between the extent to which differences in resolution contribute to differences in a quantization step value 112 and the extent to which differences in complexity contribute to differences in a quantization step value 112 when both factors are variables and both factors contribute.

Therefore, in calculating the base complexity 118, the video encoder may at least partially normalize the contribution of a MB count 120’s resolution-dependence to the base complexity 118 calculation, by modulating a resolution-dependent MB count 120 by a resolution-dependent second-type modulation parameter 128. Given an MB count 120 correlating to frame resolution, the second-type modulation parameter 128 may, for example, correlate to frame resolution or frame resolution variation, so that multiplying the MB count 120 by the second-type modulation parameter 128 reduces resolution-dependence of the MB count 120 by at least some extent, therefore reducing resolution-dependence of the base complexity 118 by at least some extent. An example equation for calculating a complexity value 106 according to example embodiments of the present disclosure is as follows:

base_cplx [i] = (iMbCount*resCountMod [i] *(i_bframe? 120: 80) )

wherein second-type modulation parameters 128 may be stored as, for example, resCountMod [] and accessed for frame i as resCountMod [i] .

According to example embodiments of the present disclosure, resolution-dependent values as described above or other similar resolution-dependent values or parameters as implemented by rate control modes used in calculating a complexity value 106 and a quantization step value 112 may be calculated for in a set for a sequence of frames of a particular resolution, and then the calculated values or parameters stored in resolution-specific sets. Such values or parameters may include, without limitation, a complexity sum value 108 and a frame number value 110, and other such values or parameters that are applicable to per-frame calculations as described herein irrespective of frame content.

A video encoder, upon encoding a current frame, may reference a resolution-specific set of values or parameters as described above matching a resolution of the current frame in performing one or more of the calculations as described above. For example, a video encoder having a stored resolution-specific set of complexity sum values 108 and quantization step values 112 for some number of frames of resolution x may refer to the same set of complexity sum values 108 and quantization step values 112 upon encoding a current frame having resolution x. The video encoder may then update the set of complexity sum values 108 and quantization step values 112 after encoding the current frame. FIG. 3 illustrates a sequence of frames wherein two chronologically separate sets of frames are coded using a first parameter set, and another set of frames is coded using a second parameter set.

Furthermore, when a video encoder encodes a succeeding frame immediately following a preceding frame having a different resolution, in referencing a parameter set having the same resolution as the resolution of the succeeding frame, rather than treat the succeeding frame as a first frame i in a sequence, and therefore setting a frame number value for a frame i –1 to 0, parameters for frame i –1 may rather reference another parameter set of a different resolution referenced by the succeeding frame. Parameter sets of different resolutions may reference each other in this manner because information for frames at the end of a parameter set may be relatively stable compared to information for frames at the start of a parameter set, due to the information for frames at the end being the cumulative result of iterative recursive calculations.

FIG. 4 illustrates an example system 400 for implementing the processes and methods described above for implementing support for resolution-adaptive video coding in Constant Rate Factor rate control modes in video encoders.

The techniques and mechanisms described herein may be implemented by multiple instances of the system 400 as well as by any other computing device, system, and/or environment. The system 400 shown in FIG. 4 is only one example of a system and is not intended to suggest any limitation as to the scope of use or functionality of any computing device utilized to perform the processes and/or procedures described above. Other well-known computing devices, systems, environments and/or configurations that may be suitable for use with the embodiments include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, game consoles, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, implementations using field programmable gate arrays ( “FPGAs” ) and application specific integrated circuits ( “ASICs” ) , and/or the like.

The system 400 may include one or more processors 402 and system memory 404 communicatively coupled to the processor (s) 402. The processor (s) 402 may execute one or more modules and/or processes to cause the processor (s) 402 to perform a variety of functions. In some embodiments, the processor (s) 402 may include a central processing unit (CPU) , a graphics processing unit (GPU) , both CPU and GPU, or other processing units or components known in the art. Additionally, each of the processor (s) 402 may possess its own local memory, which also may store program modules, program data, and/or one or more operating systems.

Depending on the exact configuration and type of the system 400, the system memory 404 may be volatile, such as RAM, non-volatile, such as ROM, flash memory, miniature hard drive, memory card, and the like, or some combination thereof. The system memory 404 may include one or more computer-executable modules 406 that are executable by the processor (s) 402.

The modules 406 may include, but are not limited to, an encoder module 408 and a rate control module 410. The rate control module 410 further includes a CRF parameter setting submodule 412, a SATD calculating submodule 414, a complexity calculating submodule 416, a complexity sum calculating submodule 418, a frame number calculating submodule 420, a quantization step calculating submodule 422, a compression parameter setting submodule 424, a quantized CRF constant calculating submodule 426, a base complexity calculating submodule 428, an MB count calculating submodule 430, a max B-frame number setting submodule 432, a filtering parameter setting submodule 434, a first-type modulation parameter setting submodule 436, and a second-type modulation parameter setting submodule 438.

The encoder module 408 may be configured to perform motion prediction coding upon a sequence of frames, and may be configured to provide frame information and parameters to the rate control module 410, receive a quantization step on a per-frame basis from the rate control module 410, then encode each frame such that picture data is discarded in accordance with the quantization step.

The rate control module 410 may be configured to assign a quantization step to frame information received from the encoder module 408 in accordance with its submodules as described below.

The CRF parameter setting submodule 412 may be configured to set a CRF parameter for a sequence of multiple frames and provide the CRF parameter to the quantized CRF constant calculating submodule 426, as abovementioned with reference to FIG. 1.

The SATD calculating submodule 414 may be configured to calculate a SATD value for each frame of a sequence by down-sampling MB picture data received from the encoder module 408 and provide the SATD value for each frame to the complexity sum calculating submodule 418, as abovementioned with reference to FIG. 1.

The complexity calculating submodule 416 may be configured to calculate a complexity value for each frame of a sequence using a complexity sum value received from a complexity sum calculating submodule 418 and a frame number received from a frame number calculating submodule 420, as abovementioned with reference to FIG. 1.

The complexity sum calculating submodule 418 may be configured to recursively calculate a complexity sum value for each frame of a sequence based on cumulative complexity sum values calculated for previous frames of the sequence, a filtering parameter for the frame provided by the filtering parameter setting submodule 434, a SATD value for the frame provided by the SATD calculating submodule 414, and a first-type modulation parameter provided by the first-type modulation parameter submodule 436, and provide the complexity sum value to the complexity calculating submodule 416, as abovementioned with reference to FIG. 1.

The frame number calculating submodule 420 may be configured to recursively calculate a frame number value for each frame of a sequence based on cumulative frame number values calculated for previous frames of the sequence and provide the complexity sum value to the complexity calculating submodule 416, as abovementioned with reference to FIG. 1.

The quantization step calculating submodule 422 may be configured to calculate a quantization step value for each frame of a sequence from a complexity value calculated by the complexity calculating submodule 416, a compression parameter set by the compression parameter setting submodule 424, and a quantized CRF constant calculated by the quantized CRF constant calculating submodule 426, as abovementioned with reference to FIG. 1.

The compression parameter setting submodule 424 may be configured to set a compression parameter and provide the compression parameter to the quantized CRF constant calculating submodule 426, as abovementioned with reference to FIG. 1.

The quantized CRF constant calculating submodule 426 may be configured to calculate a quantized CRF constant for a sequence by conversion from a base complexity calculated by the base complexity calculating submodule 428 and from a compression parameter set by the compression parameter setting submodule 424 and a CRF parameter set by the CRF parameter setting submodule 412, as abovementioned with reference to FIG. 1.

The base complexity calculating submodule 428 may be configured to calculate a base complexity from an MB count provide by the MB count calculating submodule 430, a second-type modulation parameter provided by the second-type modulation parameter setting submodule 438, and a max B-frame number set by the max B-frame number determining submodule 432, and provide the base complexity to the quantized CRF constant calculating submodule 426, as abovementioned with reference to FIG. 1.

The MB count calculating submodule 430 may be configured to calculate an MB count for each frame of a sequence and provide the MB count to the base complexity calculating submodule 428, as abovementioned with reference to FIG. 1.

The max B-frame number determining submodule 432 may be configured to determine a max B-frame number for a sequence and provide the max B-frame number to the base complexity calculating submodule 428, as abovementioned with reference to FIG. 1.

The filtering parameter setting submodule 434 may be configured to set a filtering parameter for each frame of a sequence and provide the filtering parameter to the complexity sum calculating submodule 418, as abovementioned with reference to FIG. 1.

The first-type modulation parameter setting submodule 436 may be configured to set a first-type modulation parameter and provide the first-type modulation parameter to the complexity sum calculating submodule 418, as abovementioned with reference to FIG. 1.

The second-type modulation parameter setting submodule 438 may be configured to set a second-type modulation parameter and provide the second-type modulation parameter to the base complexity calculating submodule 428, as abovementioned with reference to FIG. 1.

The system 400 may additionally include an input/output (I/O) interface 440 for receiving sequences of frames from video source data, and for outputting reconstructed frames into a reference frame buffer and/or a transmission buffer. The system 400 may also include a communication module 450 allowing the system 400 to communicate with other devices (not shown) over a network (not shown) . The network may include the Internet, wired media such as a wired network or direct-wired connections, and wireless media such as acoustic, radio frequency (RF) , infrared, and other wireless media.

Some or all operations of the methods described above can be performed by execution of computer-readable instructions stored on a computer-readable storage medium, as defined below. The term “computer-readable instructions” as used in the description and claims, include routines, applications, application modules, program modules, programs, components, data structures, algorithms, and the like. Computer-readable instructions can be implemented on various system configurations, including single-processor or multiprocessor systems, minicomputers, mainframe computers, personal computers, hand-held computing devices, microprocessor-based, programmable consumer electronics, combinations thereof, and the like.

The computer-readable storage media may include volatile memory (such as random-access memory (RAM) ) and/or non-volatile memory (such as read-only memory (ROM) , flash memory, etc. ) . The computer-readable storage media may also include additional removable storage and/or non-removable storage including, but not limited to, flash memory, magnetic storage, optical storage, and/or tape storage that may provide non-volatile storage of computer-readable instructions, data structures, program modules, and the like.

A non-transient computer-readable storage medium is an example of computer-readable media. Computer-readable media includes at least two types of computer-readable media, namely computer-readable storage media and communications media. Computer-readable storage media includes volatile and non-volatile, removable and non-removable media implemented in any process or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer-readable storage media includes, but is not limited to, phase change memory (PRAM) , static random-access memory (SRAM) , dynamic random-access memory (DRAM) , other types of random-access memory (RAM) , read-only memory (ROM) , electrically erasable programmable read- only memory (EEPROM) , flash memory or other memory technology, compact disk read-only memory (CD-ROM) , digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device. In contrast, communication media may embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism. A computer-readable storage medium employed herein shall not be interpreted as a transitory signal itself, such as a radio wave or other free-propagating electromagnetic wave, electromagnetic waves propagating through a waveguide or other transmission medium (such as light pulses through a fiber optic cable) , or electrical signals propagating through a wire.

The computer-readable instructions stored on one or more non-transitory computer-readable storage media that, when executed by one or more processors, may perform operations described above with reference to FIGS. 1-4. Generally, computer-readable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be combined in any order and/or in parallel to implement the processes.

By the abovementioned technical solutions, the present disclosure provides resolution-adaptive video coding supported by a Constant Rate Factor (CRF) mode of a video encoder, adapting the encoder to generate and calculate data that arises as a result of resolution changes between frames, and improving evaluation of complexity of video frames when resolutions between frames are different. The methods and systems described herein provide a video encoder which calculates frame complexity by modulating sums of absolute differences for each frame based on differences in resolutions between frames; calculates frame complexity by taking varying numbers of macroblocks between frames into account; calculates and stores different sets of frame complexity parameters for different resolutions; and calculates a sum of absolute transformed differences for a frame by equalizing current frame and reference frame resolution.

EXAMPLE CLAUSES

A. A method comprising: obtaining a sequence comprising a plurality of frames, at least some frames of the plurality of frames having a resolution different from at least some other frames of the plurality of frames; calculating a complexity value of a current frame of the plurality of frames according to at least one parameter calculated during performing motion prediction on the current frame, the complexity value being cumulative over at least some frames of the sequence having different resolutions; and calculating a quantization step of the current frame according to the complexity value of the current frame.

B. The method as paragraph A recites, wherein performing motion prediction on the current frame references a reference frame having a resolution different from a resolution of the current frame, and further comprises one of: resizing the reference frame to a resolution matching the resolution of the current frame, and resizing both the reference frame and the current frame to a common resolution.

C. The method as paragraph A recites, wherein the at least one parameter comprises a sum of absolute transformed differences of the current frame calculated according to performing motion prediction on the current frame.

D. The method as paragraph A recites, wherein the complexity sum value is calculated from at least one frame-specific parameter that reduces resolution-dependence of the complexity sum value.

E. The method as paragraph D recites, wherein a parameter among the at least one parameter inversely correlates to a difference between resolutions of the current frame and the reference frame.

F. The method as paragraph D recites, wherein a parameter among the at least one parameter inversely correlates to resolution of the frame relative to the sum of absolute transformed differences.

G. The method as paragraph A recites, wherein the complexity sum value references a complexity sum value of another frame having a same resolution as a resolution of the current frame, and the frame number references a frame number of another frame having a same resolution as a resolution of the current frame.

H. The method as paragraph A recites, wherein the complexity sum value references a complexity sum value of another frame having a different resolution from a resolution of the current frame, and the frame number references a frame number of another frame having a different resolution from a resolution of the current frame.

I. The method as paragraph A recites, wherein the quantization step is calculated according to a base complexity specific to the current frame.

J. The method as paragraph I recites, wherein the base complexity is calculated from an MB count specific to the current frame.

K. The method as paragraph J recites, wherein the base complexity is further calculated from a parameter correlating to variation in frame resolution over the sequence.

L. The method as paragraph J recites, wherein the MB count is derived from at least a cumulative number of MBs over the sequence up to the current frame.

M. A system comprising: one or more processors; and memory communicatively coupled to the one or more processors, the memory storing computer-executable modules executable by the one or more processors that, when executed by the one or more processors, perform associated operations, the computer-executable modules including: an encoder module configured to obtain a sequence comprising a plurality of frames, at least some frames of the plurality of frames having a resolution different from at least some other frames of the plurality of frames, and perform motion prediction on the current frame, and a rate control module, the rate control module further comprising a complexity calculating submodule configured to calculate a complexity value of a current frame of the plurality of frames according to at least one parameter calculated during performing motion prediction on the current frame, the complexity value being cumulative over at least some frames of the sequence having different resolutions; and a quantization step calculating submodule configured to calculate a quantization step of the current frame according to the complexity value of the current frame.

N. The system as paragraph M recites, wherein the encoder module is configured to perform motion prediction on the current frame by referencing a reference frame having a resolution different from a resolution of the current frame, and by one of:resizing the reference frame to a resolution matching the resolution of the current frame, and resizing both the reference frame and the current frame to a common resolution.

O. The system as paragraph M recites, wherein the at least one parameter comprises a sum of absolute transformed differences of the current frame, and the rate control module further comprises a SATD calculating submodule configured to calculate the sum of absolute transformed differences according to performing motion prediction on the current frame.

P. The system as paragraph M recites, wherein the complexity sum calculating submodule is configured to calculate the complexity sum value from at least one frame-specific parameter that reduces resolution-dependence of the complexity sum value.

Q. The system as paragraph P recites, wherein a parameter among the at least one parameter inversely correlates to a difference between resolutions of the current frame and the reference frame.

R. The system as paragraph P recites, wherein a parameter among the at least one parameter inversely correlates to resolution of the frame relative to the sum of absolute transformed differences.

S. The system as paragraph M recites, wherein the complexity sum value references a complexity sum value of another frame having a same resolution as a resolution of the current frame, and the frame number references a frame number of another frame having a same resolution as a resolution of the current frame.

T. The system as paragraph M recites, wherein the complexity sum value references a complexity sum value of another frame having a different resolution from a resolution of the current frame, and the frame number references a frame number of another frame having a different resolution from a resolution of the current frame.

U. The system as paragraph M recites, wherein the quantization step calculation submodule is configured to calculate the quantization step according to a base complexity specific to the current frame.

V. The system as paragraph U recites, wherein the rate control module further comprises a base complexity calculating submodule configured to calculate base complexity from an MB count specific to the current frame.

W. The system as paragraph V recites, wherein the base complexity calculating submodule is further configured to calculate base complexity from a parameter correlating to variation in frame resolution over the sequence.

X. The system as paragraph V recites, wherein the rate control module further comprises an MB count calculating submodule configured to derive the MB count from at least a cumulative number of MBs over the sequence up to the current frame.

Y. A computer-readable storage medium storing computer-readable instructions executable by one or more processors, that when executed by the one or more processors, cause the one or more processors to perform operations comprising: obtaining a sequence comprising a plurality of frames, at least some frames of the plurality of frames having a resolution different from at least some other frames of the plurality of frames; calculating a complexity value of a current frame of the plurality of frames according to at least one parameter calculated during performing motion prediction on the current frame, the complexity value being cumulative over at least some frames of the sequence having different resolutions; and calculating a quantization step of the current frame according to the complexity value of the current frame.

Z. The computer-readable storage medium as paragraph Y recites, wherein performing motion prediction on the current frame references a reference frame having a resolution different from a resolution of the current frame, and further comprises one of: resizing the reference frame to a resolution matching the resolution of the current frame, and resizing both the reference frame and the current frame to a common resolution.

AA. The computer-readable storage medium as paragraph Y recites, wherein the at least one parameter comprises a sum of absolute transformed differences of the current frame calculated according to performing motion prediction on the current frame.

BB. The computer-readable storage medium as paragraph Y recites, wherein the complexity sum value is calculated from at least one frame-specific parameter that reduces resolution-dependence of the complexity sum value.

CC. The computer-readable storage medium as paragraph BB recites, wherein a parameter among the at least one parameter inversely correlates to a difference between resolutions of the current frame and the reference frame.

DD. The computer-readable storage medium as paragraph BB recites, wherein a parameter among the at least one parameter inversely correlates to resolution of the frame relative to the sum of absolute transformed differences.

EE. The computer-readable storage medium as paragraph Y recites, wherein the complexity sum value references a complexity sum value of another frame having a same resolution as a resolution of the current frame, and the frame number references a frame number of another frame having a same resolution as a resolution of the current frame.

FF. The computer-readable storage medium as paragraph Y recites, wherein the complexity sum value references a complexity sum value of another frame having a different resolution from a resolution of the current frame, and the frame number references a frame number of another frame having a different resolution from a resolution of the current frame.

GG. The computer-readable storage medium as paragraph Y recites, wherein the quantization step is calculated according to a base complexity specific to the current frame.

HH. The computer-readable storage medium as paragraph GG recites, wherein the base complexity is calculated from an MB count specific to the current frame.

II. The computer-readable storage medium as paragraph HH recites, wherein the base complexity is further calculated from a parameter correlating to variation in frame resolution over the sequence.

JJ. The computer-readable storage medium as paragraph HH recites, wherein the operations further comprise deriving the MB count from at least a cumulative number of MBs over the sequence up to the current frame.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the claims.

Claims

A method comprising:

obtaining a sequence comprising a plurality of frames, at least some frames of the plurality of frames having a resolution different from at least some other frames of the plurality of frames;

calculating a complexity value of a current frame of the plurality of frames according to at least one parameter calculated during performing motion prediction on the current frame, the complexity value being cumulative over at least some frames of the sequence having different resolutions; and

calculating a quantization step of the current frame according to the complexity value of the current frame.
The method of claim 1, wherein performing motion prediction on the current frame references a reference frame having a resolution different from a resolution of the current frame, and further comprises one of: resizing the reference frame to a resolution matching the resolution of the current frame, and resizing both the reference frame and the current frame to a common resolution.
The method of claim 1, wherein the at least one parameter comprises a sum of absolute transformed differences of the current frame calculated according to performing motion prediction on the current frame.
The method of claim 1, wherein the complexity sum value is calculated from at least one frame-specific parameter that reduces resolution-dependence of the complexity sum value.
The method of claim 4, wherein a parameter among the at least one parameter inversely correlates to a difference between resolutions of the current frame and the reference frame.
The method of claim 4, wherein a parameter among the at least one parameter inversely correlates to resolution of the frame relative to the sum of absolute transformed differences.
The method of claim 1, wherein the complexity sum value references a complexity sum value of another frame having a same resolution as a resolution of the current frame, and the frame number references a frame number of another frame having a same resolution as a resolution of the current frame.
The method of claim 1, wherein the complexity sum value references a complexity sum value of another frame having a different resolution from a resolution of the current frame, and the frame number references a frame number of another frame having a different resolution from a resolution of the current frame.
The method of claim 1, wherein the quantization step is calculated according to a base complexity specific to the current frame.
The method of claim 9, wherein the base complexity is calculated from an MB count specific to the current frame.
The method of claim 10, wherein the base complexity is further calculated from a parameter correlating to variation in frame resolution over the sequence.
The method of claim 10, wherein the MB count is derived from at least a cumulative number of MBs over the sequence up to the current frame.
A system comprising:

one or more processors; and

memory communicatively coupled to the one or more processors, the memory storing computer-executable modules executable by the one or more processors that, when executed by the one or more processors, perform associated operations, the computer-executable modules including:

an encoder module configured to obtain a sequence comprising a plurality of frames, at least some frames of the plurality of frames having a resolution different from at least some other frames of the plurality of frames, and perform motion prediction on the current frame, and

a rate control module, the rate control module further comprising:

a complexity calculating submodule configured to calculate a complexity value of a current frame of the plurality of frames according to at least one parameter calculated during performing motion prediction on the current frame, the complexity value being cumulative over at least some frames of the sequence having different resolutions; and

a quantization step calculating submodule configured to calculate a quantization step of the current frame according to the complexity value of the current frame.
The system of claim 13, wherein the encoder module is configured to perform motion prediction on the current frame by referencing a reference frame having a resolution different from a resolution of the current frame, and by one of: resizing the reference frame to a resolution matching the resolution of the current frame, and resizing both the reference frame and the current frame to a common resolution.
The system of claim 13, wherein the at least one parameter comprises a sum of absolute transformed differences of the current frame, and the rate control module further comprises a SATD calculating submodule configured to calculate the sum of absolute transformed differences according to performing motion prediction on the current frame.
The system of claim 13, wherein the complexity sum calculating submodule is configured to calculate the complexity sum value from at least one frame-specific parameter that reduces resolution-dependence of the complexity sum value.
The system of claim 13, wherein the complexity sum value references a complexity sum value of another frame having a same resolution as a resolution of the current frame, and the frame number references a frame number of another frame having a same resolution as a resolution of the current frame.
The system of claim 13, wherein the quantization step is calculated according to a base complexity specific to the current frame.
The system of claim 18, wherein the rate control module further comprises a base complexity calculating submodule configured to calculate base complexity from an MB count specific to the current frame.
A computer-readable storage medium storing computer-readable instructions executable by one or more processors, that when executed by the one or more processors, cause the one or more processors to perform operations comprising:

obtaining a sequence comprising a plurality of frames, at least some frames of the plurality of frames having a resolution different from at least some other frames of the plurality of frames;

calculating a complexity value of a current frame of the plurality of frames according to at least one parameter calculated during performing motion prediction on the current frame, the complexity value being cumulative over at least some frames of the sequence having different resolutions; and

calculating a quantization step of the current frame according to the complexity value of the current frame.