US20150016513A1 - Picture-level rate control for video encoding - Google Patents
Picture-level rate control for video encoding Download PDFInfo
- Publication number
- US20150016513A1 US20150016513A1 US14/503,158 US201414503158A US2015016513A1 US 20150016513 A1 US20150016513 A1 US 20150016513A1 US 201414503158 A US201414503158 A US 201414503158A US 2015016513 A1 US2015016513 A1 US 2015016513A1
- Authority
- US
- United States
- Prior art keywords
- picture
- current picture
- pictures
- bitcount
- complexity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/577—Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
-
- H04N19/00096—
-
- H04N19/00296—
-
- H04N19/00448—
-
- H04N19/00721—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
- H04N19/126—Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/149—Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/152—Data rate or code amount at the encoder output by measuring the fullness of the transmission buffer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/157—Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
- H04N19/159—Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/18—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a set of transform coefficients
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/36—Scalability techniques involving formatting the layers as a function of picture distortion after decoding, e.g. signal-to-noise [SNR] scalability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
Definitions
- Embodiments of the invention are related to video encoding and more particularly to a high quality rate controller for various video coding environments.
- Digital signal compression is widely used in many multimedia applications and devices.
- Digital signal compression using a coder/decoder allows streaming media, such as audio or video signals to be transmitted over the Internet or stored on compact discs.
- codec coder/decoder
- a number of different standards of digital video compression have emerged, including H.261, H.263; DV; MPEG-1, MPEG-2, MPEG-4, VC1; and AVC (H.264).
- These standards, as well as other video compression technologies seek to efficiently represent a video frame picture by eliminating the spatial and temporal redundancies in the picture and among successive pictures.
- video contents can be carried in highly compressed video bit streams, and thus efficiently stored in disks or transmitted over networks.
- MPEG-4 AVC Advanced Video Coding
- H.264 is a video compression standard that offers significantly greater compression than its predecessors.
- the H.264 standard is expected to offer up to twice the compression of the earlier MPEG-2 standard.
- the H.264 standard is also expected to offer improvements in perceptual quality. As a result, more and more video content is being delivered in the form of AVC(H.264)-coded streams.
- Two rival DVD formats, the HD-DVD format and the Blu-Ray Disc format support H.264/AVC High Profile decoding as a mandatory player feature.
- AVC(H.264) coding is described in detail in “Draft of Version 4 of H.264/AVC (ITU-T Recommendation H.264 and ISO/IEC 14496-10 (MPEG-4 part 10) Advanced Video Coding)” by Gary Sullivan, Thomas Wiegand and Ajay Luthra, Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6), 14th Meeting: Hong Kong, CH 18-21 January, 2005, the entire contents of which are incorporated herein by reference for all purposes.
- JVT Joint Video Team
- Video signal coding often involve situations in which video is to be encoded at a given bit rate, a give frame rate and a given buffer size.
- Rate-control schemes have been developed to address these issues. These rate-control schemes can be classified into two major categories: constant-bit-rate (CBR) control for the constant-channel-bandwidth video transmission and variable-bit-rate (VBR) control for the variable-channel-bandwidth video transmission. These rate-control schemes can be further classified according to the unit of rate-control operation, e.g., macroblock-, slice-, or frame-layer rate control. Rate-control schemes determine how to allocate proper bits to each coding unit according to the buffer status and how to adjust an encoder quantization parameter (QP) to properly encode each unit with the allocated bits.
- QP encoder quantization parameter
- FIG. 1 is a block diagram illustrating the rate control performed in four stages.
- FIG. 2 is a block diagram illustrating the rate control of the present invention.
- FIG. 3 is a block diagram of Target Bits Estimator in the stage 2 of the rate control.
- FIG. 4 is a block diagram of the QP controller in the stage 2 of the rate control of the present invention.
- FIG. 5 is a block diagram illustrating an apparatus for implementing video coding using picture level rate control according to an embodiment of the present invention.
- FIG. 6 is a block diagram illustrating an example of an alternative apparatus for implementing video coding using picture level rate control according to an embodiment of the present invention.
- FIG. 7 is a block diagram of a computer readable medium containing computer readable instructions for implementing picture level rate control in accordance with an embodiment of the present invention.
- Embodiments of the invention are related to a high quality rate controller for various video coding environments, including multi-processor architecture. Embodiments of the invention provide a more accurate and effective rate distortion model for smoother quantization parameter transition to provide more stable perceptive experience.
- a rate controller is able to generate a series of proper quantization parameters, one of which is for each picture frame of a video sequence to meet a target bitrate and a target visual quality.
- the proposed rate control algorithm aims to be applicable for various coding conditions, ranging from different target bit rates, frame resolutions, buffer restrictions, memory limitation, constant/variable bitrates, processor architectures, etc.
- the rate control algorithm described herein has been proposed and developed to control video bitrate and video quality imposed by the requirement of a user's applications.
- a rate controller is an essential component of a complete video coding system. Given a pre-specified coding condition, a rate controller may generate a series of proper quantization parameters, each of which is for a corresponding picture or frame of a video sequence to meet the target bitrate and the target visual quality.
- the proposed rate control algorithm aims to be applicable for various coding conditions, ranging from different target bit rates, frame resolutions, buffer restrictions, memory limitation, constant/variable bitrates, processor architectures, etc.
- the proposed rate control algorithm 100 may be described in terms of four stages, identified as Stage 1, Stage 2, Stage 3 and Stage 4 as shown in FIG. 1 .
- Stage 1 is mainly used to set up an initial status of a rate control data buffer 102 .
- the rate control data buffer 102 is configured to store data that are relevant to the rate control algorithm.
- data may include statistical data 103 such as a number of bits for one or more previously encoded pictures, a complexity determined from sequence information 105 relating to one or more previously encoded pictures and/or the current picture, a quantization parameter estimated, e.g., from one or more previously encoded pictures and other relevant data.
- the statistical data may also include distortions computed by comparing reconstructions of encoded pictures to the corresponding original pictures. By way of example, distortion may be measured as a sum of squared errors between an original picture and a reconstructed picture. Distortion may also be measured between corresponding sub-units of a picture, such as blocks, macroblocks, slices, etc.
- the sequence information 105 may include, e.g., a frame rate and/or a bitrate for a particular group of pictures (GOP).
- stage 1 sometimes referred to herein as the initialization stage, a source picture 107 is input to be re-ordered based on a user-specified coding pattern and video detection results.
- parameters may be reset and memory and buffer space may be allocated.
- the source picture 107 to be encoded may be examined for its complexity.
- a picture-level quantization parameter is derived based on the data collected in the rate control data buffer 102 and source video frame.
- the picture-level QP may be derived based on the picture type of the source picture 107 , a complexity of the source picture 107 , an estimated target number of bits and an underlying rate distortion model. Other factors such as picture distortion, buffer fullness, and a QP clipping scheme with a previously coded frame may also be taken into account to determine the final QP for the source picture 107 .
- Stage 3 the final QP determined in Stage 2 is sent to one or more main coding modules 104 for encoding of the source picture 107 .
- Each coding module 104 may implement typical picture coding functions, such as intra search and mode decision.
- Stage 3 may be implemented, e.g., by passing the QP to a calling function that actually encodes the video frame.
- the resulting encoded picture 109 may be stored in a coded picture buffer CPB. Any suitable coding method may be used in implementing stage 3.
- Stage 4 statistical data is collected and updated in the rate control buffer 102 .
- the encoded bit stream corresponding to the encoded source picture 109 is examined for its size, and the distortion between the pixels for the original source picture 107 and the pixels from its reconstruction from the encoded source picture 109 are calculated and recorded.
- the statistical data 103 stored in a rate control data buffer and its interaction with the functional blocks of a rate controller 200 is shown in FIG. 2 .
- sequence-level information 105 may be used to define pre-specified constants and variables.
- buffer management and connection of the rate controller 200 with other major threads may be established at this stage.
- a target bit estimator 106 estimates a target bitcount for the current picture, frame, or field.
- the estimator uses source picture information 113 , e.g., the input source pixels, the input picture type and optional information in ME phase one with the rate control data buffer 102 to estimate a target bitcount 115 .
- source picture information 113 e.g., the input source pixels, the input picture type and optional information in ME phase one with the rate control data buffer 102 to estimate a target bitcount 115 .
- a special clipping mechanism may be used to reduce the potential risk of buffer overflow.
- ME phase one refers to a first phase of motion estimation operation.
- motion may be implemented two phases, the first of which is sometimes known as ME phase one.
- ME phase one obtains somewhat less accurate, but nonetheless adequate motion information at a relatively low computation cost. This information is very up-to-date (e.g., current picture information) used for assisting target bits allocation.
- Pre-specified parameters 117 are used to compute an initial bit budget 119 .
- Examples of pre-specified parameters include, e.g., determination of the size of the sliding windows in unit of Group of Picture (GOP).
- the bits in one or more GOPs e.g., 4 GOPs
- the initial bit budget is 4,000,000 bits in the sliding window.
- a bit budget updater 108 updates the initial bit budget 119 based on the number of bits 121 corrected from the one or more previous pictures.
- the resulting updated bit budget 123 provides an input to a bitcount distributor 110 .
- a smaller window size will have a tighter bit rate controller, which may have a better bitrate convergence, but the consequence is a larger QP fluctuation, resulting in unstable video quality.
- a larger window size tends to have more stable quality since the rate controller has more flexibility of bit budget to adjust the target bit count 115 based on a longer-term projection.
- the drawback of a larger widow is its convergence speed, resulting in less accuracy of meeting the target bitrate.
- Two extreme cases are a sliding window with one frame size (e.g., 1/30 sec in the above example) and a sliding window with the total number of picture frames to be encoded.
- the next task for the target bits estimator 106 is to determine how to allocate the target bit count 115 to the current picture frame 107 .
- the easiest way in the above example is to equally distribute 24 Mbits among these 120 frames.
- this method may suffer from an inefficient distribution due to ignorance of the coding characteristics of different coding picture types (e.g., Intra picture (I-picture), Predictive picture (P-picture), and Bi-predictive picture (B-picture)), and content variations among the different pictures in the 120 frames.
- I-picture Intra picture
- P-picture Predictive picture
- B-picture Bi-predictive picture
- the target bits estimator 106 may include a complexity calculator 112 that computes a complexity value 125 for the current picture 107 .
- the complexity calculator may calculate a complexity for the current picture 107 based on a current picture type, a current complexity and one or more past complexities for previously encoded frames.
- the complexity calculator 112 may also take into account the content complexity, actual bit usage, and actual distortion. By way of example, and not by way of limitation, three different cases to determine the target bit count for a picture frame are discussed below.
- a simple variance is but one example, among others of a representation of picture complexity.
- a sophisticated representation may alternatively be desired.
- an average variance of a macroblock in a picture may be used.
- the current picture 107 is a picture with a scene change.
- the current picture 107 is a regular I picture
- the current picture is a regular P picture.
- the rate controller may simply assigns a QP (from its associated reference frame) plus some constant K.
- the constant K may be determined solely or partially by an up-to-date Coded Picture Buffer (CPB) fullness.
- CPB Coded Picture Buffer
- N i is the number of I pictures in a sliding window
- N p is the number of P pictures in the window
- N b is the number of B pictures in the window.
- R i , R p and R b are the actual bit usages for pictures of type I, P and B, respectively.
- r i , r p and r b are estimated bit counts for pictures of picture type I, P and B, respectively.
- D i , D p , and D b denote the distortion
- M i , M p , and M b denote the complexity for I, P, and B pictures respectively.
- the following prophetic example is a sample scenario of series of actual bits usage in encoding a video sequence from time t to t+6.
- the sequence of picture types leading up to the current picture is as follows: I, P, B, B, P, B, B, k, where k denotes the current picture 107 , which may be, e.g., a picture with scene change (in case 1), or a regular I picture (in case 2), or a regular P picture (in case 3).
- Rate R i (t), R p (t+1), R b (t+2), R b (t+3), R p (t+4), R b (t+5), R b (t+6), r k (t+7).
- the bit budget in a sliding window that starts at time t is denoted WB.
- the window includes all frames from time t up to the picture before the current picture 107 .
- the estimated bit usage and complexity for the current picture are denoted r k and M k respectively below.
- the window may have any suitable number of pictures which may be of arbitrary type.
- the target bit count 115 for the current picture (i.e., r k (t+7)) may be calculated as:
- AR k is an average actual bit count over all k pictures back to the most recent scene change I picture, exclusively, and;
- AM k is an average complexity over all k pictures back to the most recent scene change I picture, exclusively, where k is the picture type for the current picture, e.g., either I, P or B picture type.
- the target bit count may be derived by calculating a ratio of the distortion, actual bit usage and picture complexity between the latest I-picture and the latest P-picture.
- the most recent P-picture before the current picture is the P-picture and t+4.
- the current picture 107 is a regular I-picture, it may be assumed that the current I-picture is similar to the most recent I-picture in terms of content characteristics. Consequently, only a minor fine-tuning of the bit rate is needed. Otherwise a scene change I-picture for the current frame is recorded.
- target bit count 115 for the current picture i.e., r k (t+7)
- target bit count 115 for the current picture i.e., r k (t+7)
- r k ( t+ 7) [ R i ( t )/ R p ( t+ 4)]*[ D i ( t )/ D p ( t+ 4)]*[ M i ( t )/ M p ( t+ 4)]* R p ( t+ 4).
- the estimated bit count r k may be computed as follows:
- r k ( t+ 7) WB ( t+ 6)*[ R p ( t+ 4)/ M p ( t+ 4)]/[ N i *R i ( t )/ M i ( t )+ N i *R p ( t+ 4)/ M p ( t+ 4)+ N b *R b ( t+ 6)/ M b ( t+ 6))].
- the bitcount distributor 108 may adjust the final target bit count 115 according to CPB status and bitrate accuracy to reduce of the risk of CPB overflow and underflow.
- the target bits estimator 106 may include a buffer regulator 116 that provides relevant CPB status information to the bit count distributor 108 for adjustment of the target bit count 115 .
- the target bit count 115 may be adjusted differently depending on whether the buffer is approaching overflow or underflow.
- CPB full CPB curr /CPB size
- a potential CPB overflow may exist when CPB full is increasing and is above a pre-defined upper threshold CPB max .
- the coded picture buffer CPB may be approaching a potential CPB underflow situation if CPB fullness is moving downward and is below a pre-defined lower threshold CPB min .
- the target bit count 115 i.e., r k
- VBR variable bitrate
- CPB cushion (CPB size ⁇ CPB_curr)/(encoder's target_bitrate), and 0.0 ⁇ incr_% ⁇ 1.0.
- sec_threshold refers to a threshold value for the CPB cushion in units of time (e.g., seconds)
- a threshold value for the CPB cushion in units of time (e.g., seconds)
- the target bits estimator 106 sends the target bitcount to a QP controller 114 , which then uses the target bitcount 115 along with distortion and source pixel information in rate control data buffer 102 to derive the QP.
- the QP controller 114 may derive the QP as shown in FIG. 4 .
- the QP controller 114 may implement a complexity function that computes a complexity factor based on a target complexity, an average complexity over a window of two or more previous frames, and a complexity stabilizer factor.
- the QP controller 114 may implement a distortion function that computes a distortion factor based on a distortion for a previous frame, an average distortion taken over a window of two or more previous frames and a distortion stabilizer factor.
- the QP controller may implement a function that computes an estimated bitcount based on a target bitcount for the current frame 107 , an average bitcount taken over a window of two or more previous frames and a bitcount stabilizer factor.
- the QP controller 114 depicted in FIG. 4 may include functional blocks (f( ) that compute the complexity, distortion and bitcount. Each functional block may receive one or more stabilizer factors as inputs.
- the stabilizers may be used to reduce large fluctuations in complexity, bit count, and distortion.
- the rate controller 114 may assign either constant or adaptive values to stabilizer terms S1 and S2, so that the rate controller 114 can obtain a more stable value of the estimated bitcount A than might be obtained by a simple ratio, e.g., B/C. If the values of the stabilizers are chosen properly they tend to stabilize the value of (B+S1)/(C+S2).
- Similar stabilizer terms may be used to stabilize similar computations of the complexity factor and distortion factor.
- the QP controller 114 may include a QP Modulator that determines a raw QP value based on the bitcount, distortion and complexity factors.
- the QP controller may further include clipping mechanism 118 that restricts the range of the resulting QP value.
- the proposed rate controller 200 may derive the QP by considering the interaction of the following major factors: picture type, picture complexity, picture distortion and target bitcount 115 . With these factors and their interaction relation, the following approach may be used.
- the QP controller 114 is to derive the final QP value 127 based on the target bit budget calculated from the Bit Count Distributor 110 .
- QP controller 114 is one of the key components in the rate controller 200 .
- the QP controller 114 has a direct impact on bit count and visual quality. To achieve the best quality, an iterative approach may be used to minimize distortion by finding the best QP. However, this may be inefficient. In embodiments of the present invention, by contrast, the goal is to achieve reasonably good visual quality in a more efficient manner.
- the QP controller 114 derives a QP that is initially based on a picture type for the current picture 107 .
- Different picture types have different methods to derive the corresponding QPs.
- five different cases may be considered: (1) the very first IDR picture of the video sequence, (2) an IDR picture with scene change, (3) a regular IDR and I picture, (4) regular P picture, and (5) non-reference B picture.
- an IDR picture (or IDR frame) is a special type of I picture (or I frame).
- the main difference is that when an encoder assigns an IDR to a picture/frame, it means that all the reference frames' information in the frame buffer are gone. Consequently, those references frames cannot be used in subsequent encoding.
- the QP in the case of the first IDR picture in a video sequence may be derived based on the complexity, the coding conditions, and some general assumptions.
- the basic idea is to find out the relation between I-picture and P-picture, and P-picture and B-picture in terms of coding bits complexity.
- the values of ratio p and ratio b may be calculated as follows:
- ratio b picture_complexity*ratio p .
- picture complexity refers to the complexity for the current picture since, in this example, the current picture is the first picture in a sequence.
- a simple first order RD model may be applied to obtain the quantization value (referred to herein as an actual QP).
- this quantization value may be quite different from the final QP (referred to herein as a syntax QP, which is a syntax element and embedded in a bitstream) since the former QP is really used in a quantizer.
- the formula QP syntax 6.0*log 10 (QP actual )/log 10 (2.0) may be used.
- the result value of QP syntax may be clipped in a pre-defined range between a minimum value QP min and a maximum value QP max to produce the final QP value 127 .
- the new QP may be derived based on the statistical data 103 including average complexity, average bit usage and average QP from all of its previous I-pictures up to the previously closest IDR with scene change.
- the old R/M ratio may be determined from R k ⁇ 1 /M k ⁇ 1 , where R k ⁇ 1 and M k ⁇ 1 are the actual bit usage and complexity for the frame preceding the current frame 107 .
- the new actual QP vale may be determined according to:
- the new actual QP value may be converted to a new syntax QP value as discussed above.
- the new QP value may be very different from the QP value for the immediately preceding frame if the current frame 107 is a scene change frame.
- the QP clipping mechanism 118 may calculate a complexity difference from the previous frame. The clipping mechanism 118 may then define a range of QP change to forcefully limit the QP change.
- the following clipping scheme may be used.
- QP range multiplier*(max(M k , M k ⁇ 1 )/min(M k , M k ⁇ 1 )), where M k ⁇ 1 is the complexity for the frame immediately preceding the current frame.
- the multiplier may be a constant value determined empirically.
- a multiplier having a constant value of 2 may be used.
- the final QP value 127 may be restricted to the range of:
- the QP controller 114 may work directly on the value of QP syntax . Since the picture frame is regular frame, implying that no noticeable changes occur in video characteristic. (Otherwise scene change should be recorded), to maintain a relatively steady value of QO syntax , a LOG operation on the ratio of actual bit count to complexity may be applied. The following RD formula may be used to derive the value of QP syntax for the current frame 107 (denoted QP k ) from the value of QP syntax for the previous frame (which is denoted QP k ⁇ 1 ).
- the final value of QP syntax for the current frame 107 may be computed as follows.
- QP k ⁇ 1 is the value of QP syntax for the frame preceding the current frame 107 .
- the QP controller 114 may maintain a steady value of QP syntax by logarithmically operating on the value of QP actual .
- the new actual QP value for the current frame (denoted QP actua — k ) may be derived as
- QP actual — k may then be converted into a QP syntax value as described above.
- the current picture 107 is a regular B picture, i.e., a non-reference B picture, no error will be propagated.
- a constant QP may therefore be obtained by simply adding +2 to the syntax QP of its previous reference frame. This situation also provides an opportunity for parallel encoding since there is, in general, no dependency between any two consecutive B pictures. The lack of data dependency between pictures serves as an entry point for parallelizing the encoding process. B-picture coding within two reference pictures can be performed in parallel.
- the rate control algorithm may simply return the QP to its calling function.
- the rate control collects the actual bit usage (texture bits and overhead bits might be separated), the actual picture distortion, and actual buffer fullness, and update this information in the rate control data buffer 103 .
- the process from Stage 2 through Stage 4 may be repeatedly performed in the course of video encoding for a series of video frames. It is noted that in embodiments of the present invention, the rate controller need only consider a target bit count for a reference picture (i.e., I-picture, P-picture or B-picture if it is used as a reference picture in a pyramid coding).
- a target bit count for a reference picture i.e., I-picture, P-picture or B-picture if it is used as a reference picture in a pyramid coding.
- FIG. 5 illustrates a block diagram of a computer apparatus 500 that may be used to implement parallel decoding of streaming data on three or more processors as described above.
- the apparatus 500 generally include a plurality of processor modules 501 A, 501 B, 501 C and a memory 502 .
- the processor modules 501 A, 501 B and 501 C may be components of a Cell processor.
- the memory 502 may be in the form of an integrated circuit, e.g., RAM, DRAM, ROM, and the like).
- the memory 502 may also be a main memory that is accessible by all of the processor modules 501 .
- the processors modules 501 A, 501 B, 501 C may have associated local memories 505 A, 505 B, and 505 C.
- An encoder program 503 may be stored in the main memory 502 in the form of processor readable instructions that can be executed on the processor modules 501 .
- the encoder program 503 may be configured to encode video frame data utilizing the rate control algorithm, e.g., as described above with respect to FIG. 1 , FIG. 2 , FIG. 3 , and FIG. 4 .
- the encoder program may compute a QP value in a manner that takes picture type, picture complexity, picture distortion and target bitcount into account in determining the QP value.
- the program 503 may be written in any suitable processor readable language, e.g., e.g., C, C++, JAVA, Assembly, MATLAB, FORTRAN and a number of other languages.
- Rate control data 507 may be stored in the memory 502 , e.g., in a rate control buffer, as described above. Such rate control data may include statistical data rating to bit utilization, complexity, distortion, QP, etc for a window of previous frames.
- portions of program code and/or data 507 may be loaded into the local stores 505 A, 505 B, and 505 C for parallel processing by the processor modules 501 A, 501 B, 501 C.
- the apparatus 500 may also include well-known support functions 510 , such as input/output (I/O) elements 511 , power supplies (P/S) 512 , a clock (CLK) 513 and cache 514 .
- the device 500 may optionally include a mass storage device 515 such as a disk drive, CD-ROM drive, tape drive, or the like to store programs and/or data.
- the device 500 may also optionally include a display unit 516 and user interface unit 518 to facilitate interaction between the apparatus 500 and a user.
- the display unit 516 may be in the form of a cathode ray tube (CRT) or flat panel screen that displays text, numerals, graphical symbols or images.
- CTR cathode ray tube
- the user interface 518 may include a keyboard, mouse, joystick, light pen or other device that may be used in conjunction with a graphical user interface (GUI).
- GUI graphical user interface
- the apparatus 500 may also include a network interface 520 to enable the device to communicate with other devices over a network, such as the internet. These components may be implemented in hardware, software or firmware or some combination of two or more of these.
- FIG. 6 illustrates a possible configuration of a cell processor 600 .
- the cell processor 600 includes a main memory 602 , a single power processor element (PPE) 604 and eight synergistic processor elements (SPE) 606 .
- PPE power processor element
- SPE synergistic processor elements
- the cell processor 601 may be configured with any number of SPEs.
- the cell processor 600 may be characterized by an architecture known as a Cell Broadband engine architecture (CBEA)-compliant processor.
- CBEA-compliant architecture multiple PPEs may be combined into a PPE group and multiple SPEs may be combined into an SPE group.
- the cell processor 600 is depicted as having only a single SPE group and a single PPE group with a single SPE and a single PPE.
- a cell processor can include multiple groups of power processor elements (PPE groups) and multiple groups of synergistic processor elements (SPE groups).
- CBEA-compliant processors are described in detail, e.g., in Cell Broadband Engine Architecture, which is available online at: http://www-306ibm.com/chips/techlib/techlib.nsf/techdocs/1AEEE1270EA2776 387257060006E61BA/$file/CBEA — 01_pub.pdf, which is incorporated herein by reference.
- the PPE 604 may be 64-bit PowerPC Processor Unit (PPU) with associated caches.
- the PPE 604 may include an optional vector multimedia extension unit.
- Each SPE 606 includes a synergistic processor unit (SPU) and a local store (LS).
- the local store may have a capacity of e.g., about 256 kilobytes of memory for code and data.
- the SPUs are less complex computational units than PPU, in that they typically do not perform any system management functions.
- the SPUs may have a single instruction, multiple data (SIMD) capability and typically process data and initiate any required data transfers (subject to access properties set up by a PPE) in order to perform their allocated tasks.
- SIMD single instruction, multiple data
- the SPUs allow the system 600 to implement applications that require a higher computational unit density and can effectively use the provided instruction set.
- a significant number of SPEs 606 in the system 600 managed by the PPE 604 , allows for cost-effective processing over a wide range of applications.
- the memory 602 , PPE 604 , and SPEs 606 may communicate with each other and with an I/O device 608 over a ring-type element interconnect bus 610 .
- the memory 602 may contain rate control data 603 having features in common with the rate control data 507 described above.
- the memory 602 may also store an encoder program 609 having features in common with the encoder program 503 described above.
- At least one of the SPE 606 may include in its local store (LS) encoding instructions 605 and/or a portion of the rate control data and/or input video frame data that is to be processed in parallel, e.g., as described below.
- the PPE 604 may include in its L1 cache, code instructions 607 having features in common with the encoding program 503 described above. Instructions 605 and data 607 may also be stored in memory 602 for access by the SPE and PPE when needed.
- the rate control algorithm depicted in FIG. 1 and described further with respect to FIGS. 2-4 may be implemented on an apparatus of the type described with respect to FIG. 5 or FIG. 6 through a series of function calls.
- the Initialization Stage (Stage 1) may be implemented by calling a function referred to herein as PicRateCtrlInit( ).
- the PicRateCtrlInit( ) function may be called one time only by an encoder SPU main control thread of the encoder program 507 or 603 in the entire course of encoding.
- the PicRateCtrlInit( ) function may thus serve as any entry point to the rate control portion of the encoder program.
- the PicRateCtrlInit( ) function may return an error message.
- the rate control instance memory is the same as the amount of space available in the rate control buffer.
- the PicRateCtrlInit( ) function may also return an error message if the rate control instance memory is used currently being used by a rate control instance. If no error condition exists, the PicRateCtrlInit( ) function may create rate control handle and allocate memory accordingly based on input parameters.
- a rate control handle refers to a particular type of pointer commonly used in computer program implementations.
- a rate control handle is a pointer to a memory address at which a particular rate controller's data may be accessed.
- the inputs to PicRateCtrlInit( ) may include (1) an SPU thread configuration buffer, (2) test driver control parameters, (3) stream level configurations, and (4) frame level configurations.
- the output of the PicRateCtrlInit( ) function is a handle to Picture Rate Control Buffer 102 .
- the preparation stage (Stage 2 of FIG. 1 ) may be implemented by calling a function referred to herein as PicRateCtrlPrepare( ).
- the main task of this function is to derive a QP value based on the input data.
- the PicRateCtrlPrepare( ) function may be called at the beginning of encoding for each picture, and is the key of the rate control algorithm.
- the inputs to PicRateCtrlPrepare( ) may include a rate control handle, a frame level configuration, an input frame buffer, and the rate control data buffer.
- the PicRateCtrlPrepare( ) function may implement the following operations:
- the encoding stage (Stage 3) may be implemented by calling a PicRateCtrlEncode( ) function.
- the PicRateCtrlEncode( ) function may be called to obtain the final QP for a given picture.
- the PicRateCtrlEncode( ) function may be called to obtain a final QP value for s subsection of a picture (e.g., a slice or macroblock).
- embodiments of the invention may be extended to rate control at the macroblock level.
- PicRateCtrlEncode( ) function may also include call other functions that are conventionally used in encoding a video picture, e.g., functions for Network Abstraction Layer (NAL) coding, Video Coded Layer (VCL) encoding, and de-blocking.
- NAL Network Abstraction Layer
- VCL Video Coded Layer
- the encoding step may include a distortion calculation that is distributed and processed in parallel on multiple processors.
- the total distortion of a picture may be calculated on a section-by-section basis with distortion calculations for different sections of a picture performed in parallel using a different processor for each section.
- the distortion for each section may be calculated macroblock by macroblock by comparing the original pixels for picture prior to encoding and the reconstructed pixels.
- the distortion calculation may be done before de-blocking to speed up the overall performance since there is no need to allocate one more data path from the deblocking thread to the main thread.
- the discrepancy of the distortion calculation based on the deblocked frame and the undeblocked frame for the rate controller has been determined experimentally to be negligible.
- the distortion in each macroblock of a picture section may be carried in the existing MB information container, which may be transferred to the server via DMA. So the NAL coding thread may collect and calculate the overall distortion of the picture. This MB distortion also helps to further improve the picture quality if a macroblock-based rate control is employed.
- the update stage may be implemented by calling a PicRateCtrlUpdate( ) function.
- the PicRateCtrlUpdate( ) function may be called in two situations: (1) to record the data right after the completion of encoding MB rows at a multicore processor such as a broadband engine (BE); or (2) this is called to collect the statistical data associated with the entire current picture right after the final Video Coded Layer (VCL) bit stream is generated.
- the inputs to the PicRateCtrlUpdate( ) function may include, but are not limited to, a rate control handle, raw color space format for the image, a previously reconstructed picture, picture level coding information, and coding bits of the previous picture.
- the PicRateCtrlUpdate( ) function may internally update the Rate Control Data Buffer 102 .
- the color space format may be 420 YUV.
- This format includes on luma component (Y) and two chroma components (U and V).
- the input to MPEG-based encoders is 420 YUV, meaning that, e.g., from a resolution viewpoint, the dimension of Y is W*H and U and V each have dimensions of W/2*H/2.
- the PicRateCtrlUpdate( ) function may implement the following operations:
- FIG. 7 illustrates an example of a computer-readable storage medium 700 .
- the storage medium contains computer-readable instructions stored in a format that can be retrieved interpreted by a computer processing device.
- the computer-readable storage medium 700 may be a computer-readable memory, such as random access memory (RAM) or read only memory (ROM), a computer readable storage disk for a fixed disk drive (e.g., a hard disk drive), or a removable disk drive.
- the computer-readable storage medium 700 may be a flash memory device, a computer-readable tape, a CD-ROM, a DVD-ROM, a Blu-ray, HD-DVD, UMD, or other optical storage medium.
- the storage medium 700 optionally contain rate control initialization instructions 702 which may including one or more instructions that implement Stage 1 of the algorithm as described above.
- the initialization instructions may be configured, upon execution, to implement the PicRateCtrlInit( ) function described above.
- the storage medium 700 may include one or more rate control preparation instructions 704 .
- the preparation instructions 704 may be configured the Stage 2 of the rate control algorithm described above.
- the initialization instructions may be configured, upon execution, to implement the PicRateCtrlPrepare( ) function described above.
- the storage medium 700 may include one or more encode instructions 706 .
- the encode instructions 706 may be configured the Stage 3 of the rate control algorithm described above.
- the initialization instructions may be configured, upon execution, to implement the PicRateCtrlEncode( ) function described above.
- the storage medium 700 may include one or more rate control update instructions 708 .
- the preparation instructions 708 may be configured the Stage 4 of the rate control algorithm described above.
- the rate control update instructions may be configured, upon execution, to implement the PicRateCtrlUpdate( ) function described above.
- the rate control algorithm described above has been largely implemented in an experimental AVC encoder.
- the performance of the rate control algorithm demonstrates that the algorithm not only accurately achieves the target bitrate but also control the CPB buffer properly to construct HRD compliant AVC bitstreams.
- the encoder demonstrates a high fidelity and stable visual quality.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Algebra (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
- This application is a continuation of commonly-assigned co-pending U.S. patent application Ser. No. 12/553,070, filed Sep. 2, 2009, the entire contents of which are incorporated herein by reference.
- This application is related to commonly-assigned co-pending U.S. patent application Ser. No. 12/553,069, filed Sep. 2, 2009 and entitled “SCENE CHANGE DETECTION” (Attorney Docket Number SCEA08074US00), the entire contents of which are incorporated herein by reference.
- This application is related to commonly-assigned co-pending U.S. patent application Ser. No. 12/553,073, filed Sep. 2, 2009 and entitled “PARALLEL DIGITAL PICTURE ENCODING” (Attorney Docket Number SCEA08077US00), the entire contents of which are incorporated herein by reference.
- This application is related to commonly-assigned co-pending U.S. patent application Ser. No. 12/553,075, filed Sep. 2, 2009 and entitled “UTILIZING THRESHOLDS AND EARLY TERMINATION TO ACHIEVE FAST MOTION ESTIMATION IN A VIDEO ENCODER” (Attorney Docket Number SCEA08078US00), the entire contents of which are incorporated herein by reference.
- Embodiments of the invention are related to video encoding and more particularly to a high quality rate controller for various video coding environments.
- Digital signal compression is widely used in many multimedia applications and devices. Digital signal compression using a coder/decoder (codec) allows streaming media, such as audio or video signals to be transmitted over the Internet or stored on compact discs. A number of different standards of digital video compression have emerged, including H.261, H.263; DV; MPEG-1, MPEG-2, MPEG-4, VC1; and AVC (H.264). These standards, as well as other video compression technologies, seek to efficiently represent a video frame picture by eliminating the spatial and temporal redundancies in the picture and among successive pictures. Through the use of such compression standards, video contents can be carried in highly compressed video bit streams, and thus efficiently stored in disks or transmitted over networks.
- MPEG-4 AVC (Advanced Video Coding), also known as H.264, is a video compression standard that offers significantly greater compression than its predecessors. The H.264 standard is expected to offer up to twice the compression of the earlier MPEG-2 standard. The H.264 standard is also expected to offer improvements in perceptual quality. As a result, more and more video content is being delivered in the form of AVC(H.264)-coded streams. Two rival DVD formats, the HD-DVD format and the Blu-Ray Disc format support H.264/AVC High Profile decoding as a mandatory player feature. AVC(H.264) coding is described in detail in “Draft of
Version 4 of H.264/AVC (ITU-T Recommendation H.264 and ISO/IEC 14496-10 (MPEG-4 part 10) Advanced Video Coding)” by Gary Sullivan, Thomas Wiegand and Ajay Luthra, Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6), 14th Meeting: Hong Kong, CH 18-21 January, 2005, the entire contents of which are incorporated herein by reference for all purposes. - Video signal coding often involve situations in which video is to be encoded at a given bit rate, a give frame rate and a given buffer size.
- It is desirable to encode a video signal in a way that avoids underfloor or overflow of a client buffer due to mismatching between the source bit rate and the available channel bandwidth available for delivering the resulting compressed bitstream. Rate-control schemes have been developed to address these issues. These rate-control schemes can be classified into two major categories: constant-bit-rate (CBR) control for the constant-channel-bandwidth video transmission and variable-bit-rate (VBR) control for the variable-channel-bandwidth video transmission. These rate-control schemes can be further classified according to the unit of rate-control operation, e.g., macroblock-, slice-, or frame-layer rate control. Rate-control schemes determine how to allocate proper bits to each coding unit according to the buffer status and how to adjust an encoder quantization parameter (QP) to properly encode each unit with the allocated bits.
- It is within this context that embodiments of the invention arise.
- The teachings of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
-
FIG. 1 is a block diagram illustrating the rate control performed in four stages. -
FIG. 2 is a block diagram illustrating the rate control of the present invention. -
FIG. 3 is a block diagram of Target Bits Estimator in thestage 2 of the rate control. -
FIG. 4 is a block diagram of the QP controller in thestage 2 of the rate control of the present invention. -
FIG. 5 is a block diagram illustrating an apparatus for implementing video coding using picture level rate control according to an embodiment of the present invention. -
FIG. 6 is a block diagram illustrating an example of an alternative apparatus for implementing video coding using picture level rate control according to an embodiment of the present invention. -
FIG. 7 is a block diagram of a computer readable medium containing computer readable instructions for implementing picture level rate control in accordance with an embodiment of the present invention. - Although the following detailed description contains many specific details for the purposes of illustration, anyone of ordinary skill in the art will appreciate that many variations and alterations to the following details are within the scope of the invention. Accordingly, the exemplary embodiments of the invention described below are set forth without any loss of generality to, and without imposing limitations upon, the claimed invention.
- Embodiments of the invention are related to a high quality rate controller for various video coding environments, including multi-processor architecture. Embodiments of the invention provide a more accurate and effective rate distortion model for smoother quantization parameter transition to provide more stable perceptive experience.
- According to an embodiment of the invention, given a pre-specified coding condition, a rate controller is able to generate a series of proper quantization parameters, one of which is for each picture frame of a video sequence to meet a target bitrate and a target visual quality. The proposed rate control algorithm aims to be applicable for various coding conditions, ranging from different target bit rates, frame resolutions, buffer restrictions, memory limitation, constant/variable bitrates, processor architectures, etc.
- The rate control algorithm described herein has been proposed and developed to control video bitrate and video quality imposed by the requirement of a user's applications. As noted above, a rate controller is an essential component of a complete video coding system. Given a pre-specified coding condition, a rate controller may generate a series of proper quantization parameters, each of which is for a corresponding picture or frame of a video sequence to meet the target bitrate and the target visual quality. The proposed rate control algorithm aims to be applicable for various coding conditions, ranging from different target bit rates, frame resolutions, buffer restrictions, memory limitation, constant/variable bitrates, processor architectures, etc.
- The proposed rate control algorithm 100 may be described in terms of four stages, identified as
Stage 1,Stage 2,Stage 3 andStage 4 as shown inFIG. 1 .Stage 1 is mainly used to set up an initial status of a ratecontrol data buffer 102. The ratecontrol data buffer 102 is configured to store data that are relevant to the rate control algorithm. Such data may includestatistical data 103 such as a number of bits for one or more previously encoded pictures, a complexity determined fromsequence information 105 relating to one or more previously encoded pictures and/or the current picture, a quantization parameter estimated, e.g., from one or more previously encoded pictures and other relevant data. The statistical data may also include distortions computed by comparing reconstructions of encoded pictures to the corresponding original pictures. By way of example, distortion may be measured as a sum of squared errors between an original picture and a reconstructed picture. Distortion may also be measured between corresponding sub-units of a picture, such as blocks, macroblocks, slices, etc. - The
sequence information 105 may include, e.g., a frame rate and/or a bitrate for a particular group of pictures (GOP). Instage 1, sometimes referred to herein as the initialization stage, asource picture 107 is input to be re-ordered based on a user-specified coding pattern and video detection results. At this stage, parameters may be reset and memory and buffer space may be allocated. During this stage, thesource picture 107 to be encoded may be examined for its complexity. - In Stage 2 a picture-level quantization parameter (QP) is derived based on the data collected in the rate
control data buffer 102 and source video frame. The picture-level QP may be derived based on the picture type of thesource picture 107, a complexity of thesource picture 107, an estimated target number of bits and an underlying rate distortion model. Other factors such as picture distortion, buffer fullness, and a QP clipping scheme with a previously coded frame may also be taken into account to determine the final QP for thesource picture 107. - In
Stage 3 the final QP determined inStage 2 is sent to one or moremain coding modules 104 for encoding of thesource picture 107. Eachcoding module 104 may implement typical picture coding functions, such as intra search and mode decision.Stage 3 may be implemented, e.g., by passing the QP to a calling function that actually encodes the video frame. The resulting encodedpicture 109 may be stored in a coded picture buffer CPB. Any suitable coding method may be used in implementingstage 3. - In
Stage 4, statistical data is collected and updated in therate control buffer 102. The encoded bit stream corresponding to the encodedsource picture 109 is examined for its size, and the distortion between the pixels for theoriginal source picture 107 and the pixels from its reconstruction from the encodedsource picture 109 are calculated and recorded. - The
statistical data 103 stored in a rate control data buffer and its interaction with the functional blocks of arate controller 200 is shown inFIG. 2 . In the initial stage (Stage 1), sequence-level information 105 may be used to define pre-specified constants and variables. Furthermore, buffer management and connection of therate controller 200 with other major threads may be established at this stage. InStage 2, atarget bit estimator 106 estimates a target bitcount for the current picture, frame, or field. The estimator uses source pictureinformation 113, e.g., the input source pixels, the input picture type and optional information in ME phase one with the ratecontrol data buffer 102 to estimate atarget bitcount 115. Note that in CBR coding condition, a special clipping mechanism may be used to reduce the potential risk of buffer overflow. - As used herein, the expression ME phase one refers to a first phase of motion estimation operation. In certain embodiments, motion may be implemented two phases, the first of which is sometimes known as ME phase one. Typically ME phase one obtains somewhat less accurate, but nonetheless adequate motion information at a relatively low computation cost. This information is very up-to-date (e.g., current picture information) used for assisting target bits allocation.
- Two key components of the
rate controller 200 are thetarget bits estimator 106 andQP controller 114. Both of these components may be used to implementsecond stage 2 as shown inFIG. 2 . The details of operation of thetarget bits estimator 106 are illustrated inFIG. 3 .Pre-specified parameters 117 are used to compute aninitial bit budget 119. Examples of pre-specified parameters include, e.g., determination of the size of the sliding windows in unit of Group of Picture (GOP). In one implementation, the bits in one or more GOPs (e.g., 4 GOPs) may set as an initial bit budget. If GOP is set in every one second, and the target bit rate is 1 Mbps (1 Million bits per second), then the initial bit budget is 4,000,000 bits in the sliding window. Abit budget updater 108 updates theinitial bit budget 119 based on the number ofbits 121 corrected from the one or more previous pictures. The resulting updatedbit budget 123 provides an input to abitcount distributor 110. - The
bit budget updater 108 may employ a sliding window based bit budget to smooth out initial jitter (e.g., due to insufficient historic data) and the possible content jitters. For example, to encode a video sequence at 6 Mbits per second and 30 frames per second with one GOP for every second, the size of the sliding window may be set as four GOP lengths. That is, in the sliding window, there are 4×6 Mbits=24 Mbits available for 4×30=120 picture frames to be encoded. The size of the selected sliding window may be determined by a compromise between the bitrate accuracy and the smooth video quality. Generally speaking, a smaller window size will have a tighter bit rate controller, which may have a better bitrate convergence, but the consequence is a larger QP fluctuation, resulting in unstable video quality. A larger window size tends to have more stable quality since the rate controller has more flexibility of bit budget to adjust the target bit count 115 based on a longer-term projection. However, the drawback of a larger widow is its convergence speed, resulting in less accuracy of meeting the target bitrate. Two extreme cases are a sliding window with one frame size (e.g., 1/30 sec in the above example) and a sliding window with the total number of picture frames to be encoded. - The next task for the
target bits estimator 106 is to determine how to allocate the target bit count 115 to thecurrent picture frame 107. The easiest way in the above example is to equally distribute 24 Mbits among these 120 frames. However, this method may suffer from an inefficient distribution due to ignorance of the coding characteristics of different coding picture types (e.g., Intra picture (I-picture), Predictive picture (P-picture), and Bi-predictive picture (B-picture)), and content variations among the different pictures in the 120 frames. - In embodiments of the present invention, different picture coding types are taken into account in deriving the target bit count. In particular, the
target bits estimator 106 may include acomplexity calculator 112 that computes acomplexity value 125 for thecurrent picture 107. The complexity calculator may calculate a complexity for thecurrent picture 107 based on a current picture type, a current complexity and one or more past complexities for previously encoded frames. Additionally, thecomplexity calculator 112 may also take into account the content complexity, actual bit usage, and actual distortion. By way of example, and not by way of limitation, three different cases to determine the target bit count for a picture frame are discussed below. - There are many ways to represent a picture complexity. A simple variance is but one example, among others of a representation of picture complexity. A sophisticated representation may alternatively be desired. By way of example and not by way of limitation, an average variance of a macroblock in a picture may be used.
- In
case 1, thecurrent picture 107 is a picture with a scene change. Incase 2, thecurrent picture 107 is a regular I picture, and incase 3, the current picture is a regular P picture. - According to one particular implementation, if the
current picture 107 is a B picture, the rate controller may simply assigns a QP (from its associated reference frame) plus some constant K. The constant K may be determined solely or partially by an up-to-date Coded Picture Buffer (CPB) fullness. This way of handling B-pictures allows an encoder more parallel execution capability. That is, any B-picture coding can be executed in parallel within any two corresponding reference frames. - In the discussion that follows, Ni is the number of I pictures in a sliding window, Np is the number of P pictures in the window, and Nb is the number of B pictures in the window. Ri, Rp and Rb are the actual bit usages for pictures of type I, P and B, respectively. In addition, ri, rp and rb are estimated bit counts for pictures of picture type I, P and B, respectively. Di, Dp, and Db denote the distortion and Mi, Mp, and Mb denote the complexity for I, P, and B pictures respectively. The following prophetic example is a sample scenario of series of actual bits usage in encoding a video sequence from time t to t+6. The sequence of picture types leading up to the current picture is as follows: I, P, B, B, P, B, B, k, where k denotes the
current picture 107, which may be, e.g., a picture with scene change (in case 1), or a regular I picture (in case 2), or a regular P picture (in case 3). - Rate: Ri(t), Rp(t+1), Rb(t+2), Rb(t+3), Rp(t+4), Rb(t+5), Rb(t+6), rk(t+7).
- Complexity: Mi(t), Mp(t+1), Mb(t+2), Mb(t+3), Mp(t+4), Mb(t+5), Mb(t+6), Mk(t+7).
- Distortion: Di(t), Dp(t+1), Db(t+2), Db(t+3), Dp(t+4), Db(t+5), Db(t+6).
- The bit budget in a sliding window that starts at time t is denoted WB. The window includes all frames from time t up to the picture before the
current picture 107. For the sake of generality, the estimated bit usage and complexity for the current picture are denoted rk and Mk respectively below. - Consider a case where the
bit count distributor 110 is trying to estimate a target bit count rk for thecurrent picture 107. - It is noted that, in general, the window may have any suitable number of pictures which may be of arbitrary type.
- In
case 1, where thecurrent picture 107 is a picture with a scene change, the target bit count 115 for the current picture (i.e., rk(t+7)) may be calculated as: -
r k(t+7)=WB(t+6)*AR i/(N i *AR i /AM i +N p *AR p /AM p+Nb *AR b /AM b), where: - ARk is an average actual bit count over all k pictures back to the most recent scene change I picture, exclusively, and;
- AMk is an average complexity over all k pictures back to the most recent scene change I picture, exclusively, where k is the picture type for the current picture, e.g., either I, P or B picture type.
- In
case 2, where thecurrent picture 107 is a regular I picture, the goal is to provide a smooth transition from the most recent P-picture. In such a case, the target bit count may be derived by calculating a ratio of the distortion, actual bit usage and picture complexity between the latest I-picture and the latest P-picture. In the picture type sequence in the above example the most recent P-picture before the current picture is the P-picture and t+4. If thecurrent picture 107 is a regular I-picture, it may be assumed that the current I-picture is similar to the most recent I-picture in terms of content characteristics. Consequently, only a minor fine-tuning of the bit rate is needed. Otherwise a scene change I-picture for the current frame is recorded. Assuming thecurrent picture 107 is a regular I-picture, and not a scene-change I-picture, target bit count 115 for the current picture (i.e., rk(t+7)) may be calculated as: -
r k(t+7)=[R i(t)/R p(t+4)]*[D i(t)/D p(t+4)]*[M i(t)/M p(t+4)]*R p(t+4). - In
case 3, where thecurrent picture 107 is a regular P picture, statistical data from the most recent I, P and B pictures may be used to calculate thetarget bit count 115. For example, given the above-described picture sequence, the estimated bit count rk may be computed as follows: -
r k(t+7)=WB(t+6)*[R p(t+4)/M p(t+4)]/[N i *R i(t)/M i(t)+N i *R p(t+4)/M p(t+4)+N b *R b(t+6)/M b(t+6))]. - The above target bit count calculation solely depends on picture characteristics and does not consider the situation in which the coded picture buffer CPB has a finite size, which may be denoted CPBsize. For a real application, the
bitcount distributor 108 may adjust the final target bit count 115 according to CPB status and bitrate accuracy to reduce of the risk of CPB overflow and underflow. To facilitate such adjustment, thetarget bits estimator 106 may include abuffer regulator 116 that provides relevant CPB status information to thebit count distributor 108 for adjustment of thetarget bit count 115. - There are a number of ways to take the CPB status information into account in adjusting the
target bit count 115. For example, in a constant bitrate (CBR) application the target bit count 115 may be adjusted differently depending on whether the buffer is approaching overflow or underflow. In determining whether a potential overflow or underflow situation is present it is useful to define a quantity referred to herein as the coded picture buffer fullness CPBfull, which may be regarded as a ratio of the quantity of data currently stored in the CPB (CPBcurr) relative to the finite size CPBsize of the coded picture buffer CPB, e.g., CPBfull=CPBcurr/CPBsize For example, a potential CPB overflow may exist when CPBfull is increasing and is above a pre-defined upper threshold CPBmax. In such a case, the target bit count 115 (i.e., rk(t+7)) may be adjusted as follows to reduce the risk of CPB overflow: rk(t+7)=rk(t+7)*(1.0+C*(CPBfull−CPBT-MAX), where C is a constant multiplier (e.g., 2). - Alternatively, the coded picture buffer CPB may be approaching a potential CPB underflow situation if CPBfullness is moving downward and is below a pre-defined lower threshold CPBmin. In such a situation, the target bit count 115 (i.e., rk) may adjusted as follows to reduce the risk of CPB underflow: rk(t+7)=rk(t+7)*(1.0+C*(CPBmin−CPBfull), where C is a constant multiplier (e.g., 2), and 0.0<CPBmin<CPBmax<1.0.
- In a variable bitrate (VBR) application, CPB overflow is more likely due to a long initial delay (i.e., CPB is fed almost fully before starting to encode). Then the
target_bit_count 115 may be adjusted as follows: -
- if (CPBcushion is less than sec_threshold),
-
r k(t+7)=r k(t+7)*(1.0+incr_%), - where CPBcushion=(CPBsize−CPB_curr)/(encoder's target_bitrate), and 0.0<incr_%<1.0.
- The term sec_threshold refers to a threshold value for the CPB cushion in units of time (e.g., seconds) By way of example, and not by way of limitation, if CPBcushion is less than 1 second the value of rk(t+7) is increased according to the above equation.
- Referring again to
FIG. 2 , thetarget bits estimator 106 sends the target bitcount to aQP controller 114, which then uses thetarget bitcount 115 along with distortion and source pixel information in ratecontrol data buffer 102 to derive the QP. By way of example, and not by way of limitation, theQP controller 114 may derive the QP as shown inFIG. 4 . Specifically, theQP controller 114 may implement a complexity function that computes a complexity factor based on a target complexity, an average complexity over a window of two or more previous frames, and a complexity stabilizer factor. Furthermore, theQP controller 114 may implement a distortion function that computes a distortion factor based on a distortion for a previous frame, an average distortion taken over a window of two or more previous frames and a distortion stabilizer factor. In addition, the QP controller may implement a function that computes an estimated bitcount based on a target bitcount for thecurrent frame 107, an average bitcount taken over a window of two or more previous frames and a bitcount stabilizer factor. - The
QP controller 114 depicted inFIG. 4 may include functional blocks (f( ) that compute the complexity, distortion and bitcount. Each functional block may receive one or more stabilizer factors as inputs. The stabilizers may be used to reduce large fluctuations in complexity, bit count, and distortion. By way of example, and not by way of limitation, stabilizer factors denoted S1, S2 may be used to reduce the effect of fluctuations in average bitcout B and average complexity C in computing estimated bitcount A according to a formula of the type: A=(B+S1)/(C+S2), where S1 and S2 are stabilizers. - To reduce the effect of large fluctuations in average bitcount and average complexity on the calculation A, the
rate controller 114 may assign either constant or adaptive values to stabilizer terms S1 and S2, so that therate controller 114 can obtain a more stable value of the estimated bitcount A than might be obtained by a simple ratio, e.g., B/C. If the values of the stabilizers are chosen properly they tend to stabilize the value of (B+S1)/(C+S2). - Similar stabilizer terms may be used to stabilize similar computations of the complexity factor and distortion factor.
- The
QP controller 114 may include a QP Modulator that determines a raw QP value based on the bitcount, distortion and complexity factors. The QP controller may further includeclipping mechanism 118 that restricts the range of the resulting QP value. - Unlike traditional approaches based purely on an assumed rate distortion model, the proposed
rate controller 200 may derive the QP by considering the interaction of the following major factors: picture type, picture complexity, picture distortion andtarget bitcount 115. With these factors and their interaction relation, the following approach may be used. - As shown in
FIG. 4 , theQP controller 114 is to derive thefinal QP value 127 based on the target bit budget calculated from theBit Count Distributor 110. As aforementioned,QP controller 114 is one of the key components in therate controller 200. TheQP controller 114 has a direct impact on bit count and visual quality. To achieve the best quality, an iterative approach may be used to minimize distortion by finding the best QP. However, this may be inefficient. In embodiments of the present invention, by contrast, the goal is to achieve reasonably good visual quality in a more efficient manner. - To maintain stable video quality, the
QP controller 114 derives a QP that is initially based on a picture type for thecurrent picture 107. Different picture types have different methods to derive the corresponding QPs. By way of example, and not by way of limitation, five different cases may be considered: (1) the very first IDR picture of the video sequence, (2) an IDR picture with scene change, (3) a regular IDR and I picture, (4) regular P picture, and (5) non-reference B picture. - As used herein an IDR picture (or IDR frame) is a special type of I picture (or I frame). The main difference is that when an encoder assigns an IDR to a picture/frame, it means that all the reference frames' information in the frame buffer are gone. Consequently, those references frames cannot be used in subsequent encoding.
- The QP in the case of the first IDR picture in a video sequence may be derived based on the complexity, the coding conditions, and some general assumptions. The basic idea is to find out the relation between I-picture and P-picture, and P-picture and B-picture in terms of coding bits complexity. Consider a case in which there are N pictures in a sliding window, and N=Ni+Np+Nb, where Ni, Np, and Nb are the number of I, P, and B pictures respectively in the window. The target bit count 115 (i.e., rk) for the first IDR is calculated as follows: rk=WB/Ni+Np/ratiop+Nb/ratiodb). The values of ratiop and ratiob may be calculated as follows:
-
ratiop =C p/bits_per_macroblock, where C p is a constant, bits_per_macroblock=target_bit_rate/(target_frame_rate*frame_width/16*frame_height/16). -
ratiob=picture_complexity*ratiop. - In the above equation the term picture complexity, refers to the complexity for the current picture since, in this example, the current picture is the first picture in a sequence.
- After the target bit count 115 (rk) is derived, then a simple first order RD model may be applied to obtain the quantization value (referred to herein as an actual QP). Note that this quantization value may be quite different from the final QP (referred to herein as a syntax QP, which is a syntax element and embedded in a bitstream) since the former QP is really used in a quantizer. To convert an actual QP to a syntax QP, the formula QPsyntax=6.0*log 10 (QPactual)/log 10 (2.0) may be used. Then the result value of QPsyntax may be clipped in a pre-defined range between a minimum value QPmin and a maximum value QPmax to produce the
final QP value 127. - In the case of an IDR picture with scene change, the new QP may be derived based on the
statistical data 103 including average complexity, average bit usage and average QP from all of its previous I-pictures up to the previously closest IDR with scene change. - The
QP controller 114 first determines an old R/M ratio which may be defined as (average bit usage/average complexity) for the past I frames. TheQP controller 114 may then derive a new relative R/M ratio as follows: new R/M ratio=(old R/M ratio)/(rk/Mk), where rk and Mk refer to the target bit count and complexity for thecurrent frame 107. The old R/M ratio may be determined from Rk−1/Mk−1, where Rk−1 and Mk−1 are the actual bit usage and complexity for the frame preceding thecurrent frame 107. - Then the new actual QP vale may be determined according to:
-
QP actual=(average QP actual)*(new R/M ratio). - The new actual QP value may be converted to a new syntax QP value as discussed above.
- It is noted that the new QP value may be very different from the QP value for the immediately preceding frame if the
current frame 107 is a scene change frame. To reduce large QP fluctuations, theQP clipping mechanism 118 may calculate a complexity difference from the previous frame. Theclipping mechanism 118 may then define a range of QP change to forcefully limit the QP change. By way of example, and not by way of limitation, the following clipping scheme may be used. - First a range QPrange is defined according to QPrange=multiplier*(max(Mk, Mk−1)/min(Mk, Mk−1)), where Mk−1 is the complexity for the frame immediately preceding the current frame.
- The multiplier may be a constant value determined empirically. By way of example, and not by way of limitation, a multiplier having a constant value of 2 may be used.
- Therefore, the
final QP value 127 may be restricted to the range of: -
[QP syntax −QP range ,QP syntax +QP range] - In the case that the
current picture 107 is a regular IDR and I picture, theQP controller 114 may work directly on the value of QPsyntax. Since the picture frame is regular frame, implying that no noticeable changes occur in video characteristic. (Otherwise scene change should be recorded), to maintain a relatively steady value of QOsyntax, a LOG operation on the ratio of actual bit count to complexity may be applied. The following RD formula may be used to derive the value of QPsyntax for the current frame 107 (denoted QPk) from the value of QPsyntax for the previous frame (which is denoted QPk−1). -
LOG(bitrate/complexity)*QP syntax=CONSTANT. - Based on the above formula, the final value of QPsyntax for the
current frame 107 may be computed as follows. -
QP k=LOG [(R k−1 /M k−1)*(QP k−1)]/LOG(r k /M k), - where QPk−1 is the value of QPsyntax for the frame preceding the
current frame 107. - If the
current frame 107 is a regular P picture, theQP controller 114 may maintain a steady value of QPsyntax by logarithmically operating on the value of QPactual. The new actual QP value for the current frame (denoted QPactua— k) may be derived as -
QP actual— k=LOG(R k−1)*(QP actual— k−1)/*R k−1) - The value of QPactual
— k may then be converted into a QPsyntax value as described above. - If the
current picture 107 is a regular B picture, i.e., a non-reference B picture, no error will be propagated. A constant QP may therefore be obtained by simply adding +2 to the syntax QP of its previous reference frame. This situation also provides an opportunity for parallel encoding since there is, in general, no dependency between any two consecutive B pictures. The lack of data dependency between pictures serves as an entry point for parallelizing the encoding process. B-picture coding within two reference pictures can be performed in parallel. - In the encode stage (Stage 2), the rate control algorithm may simply return the QP to its calling function. In the final stage, right after a video frame/field encoding, the rate control collects the actual bit usage (texture bits and overhead bits might be separated), the actual picture distortion, and actual buffer fullness, and update this information in the rate
control data buffer 103. - The process from
Stage 2 throughStage 4 may be repeatedly performed in the course of video encoding for a series of video frames. It is noted that in embodiments of the present invention, the rate controller need only consider a target bit count for a reference picture (i.e., I-picture, P-picture or B-picture if it is used as a reference picture in a pyramid coding). -
FIG. 5 illustrates a block diagram of acomputer apparatus 500 that may be used to implement parallel decoding of streaming data on three or more processors as described above. Theapparatus 500 generally include a plurality ofprocessor modules memory 502. As an example of a processing system that uses multiple processor modules, theprocessor modules - The
memory 502 may be in the form of an integrated circuit, e.g., RAM, DRAM, ROM, and the like). Thememory 502 may also be a main memory that is accessible by all of the processor modules 501. In some embodiments, theprocessors modules local memories encoder program 503 may be stored in themain memory 502 in the form of processor readable instructions that can be executed on the processor modules 501. Theencoder program 503 may be configured to encode video frame data utilizing the rate control algorithm, e.g., as described above with respect toFIG. 1 ,FIG. 2 ,FIG. 3 , andFIG. 4 . Specifically, the encoder program may compute a QP value in a manner that takes picture type, picture complexity, picture distortion and target bitcount into account in determining the QP value. Theprogram 503 may be written in any suitable processor readable language, e.g., e.g., C, C++, JAVA, Assembly, MATLAB, FORTRAN and a number of other languages.Rate control data 507 may be stored in thememory 502, e.g., in a rate control buffer, as described above. Such rate control data may include statistical data rating to bit utilization, complexity, distortion, QP, etc for a window of previous frames. In some embodiments, during execution of theencoder program 503, portions of program code and/ordata 507 may be loaded into thelocal stores processor modules - The
apparatus 500 may also include well-known support functions 510, such as input/output (I/O)elements 511, power supplies (P/S) 512, a clock (CLK) 513 and cache 514. Thedevice 500 may optionally include a mass storage device 515 such as a disk drive, CD-ROM drive, tape drive, or the like to store programs and/or data. Thedevice 500 may also optionally include adisplay unit 516 and user interface unit 518 to facilitate interaction between theapparatus 500 and a user. Thedisplay unit 516 may be in the form of a cathode ray tube (CRT) or flat panel screen that displays text, numerals, graphical symbols or images. The user interface 518 may include a keyboard, mouse, joystick, light pen or other device that may be used in conjunction with a graphical user interface (GUI). Theapparatus 500 may also include anetwork interface 520 to enable the device to communicate with other devices over a network, such as the internet. These components may be implemented in hardware, software or firmware or some combination of two or more of these. - There are a number of additional ways to streamline parallel processing with multiple processors in the
apparatus 500. For example, it is possible to “unroll” processing loops, e.g., by replicating code on two or more of theprocessors - As noted above, certain portions of the rate control described above (e.g., the distortion calculation) may be implemented on a multiprocessor system. One example, among others of a multiprocessor system capable of implementing parallel processing is known as a cell processor. There are a number of different processor architectures that may be categorized as cell processors. By way of example, and without limitation,
FIG. 6 illustrates a possible configuration of acell processor 600. Thecell processor 600 includes amain memory 602, a single power processor element (PPE) 604 and eight synergistic processor elements (SPE) 606. Alternatively, the cell processor 601 may be configured with any number of SPEs. - By way of example, the
cell processor 600 may be characterized by an architecture known as a Cell Broadband engine architecture (CBEA)-compliant processor. In CBEA-compliant architecture, multiple PPEs may be combined into a PPE group and multiple SPEs may be combined into an SPE group. For the purposes of example, thecell processor 600 is depicted as having only a single SPE group and a single PPE group with a single SPE and a single PPE. Alternatively, a cell processor can include multiple groups of power processor elements (PPE groups) and multiple groups of synergistic processor elements (SPE groups). CBEA-compliant processors are described in detail, e.g., in Cell Broadband Engine Architecture, which is available online at: http://www-306ibm.com/chips/techlib/techlib.nsf/techdocs/1AEEE1270EA2776 387257060006E61BA/$file/CBEA—01_pub.pdf, which is incorporated herein by reference. - By way of example the
PPE 604 may be 64-bit PowerPC Processor Unit (PPU) with associated caches. ThePPE 604 may include an optional vector multimedia extension unit. EachSPE 606 includes a synergistic processor unit (SPU) and a local store (LS). In some implementations, the local store may have a capacity of e.g., about 256 kilobytes of memory for code and data. The SPUs are less complex computational units than PPU, in that they typically do not perform any system management functions. The SPUs may have a single instruction, multiple data (SIMD) capability and typically process data and initiate any required data transfers (subject to access properties set up by a PPE) in order to perform their allocated tasks. The SPUs allow thesystem 600 to implement applications that require a higher computational unit density and can effectively use the provided instruction set. A significant number ofSPEs 606 in thesystem 600, managed by thePPE 604, allows for cost-effective processing over a wide range of applications. - The
memory 602,PPE 604, andSPEs 606 may communicate with each other and with an I/O device 608 over a ring-typeelement interconnect bus 610. Thememory 602 may containrate control data 603 having features in common with therate control data 507 described above. Thememory 602 may also store anencoder program 609 having features in common with theencoder program 503 described above. At least one of theSPE 606 may include in its local store (LS) encodinginstructions 605 and/or a portion of the rate control data and/or input video frame data that is to be processed in parallel, e.g., as described below. ThePPE 604 may include in its L1 cache,code instructions 607 having features in common with theencoding program 503 described above.Instructions 605 anddata 607 may also be stored inmemory 602 for access by the SPE and PPE when needed. - The rate control algorithm depicted in
FIG. 1 and described further with respect toFIGS. 2-4 may be implemented on an apparatus of the type described with respect toFIG. 5 orFIG. 6 through a series of function calls. For example, the Initialization Stage (Stage 1) may be implemented by calling a function referred to herein as PicRateCtrlInit( ). The PicRateCtrlInit( ) function may be called one time only by an encoder SPU main control thread of theencoder program Rate Control Buffer 102. - The preparation stage (
Stage 2 ofFIG. 1 ) may be implemented by calling a function referred to herein as PicRateCtrlPrepare( ). The main task of this function is to derive a QP value based on the input data. The PicRateCtrlPrepare( ) function may be called at the beginning of encoding for each picture, and is the key of the rate control algorithm. - The inputs to PicRateCtrlPrepare( ) may include a rate control handle, a frame level configuration, an input frame buffer, and the rate control data buffer. The PicRateCtrlPrepare( ) function may implement the following operations:
-
- Checking the buffer fullness in CBR case.
- Adjusting total bitrate budget in a sliding window.
- Determining the target bits 111 for the current picture using the
Target Bit Estimator 106, e.g., as described above. - If picture type is I/IDR, deriving Picture-I QP using the
QP controller 114, e.g., as described above. - If picture type is P, Picture-P QP deriving using the
QP controller 114, e.g., as described above. - If picture type is non-ref B, deriving Non-ref-Picture-B QP using the
QP controller 114, e.g., as described above. - If picture type is ref B, deriving the ref-Picture-B QP using the
QP controller 114, e.g., as described above. - Clipping QP within a pre-specified range (which may be defined in PicRateControlInit( ) to ensure the smooth visual quality transition. This operation may be implemented as described above with respect to the
QP Clipping Mechanism 118.
- The encoding stage (Stage 3) may be implemented by calling a PicRateCtrlEncode( ) function.
- The PicRateCtrlEncode( ) function may be called to obtain the final QP for a given picture. In some embodiments, the PicRateCtrlEncode( ) function may be called to obtain a final QP value for s subsection of a picture (e.g., a slice or macroblock). Thus, embodiments of the invention may be extended to rate control at the macroblock level. PicRateCtrlEncode( ) function may also include call other functions that are conventionally used in encoding a video picture, e.g., functions for Network Abstraction Layer (NAL) coding, Video Coded Layer (VCL) encoding, and de-blocking.
- A number of variations are possible on the embodiments described above. For example, in some implementations, the encoding step (Stage 3) may include a distortion calculation that is distributed and processed in parallel on multiple processors. In multi-processor implementations, the total distortion of a picture may be calculated on a section-by-section basis with distortion calculations for different sections of a picture performed in parallel using a different processor for each section. The distortion for each section may be calculated macroblock by macroblock by comparing the original pixels for picture prior to encoding and the reconstructed pixels.
- In some implementations, the distortion calculation may be done before de-blocking to speed up the overall performance since there is no need to allocate one more data path from the deblocking thread to the main thread. The discrepancy of the distortion calculation based on the deblocked frame and the undeblocked frame for the rate controller has been determined experimentally to be negligible.
- Furthermore, in some implementations, the distortion in each macroblock of a picture section may be carried in the existing MB information container, which may be transferred to the server via DMA. So the NAL coding thread may collect and calculate the overall distortion of the picture. This MB distortion also helps to further improve the picture quality if a macroblock-based rate control is employed.
- The update stage (Stage 4) may be implemented by calling a PicRateCtrlUpdate( ) function. The PicRateCtrlUpdate( ) function may be called in two situations: (1) to record the data right after the completion of encoding MB rows at a multicore processor such as a broadband engine (BE); or (2) this is called to collect the statistical data associated with the entire current picture right after the final Video Coded Layer (VCL) bit stream is generated. The inputs to the PicRateCtrlUpdate( ) function may include, but are not limited to, a rate control handle, raw color space format for the image, a previously reconstructed picture, picture level coding information, and coding bits of the previous picture. The PicRateCtrlUpdate( ) function may internally update the Rate
Control Data Buffer 102. - By way of example, and not by way of limitation, the color space format may be 420 YUV. This format includes on luma component (Y) and two chroma components (U and V). Typically, the input to MPEG-based encoders is 420 YUV, meaning that, e.g., from a resolution viewpoint, the dimension of Y is W*H and U and V each have dimensions of W/2*H/2.
- By way of example, and not by way of limitation, the PicRateCtrlUpdate( ) function may implement the following operations:
-
- Collecting statistical data.
- Updating statistical data in Rate
Control Data Buffer 102. - Checking buffer fullness to determine a potential for buffer overflow.
- Implementing a buffer overflow prevention mechanism if necessary.
- According to another embodiment, instructions for carrying out picture level rate control as described above may be stored in a computer readable storage medium. By way of example, and not by way of limitation,
FIG. 7 illustrates an example of a computer-readable storage medium 700. The storage medium contains computer-readable instructions stored in a format that can be retrieved interpreted by a computer processing device. By way of example, and not by way of limitation, the computer-readable storage medium 700 may be a computer-readable memory, such as random access memory (RAM) or read only memory (ROM), a computer readable storage disk for a fixed disk drive (e.g., a hard disk drive), or a removable disk drive. In addition, the computer-readable storage medium 700 may be a flash memory device, a computer-readable tape, a CD-ROM, a DVD-ROM, a Blu-ray, HD-DVD, UMD, or other optical storage medium. - The
storage medium 700 optionally contain rate control initialization instructions 702 which may including one or more instructions that implementStage 1 of the algorithm as described above. By way of example, and not by way of limitation, the initialization instructions may be configured, upon execution, to implement the PicRateCtrlInit( ) function described above. - The
storage medium 700 may include one or more ratecontrol preparation instructions 704. Thepreparation instructions 704 may be configured theStage 2 of the rate control algorithm described above. By way of example, and not by way of limitation, the initialization instructions may be configured, upon execution, to implement the PicRateCtrlPrepare( ) function described above. - The
storage medium 700 may include one or more encodeinstructions 706. The encodeinstructions 706 may be configured theStage 3 of the rate control algorithm described above. By way of example, and not by way of limitation, the initialization instructions may be configured, upon execution, to implement the PicRateCtrlEncode( ) function described above. Thestorage medium 700 may include one or more ratecontrol update instructions 708. Thepreparation instructions 708 may be configured theStage 4 of the rate control algorithm described above. By way of example, and not by way of limitation, the rate control update instructions may be configured, upon execution, to implement the PicRateCtrlUpdate( ) function described above. - The rate control algorithm described above has been largely implemented in an experimental AVC encoder. The performance of the rate control algorithm demonstrates that the algorithm not only accurately achieves the target bitrate but also control the CPB buffer properly to construct HRD compliant AVC bitstreams. Most importantly, with the effectiveness of the new rate control algorithm to control the quantization parameter, the encoder demonstrates a high fidelity and stable visual quality.
- While the above is a complete description of the preferred embodiment of the present invention, it is possible to use various alternatives, modifications and equivalents. Therefore, the scope of the present invention should be determined not with reference to the above description but should, instead, be determined with reference to the appended claims, along with their full scope of equivalents. Any feature described herein, whether preferred or not, may be combined with any other feature described herein, whether preferred or not. In the claims that follow, the indefinite article “A”, or “An” refers to a quantity of one or more of the item following the article, except where expressly stated otherwise. The appended claims are not to be interpreted as including means-plus-function limitations, unless such a limitation is explicitly recited in a given claim using the phrase “means for.”
Claims (27)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/503,158 US20150016513A1 (en) | 2009-09-02 | 2014-09-30 | Picture-level rate control for video encoding |
US16/439,543 US20190297347A1 (en) | 2009-09-02 | 2019-06-12 | Picture-level rate control for video encoding |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/553,075 US8848799B2 (en) | 2009-09-02 | 2009-09-02 | Utilizing thresholds and early termination to achieve fast motion estimation in a video encoder |
US12/553,073 US8379718B2 (en) | 2009-09-02 | 2009-09-02 | Parallel digital picture encoding |
US12/553,069 US8345750B2 (en) | 2009-09-02 | 2009-09-02 | Scene change detection |
US12/553,070 US8879623B2 (en) | 2009-09-02 | 2009-09-02 | Picture-level rate control for video encoding a scene-change I picture |
US14/503,158 US20150016513A1 (en) | 2009-09-02 | 2014-09-30 | Picture-level rate control for video encoding |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/553,070 Continuation US8879623B2 (en) | 2009-09-02 | 2009-09-02 | Picture-level rate control for video encoding a scene-change I picture |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/439,543 Continuation US20190297347A1 (en) | 2009-09-02 | 2019-06-12 | Picture-level rate control for video encoding |
Publications (1)
Publication Number | Publication Date |
---|---|
US20150016513A1 true US20150016513A1 (en) | 2015-01-15 |
Family
ID=43417059
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/553,070 Active 2032-05-29 US8879623B2 (en) | 2009-09-02 | 2009-09-02 | Picture-level rate control for video encoding a scene-change I picture |
US14/503,158 Abandoned US20150016513A1 (en) | 2009-09-02 | 2014-09-30 | Picture-level rate control for video encoding |
US16/439,543 Abandoned US20190297347A1 (en) | 2009-09-02 | 2019-06-12 | Picture-level rate control for video encoding |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/553,070 Active 2032-05-29 US8879623B2 (en) | 2009-09-02 | 2009-09-02 | Picture-level rate control for video encoding a scene-change I picture |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/439,543 Abandoned US20190297347A1 (en) | 2009-09-02 | 2019-06-12 | Picture-level rate control for video encoding |
Country Status (4)
Country | Link |
---|---|
US (3) | US8879623B2 (en) |
EP (1) | EP2306735B1 (en) |
JP (1) | JP2011055504A (en) |
CN (2) | CN102006471B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105898329A (en) * | 2016-04-12 | 2016-08-24 | 乐视控股(北京)有限公司 | Code rate control method and code rate control device used for video coding |
US9872018B2 (en) | 2010-08-09 | 2018-01-16 | Sony Interactive Entertainment Inc. | Random access point (RAP) formation using intra refreshing technique in video coding |
US10178390B2 (en) | 2016-03-30 | 2019-01-08 | Sony Interactive Entertainment Inc. | Advanced picture quality oriented rate control for low-latency streaming applications |
US10200716B2 (en) | 2015-06-25 | 2019-02-05 | Sony Interactive Entertainment Inc. | Parallel intra-prediction encoding/decoding process utilizing PIPCM and/or PIDC for selected sections |
US10419760B2 (en) | 2014-09-29 | 2019-09-17 | Sony Interactive Entertainment Inc. | Picture quality oriented rate control for low-latency streaming applications |
US11212536B2 (en) | 2017-07-14 | 2021-12-28 | Sony Interactive Entertainment Inc. | Negative region-of-interest video coding |
Families Citing this family (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8848799B2 (en) * | 2009-09-02 | 2014-09-30 | Sony Computer Entertainment Inc. | Utilizing thresholds and early termination to achieve fast motion estimation in a video encoder |
US8379718B2 (en) * | 2009-09-02 | 2013-02-19 | Sony Computer Entertainment Inc. | Parallel digital picture encoding |
FR2954036B1 (en) * | 2009-12-11 | 2012-01-13 | Thales Sa | METHOD AND SYSTEM FOR DETERMINING ENCODING PARAMETERS ON VARIABLE RESOLUTION FLOWS |
US20120249869A1 (en) * | 2009-12-14 | 2012-10-04 | Thomson Licensing | Statmux method for broadcasting |
US20110255594A1 (en) | 2010-04-15 | 2011-10-20 | Soyeb Nagori | Rate Control in Video Coding |
BR112012028184A2 (en) | 2010-05-07 | 2016-08-02 | Nippon Telegraph & Telephone | Video coding control method, video coding device and video coding program |
CA2798354C (en) * | 2010-05-12 | 2016-01-26 | Nippon Telegraph And Telephone Corporation | A video encoding bit rate control technique using a quantization statistic threshold to determine whether re-encoding of an encoding-order picture group is required |
US8711928B1 (en) | 2011-10-05 | 2014-04-29 | CSR Technology, Inc. | Method, apparatus, and manufacture for adaptation of video encoder tuning parameters |
US9432704B2 (en) * | 2011-11-06 | 2016-08-30 | Akamai Technologies Inc. | Segmented parallel encoding with frame-aware, variable-size chunking |
JP6080375B2 (en) | 2011-11-07 | 2017-02-15 | キヤノン株式会社 | Image encoding device, image encoding method and program, image decoding device, image decoding method and program |
US9438918B2 (en) * | 2012-04-23 | 2016-09-06 | Intel Corporation | Frame level rate control using motion estimated distortions |
US9398302B2 (en) * | 2013-03-08 | 2016-07-19 | Mediatek Inc. | Image encoding method and apparatus with rate control by selecting target bit budget from pre-defined candidate bit budgets and related image decoding method and apparatus |
GB2514777B (en) * | 2013-06-03 | 2018-12-19 | Displaylink Uk Ltd | Management of memory for storing display data |
US10038904B2 (en) * | 2013-10-25 | 2018-07-31 | Mediatek Inc. | Method and apparatus for controlling transmission of compressed picture according to transmission synchronization events |
US10356405B2 (en) * | 2013-11-04 | 2019-07-16 | Integrated Device Technology, Inc. | Methods and apparatuses for multi-pass adaptive quantization |
US9485456B2 (en) | 2013-12-30 | 2016-11-01 | Akamai Technologies, Inc. | Frame-rate conversion in a distributed computing system |
US10397574B2 (en) * | 2014-05-12 | 2019-08-27 | Intel Corporation | Video coding quantization parameter determination suitable for video conferencing |
US9386317B2 (en) | 2014-09-22 | 2016-07-05 | Sony Interactive Entertainment Inc. | Adaptive picture section encoding mode decision control |
US10097828B2 (en) * | 2014-12-11 | 2018-10-09 | Intel Corporation | Rate control for parallel video encoding |
US10171807B2 (en) * | 2015-01-29 | 2019-01-01 | Arris Enterprises Llc | Picture-level QP rate control for HEVC encoding |
US20160234496A1 (en) * | 2015-02-09 | 2016-08-11 | Qualcomm Incorporated | Near visually lossless video recompression |
US10015496B2 (en) | 2015-03-25 | 2018-07-03 | Samsung Display Co., Ltd. | Method and apparatus for temporal reference coding with light coding systems for display systems |
US9942552B2 (en) * | 2015-06-12 | 2018-04-10 | Intel Corporation | Low bitrate video coding |
US10356406B2 (en) * | 2016-01-19 | 2019-07-16 | Google Llc | Real-time video encoder rate control using dynamic resolution switching |
US20170280139A1 (en) * | 2016-03-22 | 2017-09-28 | Qualcomm Incorporated | Apparatus and methods for adaptive calculation of quantization parameters in display stream compression |
US10616583B2 (en) | 2016-06-30 | 2020-04-07 | Sony Interactive Entertainment Inc. | Encoding/decoding digital frames by down-sampling/up-sampling with enhancement information |
EP3513563A4 (en) * | 2016-10-18 | 2019-07-24 | Zhejiang Dahua Technology Co., Ltd | Methods and systems for video processing |
US10998922B2 (en) * | 2017-07-28 | 2021-05-04 | Mitsubishi Electric Research Laboratories, Inc. | Turbo product polar coding with hard decision cleaning |
CN109413427B (en) | 2017-08-17 | 2022-04-08 | 腾讯科技(深圳)有限公司 | Video frame coding method and terminal |
US11871052B1 (en) * | 2018-09-27 | 2024-01-09 | Apple Inc. | Multi-band rate control |
CN110213585B (en) | 2018-10-31 | 2022-10-28 | 腾讯科技(深圳)有限公司 | Video encoding method, video encoding device, computer-readable storage medium, and computer apparatus |
CN111193927B (en) * | 2018-11-14 | 2022-09-23 | 腾讯科技(深圳)有限公司 | Encoded data processing method, apparatus, computer device and storage medium |
CN109905711B (en) * | 2019-02-28 | 2021-02-09 | 深圳英飞拓智能技术有限公司 | Image processing method and system and terminal equipment |
CN110336581B (en) * | 2019-07-09 | 2020-11-13 | 北京遥感设备研究所 | Universal configurable MSK or QPSK direct sequence spread spectrum modulation system and method |
US11164339B2 (en) | 2019-11-12 | 2021-11-02 | Sony Interactive Entertainment Inc. | Fast region of interest coding using multi-segment temporal resampling |
CN111669594B (en) * | 2020-06-23 | 2022-12-02 | 浙江大华技术股份有限公司 | Video coding method and device and computer readable storage medium |
WO2022021422A1 (en) * | 2020-07-31 | 2022-02-03 | Oppo广东移动通信有限公司 | Video coding method and system, coder, and computer storage medium |
US11955067B2 (en) * | 2021-03-17 | 2024-04-09 | Samsung Display Co., Ltd. | Simplified rate control for an additive iterative compression system |
CN114554211A (en) * | 2022-01-14 | 2022-05-27 | 百果园技术(新加坡)有限公司 | Content adaptive video coding method, device, equipment and storage medium |
CN115219067B (en) * | 2022-09-20 | 2023-01-03 | 金乡县成启仓储服务有限公司 | Real-time state monitoring method for garlic storage |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5231484A (en) * | 1991-11-08 | 1993-07-27 | International Business Machines Corporation | Motion video compression system with adaptive bit allocation and quantization |
US20080004984A1 (en) * | 2006-06-23 | 2008-01-03 | Mark Sendo | System and method enabling children to shop on-line |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5650860A (en) * | 1995-12-26 | 1997-07-22 | C-Cube Microsystems, Inc. | Adaptive quantization |
JP3864461B2 (en) | 1996-08-30 | 2006-12-27 | ソニー株式会社 | Video data compression apparatus and method |
US6243497B1 (en) | 1997-02-12 | 2001-06-05 | Sarnoff Corporation | Apparatus and method for optimizing the rate control in a coding system |
JPH11331850A (en) * | 1998-03-16 | 1999-11-30 | Mitsubishi Electric Corp | Dynamic image coding system |
JP2001008215A (en) * | 1999-06-24 | 2001-01-12 | Victor Co Of Japan Ltd | Dynamic image encoder and method therefor |
CN1206864C (en) | 2002-07-22 | 2005-06-15 | 中国科学院计算技术研究所 | Association rate distortion optimized code rate control method and apparatus thereof |
US7492820B2 (en) | 2004-02-06 | 2009-02-17 | Apple Inc. | Rate control for video coder employing adaptive linear regression bits modeling |
US20050232497A1 (en) * | 2004-04-15 | 2005-10-20 | Microsoft Corporation | High-fidelity transcoding |
US8699561B2 (en) * | 2006-08-25 | 2014-04-15 | Sony Computer Entertainment Inc. | System and methods for detecting and handling errors in a multi-threaded video data decoder |
US8135063B2 (en) * | 2006-09-08 | 2012-03-13 | Mediatek Inc. | Rate control method with frame-layer bit allocation and video encoder |
US20080151998A1 (en) | 2006-12-21 | 2008-06-26 | General Instrument Corporation | Method and Apparatus for Providing Rate Control for Panel-Based Real Time Video Encoder |
JP2009017127A (en) * | 2007-07-03 | 2009-01-22 | Sony Corp | Coding device and coding method |
JP4936557B2 (en) | 2008-01-24 | 2012-05-23 | キヤノン株式会社 | Encoder |
-
2009
- 2009-09-02 US US12/553,070 patent/US8879623B2/en active Active
-
2010
- 2010-08-27 EP EP10174307.8A patent/EP2306735B1/en active Active
- 2010-09-02 JP JP2010196686A patent/JP2011055504A/en active Pending
- 2010-09-02 CN CN2010102717364A patent/CN102006471B/en active Active
- 2010-09-02 CN CN201310363422.0A patent/CN103402099B/en active Active
-
2014
- 2014-09-30 US US14/503,158 patent/US20150016513A1/en not_active Abandoned
-
2019
- 2019-06-12 US US16/439,543 patent/US20190297347A1/en not_active Abandoned
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5231484A (en) * | 1991-11-08 | 1993-07-27 | International Business Machines Corporation | Motion video compression system with adaptive bit allocation and quantization |
US20080004984A1 (en) * | 2006-06-23 | 2008-01-03 | Mark Sendo | System and method enabling children to shop on-line |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9872018B2 (en) | 2010-08-09 | 2018-01-16 | Sony Interactive Entertainment Inc. | Random access point (RAP) formation using intra refreshing technique in video coding |
US10419760B2 (en) | 2014-09-29 | 2019-09-17 | Sony Interactive Entertainment Inc. | Picture quality oriented rate control for low-latency streaming applications |
US11006112B2 (en) | 2014-09-29 | 2021-05-11 | Sony Interactive Entertainment Inc. | Picture quality oriented rate control for low-latency streaming applications |
US11509896B2 (en) | 2014-09-29 | 2022-11-22 | Sony Interactive Entertainment Inc. | Picture quality oriented rate control for low-latency streaming applications |
US10200716B2 (en) | 2015-06-25 | 2019-02-05 | Sony Interactive Entertainment Inc. | Parallel intra-prediction encoding/decoding process utilizing PIPCM and/or PIDC for selected sections |
US10178390B2 (en) | 2016-03-30 | 2019-01-08 | Sony Interactive Entertainment Inc. | Advanced picture quality oriented rate control for low-latency streaming applications |
CN105898329A (en) * | 2016-04-12 | 2016-08-24 | 乐视控股(北京)有限公司 | Code rate control method and code rate control device used for video coding |
US11212536B2 (en) | 2017-07-14 | 2021-12-28 | Sony Interactive Entertainment Inc. | Negative region-of-interest video coding |
Also Published As
Publication number | Publication date |
---|---|
US20110051806A1 (en) | 2011-03-03 |
CN102006471B (en) | 2013-09-25 |
EP2306735A1 (en) | 2011-04-06 |
CN103402099A (en) | 2013-11-20 |
JP2011055504A (en) | 2011-03-17 |
CN102006471A (en) | 2011-04-06 |
US20190297347A1 (en) | 2019-09-26 |
US8879623B2 (en) | 2014-11-04 |
CN103402099B (en) | 2016-12-07 |
EP2306735B1 (en) | 2018-10-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20190297347A1 (en) | Picture-level rate control for video encoding | |
US12041234B2 (en) | Video compression and transmission techniques | |
US9866838B2 (en) | Apparatus for dual pass rate control video encoding | |
JP5180294B2 (en) | Buffer-based rate control that utilizes frame complexity, buffer level, and intra-frame location in video encoding | |
Wang et al. | Rate-distortion optimization of rate control for H. 264 with adaptive initial quantization parameter determination | |
US8804820B2 (en) | Rate control with look-ahead for video transcoding | |
US8837602B2 (en) | Content adaptive video encoder and coding method | |
KR101329860B1 (en) | METHOD FOR ρ-DOMAIN FRAME LEVEL BIT ALLOCATION FOR EFFECTIVE RATE CONTROL AND ENHANCED VIDEO ENCODING QUALITY | |
US7373004B2 (en) | Apparatus for constant quality rate control in video compression and target bit allocator thereof | |
JP4668767B2 (en) | Moving picture coding apparatus and moving picture coding program | |
KR100930344B1 (en) | Initial Quantization Parameter Determination Method | |
Tsai | Rate control for low-delay video using a dynamic rate table | |
JP4962609B2 (en) | Moving picture coding apparatus and moving picture coding program | |
Overmeire et al. | Constant quality video coding using video content analysis | |
Chen | An intra-rate estimation method for H. 264 rate control | |
Wu et al. | Constant-quality rate control algorithm for multiple encoders with single video source | |
Liu et al. | An Improved MAD Prediction Method for H. 264 Rate Control |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SONY COMPUTER ENTERTAINMENT INC., JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LEE, HUNG-JU;REEL/FRAME:033909/0979 Effective date: 20091008 |
|
AS | Assignment |
Owner name: SONY INTERACTIVE ENTERTAINMENT INC., JAPAN Free format text: CHANGE OF NAME;ASSIGNOR:SONY COMPUTER ENTERTAINMENT INC.;REEL/FRAME:039239/0343 Effective date: 20160401 |
|
STCV | Information on status: appeal procedure |
Free format text: NOTICE OF APPEAL FILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |