WO2004054158A2 - Rate control with picture-based lookahead window - Google Patents

Rate control with picture-based lookahead window Download PDF

Info

Publication number
WO2004054158A2
WO2004054158A2 PCT/US2003/039184 US0339184W WO2004054158A2 WO 2004054158 A2 WO2004054158 A2 WO 2004054158A2 US 0339184 W US0339184 W US 0339184W WO 2004054158 A2 WO2004054158 A2 WO 2004054158A2
Authority
WO
WIPO (PCT)
Prior art keywords
frame
value
complexity
encoding
algorithm
Prior art date
Application number
PCT/US2003/039184
Other languages
French (fr)
Other versions
WO2004054158A3 (en
Inventor
Guoyao Yu
Zhi Zhou
Charles H. Van Dusen
Original Assignee
Tut Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tut Systems, Inc. filed Critical Tut Systems, Inc.
Priority to AU2003296418A priority Critical patent/AU2003296418B2/en
Priority to JP2004558627A priority patent/JP4434959B2/en
Priority to NZ540501A priority patent/NZ540501A/en
Priority to CA2507503A priority patent/CA2507503C/en
Priority to CN2003801057469A priority patent/CN1726709B/en
Priority to EP03812913A priority patent/EP1588557A4/en
Publication of WO2004054158A2 publication Critical patent/WO2004054158A2/en
Publication of WO2004054158A3 publication Critical patent/WO2004054158A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/115Selection of the code volume for a coding unit prior to coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/152Data rate or code amount at the encoder output by measuring the fullness of the transmission buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/177Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/40Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression

Definitions

  • the present invention relates to compression coding of video signals, and more particularly to rate control with a picture-based lookahead window for dual-pass compression encoding/transcoding .
  • An uncompressed video stream can be described as a consecutive series of pictures, or frames.
  • An individual frame describes a particular setting at a particular instant in time.
  • a scene is a series of frames describing the same setting at consecutive moments in time.
  • the second frame of a scene shows the same setting as the first frame, slightly further in the future.
  • MPEG standards take advantage of that repetition of information using what is known as temporal encoding.
  • the encoder divides a video stream into sets of related pictures, known as groups of pictures (GOPs) . Each frame within a GOP is labeled by the encoder as an intraframe, a predicted frame or a bi-directional frame.
  • An intraframe is encoded using only information from within that frame. No temporal encoding is used to compress the frame.
  • a predicted (P type) frame is encoded using information from within the frame and uses an earlier I frame or P frame as a reference for temporal compression. I and P frames are referred to as anchor frames.
  • a bi-directional (B type) frame is encoded using information from within the frame and can also use information from at least one earlier anchor frame and at least one later anchor frame.
  • the I frame is generally the most complex, followed by the P frames, and the B frames typically have the least amount complexity.
  • each GOP (which is referred to herein as a standard GOP) has a period (N) of 15 pictures, or frames, and includes only one I type picture, which is the first picture in the GOP.
  • the fourth, seventh, tenth and thirteenth pictures are P type pictures and the remaining ten pictures are B type pictures.
  • each standard GOP is composed of 5 sub-groups of 3 pictures each. Each sub-group is made up of an anchor picture and two B type pictures. The spacing between anchor pictures, in this case 3, is known as the intra-period (M) of the GOP.
  • M intra-period
  • a video stream is further sub-divided in MPEG standards by defining a frame as being made up of a series of macro-blocks (MBs) .
  • a macro-block contains all the information required to display an area of the picture representing 16x16 luminance pixels.
  • MPEG2 and H.264 specify the syntax of a valid bit stream and the ways in which a decoder must interpret such a bit stream to arrive at the intended output, the output being uncompressed digital video.
  • the MPEG standards do not, however, specify an encoder.
  • An encoder is defined as any device, hardware or software, capable of outputting a bit stream that will produce the desired output when input to an MPEG compliant decoder.
  • an uncompressed video signal is input to the encoder and encoded in accordance with the applicable compression standard and the newly encoded signal is output from the encoder, received by a decoder and decoded for viewing.
  • the decoder may include a buffer that receives the encoded data and provides data to the decoding process.
  • the encoder must ensure that the encoded signal is being output at such a rate that the decoder may be continuously decoding the encoded signal and transmitting the decoded signal. If the encoder transmits the signal too slowly, there will be gaps in the data transmitted by the decoder as the decoder waits for data from the encoder. If the encoder transmits the signal too quickly, the decoder may not be able to keep up, causing buffer overflow at the decoder and unacceptable information loss.
  • rate control The process of managing an encoder's rate of transmission is known as rate control .
  • the encoder keeps track of decoder buffer fullness by use of a virtual buffer.
  • a simplistic method of rate control would be to assign a set number of bits to each picture, or frame, of the video signal to be encoded. However this is not efficient, as the number of bits used to encode every picture in the video stream must then be large enough to accommodate the most complex frame possible, when in fact a simple frame, such as a picture of a blue sky, will require fewer bits to encode than a complex frame (a picture of cloudy sunrise at the horizon) .
  • a measure of a picture's complexity, prior to the picture being encoded, allows the encoder to make a better decision regarding how many bits should be used to encode the picture.
  • This method can be further improved if the encoder has knowledge of the complexity of frames that it will be encoding in the future.
  • a video stream that begins showing a clear blue sky, and then pans down to show a sunset.
  • the initial frames will have a low measure of complexity and thus will be encoded using a relatively small number of bits.
  • the later frames contain much more complex information and will require a larger number of bits to encode. If, while deciding how many bits to allocate to the initial, simple, video frames, the encoder is made aware that it will soon need to allocate a large number of bits to encoding more complex frames, the encoder may further reduce the number of bits used to encode the simple frames and avoid or reduce the risk of overflowing a downstream decoder.
  • Another effective method of controlling the number of bits used to encode a picture is by modifying the quantization step size dynamically for each MB within the frame. For an MB of generally uniform color and intensity only a small number of possible pixel values are necessary and thus less bits will be required to describe it. The inverse is true for an MB containing a wide variety of color and intensity values since the encoder will have to describe wider range of pixel values .
  • each MB is assigned a quantization scale factor (Mquant) that is used to modify quantization step size.
  • the amount of bits allocated to a certain picture for encoding is a function of that picture's complexity, relative to other pictures.
  • a complexity weight factor is assigned for each picture type (Xi 2 , Xp 2 and X B2 for I, P and B type pictures respectively) .
  • Xp and X B2 represent the complexity measure for I, P, B pictures and may be calculated as :
  • Xp 2 Sp 2 Qp 2
  • S I2 , S P2 and S B2 are the number of bits for each picture and Q I2 , Q 2 and Q B2 are the average quantization parameters for all MBs in each picture (see below) .
  • TM-5 bit allocation for a picture is targeted based on how much of the bit space allocated for the GOP is remaining, the type of picture being encoded, and the complexity statistics of recently encoded pictures of the same type.
  • the target bit allocation is the number of bits TM-5 anticipates will be necessary to encode the frame.
  • the virtual buffer tracks the fullness of the decoder's buffer on an MB by MB basis while a picture is being encoded. Encoding all the MBs up to, but not including, the jth MB should use a certain portion of the total targeted bits. This portion is equal to B tar multiplied by the number of MBs already encoded (j-1) and divided by the total number of MBs in the picture (MB_cnt) .
  • the number of bits actually generated by encoding up to, but not including, the jth MB is equal to Bf -u .
  • the delta between the targeted and generated number of bits represents a change in the fullness of the virtual buffer after each MB is encoded (d j ) and is calculated prior to encoding the jth MB.
  • dj d 0 + B(j_i) - Btar* (j-1) /MB_cnt where d 0 equals the fullness of the virtual buffer at the beginning of the current picture.
  • the quantization step size increases, leading to a smaller yield of bits for the subsequent MBs. Similarly, if the virtual buffer begins to underfill, the quantization step size is decreased, leading to a larger yield of bits for subsequent MBs. This measure of the virtual buffer fullness is used to generate the MB's reference quantization number (Q j ) .
  • the macro-block quantization step size is further modulated as a function of spatial activity (act j ) .
  • the macro-block is divided into four 8x8 sub-blocks and the spatial activity is measured for each sub-block. The smallest of the four measurements is then normalized (N_act j ) against the average spatial activity (avg_act) of the previously encoded picture.
  • the minimum spatial activity measurement is used because the quality of a macro-block is no better than its sub-block of highest visible distortion.
  • N_act j (2*actj + avg_act) / (act j + 2*avg_act)
  • Video encoders/decoders (codecs) built based on newer standards eventually replace those built based on older standards in applications where specifications overlap, such as for bit-rate, resolution, etc. This replacement procedure takes a long period of time, since it is expensive to replace older video codecs with newer ones. Another reason older codecs continue to be used is that many video streams have already been compressed with the older algorithms, and may easily be decompressed by the older codecs .
  • a method of encoding frames of an uncompressed digital video stream comprising analyzing a first frame of the uncompressed digital video stream with a first algorithm to measure a first value of the first frame's complexity and to assign the first frame's picture type, estimating a second value of the first frame's complexity using the first measured value as a parameter, and encoding the first frame with a distinct second algorithm employing the second value of the first frame's complexity and the first frame's picture type as parameters.
  • a method of transcoding frames of a compressed digital video stream each frame being encoded according to a first encoding algorithm and having a level of complexity
  • said method comprising decoding a first frame of the compressed digital video stream with a first decoding algorithm to produce a decoded version of the first frame and to measure a first value of the first frame's complexity and to determine the first frame's picture type, estimating a second value of the first frame's complexity using the first value as a parameter, and encoding the decoded version of the first frame with a distinct second encoding algorithm employing the second value of the first frame's complexity and the first frame's picture type as parameters.
  • an apparatus for encoding an uncompressed digital video input stream composed of a succession of frames, each frame having a plurality of characteristics associated therewith comprising an extraction means for receiving the succession of frames of the uncompressed digital video input stream and employing a first method to obtain measured values for the plurality of characteristics of a frame of the input stream and to assign a picture type to the frame, a delay means for receiving the succession of frames of the input stream and outputting the frames in delayed fashion relative to the frames of the input stream, a value storage means for storing the measured values and the picture type of a frame in the delay means, and an encoding means for receiving a frame from the delay means and encoding the frame, the encoding means being responsive to a measured value stored in the value storage means for adjusting the size of the encoded version of the frame.
  • an apparatus for transcoding a compressed digital video input stream composed of a succession of encoded frames, each encoded frame having a plurality of characteristics associated therewith comprising a decoding means for receiving the succession of encoded frames of the compressed digital video input stream and employing a first method to obtain a succession of decoded frames and measured values for the plurality of characteristics of a decoded frame and to assign a picture type to the decoded frame, a delay means for receiving the succession of decoded frames of the input stream and outputting the decoded frames in delayed fashion relative to the encoded frames of the input stream, a value storage means for storing the measured values and the picture type of a decoded frame in the delay means, and an encoding means for receiving a decoded frame from the delay means and encoding the frame, the encoding means being responsive to a measured value stored in the value storage means for adjusting the size of the encoded version of the frame.
  • An embodiment of the present invention provides rate control with a picture-based lookahead window for encoders/transcoders having mixed codecs in a dual-pass compressed video architecture.
  • a transcoder where the input video signal is a compressed video signal, statistics are extracted by using a simple compression decoder to produce the statistics from the compressed video signal; and in an encoder, where the input video signal is an uncompressed video signal, statistics are extracted by using a simple compression encoder to generate the statistics from the uncompressed video signal.
  • a trans-factor is calculated for a current picture based on previous pictures in a sliding "past" window to predict the complexity of the current picture, the trans-factor being a ratio of global complexity measures for the simple compression standard versus a sophisticated compression standard.
  • Bits for the current picture are then allocated based on the complexity of future pictures in the lookahead or "future" window. If future pictures are difficult to encode, then less bits are allocated to the current picture, and vice versa. This is effective for a scene change. Because the lookahead window takes into account the statistics of future pictures, i.e., pictures that have not yet been compressed according to the sophisticated compression standard, a more reasonable bit allocation and better quality is achieved.
  • the actual bits, the picture complexity and the trans-factor for the encoded picture are updated as the past and lookahead windows are shifted by one picture, i.e., the encoded picture moves into the past window and out of the lookahead window as a new picture is loaded into the lookahead window.
  • Fig. 1 is a block diagram view of a dual-pass encoder/transcoder architecture implementing rate control with a picture-based lookahead window according to the present invention.
  • Fig. 2 is a flow chart view of a rate control algorithm according to the present invention.
  • FIG. 3 is a conceptual view of virtual sliding windows according to the present invention.
  • Fig. 1 illustrates an encoder/transcoder having a simple compression decoder 12 for receiving and decoding a compressed video stream encoded according to a simple compression standard, such as MPEG2 , to produce an uncompressed video signal and related statistics.
  • a simple compression encoder 14 receives an uncompressed video stream to generate related statistics.
  • the statistics are input to a lookahead window module 18 for processing by a rate control algorithm, described below, while the uncompressed video signal in either configuration (transcoder or encoder) is input to a storage and delay module 16.
  • the storage and delay module is a buffer memory that receives, delays and outputs the uncompressed video stream.
  • the lookahead window module contains the statistics for each picture in the storage and delay module 16, for example, the number of bits for the picture, the picture type and the average quantization step size over all the macro- blocks for the picture.
  • the lookahead window module 18 generates bit allocation data from the statistics for use by a sophisticated compression encoder 24, such as an H.264 encoder, in determining rate control for the sophisticated encoding process .
  • the storage and delay module compensates for the time required for the lookahead window module 18 to generate bit allocation data.
  • the delayed uncompressed video stream from the storage and delay module 16 is input to an adaptive pre- filter 20 to produce a filtered uncompressed video stream.
  • the filter may be a low-pass filter that attenuates high spatial frequencies in the images represented by the uncompressed video stream and thus serves to "blur" the uncompressed video stream so that it is easier to compress, i.e., is less complex and so requires fewer bits to compress.
  • the strength of the filtering may depend on a threshold or cut-off frequency above which spatial frequency components are attenuated and on the degree to which high spatial frequencies are attenuated.
  • Both the delayed uncompressed video stream from the storage and delay module 16 and the filtered, uncompressed video stream are input to a switch 22 which selects one of the streams .
  • the selected uncompressed video stream from the switch 22 and the bit allocation data from the look ahead window module 18 are input to the sophisticated compression encoder 24 to produce a compressed video stream according to a sophisticated compressed video standard, such as H.264 (MPEG4, Part 10) .
  • the sophisticated compression encoder 24 also provides a control signal to the adaptive pre-filter 20 and to the switch 22 that determines the "strength" of the filtering and which uncompressed video stream is to be encoded.
  • the strength of the filtering may be implemented as different filtering levels or may be continuous.
  • the adaptive pre-filter 20 may be switched off or set to a low strength for minimum filtering when the filtered uncompressed video stream is not selected for encoding by the sophisticated compression encoder 24.
  • the storage and delay module 16 stores multiple uncompressed images. Each image of the uncompressed video stream will ultimately be encoded by the sophisticated compression encoder 24 as an I, P or B type picture.
  • the type of picture (I, P or B) that a given uncompressed image will be encoded as is based on the statistics provided to the lookahead window module 18. Therefore, even though the images stored in the module 16 are not encoded, it will be convenient to refer to these images as I , P or B type pictures.
  • the number of images stored by the module 16 is limited by the size of the memory and an allowed maximum delay.
  • a storage length corresponding to at least two GOPs of the input video signal is desired.
  • the storage and delay module is designed to contain two standard GOPs of fifteen pictures each.
  • the lookahead window module 18 sets a bit-rate target for the current picture being encoded based on the received statistics, which include picture types (I, P or B) , picture size (in bytes) , and average quantizer step sizes at picture levels.
  • a P-type picture may be complicated and need high bit-rate for motion compensation in MPEG2 encoding if its corresponding original picture was recorded during flash light off/on/off transition time.
  • this P-type picture may be a simple picture to an H.264 encoder which is able to select one out of up to six reference pictures for motion prediction, and one of the references may be strongly correlated with this P-type picture, as indicated above.
  • the statistics of picture complexity obtained by the lookahead window module 18 may also be used for generating the control signal for the adaptive pre-filter 20 to control the strength of the low-pass filtering.
  • the strength of the adaptive pre-filter 20 may be increased so that the picture is heavily low-pass filtered, i.e., becomes softer and easier to encode.
  • the sophisticated compression encoder 24 employs the switch 22 to select either the delayed uncompressed video signal output from the storage and delay module 16 or the filtered video signal output by the adaptive pre-filter 20 based on the rate control information and on the virtual buffer fullness of the sophisticated compression encoder 24.
  • the amount of pre-filtering is increased so that the virtual buffer does not overflow and the filtered uncompressed video is the video signal that is encoded. If there is no danger of virtual buffer overflow, then the current picture is slightly filtered or not filtered at all. In the latter event the uncompressed video signal from the storage and delay module 16 is used as the input for encoding. However, frequently and abruptly changing the filter strength and/or switching between the uncompressed video signal and the filtered uncompressed video signal within a GOP may lead to a motion compensating residue signal for P and B pictures. This is avoided by smoothly controlling the pre-filter 20 within a GOP.
  • the rate control algorithm used is based on the Test Model 5 (TM5) specification.
  • TM5 takes a complexity measure to allocate target bits for each picture and then sets a quantization parameter for each MB based on the fullness of the virtual buffer.
  • TM5 takes a complexity measure to allocate target bits for each picture and then sets a quantization parameter for each MB based on the fullness of the virtual buffer.
  • the rate control algorithm includes two parts:
  • Both processes are adaptive and a past sliding window and a future sliding window are maintained to update the statistics after each picture is encoded. Note that the past sliding window is located in the lookahead window 18 and the future sliding window is located in the sophisticated compression encoder 24. Contrary to prior applications that used sliding windows which increment in terms of GOPs, the sliding windows of the present invention are picture-based, and move forward after encoding each picture.
  • the rate control algorithm has four steps: (a) statistics extraction; (b) complexity prediction; (c) bit allocation; and (d) statistics update.
  • VBR VBR
  • CBR H.264 constant bit-rate
  • Items 1 and 2 are used for calculation of the input video's complexity, while item 3 records the picture type that is used by the sophisticated compression encoder 24.
  • the current picture's complexity is predicted by that of the previous picture of the same type.
  • the current picture's complexity is predicted on the basis of the complexity of all pictures of the same type in the past window.
  • trans-factor is introduced to take into account the difference between the sophistication of the two standards and/or the two bit-rates.
  • the trans-factor is calculated as the average of previous simple/sophisticated ratios and is updated after the encoding of each picture. Because of the different properties of different picture types, the trans-factor is calculated independently for each picture type.
  • the complexity prediction algorithm has two steps:
  • a GOP can be described as containing Nr . I type pictures, N P P type pictures and N B B type pictures. For a standard GOP, as described above:
  • the storage and delay module 16 contains W GOPs. For this discussion, it will be assumed that the module 16 is designed to store 2 standard GOPs. Wi, W P and W B represent the total number of I , P and B type pictures respectively in the storage and delay module 16.
  • the trans-factor T cur of a picture in the uncompressed video stream is calculated by averaging the trans-factors of the previous W ty p e pictures of the same type (I, P or B) , the number of previous trans-factors that are averaged being equal to the total number of pictures of that type in the storage and delay module 16 (W I ⁇ W P , W B ) .
  • the trans- factors for B type pictures are further adjusted by a weight factor (K B4 ) to take into account the different quality requirements for different picture types.
  • K B4 has been empirically determined and is a function of the ratio of the current GOP's I type simple complexity and the average simple complexity of the GOP's B type pictures.
  • K B4 is larger for a well-predicted sequence, i.e. a sequence without fast motion, and is smaller for a sequence with fast motion.
  • K B4 is set adaptively after encoding each GOP in accordance with the ratio X I /X B where X x and X B are the average simple complexity of all I and B pictures in the current GOP .
  • the sophisticated complexity X P4 for a P type picture may also adjusted by a weight factor (K P4 ) , but it has been found that this is not necessary in practice.
  • Bit allocation may be based on GOP-layer and picture-layer. Picture-layer breaks the GOP boundary and performs better than GOP-layer. This is particularly effective for scene changes in the video signal . Bit allocation has two steps.
  • Allocate target bits for current (kth) picture [0055]
  • the target size (T w ) in bits, for all the pictures currently referenced in the sliding lookahead window is calculated based on the number of pictures in the window (W F ) , the constant bit rate (R) , in bits per second, and the picture rate (F) , in pictures per second.
  • T Desi W F (R/F) [0056] Then the targeted number of bits (B 4 _ tar (k)) to be allocated for the kth picture is calculated by multiplying T w by the ratio of the current picture's complexity factor to the complexity factor of all the pictures in the sliding lookahead window.
  • This calculation essentially identifies the proportion of the target size (T w ) that should be used for the current picture.
  • the size of the current picture when encoded by the complex encoding algorithm is not permitted to be larger than the size of the current picture when encoded by the simple compression algorithm (B 2 (k) ) .
  • B 2 (k) simple compression algorithm
  • the size of the current picture when encoded is clamped to B 2 (k) .
  • B 4 tar(k) does exceed B 2 (k)
  • the number of bits targeted for the kth picture is still B_ tar (k) .
  • B_ tar (k) the smaller number B 2 (k) will be the upper limit of bits used when the encoding actually takes place.
  • the target window size is then modified to take the extra bits into account.
  • T w (k+1) T w (k) + B 4 _ tar (k) - B 2 (k)
  • Bj is the number of bits generated by encoding all MBs in the picture up to and including j
  • MB_cnt is the number of MBs in the picture
  • T is the constant bit rate (CBR) per picture
  • d 0 is the initial fullness of the virtual buffer
  • dj is the fullness of the virtual buffer at MB j .
  • the reference quantization parameter Q j is then computed for MB j
  • actj I + mxn(vblk ⁇ , vblk ⁇ , . . ., vblks)
  • N_act j ( (2* act j ) + avg_act) / ⁇ act j + (2* avg act) ) where avcj_act is the average value of act j for the last picture to be encoded . Then adjust mquantj as :
  • mquant j The final value of mquant j is clipped to a range [1 . . . 51] and used for the quantization. Delta QP should be clipped to [-26,26], as defined by H.264 semantics. Then encode one MB with mquant ⁇ and repeat this step until all MBs of the current picture are encoded.
  • Trans-factor is defined as the ratio of "global complexity measure" of corresponding simple and sophisticated compression standards pictures.
  • T B [current_picture_SN] X B2 /X B4 where X I4/ X P4 and X B4 represent the complexity measure for the I, P, B picture of the output, sophisticated compression standard (H.264) stream:
  • MPEG2 simple compression standard
  • Tw(k+1) T w (k) + R/F - S(k)
  • This adjustment is done in order to maintain the proper level of virtual buffer fullness and makes T w (k+1) a more reasonable target size of the lookahead window for calculating the bit target for the (k+l)th picture. If the input sequence is not infinitely long, at the end of the sequence the lookahead window size W F and target size T both decrease. After encoding each picture the trans-factor, T l7 T P or T B , is updated and all the predicted complexity values in the future sliding window are updated as well .
  • the trans-factor to be used in determining the complexity for the current picture to encode is based upon the average of the trans-factors for the same picture type in the past window, while the bit allocation for the current picture is based on the overall complexity of the pictures in the lookahead window.
  • the actual sophisticated standard complexity is determined and entered in the past window while the oldest one is shifted out.
  • a new picture's statistics are loaded into the lookahead window to determine a new complexity for the window as the next picture to be encoded becomes the current picture .
  • the present invention provides rate control with a picture-based sliding window to simplify transcoding/encoding from a simple compression standard to a sophisticated compression standard by extracting statistics for a video signal using the simple compression standard, by using the extracted statistics and virtual buffer fullness to control a lowpass pre-filter for the uncompressed video signal, and by encoding the filtered or unfiltered uncompressed video signal using a trans-factor which is the ratio of global complexity measures for the simple and sophisticated compression generated standards pictures with a sliding window on a picture-by-picture basis, updating the trans-factor and sliding window for each picture.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A method of encoding frames of an uncompressed digital video stream includes analyzing a frame of the uncompressed digital video stream with a first algorithm (MPEG2) to measure a first value of the frame's complexity and to assign the frame's picture type, and estimating a second value of the frame's complexity using the first measured value as a parameter. The frame is then encoded with a distinct second algorithm (H.264) employing the second value of the frame's complexity and the first frame's picture type as parameters.

Description

RATE CONTROL WITH PICTURE-BASED LOOKAHEAD WINDOW
BACKGROUND OF THE INVENTION [0001] The present invention relates to compression coding of video signals, and more particularly to rate control with a picture-based lookahead window for dual-pass compression encoding/transcoding .
[0002] Two of the international standards for digital video compression are known as MPEG2 (H.262) and H.264 (MPEG 4 part 10) . There are several other standards to which the invention might be applied, such as H.261, MPEG1 and H.263, but the following description of an embodiment of the invention refers mainly to MPEG2 and H.264 so we will not discuss the other standards .
[0003] An uncompressed video stream can be described as a consecutive series of pictures, or frames. An individual frame describes a particular setting at a particular instant in time. A scene is a series of frames describing the same setting at consecutive moments in time. The second frame of a scene shows the same setting as the first frame, slightly further in the future. MPEG standards take advantage of that repetition of information using what is known as temporal encoding. In accordance with an MPEG video compression standard, such as MPEG2 , the encoder divides a video stream into sets of related pictures, known as groups of pictures (GOPs) . Each frame within a GOP is labeled by the encoder as an intraframe, a predicted frame or a bi-directional frame. An intraframe (I type frame) is encoded using only information from within that frame. No temporal encoding is used to compress the frame. A predicted (P type) frame is encoded using information from within the frame and uses an earlier I frame or P frame as a reference for temporal compression. I and P frames are referred to as anchor frames. A bi-directional (B type) frame is encoded using information from within the frame and can also use information from at least one earlier anchor frame and at least one later anchor frame. Within a GOP, the I frame is generally the most complex, followed by the P frames, and the B frames typically have the least amount complexity.
[0004] In one conventional implementation of MPEG2 , each GOP (which is referred to herein as a standard GOP) has a period (N) of 15 pictures, or frames, and includes only one I type picture, which is the first picture in the GOP. The fourth, seventh, tenth and thirteenth pictures are P type pictures and the remaining ten pictures are B type pictures. Thus, each standard GOP is composed of 5 sub-groups of 3 pictures each. Each sub-group is made up of an anchor picture and two B type pictures. The spacing between anchor pictures, in this case 3, is known as the intra-period (M) of the GOP. Thus a standard GOP, in display order, will appear as: I B B P B B P B B P B B P B B
[0005] In this conventional implementation of MPEG2 , the standard GOP is closed, i.e. it does not make any predictions based on frames outside of the GOP.
[0006] A video stream is further sub-divided in MPEG standards by defining a frame as being made up of a series of macro-blocks (MBs) . A macro-block contains all the information required to display an area of the picture representing 16x16 luminance pixels.
[0007] MPEG2 and H.264 specify the syntax of a valid bit stream and the ways in which a decoder must interpret such a bit stream to arrive at the intended output, the output being uncompressed digital video. The MPEG standards do not, however, specify an encoder. An encoder is defined as any device, hardware or software, capable of outputting a bit stream that will produce the desired output when input to an MPEG compliant decoder. [0008] In a typical application of an encoder, an uncompressed video signal is input to the encoder and encoded in accordance with the applicable compression standard and the newly encoded signal is output from the encoder, received by a decoder and decoded for viewing. In order to accommodate variation in the rate at which data is received by the decoder, the decoder may include a buffer that receives the encoded data and provides data to the decoding process. The encoder must ensure that the encoded signal is being output at such a rate that the decoder may be continuously decoding the encoded signal and transmitting the decoded signal. If the encoder transmits the signal too slowly, there will be gaps in the data transmitted by the decoder as the decoder waits for data from the encoder. If the encoder transmits the signal too quickly, the decoder may not be able to keep up, causing buffer overflow at the decoder and unacceptable information loss. The process of managing an encoder's rate of transmission is known as rate control . The encoder keeps track of decoder buffer fullness by use of a virtual buffer. [0009] A simplistic method of rate control would be to assign a set number of bits to each picture, or frame, of the video signal to be encoded. However this is not efficient, as the number of bits used to encode every picture in the video stream must then be large enough to accommodate the most complex frame possible, when in fact a simple frame, such as a picture of a blue sky, will require fewer bits to encode than a complex frame (a picture of cloudy sunrise at the horizon) . A measure of a picture's complexity, prior to the picture being encoded, allows the encoder to make a better decision regarding how many bits should be used to encode the picture. This method can be further improved if the encoder has knowledge of the complexity of frames that it will be encoding in the future. Consider a video stream that begins showing a clear blue sky, and then pans down to show a sunset. The initial frames will have a low measure of complexity and thus will be encoded using a relatively small number of bits. The later frames, however, contain much more complex information and will require a larger number of bits to encode. If, while deciding how many bits to allocate to the initial, simple, video frames, the encoder is made aware that it will soon need to allocate a large number of bits to encoding more complex frames, the encoder may further reduce the number of bits used to encode the simple frames and avoid or reduce the risk of overflowing a downstream decoder.
[0010] Another effective method of controlling the number of bits used to encode a picture is by modifying the quantization step size dynamically for each MB within the frame. For an MB of generally uniform color and intensity only a small number of possible pixel values are necessary and thus less bits will be required to describe it. The inverse is true for an MB containing a wide variety of color and intensity values since the encoder will have to describe wider range of pixel values . In accordance with this approach each MB is assigned a quantization scale factor (Mquant) that is used to modify quantization step size.
[0011] During the development of the MPEG2 standard it became necessary to design a generic rate control and quantization methodology with which to test bit-stream syntax, decoder designs and other aspects of the standard. This methodology was known as the Test Model and it evolved as the development of MPEG2 continued. The fifth and final version of the model (TM-5) was produced with the freezing of the MPEG2 standard. TM-5 is divided into three primary steps:
(a) target bit allocation; (b) rate control; and (c) adaptive quantization. a) Target Bit Allocation
[0012] As discussed above, the amount of bits allocated to a certain picture for encoding is a function of that picture's complexity, relative to other pictures. For a particular GOP, a complexity weight factor is assigned for each picture type (Xi2, Xp2 and XB2 for I, P and B type pictures respectively) . Xϊ2. Xp and XB2 represent the complexity measure for I, P, B pictures and may be calculated as :
Figure imgf000006_0001
Xp2 = Sp2Qp2
Figure imgf000006_0002
where SI2, SP2 and SB2 are the number of bits for each picture and QI2, Q2 and QB2 are the average quantization parameters for all MBs in each picture (see below) .
[0013] In TM-5, bit allocation for a picture is targeted based on how much of the bit space allocated for the GOP is remaining, the type of picture being encoded, and the complexity statistics of recently encoded pictures of the same type. The target bit allocation is the number of bits TM-5 anticipates will be necessary to encode the frame.
b) Rate Control
[0014] If there is a difference between the target bit allocation (Btar) and the actual number of bits required to encode a picture (Bact) , than there is a risk of under or over filling TM-5's virtual buffer. The virtual buffer tracks the fullness of the decoder's buffer on an MB by MB basis while a picture is being encoded. Encoding all the MBs up to, but not including, the jth MB should use a certain portion of the total targeted bits. This portion is equal to Btar multiplied by the number of MBs already encoded (j-1) and divided by the total number of MBs in the picture (MB_cnt) . The number of bits actually generated by encoding up to, but not including, the jth MB is equal to Bf -u .The delta between the targeted and generated number of bits represents a change in the fullness of the virtual buffer after each MB is encoded (dj) and is calculated prior to encoding the jth MB. dj = d0 + B(j_i) - Btar* (j-1) /MB_cnt where d0 equals the fullness of the virtual buffer at the beginning of the current picture.
[0015] If the virtual buffer begins to overflow, the quantization step size increases, leading to a smaller yield of bits for the subsequent MBs. Similarly, if the virtual buffer begins to underfill, the quantization step size is decreased, leading to a larger yield of bits for subsequent MBs. This measure of the virtual buffer fullness is used to generate the MB's reference quantization number (Qj) .
c) Adaptive Quantization
[0016] The macro-block quantization step size is further modulated as a function of spatial activity (actj) . The macro-block is divided into four 8x8 sub-blocks and the spatial activity is measured for each sub-block. The smallest of the four measurements is then normalized (N_actj) against the average spatial activity (avg_act) of the previously encoded picture. The minimum spatial activity measurement is used because the quality of a macro-block is no better than its sub-block of highest visible distortion.
N_actj = (2*actj + avg_act) / (actj + 2*avg_act)
[0017] The product of a MB's normalized spatial activity and its reference quantization parameter gives the MB's quantization scale factor (Mquantj)*.
Mquantj = Qj*N_actj
Generally, the encoding algorithms recommended by newer compression standards are more efficient, but they are usually more complicated to implement. With the fast growth in computation speeds of central processing units (CPUs) and digital signal processing (DSP) chips, implementation of more and more sophisticated algorithms has become practically feasible. Video encoders/decoders (codecs) built based on newer standards eventually replace those built based on older standards in applications where specifications overlap, such as for bit-rate, resolution, etc. This replacement procedure takes a long period of time, since it is expensive to replace older video codecs with newer ones. Another reason older codecs continue to be used is that many video streams have already been compressed with the older algorithms, and may easily be decompressed by the older codecs . However where high coding efficiency is desired, there arises the mixed use of both older and newer codecs. In some applications it is desirable to re-transmit video streams compressed with an older codec at a new bit-rate that is lower than the older codec can achieve for the same video quality. Therefore to obtain higher compression efficiency a transcoder having mixed codecs (an older decoder and a newer encoder) is used. One good example is a transcoder that converts MPEG2 compressed video streams to H.264 compressed video streams. [0018] It is recognized by the digital compression industry that dual-pass encoding with a lookahead window provides higher coding efficiency than single-pass encoding. For the emerging, more sophisticated, compression technologies, even single-pass encoding is expensive and the cost of dual-pass encoding is much higher than that of single-pass encoding. Using two sophisticated codecs for encoding/transcoding in a dual-pass architecture raises the cost of the encoder/transcoder by almost an order of magnitude over the older technology codecs . [0019] What is desired is the ability to achieve higher coding efficiency for minimal cost in an encoder/transcoder using mixed codecs.
BRIEF SUMMARY OF THE INVENTION [0020] In accordance with a first aspect of the invention there is provided a method of encoding frames of an uncompressed digital video stream, each frame having a level of complexity, said method comprising analyzing a first frame of the uncompressed digital video stream with a first algorithm to measure a first value of the first frame's complexity and to assign the first frame's picture type, estimating a second value of the first frame's complexity using the first measured value as a parameter, and encoding the first frame with a distinct second algorithm employing the second value of the first frame's complexity and the first frame's picture type as parameters.
[0021] In accordance with a second aspect of the invention there is a method of transcoding frames of a compressed digital video stream, each frame being encoded according to a first encoding algorithm and having a level of complexity, said method comprising decoding a first frame of the compressed digital video stream with a first decoding algorithm to produce a decoded version of the first frame and to measure a first value of the first frame's complexity and to determine the first frame's picture type, estimating a second value of the first frame's complexity using the first value as a parameter, and encoding the decoded version of the first frame with a distinct second encoding algorithm employing the second value of the first frame's complexity and the first frame's picture type as parameters. [0022] In accordance with a third aspect of the invention there is an apparatus for encoding an uncompressed digital video input stream composed of a succession of frames, each frame having a plurality of characteristics associated therewith, said apparatus comprising an extraction means for receiving the succession of frames of the uncompressed digital video input stream and employing a first method to obtain measured values for the plurality of characteristics of a frame of the input stream and to assign a picture type to the frame, a delay means for receiving the succession of frames of the input stream and outputting the frames in delayed fashion relative to the frames of the input stream, a value storage means for storing the measured values and the picture type of a frame in the delay means, and an encoding means for receiving a frame from the delay means and encoding the frame, the encoding means being responsive to a measured value stored in the value storage means for adjusting the size of the encoded version of the frame.
[0023] In accordance with a fourth aspect of the invention there is an apparatus for transcoding a compressed digital video input stream composed of a succession of encoded frames, each encoded frame having a plurality of characteristics associated therewith, said apparatus comprising a decoding means for receiving the succession of encoded frames of the compressed digital video input stream and employing a first method to obtain a succession of decoded frames and measured values for the plurality of characteristics of a decoded frame and to assign a picture type to the decoded frame, a delay means for receiving the succession of decoded frames of the input stream and outputting the decoded frames in delayed fashion relative to the encoded frames of the input stream, a value storage means for storing the measured values and the picture type of a decoded frame in the delay means, and an encoding means for receiving a decoded frame from the delay means and encoding the frame, the encoding means being responsive to a measured value stored in the value storage means for adjusting the size of the encoded version of the frame.
[0024] An embodiment of the present invention provides rate control with a picture-based lookahead window for encoders/transcoders having mixed codecs in a dual-pass compressed video architecture. In a transcoder, where the input video signal is a compressed video signal, statistics are extracted by using a simple compression decoder to produce the statistics from the compressed video signal; and in an encoder, where the input video signal is an uncompressed video signal, statistics are extracted by using a simple compression encoder to generate the statistics from the uncompressed video signal. A trans-factor is calculated for a current picture based on previous pictures in a sliding "past" window to predict the complexity of the current picture, the trans-factor being a ratio of global complexity measures for the simple compression standard versus a sophisticated compression standard. Bits for the current picture are then allocated based on the complexity of future pictures in the lookahead or "future" window. If future pictures are difficult to encode, then less bits are allocated to the current picture, and vice versa. This is effective for a scene change. Because the lookahead window takes into account the statistics of future pictures, i.e., pictures that have not yet been compressed according to the sophisticated compression standard, a more reasonable bit allocation and better quality is achieved. After encoding the current picture according to the sophisticated compression standard, the actual bits, the picture complexity and the trans-factor for the encoded picture are updated as the past and lookahead windows are shifted by one picture, i.e., the encoded picture moves into the past window and out of the lookahead window as a new picture is loaded into the lookahead window. [0025] The objects, advantages and other novel features of the present invention are apparent from the following detailed description when read in conjunction with the appended claims and attached drawings.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS [0026] Fig. 1 is a block diagram view of a dual-pass encoder/transcoder architecture implementing rate control with a picture-based lookahead window according to the present invention. [0027] Fig. 2 is a flow chart view of a rate control algorithm according to the present invention.
[0028] Fig. 3 is a conceptual view of virtual sliding windows according to the present invention.
DETAILED DESCRIPTION OF THE INVENTION [0029] Fig. 1 illustrates an encoder/transcoder having a simple compression decoder 12 for receiving and decoding a compressed video stream encoded according to a simple compression standard, such as MPEG2 , to produce an uncompressed video signal and related statistics. Alternatively a simple compression encoder 14 receives an uncompressed video stream to generate related statistics. The statistics are input to a lookahead window module 18 for processing by a rate control algorithm, described below, while the uncompressed video signal in either configuration (transcoder or encoder) is input to a storage and delay module 16. The storage and delay module is a buffer memory that receives, delays and outputs the uncompressed video stream. The lookahead window module contains the statistics for each picture in the storage and delay module 16, for example, the number of bits for the picture, the picture type and the average quantization step size over all the macro- blocks for the picture. The lookahead window module 18 generates bit allocation data from the statistics for use by a sophisticated compression encoder 24, such as an H.264 encoder, in determining rate control for the sophisticated encoding process . The storage and delay module compensates for the time required for the lookahead window module 18 to generate bit allocation data.
[0030] The delayed uncompressed video stream from the storage and delay module 16 is input to an adaptive pre- filter 20 to produce a filtered uncompressed video stream. The filter may be a low-pass filter that attenuates high spatial frequencies in the images represented by the uncompressed video stream and thus serves to "blur" the uncompressed video stream so that it is easier to compress, i.e., is less complex and so requires fewer bits to compress. The strength of the filtering may depend on a threshold or cut-off frequency above which spatial frequency components are attenuated and on the degree to which high spatial frequencies are attenuated.
[0031] Both the delayed uncompressed video stream from the storage and delay module 16 and the filtered, uncompressed video stream are input to a switch 22 which selects one of the streams . The selected uncompressed video stream from the switch 22 and the bit allocation data from the look ahead window module 18 are input to the sophisticated compression encoder 24 to produce a compressed video stream according to a sophisticated compressed video standard, such as H.264 (MPEG4, Part 10) . The sophisticated compression encoder 24 also provides a control signal to the adaptive pre-filter 20 and to the switch 22 that determines the "strength" of the filtering and which uncompressed video stream is to be encoded. The strength of the filtering may be implemented as different filtering levels or may be continuous. The adaptive pre-filter 20 may be switched off or set to a low strength for minimum filtering when the filtered uncompressed video stream is not selected for encoding by the sophisticated compression encoder 24.
[0032] By using a simple encoder/decoder 12/14 instead of a sophisticated encoder/decoder at the input, the implementation cost is reduced close to that of a single-pass sophisticated codec. However the information on complexity estimation for pictures in the lookahead window module 18 is not exactly the desired information for the sophisticated compression encoder 24. For example, a P-type picture needs high bit-rate for motion compensation in simple (MPEG2) compression encoding if its corresponding original picture was recorded during light off/on/off transition time. On the other hand this P-type picture may be a simple picture for the sophisticated (H.264) encoder. Despite this deficiency, a correlation of picture complexity estimation can be made, based on both the simple and sophisticated compression standards. In most cases a picture or a group of pictures
(GOP) that is relatively complicated/simple for the simple compression standard is also relatively complicated/simple for the sophisticated compression encoder 24. The complexity statistics still indicate important relationships among pictures and macro-blocks (MBs) , with the error being tolerable. Therefore compared to single-pass sophisticated coding, the pseudo dual-pass sophisticated coding is superior in video coding efficiency with only a slightly higher implementation cost.
[0033] The statistics of picture complexity are used for:
- estimation of a bit-rate target and selection of quantizer step sizes of macro-blocks for a current picture before second pass encoding; and
- control of the strength of the adaptive pre-filter 20 for a current GOP which includes the current picture before second pass encoding. [0034] The larger the volume of statistics that are available to compute the bit allocation provided to the compression encoder 24, the better the video quality performance of the encoder/transcoder. Therefore the storage and delay module 16 stores multiple uncompressed images. Each image of the uncompressed video stream will ultimately be encoded by the sophisticated compression encoder 24 as an I, P or B type picture. The type of picture (I, P or B) that a given uncompressed image will be encoded as is based on the statistics provided to the lookahead window module 18. Therefore, even though the images stored in the module 16 are not encoded, it will be convenient to refer to these images as I , P or B type pictures. The number of images stored by the module 16 is limited by the size of the memory and an allowed maximum delay. A storage length corresponding to at least two GOPs of the input video signal is desired. For the purposes of this description, it is assumed that the storage and delay module is designed to contain two standard GOPs of fifteen pictures each.
[0035] The lookahead window module 18 sets a bit-rate target for the current picture being encoded based on the received statistics, which include picture types (I, P or B) , picture size (in bytes) , and average quantizer step sizes at picture levels.
[0036] The picture complexity in the sense of encoding is not the same for the two different compression standards. A P-type picture may be complicated and need high bit-rate for motion compensation in MPEG2 encoding if its corresponding original picture was recorded during flash light off/on/off transition time. On the other hand this P-type picture may be a simple picture to an H.264 encoder which is able to select one out of up to six reference pictures for motion prediction, and one of the references may be strongly correlated with this P-type picture, as indicated above. [0037] In addition to setting the bit-rate target, the statistics of picture complexity obtained by the lookahead window module 18 may also be used for generating the control signal for the adaptive pre-filter 20 to control the strength of the low-pass filtering. If the rate control information indicates that the current picture is a difficult picture which needs more bit-rate to encode, the strength of the adaptive pre-filter 20 may be increased so that the picture is heavily low-pass filtered, i.e., becomes softer and easier to encode. The sophisticated compression encoder 24 employs the switch 22 to select either the delayed uncompressed video signal output from the storage and delay module 16 or the filtered video signal output by the adaptive pre-filter 20 based on the rate control information and on the virtual buffer fullness of the sophisticated compression encoder 24. For example, if the virtual buffer is approaching full and the rate information indicates that the current picture for encoding requires more bits than are available in the virtual buffer, then the amount of pre-filtering is increased so that the virtual buffer does not overflow and the filtered uncompressed video is the video signal that is encoded. If there is no danger of virtual buffer overflow, then the current picture is slightly filtered or not filtered at all. In the latter event the uncompressed video signal from the storage and delay module 16 is used as the input for encoding. However, frequently and abruptly changing the filter strength and/or switching between the uncompressed video signal and the filtered uncompressed video signal within a GOP may lead to a motion compensating residue signal for P and B pictures. This is avoided by smoothly controlling the pre-filter 20 within a GOP. If a picture is filtered, any other pictures which use it as reference should also be filtered with at least the same filter strength. [0038] The rate control algorithm used, by way of illustration, is based on the Test Model 5 (TM5) specification. TM5 takes a complexity measure to allocate target bits for each picture and then sets a quantization parameter for each MB based on the fullness of the virtual buffer. In the transcoder configuration all of the information about the input video signal is available from the encoded compressed video stream via the decoder 12, especially the statistics about the complexity of the input content . In the encoder configuration all the information about the input video signal is available from the uncompressed video stream via the simple encoder 14, especially the statistics about the complexity of the input content. The rate control algorithm includes two parts:
1. Take "past" statistics for complexity prediction.
2. Take "future" statistics for bit allocation.
Both processes are adaptive and a past sliding window and a future sliding window are maintained to update the statistics after each picture is encoded. Note that the past sliding window is located in the lookahead window 18 and the future sliding window is located in the sophisticated compression encoder 24. Contrary to prior applications that used sliding windows which increment in terms of GOPs, the sliding windows of the present invention are picture-based, and move forward after encoding each picture.
[0039] The rate control algorithm has four steps: (a) statistics extraction; (b) complexity prediction; (c) bit allocation; and (d) statistics update.
(a) Statistics Extraction
[0040] When transcoding from an MPEG2 variable bit-rate
(VBR) stream to an H.264 constant bit-rate (CBR) stream or encoding an uncompressed video stream to an H.264 CBR stream the following information is collected: 1. Average quantization parameters (quantization step size) for each picture.
2. Output bits for each picture.
3. Picture type (I, P, B) for each picture.
Items 1 and 2 are used for calculation of the input video's complexity, while item 3 records the picture type that is used by the sophisticated compression encoder 24.
(b) Complexity Prediction
[0041] Complexity prediction is to predict the complexity of a current picture from a prior simple/sophisticated
(MPEG2/H.264) complexity ratio and the input complexity of the current picture. In TM5 the current picture's complexity is predicted by that of the previous picture of the same type. In an embodiment of the invention, the current picture's complexity is predicted on the basis of the complexity of all pictures of the same type in the past window. However since the statistics are based on a simple encoding format an adjustment to the algorithm in the form of a scaling factor, referenced here as trans-factor, is introduced to take into account the difference between the sophistication of the two standards and/or the two bit-rates. The trans-factor is calculated as the average of previous simple/sophisticated ratios and is updated after the encoding of each picture. Because of the different properties of different picture types, the trans-factor is calculated independently for each picture type.
[0042] The complexity prediction algorithm has two steps:
[0043] 1. Calculate a current trans-factor for a current picture by averaging previous trans-factors
[0044] At the beginning of a video sequence to be encoded/transcoded there are three initial values for the trans-factor, corresponding to the three picture types (I, P, B) respectively. The average trans-factor over a past sliding window is generally better than that of only one picture and takes into account pictures that have already been encoded by the sophisticated encoder 24 and are within the past window. [0045] A GOP can be described as containing Nr. I type pictures, NP P type pictures and NB B type pictures. For a standard GOP, as described above:
N = 15
M = 3
Ni = 1
NP = (N/M) - Ni = (15/3) - 1 = 4
NB = N - NΪ - NP = 15 - 1 - 4 = 10
[0046] The storage and delay module 16 contains W GOPs. For this discussion, it will be assumed that the module 16 is designed to store 2 standard GOPs. Wi, WP and WB represent the total number of I , P and B type pictures respectively in the storage and delay module 16.
WΪ = WNi =2* (1) = 2
Figure imgf000019_0001
WB = W(NB) = 2* (10) = 20
[0047] The trans-factor Tcur of a picture in the uncompressed video stream is calculated by averaging the trans-factors of the previous Wtype pictures of the same type (I, P or B) , the number of previous trans-factors that are averaged being equal to the total number of pictures of that type in the storage and delay module 16 (W WP, WB) .
Tlcur
Tpour
TBCUΓ
Figure imgf000019_0002
where j is the picture number of the current picture. [0048] For a storage and delay module 16 that contains two standard GOPs, as described above, two, eight or twenty previous trans-factors are averaged for an I, P or B type picture respectively.
[0049] 2. Predict current picture's complexity
[0050] For I and P type pictures, the current picture's sophisticated, or MPEG4 , complexity (Xϊ , X4 or XB4) is then predicted from the picture's simple, or MPEG2 , complexity
(X12, Xp2 or XB2) using the updated trans-factor (Tcur) to appropriately scale the simple complexity factor. The trans- factors for B type pictures are further adjusted by a weight factor (KB4) to take into account the different quality requirements for different picture types. This weight factor has been empirically determined and is a function of the ratio of the current GOP's I type simple complexity and the average simple complexity of the GOP's B type pictures.
Xl4 = l2/ Ticur p4 = Xp2/ Tpcur XB4 = Xβ2 / ( TBcur*KB4 )
[0051] KB4 is larger for a well-predicted sequence, i.e. a sequence without fast motion, and is smaller for a sequence with fast motion. KB4 is set adaptively after encoding each GOP in accordance with the ratio XI/XB where Xx and XB are the average simple complexity of all I and B pictures in the current GOP .
Table 1. Empirically Determined Values for KB4
Figure imgf000020_0001
[0052] In principle, the sophisticated complexity XP4 for a P type picture may also adjusted by a weight factor (KP4) , but it has been found that this is not necessary in practice.
c) Bit Allocation
[0053] Bit allocation may be based on GOP-layer and picture-layer. Picture-layer breaks the GOP boundary and performs better than GOP-layer. This is particularly effective for scene changes in the video signal . Bit allocation has two steps.
[0054] 1. Allocate target bits for current (kth) picture [0055] The target size (Tw) , in bits, for all the pictures currently referenced in the sliding lookahead window is calculated based on the number of pictures in the window (WF) , the constant bit rate (R) , in bits per second, and the picture rate (F) , in pictures per second.
T„ = WF(R/F) [0056] Then the targeted number of bits (B4_tar(k)) to be allocated for the kth picture is calculated by multiplying Tw by the ratio of the current picture's complexity factor to the complexity factor of all the pictures in the sliding lookahead window.
Figure imgf000021_0001
[0057] This calculation essentially identifies the proportion of the target size (Tw) that should be used for the current picture. The size of the current picture when encoded by the complex encoding algorithm is not permitted to be larger than the size of the current picture when encoded by the simple compression algorithm (B2 (k) ) . Thus, the size of the current picture when encoded is clamped to B2 (k) . If B4tar(k) does exceed B2 (k) , the number of bits targeted for the kth picture is still B_tar(k) . However it is already known that the smaller number B2 (k) will be the upper limit of bits used when the encoding actually takes place. Thus there is a known surplus of bits in the target window. The target window size is then modified to take the extra bits into account.
Tw(k+1) = Tw(k) + B4_tar(k) - B2(k)
[0058] 2. Adaptive quantization and encoding (TM5) [0059] Before encoding MBj the fullness of the virtual buffer is computed for I, P, B independently:
dj = do + Bj-i - (T* (j-1) ) /MB_cnt
where Bj is the number of bits generated by encoding all MBs in the picture up to and including j , MB_cnt is the number of MBs in the picture, T is the constant bit rate (CBR) per picture, d0 is the initial fullness of the virtual buffer and dj is the fullness of the virtual buffer at MBj . The reference quantization parameter Qj is then computed for MBj
Qj = dj*51/r where the reaction parameter, r, is r = 2*R/F
Adaptive quantization:
[0060] The spatial activity for MBj from four luminance picture-organized sub-blocks (n = 1 . . . 4) and four luminance field-organized sub-blocks (n = 5 . . . 8) are computed using the original pixel values
actj = I + mxn(vblkι, vblk∑, . . ., vblks)
where vblkn =
Figure imgf000023_0001
and
P_meann =
Figure imgf000023_0002
where P is the pixel gray level . Then normalize actj :
N_actj = ( (2* actj) + avg_act) / {actj + (2* avg act) ) where avcj_act is the average value of actj for the last picture to be encoded . Then adjust mquantj as :
mquantj = Qj *N_actj
The final value of mquantj is clipped to a range [1 . . . 51] and used for the quantization. Delta QP should be clipped to [-26,26], as defined by H.264 semantics. Then encode one MB with mquant± and repeat this step until all MBs of the current picture are encoded.
[0061] (d) . Update picture complexity and trans-factor for just encoded picture
[0062] The picture complexity and trans-factor for the just encoded picture are updated and stored in the sliding past window for use with future pictures. [0063] 1. Trans-factor is defined as the ratio of "global complexity measure" of corresponding simple and sophisticated compression standards pictures.
Ti [current_j)icture_SN] = X12/X14
TP [current_picture_SN] = XP2/XP4
TB [current_picture_SN] = XB2/XB4 where XI4/ XP4 and XB4 represent the complexity measure for the I, P, B picture of the output, sophisticated compression standard (H.264) stream:
Xl4 = Sι4Qi4 Xp4 = Sp4Qp4
Figure imgf000024_0001
and the definitions of SI4, SP4, SB4, Qι4, QP4 and QB4 correspond to the definitions of the corresponding quantities under the simple compression standard (MPEG2) .
[0064] 2. Since a picture's bit target (B4_tar) is a function of the target size of the look-ahead window we adjust the target size of the look-ahead window, Tw, for the
(k+1) th picture after the kth picture is encoded in order to account for any difference between the actual encoded size of the kth picture, S (k) , and the average encoded picture size, R/F, where R is the constant bit rate and F is the frame rate .
Tw(k+1) = Tw(k) + R/F - S(k) This adjustment is done in order to maintain the proper level of virtual buffer fullness and makes Tw(k+1) a more reasonable target size of the lookahead window for calculating the bit target for the (k+l)th picture. If the input sequence is not infinitely long, at the end of the sequence the lookahead window size WF and target size T both decrease. After encoding each picture the trans-factor, Tl7 TP or TB, is updated and all the predicted complexity values in the future sliding window are updated as well .
[0065] As illustrated in Fig. 3 the trans-factor to be used in determining the complexity for the current picture to encode is based upon the average of the trans-factors for the same picture type in the past window, while the bit allocation for the current picture is based on the overall complexity of the pictures in the lookahead window. After each picture is encoded, then the actual sophisticated standard complexity is determined and entered in the past window while the oldest one is shifted out. A new picture's statistics are loaded into the lookahead window to determine a new complexity for the window as the next picture to be encoded becomes the current picture .
[0066] Thus the present invention provides rate control with a picture-based sliding window to simplify transcoding/encoding from a simple compression standard to a sophisticated compression standard by extracting statistics for a video signal using the simple compression standard, by using the extracted statistics and virtual buffer fullness to control a lowpass pre-filter for the uncompressed video signal, and by encoding the filtered or unfiltered uncompressed video signal using a trans-factor which is the ratio of global complexity measures for the simple and sophisticated compression generated standards pictures with a sliding window on a picture-by-picture basis, updating the trans-factor and sliding window for each picture. [0067] It will be appreciated that the invention is not restricted to the particular embodiment that has been described, and that variations may be made therein without departing from the scope of the invention as defined in the appended claims and equivalents thereof. Unless the context indicates otherwise, a reference in a claim to the number of instances of an element, be it a reference to one instance or more than one instance, requires at least the stated number of instances of the element but is not intended to exclude from the scope of the claim a structure or method having more instances of that element than stated.

Claims

1. A method of encoding frames of an uncompressed digital video stream, each frame having a level of complexity, said method comprising: analyzing a first frame of the uncompressed digital video stream with a first algorithm to measure a first value of the first frame's complexity and to assign the first frame's picture type, estimating a second value of the first frame's complexity using the first measured value as a parameter, and encoding the first frame with a distinct second algorithm employing the second value of the first frame's complexity and the first frame's picture type as parameters.
2. A method according to claim 1, further comprising: analyzing the first frame with the second encoding algorithm to measure a third value of the first frame's complexity, and storing the picture type and a complexity ratio of the first frame.
3. A method according to claim 2, comprising, prior to encoding the first frame with the second algorithm, generating an encoded version of a past frame with the second algorithm, said encoded version of the past frame having a size, and estimating a value of a future frame's complexity, and wherein the step of encoding the first frame with the second algorithm further comprises: determining a target size for the first frame using the size of the past frame and the estimated value of the future frame's complexity as parameters, and encoding the first frame with the second algorithm employing the first frame's target size as a parameter.
4. A method according to claim 3, wherein the step of encoding the first frame with the second algorithm produces an encoded version of the first frame, and the step of determining the target size of the first frame comprises predicting the size of said encoded version of the first frame .
5. A method according to claim 4, wherein the first frame's complexity ratio is calculated by dividing the first value of the frame's complexity by the third value of the first frame's complexity.
6. A method according to claim 5, wherein the first frame is followed by a plurality of frames of the uncompressed video stream, and the method further comprises : analyzing each of the plurality of frames of the uncompressed digital video stream with the first algorithm to measure first values of each frame's complexity and to assign each frame's picture type, estimating a second value of each frame's complexity, using each frame's first value as a parameter, and encoding each of the plurality of frames with the second algorithm employing the second value and picture type of each frame as parameters .
7. A method according to claim 6, further comprising: analyzing each of the plurality of frames with the second encoding algorithm to measure a third value of the complexity for each frame, and storing the picture type and complexity ratio for each of the plurality of frames.
8. A method according to claim 7, wherein the plurality of frames is followed by a second frame, and the method further comprises : analyzing the second frame with the first algorithm to measure a first value of second frame's complexity and to assign the second frame's picture type, estimating a second value of the second frame's complexity using the first value as a parameter, and encoding the second frame with the second algorithm, employing the second value of the second frame's complexity and the second frame's picture type as parameters.
9. A method according to claim 8, comprising estimating the second frame's second value by dividing the second frame's first value by a translation factor.
10. A method according to claim 9 comprising calculating the second frame's translation factor by averaging the stored complexity ratios associated with a sub-set of the plurality of frames, the sub-set being the frames that have been encoded by the second algorithm and have the same picture type as the second frame .
11. A method according to claim 10, further comprising: analyzing the second frame with the second encoding algorithm to measure a third value of the second frame's complexity, and storing the picture type and complexity ratio of the second frame .
12. A method according to claim 1, wherein the step of estimating the first frame's second value comprises dividing the first value by a default value based on the first frame's picture type.
13. A method according to claim 1, further comprising: receiving an unfiltered version of the first frame, creating a filtered version of the first frame prior to encoding the first frame with the second algorithm, and selecting either the unfiltered or filtered version of the first frame for encoding by the second algorithm.
14. A method of transcoding frames of a compressed digital video stream, each frame being encoded according to a first encoding algorithm and having a level of complexity, said method comprising: decoding a first frame of the compressed digital video stream with a first decoding algorithm to produce a decoded version of the first frame and to measure a first value of the first frame's complexity and to determine the first frame's picture type, estimating a second value of the first frame's complexity using the first value as a parameter, and encoding the decoded version of the first frame with a distinct second encoding algorithm employing the second value of the first frame's complexity and the first frame's picture type as parameters .
15. A method according to claim 14, further comprising: analyzing the decoded version of the first frame with the second encoding algorithm to measure a third value of the first frame's complexity, and storing the picture type and a complexity ratio of the first frame.
16. A method according to claim 15, comprising, prior to encoding the decoded version of the first frame with the second algorithm, generating an encoded version of a past frame with the second algorithm, said encoded version of the past frame having a size, and estimating a value of a future frame's complexity, and wherein the step of encoding the decoded version of the first frame with the second algorithm further comprises : determining a target size for the decoded version of the first frame using the size of the past frame and the second value of the future frame's complexity as parameters, and encoding the decoded version of the first frame with the second algorithm employing the first frame's target size as a parameter.
17. A method according to claim 16, wherein the step of encoding the first frame with the second algorithm produces an encoded version of the first frame, and the step of determining the target size of the decoded version of the first frame comprises predicting the size of said encoded version of the first frame.
18. A method according to claim 17, wherein the first frame's complexity ratio is calculated by dividing the first value of the frame's complexity by the third value of the first frame's complexity.
19. A method according to claim 18, wherein the first frame is followed by a plurality of frames of the compressed video stream, and the method further comprises analyzing each of the plurality of frames of the compressed digital video stream with the first algorithm to produce a decoded version of each frame and to measure a first value of each frame's complexity and to determine each frame's picture type, estimating a second value of each frame's complexity using each frame's first value as a parameter, and encoding the decoded version of each of the plurality of frames with the second algorithm, employing the second value and picture type of each frame as parameters.
20. A method according to claim 19, further comprising: analyzing the decoded version of each of the plurality of frames with the second algorithm to measure a third value of the complexity of each frame and storing the picture type and complexity ratio for each of the plurality of frames.
21. A method according to claim 20, wherein the plurality of frames is followed by a second frame, and the method further comprises : analyzing the second frame with the first algorithm to produce a decoded version of the second frame and to measure a first value of the second frame's complexity and to determine the second frame's picture type, estimating a second value of the second frame's complexity using the first value as a parameter, and encoding the decoded version of the second frame with the second algorithm employing the second value of the second frame's complexity and the second frame's picture type as parameters .
22. A method according to claim 21, comprising estimating the second frame's second value by dividing the second frame's first value by a translation factor.
23. A method according to claim 22, comprising calculating the second frame's translation factor by averaging the stored complexity ratios associated with a sub- set of the plurality of frames, the sub-set being the frames that have been encoded by the second algorithm and have the same picture type as the second frame.
24. A method according to claim 23, further comprising: analyzing the decoded version of the second encoded frame with the second algorithm to measure a third value of the second frame's complexity, and storing the picture type and the complexity ratio of the second frame .
25. A method according to claim 14, wherein the step of estimating the first frame's second value comprises dividing the first value by a default value based on the first frame's picture type.
26. A method according to claim 14, further comprising: creating a filtered version of the first frame from an unfiltered, uncompressed version of the first frame prior to encoding the first frame with the second algorithm, and selecting either the unfiltered or filtered version of the first frame for encoding by the second algorithm.
27. An apparatus for encoding an uncompressed digital video input stream composed of a succession of frames, each frame having a plurality of characteristics associated therewith, said apparatus comprising: an extraction means for receiving the succession of frames of the uncompressed digital video input stream and employing a first method to obtain measured values for the plurality of characteristics of a frame of the input stream and to assign a picture type to the frame, a delay means for receiving the succession of frames of the input stream and outputting the frames in delayed fashion relative to the frames of the input stream, a value storage means for storing the measured values and the picture type of a frame in the delay means, and an encoding means for receiving a frame from the delay means and encoding the frame, the encoding means being responsive to a measured value stored in the value storage means for adjusting the size of the encoded version of the frame .
28. An apparatus according to claim 27, wherein the value storage means further comprises a means for manipulating a measured value of a first characteristic of a frame to derive estimated values for a second characteristic of the frame .
29. An apparatus according to claim 28, wherein the encoding means receives the picture type, a measured value of a characteristic and an estimated value of a characteristic associated with said frame and uses said picture type, said measured value and said estimated value as parameters for creating the encoded version of said frame.
30. An apparatus according to claim 29, wherein the value storage means will determine a predicted value of the size of an encoded version of a frame in the delay means and wherein the encoding means adjusts the size of the encoded version of the frame received from the delay means in response to the predicted value.
31. An apparatus according to claim 30, wherein the encoding means further comprises a means for measuring a value of a frame's complexity, a means for storing the measured complexity value, and a means for storing the size of an encoded version of a frame.
32. An apparatus according to claim 31, further comprising: a filtering means for receiving an unfiltered frame from the delay means and creating a filtered frame and a switching means for receiving the unfiltered frame from the delay means, receiving the filtered frame from the filtering means and selectively transmitting either the unfiltered frame or the filtered frame to the encoding means .
33. An apparatus according to claim 32, wherein the switching means is responsive to the encoding means for selecting the unfiltered frame or the filtered frame for transmission to the encoding means.
34. An apparatus according to claim 27, wherein the extraction means comprises an encoding means for encoding the frames by a first encoding method and the encoding means for receiving the frames from the delay means encodes the frame by a second, distinct, encoding method.
35. An apparatus for transcoding a compressed digital video input stream composed of a succession of encoded frames, each encoded frame having a plurality of characteristics associated therewith, said apparatus comprising: a decoding means for receiving the succession of encoded frames of the compressed digital video input stream and employing a first method to obtain a succession of decoded frames and measured values for the plurality of characteristics of a decoded frame and to assign a picture type to the decoded frame, a delay means for receiving the succession of decoded frames of the input stream and outputting the decoded frames in delayed fashion relative to the encoded frames of the input stream, a value storage means for storing the measured values and the picture type of a decoded frame in the delay means, and an encoding means for receiving a decoded frame from the delay means and encoding the frame, the encoding means being responsive to a measured value stored in the value storage means for adjusting the size of the encoded version of the frame .
36. An apparatus according to claim 35, wherein the value storage means further comprises a means for manipulating a measured value of a first characteristic of a decoded frame to derive estimated values for a second characteristic of the decoded frame.
37. An apparatus according to claim 36, wherein the encoding means receives the picture type, a measured value of a characteristic and an estimated value of a characteristic associated with said decoded frame and uses said picture type, said measured value and said estimated value as parameters for creating the encoded version of said decoded frame .
38. An apparatus according to claim 37, wherein the value storage means will determine a predicted value of the size of an encoded version of a decoded frame in the delay means and wherein the encoding means adjusts the size of the encoded version of the decoded frame received from the delay means in response to the predicted value .
39. An apparatus according to claim 38, wherein the encoding means further comprises a means for measuring a value of a decoded frame's complexity, a means for storing the measured complexity value, and a means for storing the size of an encoded version of a decoded frame.
40. An apparatus according to claim 35, further comprising: a filtering means for receiving an unfiltered decoded frame from the delay means and creating a filtered frame and a switching means for receiving the unfiltered frame from the delay means, receiving the filtered frame from the filtering means and selectively transmitting either the unfiltered decoded frame or the filtered frame to the encoding means .
41. An apparatus according to claim 40, wherein the switching means is responsive to the encoding means for selecting the unfiltered decoded frame or the filtered frame for transmission to the encoding means.
PCT/US2003/039184 2002-12-10 2003-12-09 Rate control with picture-based lookahead window WO2004054158A2 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
AU2003296418A AU2003296418B2 (en) 2002-12-10 2003-12-09 Rate control with picture-based lookahead window
JP2004558627A JP4434959B2 (en) 2002-12-10 2003-12-09 Rate control with picture-based look-ahead window
NZ540501A NZ540501A (en) 2002-12-10 2003-12-09 Rate control with picture-based lookahead window
CA2507503A CA2507503C (en) 2002-12-10 2003-12-09 Rate control with picture-based lookahead window
CN2003801057469A CN1726709B (en) 2002-12-10 2003-12-09 Method and device for encoding image of uncompressed digital video frequency sequence
EP03812913A EP1588557A4 (en) 2002-12-10 2003-12-09 Rate control with picture-based lookahead window

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/316,483 US7099389B1 (en) 2002-12-10 2002-12-10 Rate control with picture-based lookahead window
US10/316,483 2002-12-10

Publications (2)

Publication Number Publication Date
WO2004054158A2 true WO2004054158A2 (en) 2004-06-24
WO2004054158A3 WO2004054158A3 (en) 2004-12-16

Family

ID=32505954

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2003/039184 WO2004054158A2 (en) 2002-12-10 2003-12-09 Rate control with picture-based lookahead window

Country Status (9)

Country Link
US (1) US7099389B1 (en)
EP (1) EP1588557A4 (en)
JP (1) JP4434959B2 (en)
KR (1) KR101012600B1 (en)
CN (1) CN1726709B (en)
AU (1) AU2003296418B2 (en)
CA (1) CA2507503C (en)
NZ (1) NZ540501A (en)
WO (1) WO2004054158A2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006093306A1 (en) * 2005-03-03 2006-09-08 Pioneer Corporation Image encoding method conversion device and method
KR100837410B1 (en) 2006-11-30 2008-06-12 삼성전자주식회사 Method and apparatus for visually lossless image data compression
WO2012178053A1 (en) * 2011-06-22 2012-12-27 Qualcomm Incorporated Quantization parameter prediction in video coding
JP2014014148A (en) * 2004-12-10 2014-01-23 Tut Systems Inc Method for determining value of quantization parameter
EP2194718A3 (en) * 2008-10-30 2014-03-26 ViXS Systems Inc. Video transcoding system with quality readjustment based on scene cost detection
WO2014146055A3 (en) * 2013-03-15 2015-02-26 Ning Lu Encoding of video data
US20220174277A1 (en) * 2019-03-11 2022-06-02 Telefonaktiebolaget Lm Ericsson (Publ) Video coding involving gop-based temporal filtering

Families Citing this family (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2851111B1 (en) * 2003-02-10 2005-07-22 Nextream France DEVICE FOR ENCODING A VIDEO DATA STREAM
US8542733B2 (en) * 2003-06-26 2013-09-24 Thomson Licensing Multipass video rate control to match sliding window channel constraints
US7738554B2 (en) 2003-07-18 2010-06-15 Microsoft Corporation DC coefficient signaling at small quantization step sizes
US8218624B2 (en) 2003-07-18 2012-07-10 Microsoft Corporation Fractional quantization step sizes for high bit rates
US7602851B2 (en) * 2003-07-18 2009-10-13 Microsoft Corporation Intelligent differential quantization of video coding
US10554985B2 (en) 2003-07-18 2020-02-04 Microsoft Technology Licensing, Llc DC coefficient signaling at small quantization step sizes
US7330509B2 (en) * 2003-09-12 2008-02-12 International Business Machines Corporation Method for video transcoding with adaptive frame rate control
US7263126B2 (en) * 2003-09-15 2007-08-28 Sharp Laboratories Of America, Inc. System and method for transcoding with adaptive bit rate control
US7535959B2 (en) * 2003-10-16 2009-05-19 Nvidia Corporation Apparatus, system, and method for video encoder rate control
US7580461B2 (en) * 2004-02-27 2009-08-25 Microsoft Corporation Barbell lifting for wavelet coding
JP2005294977A (en) * 2004-03-31 2005-10-20 Ulead Systems Inc Two-path video encoding method and system using sliding window
US7801383B2 (en) 2004-05-15 2010-09-21 Microsoft Corporation Embedded scalar quantizers with arbitrary dead-zone ratios
US8422546B2 (en) 2005-05-25 2013-04-16 Microsoft Corporation Adaptive video encoding using a perceptual model
JP4784188B2 (en) * 2005-07-25 2011-10-05 ソニー株式会社 Image processing apparatus, image processing method, and program
US8625914B2 (en) 2013-02-04 2014-01-07 Sony Corporation Image processing system, image processing method and program
US8009963B2 (en) * 2006-01-26 2011-08-30 Qualcomm Incorporated Adaptive filtering to enhance video bit-rate control performance
US7903733B2 (en) 2006-01-26 2011-03-08 Qualcomm Incorporated Adaptive filtering to enhance video encoder performance
US8503536B2 (en) 2006-04-07 2013-08-06 Microsoft Corporation Quantization adjustments for DC shift artifacts
US8130828B2 (en) 2006-04-07 2012-03-06 Microsoft Corporation Adjusting quantization to preserve non-zero AC coefficients
US8059721B2 (en) 2006-04-07 2011-11-15 Microsoft Corporation Estimating sample-domain distortion in the transform domain with rounding compensation
US7974340B2 (en) 2006-04-07 2011-07-05 Microsoft Corporation Adaptive B-picture quantization control
US7995649B2 (en) 2006-04-07 2011-08-09 Microsoft Corporation Quantization adjustment based on texture level
US8711925B2 (en) 2006-05-05 2014-04-29 Microsoft Corporation Flexible quantization
US9332274B2 (en) * 2006-07-07 2016-05-03 Microsoft Technology Licensing, Llc Spatially scalable video coding
US8411734B2 (en) 2007-02-06 2013-04-02 Microsoft Corporation Scalable multi-thread video decoding
US8238424B2 (en) 2007-02-09 2012-08-07 Microsoft Corporation Complexity-based adaptive preprocessing for multiple-pass video compression
US20080198932A1 (en) * 2007-02-21 2008-08-21 Nucore Technology Inc. Complexity-based rate control using adaptive prefilter
JP2008206060A (en) * 2007-02-22 2008-09-04 Sony Corp Recording apparatus, method, and program
US8498335B2 (en) 2007-03-26 2013-07-30 Microsoft Corporation Adaptive deadzone size adjustment in quantization
US8243797B2 (en) 2007-03-30 2012-08-14 Microsoft Corporation Regions of interest for quality adjustments
US8442337B2 (en) 2007-04-18 2013-05-14 Microsoft Corporation Encoding adjustments for animation content
US8331438B2 (en) 2007-06-05 2012-12-11 Microsoft Corporation Adaptive selection of picture-level quantization parameters for predicted video pictures
US9648325B2 (en) 2007-06-30 2017-05-09 Microsoft Technology Licensing, Llc Video decoding implementations for a graphics processing unit
US8189933B2 (en) 2008-03-31 2012-05-29 Microsoft Corporation Classifying and controlling encoding quality for textured, dark smooth and smooth video content
US8897359B2 (en) 2008-06-03 2014-11-25 Microsoft Corporation Adaptive quantization for enhancement layer video coding
EP2301251A4 (en) * 2008-06-25 2012-12-26 Ericsson Telefon Ab L M Row evaluation rate control
JP2010187337A (en) * 2009-02-13 2010-08-26 Toshiba Corp Moving image converting apparatus and moving image converting method
JP4746691B2 (en) * 2009-07-02 2011-08-10 株式会社東芝 Moving picture coding apparatus and moving picture coding method
US20120249869A1 (en) * 2009-12-14 2012-10-04 Thomson Licensing Statmux method for broadcasting
WO2011138900A1 (en) 2010-05-06 2011-11-10 日本電信電話株式会社 Video encoding control method and apparatus
BR112012028184A2 (en) 2010-05-07 2016-08-02 Nippon Telegraph & Telephone Video coding control method, video coding device and video coding program
CA2798354C (en) 2010-05-12 2016-01-26 Nippon Telegraph And Telephone Corporation A video encoding bit rate control technique using a quantization statistic threshold to determine whether re-encoding of an encoding-order picture group is required
US20110310955A1 (en) * 2010-06-22 2011-12-22 Lei Zhang Method and system for repetition based adaptive video compression
US8885729B2 (en) 2010-12-13 2014-11-11 Microsoft Corporation Low-latency video decoding
US9706214B2 (en) 2010-12-24 2017-07-11 Microsoft Technology Licensing, Llc Image and video decoding implementations
BR112013020069A2 (en) 2011-01-28 2016-10-25 Eye Io Llc encoder and methods for hvs model-based color conversion
JP2014511138A (en) 2011-01-28 2014-05-08 アイ アイオー,リミテッド・ライアビリティ・カンパニー Video stream encoding based on scene type
MX2013008757A (en) * 2011-01-28 2014-02-28 Eye Io Llc Adaptive bit rate control based on scenes.
TWI606722B (en) 2011-06-30 2017-11-21 微軟技術授權有限責任公司 Method, system, and computer-readable media for reducing latency in video encoding and decoding
US8731067B2 (en) 2011-08-31 2014-05-20 Microsoft Corporation Memory management for video decoding
US9819949B2 (en) 2011-12-16 2017-11-14 Microsoft Technology Licensing, Llc Hardware-accelerated decoding of scalable video bitstreams
US9094684B2 (en) 2011-12-19 2015-07-28 Google Technology Holdings LLC Method for dual pass rate control video encoding
US9762902B2 (en) * 2012-01-09 2017-09-12 Futurewei Technologies, Inc. Weighted prediction method and apparatus in quantization matrix coding
US9183261B2 (en) 2012-12-28 2015-11-10 Shutterstock, Inc. Lexicon based systems and methods for intelligent media search
US9183215B2 (en) 2012-12-29 2015-11-10 Shutterstock, Inc. Mosaic display systems and methods for intelligent media search
CA2952823A1 (en) * 2014-06-25 2015-12-30 Arris Enterprises Llc A method for using a decoder or look-ahead encoder to control an adaptive pre-filter
US9860535B2 (en) * 2015-05-20 2018-01-02 Integrated Device Technology, Inc. Method for time-dependent visual quality encoding for broadcast services
KR20170017573A (en) * 2015-08-07 2017-02-15 삼성전자주식회사 Image Data Processing method and electronic device supporting the same
KR101998303B1 (en) 2015-12-08 2019-10-01 네이버 주식회사 Method and system for managing sliding window for time machine function
CN110753242B (en) * 2019-11-01 2021-10-12 深圳市梦网视讯有限公司 Method and system for adjusting quantization parameters of transcoding slice source frame layer
CN116347080B (en) * 2023-03-27 2023-10-31 苏州利博特信息科技有限公司 Intelligent algorithm application system and method based on downsampling processing

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5757434A (en) 1993-11-30 1998-05-26 U.S. Philips Corporation Motion-compensated predictive encoder in which groups of pictures are each encoded with substantially the same number of bits

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5343247A (en) * 1991-08-02 1994-08-30 U.S. Philips Corporation Filter circuit for preprocessing a video signal to be coded
WO1996025823A2 (en) * 1995-02-15 1996-08-22 Philips Electronics N.V. Method and device for transcoding video signals
JP3356004B2 (en) * 1997-05-30 2002-12-09 日本ビクター株式会社 Variable rate coding apparatus and method
US6654417B1 (en) * 1998-01-26 2003-11-25 Stmicroelectronics Asia Pacific Pte. Ltd. One-pass variable bit rate moving pictures encoding
US6181742B1 (en) * 1998-01-26 2001-01-30 International Business Machines Corporation Single pass target allocation for video encoding
US6310915B1 (en) * 1998-11-20 2001-10-30 Harmonic Inc. Video transcoder with bitstream look ahead for rate control and statistical multiplexing
KR100601615B1 (en) * 1999-08-20 2006-07-14 삼성전자주식회사 Apparatus for compressing video according to network bandwidth
KR100433516B1 (en) * 2000-12-08 2004-05-31 삼성전자주식회사 Transcoding method
US7058127B2 (en) * 2000-12-27 2006-06-06 International Business Machines Corporation Method and system for video transcoding
US6961376B2 (en) * 2002-06-25 2005-11-01 General Instrument Corporation Methods and apparatus for rate control during dual pass encoding

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5757434A (en) 1993-11-30 1998-05-26 U.S. Philips Corporation Motion-compensated predictive encoder in which groups of pictures are each encoded with substantially the same number of bits

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
P H WESTERINK ET AL.: "Two-pass MPEG-2 variable-bit-rate encoding", IBM J. RES. DEVELOP., vol. 43, no. 4, 1999, pages 471 - 488, XP002395114
See also references of EP1588557A4

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014014148A (en) * 2004-12-10 2014-01-23 Tut Systems Inc Method for determining value of quantization parameter
WO2006093306A1 (en) * 2005-03-03 2006-09-08 Pioneer Corporation Image encoding method conversion device and method
KR100837410B1 (en) 2006-11-30 2008-06-12 삼성전자주식회사 Method and apparatus for visually lossless image data compression
EP2194718A3 (en) * 2008-10-30 2014-03-26 ViXS Systems Inc. Video transcoding system with quality readjustment based on scene cost detection
US9407925B2 (en) 2008-10-30 2016-08-02 Vixs Systems, Inc. Video transcoding system with quality readjustment based on high scene cost detection and method for use therewith
WO2012178053A1 (en) * 2011-06-22 2012-12-27 Qualcomm Incorporated Quantization parameter prediction in video coding
EP3035684A1 (en) * 2011-06-22 2016-06-22 Qualcomm Incorporated Quantization parameter prediction in video coding
US10298939B2 (en) 2011-06-22 2019-05-21 Qualcomm Incorporated Quantization in video coding
WO2014146055A3 (en) * 2013-03-15 2015-02-26 Ning Lu Encoding of video data
US20220174277A1 (en) * 2019-03-11 2022-06-02 Telefonaktiebolaget Lm Ericsson (Publ) Video coding involving gop-based temporal filtering
US12113969B2 (en) * 2019-03-11 2024-10-08 Telefonaktiebolaget Lm Ericsson (Publ) Video coding involving GOP-based temporal filtering

Also Published As

Publication number Publication date
US7099389B1 (en) 2006-08-29
CA2507503C (en) 2011-06-28
EP1588557A4 (en) 2009-07-15
KR20050085451A (en) 2005-08-29
WO2004054158A3 (en) 2004-12-16
CA2507503A1 (en) 2004-06-24
CN1726709A (en) 2006-01-25
NZ540501A (en) 2006-06-30
EP1588557A2 (en) 2005-10-26
KR101012600B1 (en) 2011-02-10
JP4434959B2 (en) 2010-03-17
CN1726709B (en) 2010-09-29
AU2003296418B2 (en) 2008-10-23
JP2006509444A (en) 2006-03-16
AU2003296418A1 (en) 2004-06-30

Similar Documents

Publication Publication Date Title
CA2507503C (en) Rate control with picture-based lookahead window
CA2688249C (en) A buffer-based rate control exploiting frame complexity, buffer level and position of intra frames in video coding
US8077775B2 (en) System and method of adaptive rate control for a video encoder
US20190297347A1 (en) Picture-level rate control for video encoding
US9451250B2 (en) Bounded rate compression with rate control for slices
US5835149A (en) Bit allocation in a coded video sequence
US5990955A (en) Dual encoding/compression method and system for picture quality/data density enhancement
JP2001169281A (en) Device and method for encoding moving image
US7826530B2 (en) Use of out of order encoding to improve video quality
US8948242B2 (en) Encoding device and method and multimedia apparatus including the encoding device
KR20180122354A (en) Apparatus and methods for adaptive computation of quantization parameters in display stream compression
JP2000134617A (en) Image encoding device
Pan et al. Content adaptive frame skipping for low bit rate video coding
Pan et al. Variable frame rate encoding via active frame-skipping
Pan et al. Proactive frame-skipping decision scheme for variable frame rate video coding
Overmeire et al. Constant quality video coding using video content analysis
Abdullah et al. Constant Bit Rate For Video Streaming Over Packet Switching Networks
Khamiss et al. Constant Bit Rate For Video Streaming Over Packet Switching Networks
Notebaert et al. Rate-controlled requantization transcoding for H. 264/AVC video streams
JP2001128175A (en) Device and method for converting image information and recording medium

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2003296418

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 2507503

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2003812913

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 540501

Country of ref document: NZ

WWE Wipo information: entry into national phase

Ref document number: 1020057010394

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 20038A57469

Country of ref document: CN

Ref document number: 2004558627

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 1225/KOLNP/2005

Country of ref document: IN

Ref document number: 01225/KOLNP/2005

Country of ref document: IN

WWP Wipo information: published in national office

Ref document number: 1020057010394

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 2003812913

Country of ref document: EP