WO2006004605A2 - Multi-pass video encoding - Google Patents

Multi-pass video encoding Download PDF

Info

Publication number
WO2006004605A2
WO2006004605A2 PCT/US2005/022616 US2005022616W WO2006004605A2 WO 2006004605 A2 WO2006004605 A2 WO 2006004605A2 US 2005022616 W US2005022616 W US 2005022616W WO 2006004605 A2 WO2006004605 A2 WO 2006004605A2
Authority
WO
WIPO (PCT)
Prior art keywords
encoding
image
images
complexity
readable medium
Prior art date
Application number
PCT/US2005/022616
Other languages
English (en)
French (fr)
Other versions
WO2006004605A3 (en
WO2006004605B1 (en
Inventor
Xin Tong
Hsi-Jung Wu
Thomas Pun
Adriana Dumitra
Barin Haskel
Jim Normile
Original Assignee
Apple Computer, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/118,616 external-priority patent/US8406293B2/en
Priority claimed from US11/118,604 external-priority patent/US8005139B2/en
Application filed by Apple Computer, Inc. filed Critical Apple Computer, Inc.
Priority to CN2005800063635A priority Critical patent/CN1926863B/zh
Priority to EP05773224A priority patent/EP1762093A4/en
Priority to KR1020067017074A priority patent/KR100909541B1/ko
Priority to JP2007518338A priority patent/JP4988567B2/ja
Publication of WO2006004605A2 publication Critical patent/WO2006004605A2/en
Publication of WO2006004605A3 publication Critical patent/WO2006004605A3/en
Publication of WO2006004605B1 publication Critical patent/WO2006004605B1/en
Priority to HK07106057.0A priority patent/HK1101052A1/xx

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • H04N19/126Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/142Detection of scene cut or scene change
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/15Data rate or code amount at the encoder output by monitoring actual compressed data size at the memory before deciding storage at the transmission buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/152Data rate or code amount at the encoder output by measuring the fullness of the transmission buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/177Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive

Definitions

  • Video encoders encode a sequence of video images (e.g., video frames) by using a variety of encoding schemes.
  • Video encoding schemes typically encode video frames or portions of video frames (e.g., sets of pixels in the video frames) in terms of intraframes or interframes.
  • An intraframe encoded frame or pixel set is one that is encoded independently of other frames or pixels sets in other frames.
  • An interframe encoded frame or pixel set is one that is encoded by reference to one or more other frames or pixel sets in other frames.
  • some encoders When compressing video frames, some encoders implements a 'rate controller,' which provides a 'bit budget' for a video frame or a set of video frames that are to be encoded.
  • the bit budget specifies the number of bits that have been allocated to encode the video frame or set of video frames.
  • the rate controller attempts to generate the highest quality compressed video stream in view of certain constraints (e.g., a target bit rate, etc.).
  • a single-pass rate controller provides bit budgets for an encoding scheme that encodes a series of video images in one pass
  • a multi-pass rate controller provides bit budgets for an encoding scheme that encodes a series of video images in multiple passes.
  • Multi ⁇ pass rate controllers are useful in real-time encoding situations.
  • Multi ⁇ pass rate controllers optimize the encoding for a particular bit rate based on a set of constraints. Not many rate controllers to date consider the spatial or temporal complexity of frames or pixel-sets within the frames in controlling the bit rates of their encodings. Also, most multi-pass rate controllers do not adequately search the solution space for encoding solutions that use optimal quantization parameters for frames and/or pixel sets within frames in view of a desired bit rate.
  • a rate controller that uses novel techniques to consider the spatial or temporal complexity of video images and/or portions of video images, while controlling the bit rate for encoding a set of video images.
  • a multi-pass rate controller that adequately examines the encoding solutions to identify an encoding solution that uses an optimal set of quantization parameters for video images and/or portions of video images.
  • Some embodiments of the invention provide a multi-pass encoding method that encodes several images (e.g., several frames of a video sequence).
  • the method iteratively performs an encoding operation that encodes these images.
  • the encoding operation is based on a nominal quantization parameter, which the method uses to compute quantization parameters for the images.
  • the method uses several different nominal quantization parameters.
  • the method stops its iterations when it reaches a terminating criterion (e.g., it identifies an acceptable encoding of the images).
  • Some embodiments of the invention provide a method for encoding video sequences.
  • the method identifies a first attribute quantifying the complexity of a first image in the video. It also identifies a quantization parameter for encoding the first image based on the identified first attribute.
  • the method then encodes the first image based on the identified quantization parameter. In some embodiments, this method performs these three operations for several images in the video.
  • Some embodiments of the invention encode a sequence of video images based on "visual masking" attributes of the video images and/or portions of the video images.
  • Visual masking of an image or a portion of the image is an indication of how much coding artifacts can be tolerated in the image or image portion.
  • some embodiments compute a visual masking strength that quantifies the brightness energy of the image or the image portion.
  • the brightness energy is measured as a function of the average luma or pixel energy of the image or image portion.
  • the visual masking strength of an image or image portion might also quantify activity energy of the image or image portion.
  • the activity energy expresses the complexity of the image or image portion.
  • the activity energy includes a spatial component that quantifies the spatial complexity of the image or image portion, and/or a motion component that quantifies the amount of distortion that can be . tolerated/masked due to motion between images.
  • Some embodiments of the invention provide a method for encoding video sequences.
  • the method identifies a visual-masking attribute of a first image in the video. It also identifies a quantization parameter for encoding the first image based on the identified visual-masking attribute.
  • the method then encodes the first image based on the identified quantization parameter.
  • Figure 1 presents a process that conceptually illustrates the encoding method of some embodiments of the invention.
  • Figure 2 conceptually illustrates a codec system of some embodiments.
  • Figure 3 is a flow chart illustrating encoding process of some embodiments.
  • Figures 4a is plot of the difference between nominal removal time and final arrival time of images versus image number illustrating underflow condition in some embodiments.
  • Figure 4b illustrates plot of the difference between nominal removal time and final arrival time of images versus image number for the same images shown in Figure 4a after the underflow condition is eliminated.
  • Figure 5 illustrates a process that the encoder uses to perform underflow detection in some embodiments.
  • Figure 6 illustrates a process the encoder utilizes to eliminate underflow condition in a single segment of images in some embodiments.
  • Figure 7 illustrates an application of buffer underflow management in a video streaming application.
  • Figure 8 illustrates an application of buffer underflow management in an HD- DVD system.
  • Figure 9 presents a computer system with which one embodiment of the invention is implemented.
  • R T represents a target bit rate, which is a desired bit rate for encoding a sequence of frames. Typically, this bit rate is expressed in units of bit/second, and is calculated from the desired final file size, the number of frames in the sequence, and the frame rate.
  • Rp represents the bit rate of the encoded bit stream at the end of a pass p.
  • E p represents the percentage of error in the bit rate at the end of pass p.
  • this percentage is calculated as 100 x ⁇ represents the error tolerance in the final bit rate.
  • ⁇ c represents the error tolerance in the bit rate for the first QP search stage.
  • QP represents the quantization parameter
  • QPNom(p ) represents the nominal quantization parameter that is used in pass p encoding for a sequence of frames.
  • the value of QPNom(p) is adjusted by the invention's multi-pass encoder in a first QP adjustment stage to reach the target bit rate.
  • MQPp(k) represents the masked frame QP, which is the quantization parameter (QP) for a frame k in pass p. Some embodiments compute this value by using the nominal QP and frame-level visual masking.
  • MQP MB( p ) (k, m) represents the masked macroblock QP, which is the quantization parameter (QP) for an individual macroblock (with a macroblock index m) in a frame k and a pass p.
  • QP quantization parameter
  • Some embodiments compute MQPMB( P )(k, m) by using MQP p (k) and macroblock-level visual masking.
  • ⁇ F (k) represents a value referred to as the masking strength for frame k.
  • masking strength ⁇ F (k) is a measure of complexity for the frame and, in some
  • this value is used to determine how visible coding artifacts/noise would appear and to compute the MQP p (k) of frame k.
  • ⁇ R(P) represents the reference masking strength in pass p.
  • masking strength is used to compute MQP p (k) of frame k, and it is adjusted by the invention's multi-pass encoder in a second stage to reach the target bit rate.
  • ⁇ MB (k, m) represents the masking strength for a macroblock with an index m
  • the masking strength ⁇ MB (k, m) is a measure of complexity for the
  • AMQP P represents an average masked QP over frames in pass p. In some embodiments, this value is computed as the average MQP p (k) over all frames in a pass p.
  • Some embodiments of the invention provide an encoding method that achieves the best visual quality for encoding a sequence of frames at a given bit rate.
  • this method uses a visual masking process that assigns a quantization parameter QP to every macroblock. This assignment is based on the realization that coding artifacts/noise in brighter or spatially complex areas in an image or a video frame are less visible than those in darker or flat areas.
  • this visual masking process is performed as part of an inventive multi-pass encoding process.
  • This encoding process adjusts a nominal quantization parameter and controls the visual masking process through a reference masking strength parameter ⁇ R , in order to have the final encoded bit stream reach the
  • adjusting the nominal quantization parameter and controlling the masking algorithm adjusts the QP values for each picture (i.e., each frame in typically video encoding schemes) and each macroblock within each picture.
  • the multi-pass encoding process globally adjusts the nominal QP and ⁇ R for the entire sequence. In other embodiments, this process
  • the method has three stages of encoding. These three stages are: (1) an initial analysis stage that is performed in pass 0, (2) a first search stage that is performed in pass 1 through pass N 1 , and (3) a second search stage that is performed in pass N 1 H- I through N 1 H- N 2 .
  • the method identifies an initial value for the nominal QP (QPNom(i), to be used in pass 1 of the encoding). During the initial analysis stage, the method also identifies a value of the reference masking strength ⁇ R , which is used in all the passes in first search stage.
  • the method performs N 1 iterations (i.e., N 1 passes) of an encoding process.
  • N 1 iterations i.e., N 1 passes
  • the process encodes the frame by using a particular quantization parameter MQP p (k) and particular quantization parameters MQP MB( p ) (k, m) for individual macroblocks m within the frame k, where MQP MB(P) (k, m) is computed using MQP p (k).
  • the quantization parameter MQP p (k) changes between passes as it is derived from a nominal quantization parameter QPNo m( p ) that changes between passes.
  • the process computes a nominal QPNom(p+i) for pass p+1.
  • the nominal QP No mCp+ t) is based on the nominal QP value(s) and bit rate error(s) from previous pass(es).
  • the nominal QPNom(p+i) value is computed differently at the end 1 of each pass in the second search stage.
  • the method performs N 2 iterations (i.e., N 2 passes) of the encoding process.
  • the process encodes each frame k during each pass p by using a particular quantization parameter MQP p (k) and particular quantization parameters MQPMB(p)(k, m) for individual macroblocks m within the frame k, where MQPMB(p ) (k, m) is derived from MQP p (k).
  • the quantization parameter MQP p (k) changes between passes.
  • this parameter changes as it is computed using a reference masking strength ⁇ R(P) that changes between passes.
  • the reference masking strength ⁇ R( P ) is computed based on the
  • this reference masking strength is computed to be a different value at the end of each pass in the second search stage.
  • the multi-pass encoding process is described in conjunction with the visual masking process, one of ordinary skill in art will realize that an encoder does not need to use both these processes together.
  • the multi-pass encoding process is used to encode a bitstream near a given target bit rate without visual masking, by ignoring ⁇ R and omitting the second search stage
  • the visual masking process Given a nominal quantization parameter, the visual masking process first computes a masked frame quantization parameter (MQP) for each frame using the reference masking strength ( ⁇ R ) and the frame masking strength ( ⁇ F ). This process
  • MQP MB masked macroblock quantization parameter
  • the reference masking strength ( ⁇ R ) in some embodiments is identified during the first
  • ⁇ F (k) C*power(E*avgFrameLuma(k), ⁇ ) * power( D*avgFrameSAD(k), ⁇ F), (A) where
  • avgFrameSAD(k) is the average of Mb SAD (k, m) over all macroblocks in frame k;
  • MbSAD(k, m) is the sum of the values given by a function Calc4x4MeanRemovedSAD(4x4_block_pixel_values) for all 4x4 blocks in the macroblock with index m;
  • calculate the mean of pixel values in the given 4x4 block; subtract the mean from pixel values and compute their absolute values; sum the absolute values obtained in the previous step; return the sum; ⁇
  • Activity_Attribute G * power( D*Spatial_Activity_Attribute, exponent_beta) +
  • the Temporal_Activity_Attribute quantifies the amount of distortion that can be tolerated (i.e., masked) due to motion between frames.
  • the Temporal_Activity_Attribute of a frame equals a constant times the sum of the absolute value of the motion compensated error signal of pixel regions defined within the frame.
  • Temporal_Activity_Attribute is provided by the equation (D) below:
  • avgFrameSAD expresses (as described above) the average macroblock SAD (MbSAD(k, m)) value in a frame
  • avgFrameSAD(O) is the avgFrameSAD for the current frame
  • negative j indexes time instances before the current and positive j indexes time instances after the current frame.
  • the variables N and M refer to the number of frames that are respectively before and after the current frame. Instead of simply selecting the values N and M based on particular number of frames, some embodiments compute the values N and M based on the particular durations of time before and after the time of the current frame. Correlating the motion masking to temporal durations is more advantageous than correlating the motion masking to a set number of frames. This is because the correlation of the motion masking with the temporal durations is directly in line with the viewer's time-based visual perception. The correlation of such masking with the number of frames, on the other hand, suffers from a variable display duration as different displays present video at different frame rates.
  • Equation (D) "W” refers to a weighting factor, which, in some embodiment, decreases as the frame j gets further from the current frame. Also, in this equation, the first summation expresses the amount of motion that can be masked before the current frame, the second summation expresses the amount of motion that can be masked after the current frame, and the last expression (avgFrameSAD(O)) expresses the frame SAD of the current frame.
  • the weighting factors are adjusted to account for scene changes. For instance, some embodiments account for an upcoming scene change within the look ahead range (i.e., within the M frames) but not any frames after a scene change. For instance, these embodiments might set the weighting factors to zero for frames within the look ahead range that are after a scene change. Also, some embodiments do not account for frames prior to or on a scene change within the look behind range (i.e., within the N frames). For instance, these embodiments might set the weighting factors to zero for frames within the look behind range that relate to a previous scene or fall before the previous scene change. 3. Variations to the Second Approach a) Limiting the Influence of Past and Future Frames on the Temporal_Activity_Attribute
  • Equation (D) above essentially expresses the Temporal_Activity_Attribute in the following terms:
  • Past_Frame_Activity PFA
  • Future_Frame_Activity (FFA) equals ⁇ W j ⁇ avgFrameSAD(j)), and
  • CFA Current_Frame_Activity
  • Some embodiments modify the calculation of the Temporal_Activity_Attribute so that neither the Past_Frame_Activity nor the Future_Frame_Activity unduly control the value of the Temporal_Activity_Attribute. For instance, some embodiments initially define PFA to equal
  • some embodiments after initially defining the PFA and FFA values based on the weighted sums, some embodiments also determine whether FFA value is bigger than a scalar times PFA. If so, these embodiments then set FFA equal to an upper FFA limit value (e.g., a scalar times PFA). ). In addition to setting FFA equal to an upper FFA limit value, some embodiments may perform a combination of setting PFA to zero and setting CFA to zero. Other embodiments may set either of or both of FFA and CFA to a weighted combination of FFA, CFA, and PFA.
  • an upper FFA limit value e.g., a scalar times PFA.
  • Equation (C) above essentially expresses the Activity _Attribute in the following terms:
  • Activity_Attribute Spatial_Activity + Temporal_Activity, where the Spatial_Activity equals a scalar* (scalar* Spatial_Activity_Attribute) ⁇ , and
  • Temporal_Activity equals a scalar*(scalar*Temporal_Activity_Attribute) ⁇ .
  • Some embodiments modify the calculation of the Activity_Attribute so that neither the Spatial_Activity nor the Temporal_Activity unduly control the value of the Activity_Attribute. For instance, some embodiments initially define the
  • SA Spatial_Activity
  • Temporal_Activity (TA) to equal a scalar*(scalar*Temporal_Activity_Attribute) A .
  • these embodiments determine whether SA is bigger than a scalar times TA. If so, these embodiments then set SA equal to an upper SA limit value (e.g., a scalar times TA). In addition to setting SA equal to an upper SA limit in such a case, some embodiments might also set the TA value to zero or to a weighted combination ofTA and SA.
  • an upper SA limit value e.g., a scalar times TA.
  • some embodiments after initially defining the SA and TA values based on the exponential equations, some embodiments also determine whether TA value is bigger than a scalar times SA. If so, these embodiments then set TA equal to an upper TA limit value (e.g., a scalar times SA). In addition to setting TA equal to an upper TA limit in such a case, some embodiments might also set the SA value to zero or to a weighted combination of SA and TA.
  • an upper TA limit value e.g., a scalar times SA
  • the macroblock-level masking strength ⁇ MB (k, m) is
  • ⁇ MB (k, m) A*power(C*avgMbLuma(k,m), ⁇ )*power(B*MbSAD(k,
  • ⁇ MB , ⁇ , A, B, and C are constants and/or are adapted to the local
  • the macroblock' s Mb_Brightness_Attribute equals avgMbLuma(k,m)
  • Mb_Spatial_Activity_Attribute equals avgMbSAD(k). This Mb_Spatial_Activity_Attribute measures the amount of spatial innovations in a region of pixels within the macroblock that is being coded.
  • Mb_Activity_Attribute F * power( D*Mb_Spatial_Activity_Attribute, exponent_beta) +
  • the Mb_Temporal_Activity_Attribute for a macroblock can be analogous to the above-described computation of the Mb_Temporal_Activity_Attribute for a frame.
  • the Mb_Temporal_Activity_Attribute is provided by the equation (I) below:
  • the macroblock m in frame i or j can be the macroblock in the same location as the macroblock m in the current frame, or can be the macroblock in frame i or j that is initially predicted to correspond the macroblock m in the current frame.
  • the Mb_Temporal_Activity_Attribute provided by equation (I) can be modified in an analogous manner to the modifications (discussed in Section III.A.3 above) of the frame Temporal_Activity_Attribute provided by equation (D). Specifically, the Mb_Temporal_Activity_Attribute provided by the equation (I) can be modified to limit the undue influence of macroblocks in the past and future frames.
  • the Mb_Activity_Attribute provided by equation (H) can be modified in an analogous manner to the modifications (discussed in Section III.A.3 above) of the frame Activity_Attribute provided by equation (C). Specifically, the Mb_Activity_Attribute provided by equation (H) can be modified to limit the undue influenc e o f the Mb_Spatial_Activity_Attribute and the
  • the visual masking process can calculate the masked
  • T is a suitably chosen threshold
  • ⁇ F and ⁇ M B can be predetermined constants or
  • Figure 1 presents a process 100 that conceptually illustrates the multi-pass encoding method of some embodiments of the invention. As shown in this figure, the process 100 has three stages, which are described in the following three sub-sections.
  • the process 100 initially computes (at 105) the initial
  • quantization parameter QP Nom( i)
  • the initial reference masking strength ⁇ R ⁇ )
  • the initial nominal quantization parameter QP Nom( i )
  • QP Nom( i ) the initial nominal quantization parameter
  • ⁇ R(0) can be some arbitrary value or a value
  • the reference masking strength, ⁇ R(1) is set to be equal
  • ⁇ R are also possible. For instance, it may be computed as the median or other
  • the initial nominal QP can be selected as an arbitrary value (e.g., 26).
  • a value can be selected that is known to produce an acceptable quality for the target bit rate based on coding experiments.
  • the initial nominal QP value can also be selected from a look-up table based on spatial resolution, frame rate, spatial/temporal complexity, and target bit rate.
  • this initial nominal QP value is selected from the table using a distance measure that depends on each of these parameters, or it may be selected using a weighted distance measure of these parameters.
  • This initial nominal QP value can also be set to the adjusted average of the frame QP values as they are selected during a fast encoding with a rate controller (without masking), where the average has been adjusted based on the bit rate percentage rate error E 0 for pass 0.
  • the initial nominal QP can also be set to a weighted adjusted average of the frame QP values, where the weight for each frame is determined by the percentage of macroblocks in this frame that are not coded as skipped macroblocks.
  • the initial nominal QP can be set to an adjusted average or an adjusted weighted average of the frame QP values as they are selected during a fast encoding with a rate controller (with masking), as long as the effect of changing the reference masking strength from ⁇ R(0) to taken into account.
  • the multi-pass encoding process 100 enters the first search stage.
  • the process 100 performs N 1 encodings of the sequence, where N 1 represents the number of passes through the first search stage.
  • N 1 represents the number of passes through the first search stage.
  • the process uses a changing nominal quantization parameter with a constant reference masking strength.
  • the process 100 computes (at 107) a particular quantization parameter MQP p (k) for each frame k and a particular quantization parameter MQP M B(p)(k, m) for each individual macroblock m within the frame k.
  • the calculation of the parameters MQP p (k) anH IT ⁇ for a given nominal quantization parameter QPNom(p) and reference masking strength ⁇ R(P) was described in Section III (where MQP p (k) and MQPMB( P )(k, m) are computed by using the functions CaIcMQP and CalcMQPforMB, which were described above in Section III).
  • the nominal quantization parameter and the first-stage reference masking strength are parameter QP N om(i) and reference masking strength ⁇ R ⁇ , which were computed during the initial analysis
  • the process encodes (at 110) the sequence based on the quantization parameter values computed at 107.
  • the encoding process 100 determines (at 115) whether it should terminate. Different embodiments have different criteria for terminating the overall encoding process. Examples of exit conditions that completely terminate the multi-pass encoding process include:
  • QPNom(p) is at the upper or lower bound of the valid range of QP values.
  • Some embodiments might use all of these exit conditions, while other embodiments might only use some of them. Yet other embodiments might use other exit conditions for terminating the encoding process.
  • the process , 100 omits the second search stage and transitions to 145.
  • the process saves the bitstream from the last pass p as the final result, and then terminates.
  • the process determines (at 115) that it should not terminate, it then determines (at 120) whether it should terminate the first search stage.
  • it determines (at 120) whether it should terminate the first search stage.
  • different embodiments have different criteria for terminating the first search stage. Examples of exit conditions that terminate the first search stage of the multi-pass encoding process include:
  • QPNom(p+i) is the same as QPNom(q), and q ⁇ p, (in this case, the error in bit rate cannot be lowered any further by modifying the nominal QP).
  • Some embodiments might use all these exit conditions, while other embodiments might only use some of them. Yet other embodiments might use other exit conditions for terminating the first search stage.
  • the process 100 proceeds to the second search stage, which is described in the next sub-section.
  • the process determines (at 120) that it should not terminate the first search stage, it updates (at 125) the nominal QP for the next pass in the first search stage (i.e., defines QP N om(p ⁇ i))-
  • the nominal QPNom( P +i) is updated as follows. At the end of pass 1, these embodiments define
  • QpNom(p+i) I ⁇ terpExtrap(O, E ql , E q2 , QP N om( q i), QPNom(q2)), where InterpExtrap is a function that is further described below. Also, in the above equation, ql and q2 are pass numbers with corresponding bit rate errors that are the lowest among all passes up to pass p, and ql, q2, and p have the following relationship:
  • the nominal QP value is typically rounded to an integer value and clipped to lie within the valid range of QP values.
  • One of ordinary skill in art will realize that other embodiments might compute the nominal QPNom(p ⁇ i ) value differently than the approached described above.
  • the process encodes (at 110) the sequence of frames based on these newly computed quantization parameters. From 110, the process then transitions to 115, which was described above.
  • the process 100 determines (at 120) that it should terminate the first search stage, it transitions to 130.
  • the process 100 performs N 2 encodings of the sequence, where N 2 represents the number of passes through the second search stage. During each pass, the process uses the same nominal quantization parameter and a changing reference masking strength.
  • the process 100 computes a reference masking strength ⁇ R(P + 1) for the
  • next pass i.e., pass p+1, which is pass N 1 H-I.
  • pass N 1 H-I the process 100 encodes the sequence of frames in 135.
  • Different embodiments compute (at 130) the reference masking strength ⁇ R ( P + I ) at the end of a pass p in different ways. Two alternative approaches are described below.
  • Some embodiments compute the reference masking strength ⁇ R(P) based on the
  • ⁇ R(Nl+m) InterpExtrap(0, ENRm-2, ENRIH-I 5 ⁇ R(Nl+m-2), ⁇ R(Nl+m-
  • Some embodiments that use AMQP compute a desired AMQP for pass p+1 based on the error in bit rate(s) and value(s) of AMQP from previous pass(es).
  • some embodiments at the end of pass N 1 compute AMQP NI + I , where
  • AMQP NI + I InterpExtra ⁇ (O, E N1-1 , E N1 , AMQP N1-1 , AMQP N1 ), when Ni> 1, and
  • ⁇ R ⁇ NI+I Search( AMQPN 1+1 , ⁇ R(NI))
  • AMQP N1 +m InterpExtra ⁇ (O, E N i+m-2, E N i+m-i, AMQP N1+m-2 , AMQP N1 + m -i), and
  • the ⁇ R corresponding to the desired AMQP can be found using the Search function, which has the following pseudo code in some embodiments:
  • the numbers 10, 12 and 0.05 may be replaced with suitably chosen thresholds.
  • the process computes (at 132) a particular quantization parameter MQP p (k) for each frame k and particular quantization parameters MQP MB(p) (k, m) for individual macroblocks m within the frame k.
  • the process encodes (at 135) the frame sequence using the quantization parameters computed at 130. After 135, the process determines (at 140) whether it should terminate the second search stage. Different embodiments use different criteria for terminating the second search stage at the end of a pass p. Examples of such criteria are:
  • Some embodiments might use all of these exit conditions, while other embodiments might only use some of them. Yet other embodiments might use other exit conditions for terminating the first search stage.
  • the process 100 determines (at 140) that it should not terminate the second search stage, it returns to 130 to recompute the reference masking strength for the next pass of encoding. From 130, the process transitions to 132 to compute quantization parameters and then to 135 to encode the video sequence by using the newly computed quantization parameters.
  • the process decides (at 140) to terminate the second search stage, it transitions to 145.
  • the process 100 saves the bitstream from the last pass p as the final result, and then terminates.
  • Some embodiments of the invention provide a multi-pass encoding process that examines various encodings of a video sequence for a target bit rate, in order to identify an optimal encoding solution with respect to the usage of an input buffer used by the decoder.
  • this multi-pass process follows the multi-pass encoding process 100 of Figure 1.
  • decoder buffer The decoder input buffer (“decoder buffer”) usage will fluctuate to some degree during the decoding of an encoded sequence of images (e.g., frames), because of a variety of factors, such as fluctuation in the size of encoded images, the speed with which the decoder receives encoded data, the size of the decoder buffer, the speed of the decoding process, etc.
  • a decoder buffer underflow signifies the situation where the decoder is ready to decode the next image before that image has completely arrived at the decoder side.
  • the multi-pass encoder of some embodiments simulates the decoder buffer and re- encode selected segments in the sequence to prevent decoder buffer underflow.
  • FIG. 2 conceptually illustrates a codec system 200 of some embodiments of the invention.
  • This system includes a decoder 205 and an encoder 210.
  • the encoder 210 has several components that enable it to simulate the operations of similar components of the decoder 205.
  • the decoder 205 has an input buffer 215, a decoding process 220, and an output buffer 225.
  • the encoder 210 simulates these modules by maintaining a simulated decoder input buffer 230, a simulated decoding process 235, and a simulated decoder output buffer 240.
  • Figure 2 is simplified to show the decoding process 220 and encoding process 245 as single blocks.
  • the simulated decoding process 235 and simulated decoder output buffer 240 are not utilized for buffer underflow management, and are therefore shown in this figure for illustration only.
  • the decoder maintains the input buffer 215 to smooth out variations in the rate and arrival time of incoming encoded images. If the decoder runs out of data (underflow) or fills up the input buffer (overflow), there will be visible decoding discontinuities as the picture decoding halts or incoming data is discarded. Both of these cases are undesirable.
  • the encoder 210 in some embodiments first encodes a sequence of images and stores them in a storage 255. For instance, the encoder 210 uses the multi-pass encoding process 100 to obtain a first encoding of the sequence of images. It then simulates the decoder input buffer 215 and re-encodes the images that would cause buffer underflow. After all buffer underflow conditions are removed, the re-encoded images are supplied to the decoder 205 through a connection 255, which maybe a network connection (Internet, cable, PSTN lines, etc.), a non- network direct connection, a media (DVD, etc.), etc.
  • a connection 255 which maybe a network connection (Internet, cable, PSTN lines, etc.), a non- network direct connection, a media (DVD, etc.), etc.
  • Figure 3 illustrates an encoding process 300, of the encoder of some embodiments. This process tries to find an optimal encoding solution that does not cause the decoder buffer to underflow. As shown in Figure 3, the process 300 identifies (at 302) a first encoding of the sequence of images that meets a desired target bit rate (e.g., the average bit rate for each image in the sequence meets a desired average target bit rate). For instance, the process 300 may use (at 302) the multi-pass encoding process 100 to obtain the first encoding of the sequence of images.
  • a desired target bit rate e.g., the average bit rate for each image in the sequence meets a desired average target bit rate
  • the encoding process 300 simulates (at 305) the decoder input buffer 215 by considering a variety of factors, such as the connection speed (i.e., the speed with which the decoder receives encoded data), the size of the decoder input buffer, the size of encoded images, the decoding process speed, etc.
  • the process 300 determines if any segment of the encoded images will cause a decoder input buffer to underflow. The techniques that the encoder uses to determine (and subsequently eliminate) the underflow condition are described further below. If the process 300 determines (at 310) that the encoded images do not create underflow condition, the process ends.
  • the process 300 determines (at 310) that a buffer underflow condition exists in any segment of the encoded images, it refines (at 315) the encoding parameters based on the value of these parameters from previous encoding passes. The process then re-encodes (at 320) the segment with underflow to reduce the segment bit size. After re-encoding the segment, the process 300 examines (at 325) the segment to determine if the underflow condition is eliminated.
  • the process 300 transitions to 315 to further refine the encoding parameters to eliminate underflow.
  • the process specifies (at 330) that starting point for re-examining and re-encoding the video sequence as the frame after the end of the segment re-encoded in the last iteration at 320.
  • the process re- encodes the portion of the video sequence specified at 330, up to (and excluding) the first IDR frame following the underflow segment specified at 315 and 320.
  • the process transitions back to 305 to simulate the decoder buffer to determine whether the rest of the video sequence still causes buffer underflow after re-encoding.
  • the flow of the process 300 from 305 was described above.
  • the encoder simulates the decoder buffer conditions to determine whether any segment in the sequence of the encoded or re-encoded images cause underflow in the decoder buffer.
  • the encoder uses a simulation model that considers the size of encoded images, network conditions such as bandwidth, decoder factors (e.g., input buffer size, initial and nominal time to remove images, decoding process time, display time of each image, etc.).
  • the MPEG-4 AVC Coded Picture Buffer (CPB) model is used to simulate the decoder input buffer conditions.
  • the CPB is the term used in MPEG-4 H.264 standard to refer to the simulated input buffer of the Hypothetical Reference Decoder (HRD).
  • the HRD is a hypothetical decoder model that specifies constraints on the variability of conforming streams that an encoding process may produce.
  • the CPB model is well known and is described in Section 1 below for convenience. More detailed description of CPB and HRD can be found in Draft ITU- T Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec. H.264 / ISO/IEC 14496-10 AVC).
  • the following paragraphs describe how the decoder input buffer is simulated in some embodiments using the CPB model.
  • the time at which the first bit of image n begins to enter the CPB is referred to as the initial arrival time tai( n ), which is
  • initial_cpb_removal_delay is the initial buffering period.
  • the encoder makes its own calculations of the nominal removal time as described below instead of reading them from an optional part of the bit stream as in the H.264 specification.
  • the nominal removal time of the image from the CPB is specified by tr,n( 0 ) ⁇ initial_cpb_removal_delay
  • tr,n( n ) is the nominal removal time of image n
  • ti is the display duration
  • the removal time of image n is specified as follows.
  • the encoder can simulate the decoder input buffer state and obtain the number of bits in the buffer at a given time instant.
  • the encoder can track how each individual image changes the decoder input buffer state via the difference between its nominal removal time and final arrival time (i.e., t b (n) - t r n (n) - t ⁇ n)).
  • t b (n) is less than O 5 the buffer is suffering from
  • an underflow segment as a
  • Figure 4 is a plot of the difference between nominal removal time and final arrival of images t b (n) versus image number in some embodiments. The plot is drawn
  • Figure 4a shows an underflow segment with arrows marking its beginning and end. Note that there is another underflow segment in Figure 4a that occurs after the first underflow segment, which is not explicitly marked by arrows for simplicity.
  • Figure 5 illustrates a process 500 that the encoder uses to perform the underflow detection operation at 305.
  • the process 500 first determines (at 505) the final arrival time, taf, and nominal removal time, tr 5 n 3 of each image by simulating the
  • decoder input buffer conditions as explained above. Note that since this process may be called several times during the iterative process of buffer underflow management, it receives an image number as the starting point and examines the sequence of images from this given starting image. Obviously, for the first iteration, the starting point is the first image in the sequence.
  • the process 500 compares the final arrival time of each image at the decoder input buffer with the nominal removal time of that image by the decoder. If the process determines that there are no images with final arrival time after the nominal removal time (i.e., no underflow condition exits), the process exits. On the other hand, when an image is found for which the final arrival time is after the nominal removal time, the process determines that there is an underflow and transitions to 515 to identify the underflow segment.
  • the process 500 identifies the underflow segment as the segment of the images where the decoder buffer starts to be continuously depleted until the next global minimum where the underflow condition starts to improve (i.e., t b (n) does not
  • the beginning of the underflow segment is further adjusted to start with an I-frame, which is an intra-encoded image that marks the starting of a set of related inter-encoded images.
  • the encoder proceeds to eliminate the underflow. Section B below describes elimination of underflow in a single-segment case (i.e., when the entire sequence of encoded images only contains a single underflow segment). Section C then describes elimination of underflow for the multi-segment underflow cases.
  • underflow segment begins at the nearest local maximum preceding the zero-crossing point, and ends at the next global minimum between the zero-crossing point and the end of the sequence.
  • the end point of the segment could be followed by another zero-crossing point with the curve taking an ascending slope if the buffer recovers from the underflow.
  • Figure 6 illustrates a process 600 the encoder utilizes (at 315, 320, and 325) to eliminate underflow condition in a single segment of images in some embodiments.
  • the process 600 estimates the total number of bits to reduce ( ⁇ B) in the
  • underflow segment by computing the product of the input bit rate into the buffer and the longest delay (e.g., minimum t b (n)) found at the end of the segment.
  • AMQP average masked frame QP
  • B T B - ⁇ B P
  • the process 600 uses the desired AMQP to modify average masked frame QP, MQP(n), based on masking strength ⁇ F ( ⁇ ) such that images that
  • the process then re-encodes (at 620) the video segment based on the parameters defined at 315.
  • the process then examines (at 625) the segment to determine whether the underflow condition is eliminated.
  • Figure 4(b) illustrates the elimination of the underflow condition of Figure 4(a) after process 600 is applied to the underflow segment to re-encode it.
  • the process exits. Otherwise, it will transition back to 605 to further adjust encoding parameters to reduce total bit size.
  • the encoder searches for one underflow segment at a time, starting from the first zero-crossing point (i.e., at the lowest n) with a descending slope.
  • the underflow segment begins at the nearest local maximum preceding this zero-crossing point, and ends at the next global minimum between the zero-crossing point and the next zero-crossing point (or the end of the sequence if there is no more zero crossing).
  • the encoder hypothetically removes the underflow in this segment and estimates the updated buffer fullness by setting t b (n) to
  • the encoder then continues searching for the next segment using the modified buffer fullness. Once all underflow segments are identified as described above, the encoder derives the AMQPs and modifies the Masked frame QPs for each segment independently of the others just as in the single-segment case.
  • some embodiments would not identify multiple segments that cause underflow of the input buffer of the decoder. Instead, some embodiments would perform buffer simulation as described above to identify a first segment that causes underflow. After identifying such a segment, these embodiments correct the segment to rectify underflow condition in that segment and then resume encoding following the corrected portion. After the encoding of the remaining of the sequence, these embodiments will repeat this process for the next underflow segment.
  • decoder buffer underflow techniques described above applies to numerous encoding and decoding systems. Several examples of such systems are described below.
  • Figure 7 illustrates a network 705 connecting a video streaming server 710 and several client decoders 715-725. Clients are connected to the network 705 via links with different bandwidths such as 300 Kb/sec and 3 Mb/sec.
  • the video streaming server 710 is controlling streaming of encoded video images from an encoder 730 to the client decoders 715-725.
  • the streaming video server may decide to stream the encoded video images using the slowest bandwidth in the network (i.e., 300 Kb/sec) and the smallest client buffer size.
  • the streaming server 710 needs only one set of encoded images that are optimized for a target bit rate of 300 Kb/sec.
  • the server may generate and store different encodings that are optimized for different bandwidths and different client buffer conditions.
  • FIG. 8 illustrates another example of an application for decoder underflow management.
  • an HD-DVD player 805 is receiving encoded video images from an HD-DVD 840 that has stored encoded video data from a video encoder 810.
  • the HD-DVD player 805 has an input buffer 815, a set of decoding modules shown as one block 820 for simplicity, and an output buffer 825.
  • the output of the player 805 is sent to display devices such as TV 830 or computer display terminal 835.
  • the HD-DVD player may have a very high bandwidth, e.g. 29.4 Mb/sec.
  • the encoder ensures that the video images are encoded in a way that no segments in the sequence of images would be so large that cannot be delivered to the decoder input buffer on time.
  • FIG. 9 presents a computer system with which one embodiment of the invention is implemented.
  • Computer system 900 includes a bus 905, a processor 910, a system memory 915, a read-only memory 920, a permanent storage device 925, input devices 930, and output devices 935.
  • the bus 905 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the computer system 900. For instance, the bus 905 communicatively connects the processor 910 with the read-only memory 920, the system memory 915, and the permanent storage device 925.
  • the processor 910 retrieves instructions to execute and data to process in order to execute the processes of the invention.
  • the read-only-memory (ROM) 920 stores static data and instructions that are needed by the processor 910 and other modules of the computer system.
  • the permanent storage device 925 is read-and-write memory device. This device is a non- volatile memory unit that stores instruction and data even when the computer system 900 is off. Some embodiments of the invention use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 925.
  • a mass-storage device such as a magnetic or optical disk and its corresponding disk drive
  • the system memory 915 is a read-and- write memory device. However, unlike storage device 925, the system memory is a volatile read- and-write memory, such as a random access memory.
  • the system memory stores some of the instructions and data that the processor needs at runtime. In some embodiments, the invention's processes are stored in the system memory 915, the permanent storage device 925, and/or the read-only memory 920.
  • the bus 905 also connects to the input and output devices 930 and 935.
  • the input devices enable the user to communicate information and select commands to the computer system.
  • the input devices 930 include alphanumeric keyboards and cursor- controllers.
  • the output devices 935 display images generated by the computer system.
  • the output devices include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD).
  • bus 905 also couples computer 900 to a network 965 through a network adapter (not shown).
  • the computer can be a part of a network of computers (such as a local area network ("LAN”), a wide area network (“WAN”), or an Intranet) or a network of networks (such as the Internet).
  • LAN local area network
  • WAN wide area network
  • Intranet a network of networks
  • the Internet a network of networks
  • Several embodiments described above compute the mean removed SAD to obtain an indication of the image variance in a macroblock. Other embodiments, however, might identify the image variance differently. For example, some embodiments might predict an expected image value for the pixels of a macroblock. These embodiments then generate a macroblock SAD by subtracting this predicted value form the luminance value of the pixels of the macroblock, and summing the absolute value of the subtractions. In some embodiments, the predicted value is based on not only the values of the pixels in the macroblock but also the value of the pixels in one or more of the neighboring macroblocks.
  • the embodiments described above use the derived spatial and temporal masking values directly.
  • Other embodiments will apply a smoothing filtering on successive spatial masking values and/or to successive temporal masking values before using them in order to pick out the general trend of those values through the video images.
  • the invention is not to be limited by the foregoing illustrative details.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
PCT/US2005/022616 2004-06-27 2005-06-24 Multi-pass video encoding WO2006004605A2 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN2005800063635A CN1926863B (zh) 2004-06-27 2005-06-24 多通路视频编码的方法
EP05773224A EP1762093A4 (en) 2004-06-27 2005-06-24 MULTIPASS VIDEO CODING
KR1020067017074A KR100909541B1 (ko) 2004-06-27 2005-06-24 멀티-패스 비디오 인코딩 방법
JP2007518338A JP4988567B2 (ja) 2004-06-27 2005-06-24 マルチパスのビデオ符号化
HK07106057.0A HK1101052A1 (en) 2004-06-27 2007-06-07 Method of multi-pass video encoding

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US58341804P 2004-06-27 2004-06-27
US60/583,418 2004-06-27
US64391805P 2005-01-09 2005-01-09
US60/643,918 2005-01-09
US11/118,604 2005-04-28
US11/118,616 US8406293B2 (en) 2004-06-27 2005-04-28 Multi-pass video encoding based on different quantization parameters
US11/118,616 2005-04-28
US11/118,604 US8005139B2 (en) 2004-06-27 2005-04-28 Encoding with visual masking

Publications (3)

Publication Number Publication Date
WO2006004605A2 true WO2006004605A2 (en) 2006-01-12
WO2006004605A3 WO2006004605A3 (en) 2006-05-04
WO2006004605B1 WO2006004605B1 (en) 2006-07-13

Family

ID=35783274

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2005/022616 WO2006004605A2 (en) 2004-06-27 2005-06-24 Multi-pass video encoding

Country Status (6)

Country Link
EP (1) EP1762093A4 (ja)
JP (2) JP4988567B2 (ja)
KR (3) KR100909541B1 (ja)
CN (3) CN1926863B (ja)
HK (1) HK1101052A1 (ja)
WO (1) WO2006004605A2 (ja)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009045683A1 (en) * 2007-09-28 2009-04-09 Athanasios Leontaris Video compression and tranmission techniques
US7822118B2 (en) 2002-11-08 2010-10-26 Apple Inc. Method and apparatus for control of rate-distortion tradeoff by mode selection in video encoders
WO2011084918A1 (en) * 2010-01-06 2011-07-14 Dolby Laboratories Licensing Corporation High performance rate control for multi-layered video coding applications
US8005139B2 (en) 2004-06-27 2011-08-23 Apple Inc. Encoding with visual masking
US8208536B2 (en) 2005-04-28 2012-06-26 Apple Inc. Method and apparatus for encoding using single pass rate controller
US8406293B2 (en) 2004-06-27 2013-03-26 Apple Inc. Multi-pass video encoding based on different quantization parameters
WO2013095627A1 (en) * 2011-12-23 2013-06-27 Intel Corporation Content adaptive high precision macroblock rate control
EP2951994A4 (en) * 2013-01-30 2016-10-12 Intel Corp CONTENT-ADVANCED BITRATE AND QUALITY CONTROL USING FRAME-HIERARCHY-SENSITIVE QUANTIFICATION FOR HIGH-EFFICIENT VIDEO-CORDING OF THE NEXT GENERATION
EP3044960A4 (en) * 2013-09-12 2017-08-02 Magnum Semiconductor, Inc. Methods and apparatuses including an encoding system with temporally adaptive quantization
US10313675B1 (en) 2015-01-30 2019-06-04 Google Llc Adaptive multi-pass video encoder control

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100918499B1 (ko) * 2007-09-21 2009-09-24 주식회사 케이티 멀티 패스 인코딩 장치 및 그 방법
EP2101503A1 (en) * 2008-03-11 2009-09-16 British Telecommunications Public Limited Company Video coding
CN102860010A (zh) 2010-05-06 2013-01-02 日本电信电话株式会社 视频编码控制方法及装置
JP5295429B2 (ja) 2010-05-07 2013-09-18 日本電信電話株式会社 動画像符号化制御方法,動画像符号化装置および動画像符号化プログラム
JP5286581B2 (ja) 2010-05-12 2013-09-11 日本電信電話株式会社 動画像符号化制御方法,動画像符号化装置および動画像符号化プログラム
KR101702562B1 (ko) 2010-06-18 2017-02-03 삼성전자 주식회사 멀티미디어 스트림 파일의 저장 파일 포맷, 저장 방법 및 이를 이용한 클라이언트 장치
US9402082B2 (en) * 2012-04-13 2016-07-26 Sharp Kabushiki Kaisha Electronic devices for sending a message and buffering a bitstream
CN102946542B (zh) * 2012-12-07 2015-12-23 杭州士兰微电子股份有限公司 已编著镜像视频区间码流重新编码及无缝接入方法和系统
US11153585B2 (en) 2017-02-23 2021-10-19 Netflix, Inc. Optimizing encoding operations when generating encoded versions of a media title
US10897618B2 (en) 2017-02-23 2021-01-19 Netflix, Inc. Techniques for positioning key frames within encoded video sequences
US10742708B2 (en) 2017-02-23 2020-08-11 Netflix, Inc. Iterative techniques for generating multiple encoded versions of a media title
US11166034B2 (en) 2017-02-23 2021-11-02 Netflix, Inc. Comparing video encoders/decoders using shot-based encoding and a perceptual visual quality metric
US10666992B2 (en) 2017-07-18 2020-05-26 Netflix, Inc. Encoding techniques for optimizing distortion and bitrate
CN109756733B (zh) * 2017-11-06 2022-04-12 华为技术有限公司 视频数据解码方法及装置

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05167998A (ja) * 1991-12-16 1993-07-02 Nippon Telegr & Teleph Corp <Ntt> 画像の符号化制御処理方法
JP3627279B2 (ja) * 1995-03-31 2005-03-09 ソニー株式会社 量子化装置および量子化方法
US5956674A (en) * 1995-12-01 1999-09-21 Digital Theater Systems, Inc. Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels
FR2753330B1 (fr) * 1996-09-06 1998-11-27 Thomson Multimedia Sa Procede de quantification pour codage video
JPH10304311A (ja) * 1997-04-23 1998-11-13 Matsushita Electric Ind Co Ltd 映像符号化装置及び映像復号化装置
EP0940042B1 (en) * 1997-07-29 2005-07-27 Koninklijke Philips Electronics N.V. Variable bitrate video coding method and corresponding video coder
US6192075B1 (en) * 1997-08-21 2001-02-20 Stream Machine Company Single-pass variable bit-rate control for digital video coding
WO1999043163A2 (en) * 1998-02-20 1999-08-26 Koninklijke Philips Electronics N.V. Method and device for coding a sequence of pictures
US6278735B1 (en) * 1998-03-19 2001-08-21 International Business Machines Corporation Real-time single pass variable bit rate control strategy and encoder
US6289129B1 (en) * 1998-06-19 2001-09-11 Motorola, Inc. Video rate buffer for use with push dataflow
ES2259827T3 (es) * 1998-10-13 2006-10-16 Matsushita Electric Industrial Co., Ltd. Regulacion de los requisitos de calculo y de memoria de un tren de bits comprimido en un decodificador de video.
US20020057739A1 (en) * 2000-10-19 2002-05-16 Takumi Hasebe Method and apparatus for encoding video
US6594316B2 (en) * 2000-12-12 2003-07-15 Scientific-Atlanta, Inc. Method and apparatus for adaptive bit rate control in an asynchronized encoding system
US6831947B2 (en) * 2001-03-23 2004-12-14 Sharp Laboratories Of America, Inc. Adaptive quantization based on bit rate prediction and prediction error energy
US7062429B2 (en) * 2001-09-07 2006-06-13 Agere Systems Inc. Distortion-based method and apparatus for buffer control in a communication system
JP3753371B2 (ja) * 2001-11-13 2006-03-08 Kddi株式会社 動画像圧縮符号化レート制御装置
US7027982B2 (en) * 2001-12-14 2006-04-11 Microsoft Corporation Quality and rate control strategy for digital audio
KR100468726B1 (ko) * 2002-04-18 2005-01-29 삼성전자주식회사 실시간 가변 비트율 제어를 수행하는 부호화 장치 및 방법
JP2004166128A (ja) * 2002-11-15 2004-06-10 Pioneer Electronic Corp 画像情報の符号化方法、符号化装置及び符号化プログラム
WO2005011255A2 (en) * 2003-06-26 2005-02-03 Thomson Licensing S.A. Multipass video rate control to match sliding window channel constraints

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
None
See also references of EP1762093A4

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7822118B2 (en) 2002-11-08 2010-10-26 Apple Inc. Method and apparatus for control of rate-distortion tradeoff by mode selection in video encoders
US8355436B2 (en) 2002-11-08 2013-01-15 Apple Inc. Method and apparatus for control of rate-distortion tradeoff by mode selection in video encoders
US8406293B2 (en) 2004-06-27 2013-03-26 Apple Inc. Multi-pass video encoding based on different quantization parameters
US8594190B2 (en) 2004-06-27 2013-11-26 Apple Inc. Encoding with visual masking
US8005139B2 (en) 2004-06-27 2011-08-23 Apple Inc. Encoding with visual masking
US8811475B2 (en) 2004-06-27 2014-08-19 Apple Inc. Multi-pass video encoding solution for buffer underflow
US8208536B2 (en) 2005-04-28 2012-06-26 Apple Inc. Method and apparatus for encoding using single pass rate controller
US9445110B2 (en) 2007-09-28 2016-09-13 Dolby Laboratories Licensing Corporation Video compression and transmission techniques
CN101855910A (zh) * 2007-09-28 2010-10-06 杜比实验室特许公司 视频压缩和传送技术
WO2009045683A1 (en) * 2007-09-28 2009-04-09 Athanasios Leontaris Video compression and tranmission techniques
US8908758B2 (en) 2010-01-06 2014-12-09 Dolby Laboratories Licensing Corporation High performance rate control for multi-layered video coding applications
WO2011084918A1 (en) * 2010-01-06 2011-07-14 Dolby Laboratories Licensing Corporation High performance rate control for multi-layered video coding applications
CN102714725A (zh) * 2010-01-06 2012-10-03 杜比实验室特许公司 用于多层视频编码应用的高性能码率控制
WO2013095627A1 (en) * 2011-12-23 2013-06-27 Intel Corporation Content adaptive high precision macroblock rate control
US9497241B2 (en) 2011-12-23 2016-11-15 Intel Corporation Content adaptive high precision macroblock rate control
EP2951994A4 (en) * 2013-01-30 2016-10-12 Intel Corp CONTENT-ADVANCED BITRATE AND QUALITY CONTROL USING FRAME-HIERARCHY-SENSITIVE QUANTIFICATION FOR HIGH-EFFICIENT VIDEO-CORDING OF THE NEXT GENERATION
EP3044960A4 (en) * 2013-09-12 2017-08-02 Magnum Semiconductor, Inc. Methods and apparatuses including an encoding system with temporally adaptive quantization
US10313675B1 (en) 2015-01-30 2019-06-04 Google Llc Adaptive multi-pass video encoder control

Also Published As

Publication number Publication date
CN102833538A (zh) 2012-12-19
KR20070011294A (ko) 2007-01-24
WO2006004605A3 (en) 2006-05-04
KR20090037475A (ko) 2009-04-15
JP2011151838A (ja) 2011-08-04
CN1926863B (zh) 2012-09-19
JP4988567B2 (ja) 2012-08-01
EP1762093A4 (en) 2011-06-29
KR100909541B1 (ko) 2009-07-27
CN1926863A (zh) 2007-03-07
CN102833539B (zh) 2015-03-25
EP1762093A2 (en) 2007-03-14
HK1101052A1 (en) 2007-10-05
KR100988402B1 (ko) 2010-10-18
CN102833539A (zh) 2012-12-19
WO2006004605B1 (en) 2006-07-13
CN102833538B (zh) 2015-04-22
JP5318134B2 (ja) 2013-10-16
JP2008504750A (ja) 2008-02-14
KR20090034992A (ko) 2009-04-08
KR100997298B1 (ko) 2010-11-29

Similar Documents

Publication Publication Date Title
US8005139B2 (en) Encoding with visual masking
US8406293B2 (en) Multi-pass video encoding based on different quantization parameters
WO2006004605A2 (en) Multi-pass video encoding
Guo et al. Optimal bit allocation at frame level for rate control in HEVC
US6529631B1 (en) Apparatus and method for optimizing encoding and performing automated steerable image compression in an image coding system using a perceptual metric
CA2688249C (en) A buffer-based rate control exploiting frame complexity, buffer level and position of intra frames in video coding
US20060233237A1 (en) Single pass constrained constant bit-rate encoding
WO2007099039A1 (en) Method and apparatus for determining in picture signal encoding the bit allocation for groups of pixel blocks in a picture
EP2027727A1 (en) Method and apparatus for adaptively determining a bit budget for encoding video pictures
EP4333433A1 (en) Video coding method and apparatus, and electronic device
Wu et al. Adaptive initial quantization parameter determination for H. 264/AVC video transcoding
Chi et al. Region-of-interest video coding based on rate and distortion variations for H. 263+
Zhang et al. A two-pass rate control algorithm for H. 264/AVC high definition video coding
Overmeire et al. Constant quality video coding using video content analysis
Hoang Real-time VBR rate control of MPEG video based upon lexicographic bit allocation
Guan et al. A Novel Video Compression Algorithm Based on Wireless Sensor Network.
Chang et al. A two-layer characteristic-based rate control framework for low delay video transmission
Tun et al. A novel rate control algorithm for the Dirac video codec based upon the quality factor optimization

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

WWE Wipo information: entry into national phase

Ref document number: 2005773224

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2007518338

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 1020067017074

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 200580006363.5

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

WWP Wipo information: published in national office

Ref document number: 1020067017074

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 2005773224

Country of ref document: EP