WO2008136828A1 - Video coding mode selection using estimated coding costs - Google Patents

Video coding mode selection using estimated coding costs Download PDF

Info

Publication number
WO2008136828A1
WO2008136828A1 PCT/US2007/068307 US2007068307W WO2008136828A1 WO 2008136828 A1 WO2008136828 A1 WO 2008136828A1 US 2007068307 W US2007068307 W US 2007068307W WO 2008136828 A1 WO2008136828 A1 WO 2008136828A1
Authority
WO
WIPO (PCT)
Prior art keywords
coding
transform coefficients
block
residual data
matrix
Prior art date
Application number
PCT/US2007/068307
Other languages
French (fr)
Inventor
Sitaraman Ganapathy Subramania
Fang Shi
Peisong Chen
Seyfullah Halit Oguz
Scott T. Swazey
Vinod Kaushik
Original Assignee
Qualcomm Incorporated
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Incorporated filed Critical Qualcomm Incorporated
Priority to JP2010507374A priority Critical patent/JP2010526515A/en
Priority to EP07761930A priority patent/EP2156672A1/en
Priority to KR1020097025315A priority patent/KR101166732B1/en
Priority to PCT/US2007/068307 priority patent/WO2008136828A1/en
Priority to KR1020127007471A priority patent/KR20120031529A/en
Priority to CN2007800528186A priority patent/CN101663895B/en
Publication of WO2008136828A1 publication Critical patent/WO2008136828A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/19Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding using optimisation based on Lagrange multipliers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/107Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/109Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Definitions

  • the disclosure relates to video coding and, more particularly, to estimating coding costs to code video sequences.
  • Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless communication devices, personal digital assistants (PDAs), laptop computers, desktop computers, video game consoles, digital cameras, digital recording devices, cellular or satellite radio telephones, and the like. Digital video devices can provide significant improvements over conventional analog video systems in processing and transmitting video sequences.
  • PDAs personal digital assistants
  • laptop computers desktop computers
  • video game consoles digital cameras
  • digital recording devices digital recording devices
  • cellular or satellite radio telephones and the like.
  • MPEG Moving Picture Experts Group
  • MPEG-I Motion Picture Experts Group
  • MPEG-2 MPEG-4
  • MPEG-4 MPEG-4
  • ISO/IEC MPEG-4 Part 10
  • AVC Advanced Video Coding
  • blocks of pixels are divided into discrete blocks of pixels, and the blocks of pixels are coded based on differences with other blocks, which may be located within the same frame or in a different frame.
  • Some blocks of pixels often referred to as "macroblocks," comprise a grouping of sub-blocks of pixels.
  • a 16x16 macroblock may comprise four 8x8 sub-blocks.
  • the sub-blocks may be coded separately.
  • the H.264 standard permits coding of blocks with a variety of different sizes, e.g., 16x16, 16x8, 8x16, 8x8, 4x4, 8x4, and 4x8.
  • sub-blocks of any size may be included within a macroblock, e.g., 2x16, 16x2, 2x2, 4x16, and 8x2.
  • a method for processing digital video data comprises identifying one or more transform coefficients for residual data of a block of pixels that will remain non-zero when quantized, estimating a number of bits associated with coding of the residual data based on at least the identified transform coefficients, and estimating a coding cost for coding the block of pixels based on at least the estimated number of bits associated with coding the residual data.
  • an apparatus for processing digital video data comprises a transform module that generates transform coefficients for residual data of a block of pixels, a bit estimate module that identifies one or more of the transform coefficients that will remain non-zero when quantized and estimates a number of bits associated with coding of the residual data based on at least the identified transform coefficients, and a control module that estimates a coding cost for coding the block of pixels based on at least the estimated number of bits associated with coding the residual data.
  • an apparatus for processing digital video data comprises means for identifying one or more transform coefficients for residual data of a block of pixels that will remain non-zero when quantized, means for estimating a number of bits associated with coding of the residual data based on at least the identified transform coefficients, means for estimating a coding cost for coding the block of pixels based on at least the estimated number of bits associated with coding the residual data.
  • a computer-program product for processing digital video data comprises a computer readable medium having instructions thereon.
  • the instructions include code for identifying one or more transform coefficients for residual data of a block of pixels that will remain non-zero when quantized, code for estimating a number of bits associated with coding of the residual data based on at least the identified transform coefficients, and code for estimating a coding cost for coding the block of pixels based on at least the estimated number of bits associated with coding the residual data.
  • FIG. 1 is a block diagram illustrating a video coding system that employs the coding cost estimate techniques described herein.
  • FIG. 2 is a block diagram illustrating an exemplary encoding module in further detail.
  • FIG. 3 is a block diagram illustrating another exemplary encoding module in further detail.
  • FIG. 4 is a flow diagram illustrating exemplary operation of an encoding module selecting an encoding mode based on estimated coding costs.
  • FIG. 5 is a flow diagram illustrating exemplary operation of an encoding module estimating the number of bits associated with coding the residual data of a block without quantizing or encoding of the residual data.
  • FIG. 6 is a flow diagram illustrating exemplary operation of an encoding module estimating the number of bits associated with coding the residual data of a block without encoding the residual data.
  • This disclosure describes techniques for video coding mode selection using estimated coding costs.
  • an encoding device may attempt to select a coding mode for coding blocks of pixels that codes the data of the blocks with high efficiency.
  • the encoding device may perform coding mode selection based on at least estimates of coding cost for at least a portion of the possible modes.
  • the encoding device estimates the coding cost for the different modes without actually coding the blocks.
  • the encoding module device may estimate the coding cost for the modes without quantizing the data of the block for each mode. In this manner, the coding cost estimation techniques of this disclosure reduce the amount of computationally intensive calculations needed to perform effective mode selection.
  • FIG. 1 is a block diagram illustrating a multimedia coding system 10 that employs coding cost estimate techniques as described herein.
  • Coding system 10 includes an encoding device 12 and a decoding device 14 connected by a transmission channel 16.
  • Encoding device 12 encodes one or more sequences of digital multimedia data and transmits the encoded sequences over transmission channel 16 to decoding device 14 for decoding and, possibly, presentation to a user of decoding device 14.
  • Transmission channel 16 may comprise any wired or wireless medium, or a combination thereof.
  • Encoding device 12 may form part of a broadcast network component used to broadcast one or more channels of multimedia data.
  • encoding device 12 may form part of a wireless base station, server, or any infrastructure node that is used to broadcast one or more channels of encoded multimedia data to wireless devices.
  • encoding device 12 may transmit the encoded data to a plurality of wireless devices, such as decoding device 14.
  • a single decoding device 14, however, is illustrated in FIG. 1 for simplicity.
  • encoding device 12 may comprise a handset that transmits locally captured video for video telephony or other similar applications.
  • Decoding device 14 may comprise a user device that receives the encoded multimedia data transmitted by encoding device 12 and decodes the multimedia data for presentation to a user.
  • decoding device 14 may be implemented as part of a digital television, a wireless communication device, a gaming device, a portable digital assistant (PDA), a laptop computer or desktop computer, a digital music and video device, such as those sold under the trademark "iPod,” or a radiotelephone such as cellular, satellite or terrestrial-based radiotelephone, or other wireless mobile terminal equipped for video and/or audio streaming, video telephony, or both.
  • Decoding device 14 may be associated with a mobile or stationary device. In a broadcast application, encoding device 12 may transmit encoded video and/or audio to multiple decoding devices 14 associated with multiple users.
  • multimedia coding system 10 may support video telephony or video streaming according to the Session Initiated Protocol (SIP), International Telecommunication Union Standardization Sector (ITU-T) H.323 standard, ITU-T H.324 standard, or other standards.
  • SIP Session Initiated Protocol
  • ITU-T International Telecommunication Union Standardization Sector
  • encoding device 12 may generate encoded multimedia data according to a video compression standard, such as Moving Picture Experts Group (MPEG)-2, MPEG-4, ITU-T H.263, or ITU-T H.264, which corresponds to MPEG-4, Part 10, Advanced Video Coding (AVC).
  • MPEG Moving Picture Experts Group
  • MPEG-4 MPEG-4
  • ITU-T H.263 ITU-T H.264
  • AVC Advanced Video Coding
  • encoding device 12 and decoding device 14 may be integrated with an audio encoder and decoder, respectively, and include appropriate multiplexer-demultiplexer (MUX-DEMUX) modules, or other hardware, firmware, or software, to handle encoding of both audio and video in a common data sequence or separate data sequences.
  • MUX-DEMUX modules may conform to the ITU H.223 multiplexer protocol, or other protocols such as the user datagram protocol (UDP).
  • this disclosure contemplates application to Enhanced H.264 video coding for delivering real-time multimedia services in terrestrial mobile multimedia multicast (TM3) systems using the Forward Link Only (FLO) Air Interface Specification, "Forward Link Only Air Interface Specification for Terrestrial Mobile Multimedia Multicast,” published as Technical Standard TIA- 1099, Aug. 2006 (the "FLO Specification”).
  • FLO Forward Link Only
  • the coding cost estimation techniques described in this disclosure are not limited to any particular type of broadcast, multicast, unicast, or point- to-point system.
  • encoding device 12 includes an encoding module 18 and a transmitter 20.
  • Encoding module 18 receives one or more input multimedia sequences that can include, in the case of video encoding, one or more frames of data and selectively encodes the frames of the received multimedia sequences.
  • Encoding module 18 receives the input multimedia sequences from one or more sources (not shown in FIG. 1).
  • encoding module 18 may receive the input multimedia sequences from one or more video content providers, e.g., via satellite.
  • encoding module 18 may receive the multimedia sequences from an image capture device (not shown in FIG. 1) integrated within encoding device 12 or coupled to encoding device 12.
  • encoding module 18 may receive the multimedia sequences from a memory or archive (not shown in FIG. 1) within encoding device 12 or coupled to encoding device 12.
  • the multimedia sequences may comprise live real-time or near real-time video, audio, or video and audio sequences to be coded and transmitted as a broadcast or on-demand, or may comprise pre-recorded and stored video, audio, or video and audio sequences to be coded and transmitted as a broadcast or on-demand.
  • at least a portion of the multimedia sequences may be computer-generated, such as in the case of gaming.
  • encoding module 18 encodes and transmits a plurality of coded frames to decoding device 14 via transmitter 20.
  • Encoding module 18 may encode the frames of the input multimedia sequences as intra-coded frames, inter-coded frames or a combination thereof.
  • Frames encoded using intra-coding techniques are coded without reference to other frames, and are often referred to as intra ("I") frames.
  • Frames encoded using inter-coding techniques are coded with reference to one or more other frames.
  • the inter-coded frames may include one or more predictive ("P") frames, bi-directional ("B”) frames, or a combination thereof.
  • P frames are encoded with reference to at least one temporally prior frame while B frames are encoded with reference to at least one temporally future frame.
  • B frames may be encoded with reference to at least one temporally future frame and at least one temporally prior frame.
  • Encoding module 18 may be further configured to partition a frame into a plurality of blocks and encode each of the blocks separately.
  • encoding module 18 may partition the frame into a plurality of 16x16 blocks.
  • Some blocks, often referred to as “macroblocks,” comprise a grouping of sub-partition blocks (referred to herein as "sub-blocks").
  • a 16x16 macroblock may comprise four 8x8 sub-blocks, or other sub-partition blocks.
  • the H.264 standard permits encoding of blocks with a variety of different sizes, e.g., 16x16, 16x8, 8x16, 8x8, 4x4, 8x4, and 4x8.
  • encoding module 18 may be configured to divide the frame into several blocks and encode each of the blocks of pixels as intra-coded blocks or inter-coded blocks, each of which may be referred to generally as a block.
  • Encoding module 18 may support a plurality of coding modes. Each of the modes may be correspond to a different combination of block sizes and coding techniques. In the case of the H.264 standard, for example, there are seven inter modes and thirteen intra modes.
  • the seven variable block-size inter modes include a SKIP mode, 16x16 mode, 16x8 mode, 8x16 mode, 8x8 mode, 8x4 mode, 4x8 mode, and 4x4 mode.
  • the thirteen intra modes include an INTRA 4x4 mode for which there are nine possible interpolation directions and an INTRA 16x16 mode for which there are four possible interpolation directions.
  • encoding module 18 attempts to select the mode that codes the data of the blocks with high efficiency. To this end, encoding module 18 estimates, for each of the blocks, a coding cost for at least a portion of the modes. Encoding module 18 estimates the coding cost as a function of rate and distortion. In accordance with the techniques described herein, encoding module 18 estimates the coding cost for the modes without actually coding the blocks to determine the rate and distortion metrics. In this manner, encoding module 18 may select one of the modes based on at least the coding cost without performing the computationally complex coding of the data of the block for each mode.
  • encoding module 18 may estimate the coding cost for the modes without quantizing the data of the block for each mode. In this manner, the coding cost estimation techniques of this disclosure reduce the amount of computationally intensive calculations needed to perform effective mode selection.
  • Encoding device 12 applies the selected mode to code the blocks of the frames and transmits the coded frames of data via transmitter 20.
  • Transmitter 20 may include appropriate modem and driver circuitry software and/or firmware to transmit encoded multimedia over transmission channel 16.
  • transmitter 26 includes RF circuitry to transmit wireless data carrying the encoded multimedia data.
  • Decoding device 14 includes a receiver 22 and a decoding module 24.
  • Decoding device 14 receives the encoded data from encoding device 12 via receiver 22.
  • receiver 22 may include appropriate modem and driver circuitry software and/or firmware to receive encoded multimedia over transmission channel 16, and may include RF circuitry to receive wireless data carrying the encoded multimedia data in wireless applications.
  • Decoding module 24 decodes the coded frames of data received via receiver 22.
  • Decoding device 14 may further present the decoded frame of data to a user via a display (not shown) that may be either integrated within decoding device 14 or provided as a discrete device coupled to decoding device 14 via a wired or wireless connection.
  • encoding device 12 and decoding device 14 each may include reciprocal transmit and receive circuitry so that each may serve as both a transmit device and a receive device for encoded multimedia and other information transmitted over transmission channel 16.
  • both encoding device 12 and decoding device 14 may transmit and receive multimedia sequences and thus participate in two-way communications.
  • the illustrated components of coding system 10 may be integrated as part of an encoder/decoder (CODEC).
  • encoding device 12 and decoding device 14 are exemplary of those applicable to implement the techniques described herein.
  • Encoding device 12 and decoding device 14 may include many other components, if desired.
  • encoding device 12 may include a plurality of encoding modules that each receive one or more sequences of multimedia data and encode the respective sequences of multimedia data in accordance with the techniques described herein.
  • encoding device 12 may further include at least one multiplexer to combine the segments of data for transmission.
  • encoding device 12 and decoding device 14 may include appropriate modulation, demodulation, frequency conversion, filtering, and amplifier components for transmission and reception of encoded video, including radio frequency (RF) wireless components and antennas, as applicable.
  • RF radio frequency
  • FIG. 2 is a block diagram illustrating an exemplary encoding module 30 in further detail.
  • Encoding module 30 may, for example, represent encoding module 18 of encoding device 12 of FIG. 1.
  • encoding module 30 includes a control module 32 that receives input frames of multimedia data of one or more multimedia sequences from one or more sources, and processes the frames of the received multimedia sequences.
  • control module 32 analyzes the incoming frames of the multimedia sequences and determines whether to encode or skip the incoming frames based on analysis of the frames.
  • encoding device 12 may encode the information contained in the multimedia sequences at a reduced frame rate using frame skipping to conserve bandwidth across transmission channel 16.
  • control module 32 may also be configured to determine whether to encode the frames as I frames, P frames, or B frames. Control module 32 may determine to encode an incoming frame as an I frame at the start of a multimedia sequence, at a scene change within the sequence, for use as a channel switch frame, or for use as an intra refresh frame. Otherwise, control module 32 encodes the frame as an inter-coded frame (i.e., a P frame or B frame) to reduce the amount of bandwidth associated with coding the frame.
  • inter-coded frame i.e., a P frame or B frame
  • Control module 32 may be further configured to partition the frames into a plurality of blocks and select a coding mode, such as one of the H.264 coding modes described above, for each of the blocks.
  • encoding module 30 may estimate the coding cost for at least a portion of the modes to assist in selecting a most efficient one of the coding modes.
  • encoding module 30 After selecting the coding mode for use in coding one of the blocks, encoding module 30 generates residual data for the block.
  • spatial prediction module 34 For a block selected to be intra-coded, spatial prediction module 34 generates the residual data for the block. Spatial prediction module 34 may, for example, generate a predicted version of the block via interpolation using one or more adjacent blocks and the interpolation directionality corresponding to the selected intra-coding mode. Spatial prediction module 34 may then compute a difference between the block of the input frame and the predicted block. This difference is referred to as residual data or residual coefficients.
  • motion estimation module 36 and motion compensation module 38 For a block selected to be inter-coded, motion estimation module 36 and motion compensation module 38 generate the residual data for the block.
  • motion estimation module 36 identifies at least one reference frame and searches for a block in the reference frame that is a best match to the block in the input frame.
  • Motion estimation module 36 computes a motion vector to represent an offset between the location of the block in the input frame and the location of the identified block in the reference frame.
  • Motion compensation module 38 computes a difference between the block of the input frame and the identified block in the reference frame to which the motion vector points. This difference is the residual data for the block.
  • Encoding module 30 also includes a transform module 40, a quantization module 46 and an entropy encoder 48.
  • Transform module 40 transforms the residual data of the block in accordance with a transform function.
  • transform module 40 applies an integer transform, such as a 4x4 or 8x8 integer transform or a Discrete Cosine Transform (DCT), to the residual data to generate transform coefficients for the residual data.
  • Quantization module 46 quantizes the transform coefficients and provides the quantized transform coefficients to entropy encoder 48.
  • Entropy encoder 48 encodes the quantized transform coefficients using a context-adaptive coding technique, such as context-adaptive variable-length coding (CAVLC) or context-adaptive binary arithmetic coding (CABAC). As will be described in detail below, entropy encoder 48 applies a selected mode to code the data of the block.
  • a context-adaptive coding technique such as context-adaptive variable-length coding (CAVLC) or context-adaptive binary arithmetic coding (CABAC).
  • CABAC context-adaptive binary arithmetic coding
  • Entropy encoder 48 may also encode additional data associated with the block. For example, in addition to the residual data, entropy encoder 48 may encode one or more motion vectors of the block, an identifier indicating the coding mode of the block, one or more reference frame indices, quantization parameter (QP) information, slice information of the block and the like. Entropy encoder 48 may receive this additional block data from other modules within encoding module 30. For example, the motion vector information may be received from motion estimation module 36 while the block mode information may be received from control module 32.
  • QP quantization parameter
  • entropy encoder 48 may code at least a portion of this additional information using a fixed length coding (FLC) technique or a universal variable length coding (VLC) technique, such as Exponential-Golomb coding ("Exp-Golomb").
  • FLC fixed length coding
  • VLC universal variable length coding
  • entropy encoder 48 may encode a portion of the additional block data using the context-adaptive coding techniques described above, i.e., CABAC or CAVLC.
  • control module 32 estimates a coding cost for at least a portion of the possible modes.
  • control module 32 may estimate the cost of coding the block in each of the possible coding modes. The cost may be estimated, for example, in terms of the number of bits associated with coding the block in a given mode versus the amount of distortion produced in that mode.
  • control module 32 may estimate the coding cost for twenty-two different coding modes (the inter- and intra-coding modes) for a block selected for inter-coding and thirteen different coding modes for a block selected for intra-coding.
  • control module 32 may use another mode selection technique to initially reduce the set of possible modes, and then utilize the techniques of this disclosure to estimate the coding cost for the remaining modes of the set. In other words, in some aspects, control module 32 may narrow down the number of mode possibilities before applying the cost estimate technique.
  • encoding module 30 estimates the coding costs for the modes without actually coding the data of the blocks for the different modes, thereby reducing computational overhead associated with the coding decision. In fact, in the example illustrated in FIG. 2, encoding module 30 may estimate the coding cost without quantizing the data of the block for the different modes. In this manner, the coding cost estimation techniques of this disclosure reduce the amount of computationally intensive calculations needed to compute the coding cost. In particular, it is not necessary to encode the blocks using the various coding modes in order to select one of the modes.
  • control module 32 estimates the coding cost of each analyzed mode in accordance with the equation:
  • J D + ⁇ mode ⁇ R, (1)
  • D a distortion metric of the block
  • ⁇ mode a Lagrange multiplier of the respective mode
  • R a rate metric of the block.
  • the distortion metric (D) may, for example, comprise a sum of absolute difference (SAD), sum of square difference (SSD), a sum of absolute transform difference (SATD), sum of square transform different (SSTD) or the like.
  • the rate metric (R) may, for example, be a number of bits associated with coding the data in a given block. As described above, different types of block data may be coded using different coding techniques. Equation (1) may thus be re -written in the form below:
  • R con text represents a rate metric for block data coded using context-adaptive coding techniques
  • R non context represents a rate metric for block data coded using non context-adaptive coding techniques.
  • the residual data may be coded using context-adaptive coding, such CAVLC or CABAC.
  • Other block data such as motion vectors, block modes, and the like may be coded using a FLC or a universal VLC technique, such as Exp-Golomb.
  • equation (2) may be re-written in the form:
  • Rresidual represents a rate metric for coding the residual data using context-adaptive coding techniques, e.g., the number of bits associated with coding the residual data
  • Rother represents a rate metric for coding the other block data using a FLC or universal VLC technique, e.g., the number of bits associated with coding the other block data.
  • encoding module 30 may determine the number of the bits associated with coding block data using FLC or universal VLC, i.e., R other , relatively easy. Encoding module 30 may, for example, use a code table to identify the number of bits associated with coding the block data using FLC or universal VLC.
  • the code table may, for example, include a plurality of codewords and the number of bits associated with coding the codeword. Determining the number of bits associated with coding the residual data (R es i dual ), however, presents a much more difficult task due to the adaptive nature of context-adaptive coding as a function of the context of the data.
  • bit estimate module 42 may estimate the number of bits associated with coding the residual data using the context-adaptive coding techniques without actually coding the residual data.
  • bit estimate module 42 estimates the number of bits associated with coding the residual data using transform coefficients for the residual data.
  • encoding module 30 only needs to compute the transform coefficients for the residual data to estimate the number of bits associated with coding the residual data. Encoding module 30 therefore reduces the amount of computing resources and time required to determine the number of bits associated with coding the residual data by not quantizing the transform coefficients or encoding the quantized transform coefficients for each of the modes.
  • Bit estimate module 42 analyzes the transform coefficients output by transform module 40 to identify one or more transform coefficients that will remain non-zero after quantization. In particular, bit estimate module 42 compares each of the transform coefficients to a corresponding threshold. In some aspects, the corresponding thresholds may be computed as a function of a QP of encoding module 30. Bit estimate module 42 identifies, as the transform coefficients that will remain non-zero after quantization, the transform coefficients that are greater than or equal to their corresponding thresholds.
  • Bit estimate module 42 estimates the number of bits associated with coding the residual data based on at least the transform coefficients identified to remain non-zero after quantization. In particular, bit estimate module 42 determines the number of non-zero transform coefficients that will survive quantization. Bit estimate module 42 also sums at least a portion of the absolute values of the transform coefficients identified to survive quantization. Bit estimate module 42 then estimates the rate metric for the residual data, i.e., the number of bits associated with coding the residual data, using the equation:
  • Rresrdual U 1 * SATD + Q 2 * NZ est + Q 3 , (4)
  • SATD is the sum of the at least a portion absolute values of the non-zero transform coefficients predicted to survive quantization
  • NZ est is the estimated number of non-zero transform coefficients predicted to survive quantization
  • ai, a 2 , and a 3 are coefficients.
  • Coefficients ai, a 2 , and a 3 may be computed, for example, using least squares estimation.
  • the sum of the transform coefficients is the sum of absolute transform differences SATDs in the example of equation (4), other difference coefficients may be used such as SSTDs.
  • Encoding module 30 computes a matrix of transform coefficients for the residual data. An exemplary matrix of transform coefficients is illustrated below.
  • the number of rows of the matrix of transform coefficients (A) is equal to the number of rows of pixels in the block and the number of columns of the matrix of transform coefficients is equal to the number of columns of pixels in the block.
  • the dimensions of the matrix of transform coefficients is 4x4 to correspond with the 4x4 block.
  • Each of the entries A (i,j) of the matrix of transform coefficients is the transform of the respective residual coefficients.
  • encoding module 30 compares the matrix of residual transform coefficients A to a matrix of threshold values to predict which of the transform coefficient of matrix A will remain non-zero after quantization.
  • An exemplary matrix of threshold values is illustrated below.
  • the matrix C may be computed as a function of a QP value.
  • the dimensions of matrix C are the same as the dimensions of matrix A.
  • the entries of matrix C may be computed based on the equation:
  • QBITS(QPJ is a parameter that determines scaling as a function of QP
  • Level- _Offset(i,j) ⁇ QP ⁇ is a deadzone parameter for the entry at row i and column j of the matrix and is also a function of QP
  • Level _Scale(i, j) ⁇ QP ⁇ is a multiplicative factor for the entry at row i and column j of the matrix and is also a function of QP
  • i corresponds to a row of the matrix
  • j corresponds to a column of the matrix
  • QP corresponds to a quantization parameter of encoding module 30.
  • the variables may be defined in the H.264 coding standard as a function of the operating QP.
  • encoding module 30 may be configured to operate within a range of QP values. In this case, encoding module 30 may pre-compute a plurality of comparison matrices that corresponds with each of the QP values in the range of QP values. Encoding module 30 selects the comparison matrix that corresponds with the QP of encoding module 30 to compare with the transform coefficient matrix.
  • the result of the comparison between the matrix of transform coefficients A and the matrix of thresholds C is a matrix of ones and zeros.
  • a transform coefficient is identified as likely to remain non-zero when the absolute value of the transform coefficient of matrix A is greater than or equal to the corresponding threshold of matrix C.
  • bit estimate module 42 determines the number of transform coefficients that will survive quantization. In other words, bit estimate module 42 determines the number of transform coefficients identified as remaining non-zero after quantization. Bit estimate module 42 may determine the number of transform coefficients identified as remaining non-zero after quantization according to the equation:
  • NZ est is the estimated number of non-zero transform coefficients
  • M(i,j) is the value of the matrix M at row i and column/
  • NZ est is equal to 8.
  • Bit estimate module 42 also computes a sum of at least a portion of the absolute value of the transform coefficients estimated to survive quantization.
  • bit estimate module 42 may compute the sum of the at least a portion of absolute values of the transform coefficients according to the equation:
  • SATD is the sum total of the transform coefficients identified as remaining non-zero after quantization
  • M(i,j) is the value of the matrix Mat row i and columny
  • A(i,j) is the value of the matrix A at row i and columny
  • abs(x) is an absolute value function that computes the absolute value of x.
  • SATD is equal to 2361.
  • Other difference metrics may be used for the transform coefficients, such as SSTDs.
  • bit estimate module 42 approximates the number of bits associated with coding the residual coefficients using equation (3) above.
  • Control module 32 may use the estimate of R reS iduai to compute an estimate of the total coding cost of the mode.
  • Encoding module 30 may estimate the total coding cost for one or more other possible modes in the same manner, and then select the mode with the smallest coding cost. Encoding module 30 then applies the selected coding mode to code the block or blocks of the frame.
  • the foregoing techniques may be implemented individually, or two or more of such techniques, or all of such techniques, may be implemented together in encoding device 12.
  • the components in encoding module 30 are exemplary of those applicable to implement the techniques described herein. Encoding module 30, however, may include many other components, if desired, as well as fewer components that combine the functionality of one or more of the modules described above.
  • the components in encoding module 30 may be implemented as one or more processors, digital signal processors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware, or any combinations thereof. Depiction of different features as modules is intended to highlight different functional aspects of encoding module 30 and does not necessarily imply that such modules must be realized by separate hardware or software components. Rather, functionality associated with one or more modules may be integrated within common or separate hardware or software components.
  • FIG. 3 is a block diagram illustrating another exemplary encoding module 50.
  • Encoding module 50 of FIG. 3 conforms substantially to encoding module 30 of FIG. 2, except bit estimate module 52 of encoding module 50 estimates the number of bits associated with coding the residual data after quantization of the transform coefficients for the residual data.
  • bit estimate module 52 estimates the number of bits associated with coding the residual coefficients using the equation:
  • Rresrdual Cl 1 * SATQD + U 2 * NZ TQ + U 3 , (8) where SATQD is the sum of the absolute values of the non-zero quantized transform coefficients, NZ TQ is the number of non-zero quantized transform coefficients, and ai, ci2, and ci3 are coefficients. Coefficients ai, CI2, and CI3 may be computed, for example, using least squares estimation. Although encoding module 50 quantizes the transform coefficients prior to estimating the number of bits associated with coding the residual data, encoding module 50 still estimates the coding costs for the modes without actually coding the data of the blocks. Thus, the amount of computationally intensive calculations is still reduced.
  • FIG. 4 is a flow diagram illustrating exemplary operation of an encoding module, such as encoding module 30 of FIG. 2 and/or encoding module 50 of FIG. 3, selecting an encoding mode based on at least the estimated coding costs. For exemplary purposes, however, FIG. 4 will be discussed in terms of encoding module 30.
  • Encoding module 30 selects a mode for which to estimate a coding cost (60).
  • Encoding module 30 generate a distortion metric for the current block (62).
  • Encoding module 30 may, for example, compute the distortion metric based on a comparison between the block and at least one reference block. In the case of a block selected to be intra-coded, the reference block may be an adjacent block within the same frame. For a block selected to be inter-coded, on the other hand, the reference block may be a block from an adjacent frame.
  • the distortion metric may be, for example, a SAD, SSD, SATD, SSTD, or other similar distortion metric.
  • encoding module 30 determines the number of bits associated with coding the portion of the data that is coded using non context-adaptive coding techniques (64). As described above, this data may include one or more motion vectors of the block, an identifier that indicates a coding mode of the block, one or more reference frame indices, QP information, slice information of the block and the like. Encoding module 30 may, for example, use a code table to identify the number of bits associated with coding the data using FLC, universal VLC or other non context-adaptive coding technique.
  • Encoding module 30 estimates and/or computes the number of bits associated with coding the portion of the data that is coded using context-adaptive coding techniques (66). In the context of the H.264 standard, for example, encoding module 30 may estimate the number of bits associated with coding the residual data using context-adaptive coding. Encoding module 30 may estimate the number of bits associated with coding the residual data without actually performing the coding the residual data. In certain aspects, encoding module 30 may estimate the number of bits associated with coding the residual data without quantizing the residual data. For example, encoding module 30 may compute transform coefficients for the residual data and identify the transform coefficients likely to remain non-zero after quantization.
  • encoding module 30 uses these identified transform coefficients to estimate the number of bits associated with coding the residual data. In other aspects, encoding module 30 may quantize the transform coefficients and estimate the number of bits associated with coding the residual data based on at least the quantized transform coefficients. In either case, encoding module 30 saves time and processing resources by estimating the required number of bits. If there are sufficient computing resources, encoding module 30 may compute the actual number of bits required instead of estimating.
  • Encoding module 30 estimates and/or computes the total coding cost for coding the block in the selected mode (68). Encoding module 30 may estimate the total coding cost for coding the block based on the distortion metric, the bits associated with coding the portion of the data that is coded using non context-adaptive coding and the bits associated with coding the portion of the data that is coded using context-adaptive coding. For example, encoding module 30 may estimate the total coding cost for coding the block in the selected mode using equation (2) or (3) above.
  • Encoding module 30 determines whether there are any other coding modes for which to estimate the coding cost (70). As described above, encoding module 30 estimates the coding cost for at least a portion of the possible modes. In certain aspects, encoding module 30 may estimate the cost of coding the block in each of the possible coding modes. In the context of the H.264 standard, for example, encoding module 30 may estimate the coding cost for twenty-two different coding modes (the inter- and intra-coding modes) for a block selected for inter-coding and thirteen different coding modes for a block selected for intra-coding. In other aspects, encoding module 30 may use another mode selection technique to initially reduce the set of possible modes, and then utilize the techniques of this disclosure to estimate the coding cost for the reduced set of coding modes.
  • encoding module 30 selects the next coding mode and estimates the cost of coding the data in the selected coding mode. When there are no more coding modes for which to estimate the coding cost, encoding module 30 selects one of the modes to use for coding the block based on at least the estimated coding costs (72). In one example, coding module 30 may select the coding mode that has the smallest estimated coding cost. Upon selection of the mode, coding module 30 may apply the selected mode to code the particular block (74). The process may continue for additional blocks in a given frame. As an example, the process may continue until all the blocks in the frame have been coded using the coding mode selected in accordance with the techniques described herein. Moreover, the process may continue until blocks of a plurality of frames have been coded using a high efficiency mode.
  • FIG. 5 is a flow diagram illustrating exemplary operation of an encoding module, such as encoding module 30 of FIG. 2, estimating the number of bits associated with coding the residual coefficients of a block.
  • encoding module 30 After selecting one of the coding modes for which to estimate the coding cost, encoding module 30 generates the residual data of the block for the selected mode (80).
  • spatial prediction module 34 For a block selected to be intra-coded, for example, spatial prediction module 34 generate the residual data for the block based on a comparison of the block with a predicted version of the block.
  • motion estimation module 36 and motion compensation module 38 compute the residual data for the block based on a comparison between the block and a corresponding block in a reference frame.
  • the residual data may have already been computed to generate the distortion metric of the block.
  • encoding module 30 may retrieve the residual data from a memory.
  • Transform module 40 transforms the residual coefficients of the block in accordance with a transform function to generate transform coefficients for the residual data (82).
  • Transform module 40 may, for example, apply a 4x4 or 8x8 integer transform or a DCT transform to the residual data to generate the transform coefficients for the residual data.
  • Bit estimate module 42 compares one of the transform coefficients to a corresponding threshold to determine whether the transform coefficient is greater than or equal to the threshold (84). The threshold corresponding with the transform coefficient may be computed as a function of the QP of encoding module 30. If the transform coefficient is greater than or equal to the corresponding threshold, bit estimate module 42 identifies the transform coefficient as a coefficient that will remain non-zero after quantization (86). If the transform coefficient is less than the corresponding threshold, bit estimate module 42 identifies the transform coefficient as a coefficient that will become zero after quantization (88).
  • Bit estimate module 42 determines whether there are any additional transform coefficients for the residual data of the block (90). If there are additional transform coefficients of the block, bit estimate module 42 selects another one of the coefficients and compares it to a corresponding threshold. If there are no additional transform coefficients to analyze, bit estimate module 42 determines the number of coefficients identified to remain non-zero after quantization (92). Bit estimate module 42 also sums at least a portion of the absolute values of the transform coefficients identified to remain non-zero after quantization (94). Bit estimate module 42 estimates the number of bits associated with coding the residual data using the determined number of non-zero coefficients and the sum of the portion of the non-zero coefficients (96).
  • Bit estimate module 42 may, for example, estimate the number of bits associated with coding the residual data using equation (4) above. In this manner, the encoding module 30 estimates the number of bits associated with coding the residual data of the block in the selected mode without quantizing or encoding the residual data.
  • FIG. 6 is a flow diagram illustrating exemplary operation of an encoding module, such as encoding module 50 of FIG. 3, estimating the number of bits associated with coding the residual coefficients of a block.
  • encoding module 50 After selecting one of the coding modes for which to estimate the coding cost, encoding module 50 generates the residual coefficients of the block (100).
  • spatial prediction module 34 computes the residual data for the block based on a comparison of the block with a predicted version of the block.
  • motion estimation module 36 and motion compensation module 38 compute the residual data for the block based on a comparison between the block and a corresponding block in a reference frame.
  • the residual coefficients may have already been computed to generate the distortion metric of the block.
  • Transform module 40 transforms the residual coefficients of the block in accordance with a transform function to generate transform coefficients for the residual data (102).
  • Transform module 40 may, for example, apply a 4x4 or 8x8 integer transform or a DCT transform to the residual data to generate transformed residual coefficients.
  • Quantization module 46 quantizes the transform coefficients in accordance with a QP of encoding module 50 (104).
  • Bit estimate module 52 determines the number of quantized transform coefficients that are non-zero (106). Bit estimate module 42 also sums the absolute values of the non-zero levels or quantized transform coefficients (108). Bit estimate module 52 estimates the number of bits associated with coding the residual data using the computed number of non-zero quantized transform coefficients and the sum of the non-zero quantized transform coefficients (110). Bit estimate module 52 may, for example, estimate the number of bits associated with coding the residual coefficients using equation (4) above. In this manner, the encoding module estimates the number of bits associated with coding the residual data of the block in the selected mode without encoding the residual data.
  • the instructions or code associated with a computer-readable medium of the computer program product may be executed by a computer, e.g., by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, ASICs, FPGAs, or other equivalent integrated or discrete logic circuitry.
  • processors such as one or more digital signal processors (DSPs), general purpose microprocessors, ASICs, FPGAs, or other equivalent integrated or discrete logic circuitry.
  • such computer-readable media can comprise RAM, such as synchronous dynamic random access memory (SDRAM), readonly memory (ROM), non-volatile random access memory (NVRAM), ROM, electrically erasable programmable read-only memory (EEPROM), EEPROM, FLASH memory, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other tangible medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • RAM such as synchronous dynamic random access memory (SDRAM), readonly memory (ROM), non-volatile random access memory (NVRAM), ROM, electrically erasable programmable read-only memory (EEPROM), EEPROM, FLASH memory, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other tangible medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • SDRAM synchronous dynamic random access memory
  • ROM readonly memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

This disclosure describes techniques for coding mode selection using estimated coding costs. To provide high compression efficiency, for example, an encoding device may attempt to select a coding mode for coding blocks of pixels that codes the data of the blocks with high efficiency. To this end, the encoding device may perform coding mode selection based on estimates of coding cost for at least a portion of the possible modes. In accordance with the techniques described herein, the encoding device estimates the coding cost for the different modes without actually coding the blocks. In fact, in some aspects, the encoding module device may estimate the coding cost for the modes without quantizing the data of the block for each mode. In this manner, the coding cost estimation techniques of this disclosure reduce the amount of computationally intensive calculations needed to perform effective mode selection.

Description

VIDEO CODING MODE SELECTION USING ESTIMATED CODING COSTS
TECHNICAL FIELD
[0001] The disclosure relates to video coding and, more particularly, to estimating coding costs to code video sequences.
BACKGROUND
[0002] Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless communication devices, personal digital assistants (PDAs), laptop computers, desktop computers, video game consoles, digital cameras, digital recording devices, cellular or satellite radio telephones, and the like. Digital video devices can provide significant improvements over conventional analog video systems in processing and transmitting video sequences.
[0003] Different video coding standards have been established for coding digital video sequences. The Moving Picture Experts Group (MPEG), for example, has developed a number of standards including MPEG-I, MPEG-2 and MPEG-4. Other examples include the International Telecommunication Union (ITU)-T H.263 standard, and the ITU-T H.264 standard and its counterpart, ISO/IEC MPEG-4, Part 10, i.e., Advanced Video Coding (AVC). These video coding standards support improved transmission efficiency of video sequences by coding data in a compressed manner.
[0004] Many current techniques make use of block-based coding. In block-based coding, frames of a multimedia sequence are divided into discrete blocks of pixels, and the blocks of pixels are coded based on differences with other blocks, which may be located within the same frame or in a different frame. Some blocks of pixels, often referred to as "macroblocks," comprise a grouping of sub-blocks of pixels. As an example, a 16x16 macroblock may comprise four 8x8 sub-blocks. The sub-blocks may be coded separately. For example, the H.264 standard permits coding of blocks with a variety of different sizes, e.g., 16x16, 16x8, 8x16, 8x8, 4x4, 8x4, and 4x8. Further, by extension, sub-blocks of any size may be included within a macroblock, e.g., 2x16, 16x2, 2x2, 4x16, and 8x2. SUMMARY
[0005] In certain aspects of this disclosure, a method for processing digital video data comprises identifying one or more transform coefficients for residual data of a block of pixels that will remain non-zero when quantized, estimating a number of bits associated with coding of the residual data based on at least the identified transform coefficients, and estimating a coding cost for coding the block of pixels based on at least the estimated number of bits associated with coding the residual data.
[0006] In certain aspects, an apparatus for processing digital video data comprises a transform module that generates transform coefficients for residual data of a block of pixels, a bit estimate module that identifies one or more of the transform coefficients that will remain non-zero when quantized and estimates a number of bits associated with coding of the residual data based on at least the identified transform coefficients, and a control module that estimates a coding cost for coding the block of pixels based on at least the estimated number of bits associated with coding the residual data.
[0007] In certain aspects, an apparatus for processing digital video data comprises means for identifying one or more transform coefficients for residual data of a block of pixels that will remain non-zero when quantized, means for estimating a number of bits associated with coding of the residual data based on at least the identified transform coefficients, means for estimating a coding cost for coding the block of pixels based on at least the estimated number of bits associated with coding the residual data.
[0008] In certain aspects, a computer-program product for processing digital video data comprises a computer readable medium having instructions thereon. The instructions include code for identifying one or more transform coefficients for residual data of a block of pixels that will remain non-zero when quantized, code for estimating a number of bits associated with coding of the residual data based on at least the identified transform coefficients, and code for estimating a coding cost for coding the block of pixels based on at least the estimated number of bits associated with coding the residual data.
[0009] The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims. BRIEF DESCRIPTION OF DRAWINGS
[0010] FIG. 1 is a block diagram illustrating a video coding system that employs the coding cost estimate techniques described herein.
[0011] FIG. 2 is a block diagram illustrating an exemplary encoding module in further detail.
[0012] FIG. 3 is a block diagram illustrating another exemplary encoding module in further detail.
[0013] FIG. 4 is a flow diagram illustrating exemplary operation of an encoding module selecting an encoding mode based on estimated coding costs.
[0014] FIG. 5 is a flow diagram illustrating exemplary operation of an encoding module estimating the number of bits associated with coding the residual data of a block without quantizing or encoding of the residual data.
[0015] FIG. 6 is a flow diagram illustrating exemplary operation of an encoding module estimating the number of bits associated with coding the residual data of a block without encoding the residual data.
DETAILED DESCRIPTION
[0016] This disclosure describes techniques for video coding mode selection using estimated coding costs. To provide high compression efficiency, for example, an encoding device may attempt to select a coding mode for coding blocks of pixels that codes the data of the blocks with high efficiency. To this end, the encoding device may perform coding mode selection based on at least estimates of coding cost for at least a portion of the possible modes. In accordance with the techniques described herein, the encoding device estimates the coding cost for the different modes without actually coding the blocks. In fact, in some aspects, the encoding module device may estimate the coding cost for the modes without quantizing the data of the block for each mode. In this manner, the coding cost estimation techniques of this disclosure reduce the amount of computationally intensive calculations needed to perform effective mode selection.
[0017] FIG. 1 is a block diagram illustrating a multimedia coding system 10 that employs coding cost estimate techniques as described herein. Coding system 10 includes an encoding device 12 and a decoding device 14 connected by a transmission channel 16. Encoding device 12 encodes one or more sequences of digital multimedia data and transmits the encoded sequences over transmission channel 16 to decoding device 14 for decoding and, possibly, presentation to a user of decoding device 14. Transmission channel 16 may comprise any wired or wireless medium, or a combination thereof.
[0018] Encoding device 12 may form part of a broadcast network component used to broadcast one or more channels of multimedia data. As an example, encoding device 12 may form part of a wireless base station, server, or any infrastructure node that is used to broadcast one or more channels of encoded multimedia data to wireless devices. In this case, encoding device 12 may transmit the encoded data to a plurality of wireless devices, such as decoding device 14. A single decoding device 14, however, is illustrated in FIG. 1 for simplicity. Alternatively, encoding device 12 may comprise a handset that transmits locally captured video for video telephony or other similar applications.
[0019] Decoding device 14 may comprise a user device that receives the encoded multimedia data transmitted by encoding device 12 and decodes the multimedia data for presentation to a user. By way of example, decoding device 14 may be implemented as part of a digital television, a wireless communication device, a gaming device, a portable digital assistant (PDA), a laptop computer or desktop computer, a digital music and video device, such as those sold under the trademark "iPod," or a radiotelephone such as cellular, satellite or terrestrial-based radiotelephone, or other wireless mobile terminal equipped for video and/or audio streaming, video telephony, or both. Decoding device 14 may be associated with a mobile or stationary device. In a broadcast application, encoding device 12 may transmit encoded video and/or audio to multiple decoding devices 14 associated with multiple users.
[0020] In some aspects, for two-way communication applications, multimedia coding system 10 may support video telephony or video streaming according to the Session Initiated Protocol (SIP), International Telecommunication Union Standardization Sector (ITU-T) H.323 standard, ITU-T H.324 standard, or other standards. For one-way or two-way communication, encoding device 12 may generate encoded multimedia data according to a video compression standard, such as Moving Picture Experts Group (MPEG)-2, MPEG-4, ITU-T H.263, or ITU-T H.264, which corresponds to MPEG-4, Part 10, Advanced Video Coding (AVC). Although not shown in FIG. 1, encoding device 12 and decoding device 14 may be integrated with an audio encoder and decoder, respectively, and include appropriate multiplexer-demultiplexer (MUX-DEMUX) modules, or other hardware, firmware, or software, to handle encoding of both audio and video in a common data sequence or separate data sequences. If applicable, MUX-DEMUX modules may conform to the ITU H.223 multiplexer protocol, or other protocols such as the user datagram protocol (UDP).
[0021] In certain aspects, this disclosure contemplates application to Enhanced H.264 video coding for delivering real-time multimedia services in terrestrial mobile multimedia multicast (TM3) systems using the Forward Link Only (FLO) Air Interface Specification, "Forward Link Only Air Interface Specification for Terrestrial Mobile Multimedia Multicast," published as Technical Standard TIA- 1099, Aug. 2006 (the "FLO Specification"). However, the coding cost estimation techniques described in this disclosure are not limited to any particular type of broadcast, multicast, unicast, or point- to-point system.
[0022] As illustrated in FIG. 1, encoding device 12 includes an encoding module 18 and a transmitter 20. Encoding module 18 receives one or more input multimedia sequences that can include, in the case of video encoding, one or more frames of data and selectively encodes the frames of the received multimedia sequences. Encoding module 18 receives the input multimedia sequences from one or more sources (not shown in FIG. 1). In some aspects, encoding module 18 may receive the input multimedia sequences from one or more video content providers, e.g., via satellite. As another example, encoding module 18 may receive the multimedia sequences from an image capture device (not shown in FIG. 1) integrated within encoding device 12 or coupled to encoding device 12. Alternatively, encoding module 18 may receive the multimedia sequences from a memory or archive (not shown in FIG. 1) within encoding device 12 or coupled to encoding device 12. The multimedia sequences may comprise live real-time or near real-time video, audio, or video and audio sequences to be coded and transmitted as a broadcast or on-demand, or may comprise pre-recorded and stored video, audio, or video and audio sequences to be coded and transmitted as a broadcast or on-demand. In some aspects, at least a portion of the multimedia sequences may be computer-generated, such as in the case of gaming.
[0023] In any case, encoding module 18 encodes and transmits a plurality of coded frames to decoding device 14 via transmitter 20. Encoding module 18 may encode the frames of the input multimedia sequences as intra-coded frames, inter-coded frames or a combination thereof. Frames encoded using intra-coding techniques are coded without reference to other frames, and are often referred to as intra ("I") frames. Frames encoded using inter-coding techniques are coded with reference to one or more other frames. The inter-coded frames may include one or more predictive ("P") frames, bi-directional ("B") frames, or a combination thereof. P frames are encoded with reference to at least one temporally prior frame while B frames are encoded with reference to at least one temporally future frame. In some cases, B frames may be encoded with reference to at least one temporally future frame and at least one temporally prior frame.
[0024] Encoding module 18 may be further configured to partition a frame into a plurality of blocks and encode each of the blocks separately. As an example, encoding module 18 may partition the frame into a plurality of 16x16 blocks. Some blocks, often referred to as "macroblocks," comprise a grouping of sub-partition blocks (referred to herein as "sub-blocks"). As an example, a 16x16 macroblock may comprise four 8x8 sub-blocks, or other sub-partition blocks. For example, the H.264 standard permits encoding of blocks with a variety of different sizes, e.g., 16x16, 16x8, 8x16, 8x8, 4x4, 8x4, and 4x8. Further, by extension, sub-blocks of any size may be included within a macroblock, e.g., 2x16, 16x2, 2x2, 4x16, 8x2 and so on. Thus, encoding module 18 may be configured to divide the frame into several blocks and encode each of the blocks of pixels as intra-coded blocks or inter-coded blocks, each of which may be referred to generally as a block.
[0025] Encoding module 18 may support a plurality of coding modes. Each of the modes may be correspond to a different combination of block sizes and coding techniques. In the case of the H.264 standard, for example, there are seven inter modes and thirteen intra modes. The seven variable block-size inter modes include a SKIP mode, 16x16 mode, 16x8 mode, 8x16 mode, 8x8 mode, 8x4 mode, 4x8 mode, and 4x4 mode. The thirteen intra modes include an INTRA 4x4 mode for which there are nine possible interpolation directions and an INTRA 16x16 mode for which there are four possible interpolation directions.
[0026] To provide high compression efficiency, in accordance with various aspects of this disclosure, encoding module 18 attempts to select the mode that codes the data of the blocks with high efficiency. To this end, encoding module 18 estimates, for each of the blocks, a coding cost for at least a portion of the modes. Encoding module 18 estimates the coding cost as a function of rate and distortion. In accordance with the techniques described herein, encoding module 18 estimates the coding cost for the modes without actually coding the blocks to determine the rate and distortion metrics. In this manner, encoding module 18 may select one of the modes based on at least the coding cost without performing the computationally complex coding of the data of the block for each mode. Conventional mode selection requires actual coding of the data using each of the modes to determine which mode to select. Thus, the techniques save time and computational resources by selecting the mode based on the coding cost without actually coding the data for each of the modes. In fact, in some aspects, encoding module 18 may estimate the coding cost for the modes without quantizing the data of the block for each mode. In this manner, the coding cost estimation techniques of this disclosure reduce the amount of computationally intensive calculations needed to perform effective mode selection.
[0027] Encoding device 12 applies the selected mode to code the blocks of the frames and transmits the coded frames of data via transmitter 20. Transmitter 20 may include appropriate modem and driver circuitry software and/or firmware to transmit encoded multimedia over transmission channel 16. For wireless applications, transmitter 26 includes RF circuitry to transmit wireless data carrying the encoded multimedia data.
[0028] Decoding device 14 includes a receiver 22 and a decoding module 24. Decoding device 14 receives the encoded data from encoding device 12 via receiver 22. Like transmitter 20, receiver 22 may include appropriate modem and driver circuitry software and/or firmware to receive encoded multimedia over transmission channel 16, and may include RF circuitry to receive wireless data carrying the encoded multimedia data in wireless applications. Decoding module 24 decodes the coded frames of data received via receiver 22. Decoding device 14 may further present the decoded frame of data to a user via a display (not shown) that may be either integrated within decoding device 14 or provided as a discrete device coupled to decoding device 14 via a wired or wireless connection.
[0029] In some examples, encoding device 12 and decoding device 14 each may include reciprocal transmit and receive circuitry so that each may serve as both a transmit device and a receive device for encoded multimedia and other information transmitted over transmission channel 16. In this case, both encoding device 12 and decoding device 14 may transmit and receive multimedia sequences and thus participate in two-way communications. In other words, the illustrated components of coding system 10 may be integrated as part of an encoder/decoder (CODEC).
[0030] The components in encoding device 12 and decoding device 14 are exemplary of those applicable to implement the techniques described herein. Encoding device 12 and decoding device 14, however, may include many other components, if desired. For example, encoding device 12 may include a plurality of encoding modules that each receive one or more sequences of multimedia data and encode the respective sequences of multimedia data in accordance with the techniques described herein. In this case, encoding device 12 may further include at least one multiplexer to combine the segments of data for transmission. In addition, encoding device 12 and decoding device 14 may include appropriate modulation, demodulation, frequency conversion, filtering, and amplifier components for transmission and reception of encoded video, including radio frequency (RF) wireless components and antennas, as applicable. For ease of illustration, however, such components are not shown in FIG. 1.
[0031] FIG. 2 is a block diagram illustrating an exemplary encoding module 30 in further detail. Encoding module 30 may, for example, represent encoding module 18 of encoding device 12 of FIG. 1. As illustrated in FIG. 2, encoding module 30 includes a control module 32 that receives input frames of multimedia data of one or more multimedia sequences from one or more sources, and processes the frames of the received multimedia sequences. In particular, control module 32 analyzes the incoming frames of the multimedia sequences and determines whether to encode or skip the incoming frames based on analysis of the frames. In some aspects, encoding device 12 may encode the information contained in the multimedia sequences at a reduced frame rate using frame skipping to conserve bandwidth across transmission channel 16.
[0032] Moreover, for the incoming frames that will be encoded, control module 32 may also be configured to determine whether to encode the frames as I frames, P frames, or B frames. Control module 32 may determine to encode an incoming frame as an I frame at the start of a multimedia sequence, at a scene change within the sequence, for use as a channel switch frame, or for use as an intra refresh frame. Otherwise, control module 32 encodes the frame as an inter-coded frame (i.e., a P frame or B frame) to reduce the amount of bandwidth associated with coding the frame.
[0033] Control module 32 may be further configured to partition the frames into a plurality of blocks and select a coding mode, such as one of the H.264 coding modes described above, for each of the blocks. As will be described in detail below, encoding module 30 may estimate the coding cost for at least a portion of the modes to assist in selecting a most efficient one of the coding modes. After selecting the coding mode for use in coding one of the blocks, encoding module 30 generates residual data for the block. For a block selected to be intra-coded, spatial prediction module 34 generates the residual data for the block. Spatial prediction module 34 may, for example, generate a predicted version of the block via interpolation using one or more adjacent blocks and the interpolation directionality corresponding to the selected intra-coding mode. Spatial prediction module 34 may then compute a difference between the block of the input frame and the predicted block. This difference is referred to as residual data or residual coefficients.
[0034] For a block selected to be inter-coded, motion estimation module 36 and motion compensation module 38 generate the residual data for the block. In particular, motion estimation module 36 identifies at least one reference frame and searches for a block in the reference frame that is a best match to the block in the input frame. Motion estimation module 36 computes a motion vector to represent an offset between the location of the block in the input frame and the location of the identified block in the reference frame. Motion compensation module 38 computes a difference between the block of the input frame and the identified block in the reference frame to which the motion vector points. This difference is the residual data for the block.
[0035] Encoding module 30 also includes a transform module 40, a quantization module 46 and an entropy encoder 48. Transform module 40 transforms the residual data of the block in accordance with a transform function. In some aspects, transform module 40 applies an integer transform, such as a 4x4 or 8x8 integer transform or a Discrete Cosine Transform (DCT), to the residual data to generate transform coefficients for the residual data. Quantization module 46 quantizes the transform coefficients and provides the quantized transform coefficients to entropy encoder 48. Entropy encoder 48 encodes the quantized transform coefficients using a context-adaptive coding technique, such as context-adaptive variable-length coding (CAVLC) or context-adaptive binary arithmetic coding (CABAC). As will be described in detail below, entropy encoder 48 applies a selected mode to code the data of the block.
[0036] Entropy encoder 48 may also encode additional data associated with the block. For example, in addition to the residual data, entropy encoder 48 may encode one or more motion vectors of the block, an identifier indicating the coding mode of the block, one or more reference frame indices, quantization parameter (QP) information, slice information of the block and the like. Entropy encoder 48 may receive this additional block data from other modules within encoding module 30. For example, the motion vector information may be received from motion estimation module 36 while the block mode information may be received from control module 32. In some aspects, entropy encoder 48 may code at least a portion of this additional information using a fixed length coding (FLC) technique or a universal variable length coding (VLC) technique, such as Exponential-Golomb coding ("Exp-Golomb"). Alternatively, entropy encoder 48 may encode a portion of the additional block data using the context-adaptive coding techniques described above, i.e., CABAC or CAVLC.
[0037] To assist control module 32 in selecting a mode for the block, control module 32 estimates a coding cost for at least a portion of the possible modes. In certain aspects, control module 32 may estimate the cost of coding the block in each of the possible coding modes. The cost may be estimated, for example, in terms of the number of bits associated with coding the block in a given mode versus the amount of distortion produced in that mode. In the case of the H.264 standard, for example, control module 32 may estimate the coding cost for twenty-two different coding modes (the inter- and intra-coding modes) for a block selected for inter-coding and thirteen different coding modes for a block selected for intra-coding. In other aspects, control module 32 may use another mode selection technique to initially reduce the set of possible modes, and then utilize the techniques of this disclosure to estimate the coding cost for the remaining modes of the set. In other words, in some aspects, control module 32 may narrow down the number of mode possibilities before applying the cost estimate technique. Advantageously, encoding module 30 estimates the coding costs for the modes without actually coding the data of the blocks for the different modes, thereby reducing computational overhead associated with the coding decision. In fact, in the example illustrated in FIG. 2, encoding module 30 may estimate the coding cost without quantizing the data of the block for the different modes. In this manner, the coding cost estimation techniques of this disclosure reduce the amount of computationally intensive calculations needed to compute the coding cost. In particular, it is not necessary to encode the blocks using the various coding modes in order to select one of the modes.
[0038] As will be described in more detail herein, control module 32 estimates the coding cost of each analyzed mode in accordance with the equation:
J = D + λmode R, (1) where J is the estimated coding cost, D is a distortion metric of the block, λmode is a Lagrange multiplier of the respective mode, and R is a rate metric of the block. The distortion metric (D) may, for example, comprise a sum of absolute difference (SAD), sum of square difference (SSD), a sum of absolute transform difference (SATD), sum of square transform different (SSTD) or the like. The rate metric (R) may, for example, be a number of bits associated with coding the data in a given block. As described above, different types of block data may be coded using different coding techniques. Equation (1) may thus be re -written in the form below:
J L) ~τ Amo(je
Figure imgf000013_0001
' K-non context) ?
Figure imgf000013_0002
where Rcontext represents a rate metric for block data coded using context-adaptive coding techniques and Rnon context represents a rate metric for block data coded using non context-adaptive coding techniques. In the H.264 standard, for example, the residual data may be coded using context-adaptive coding, such CAVLC or CABAC. Other block data, such as motion vectors, block modes, and the like may be coded using a FLC or a universal VLC technique, such as Exp-Golomb. In this case, equation (2) may be re-written in the form:
J — L) < ^mode \R~resιdual ' Mother)-, Ky) where Rresidual represents a rate metric for coding the residual data using context-adaptive coding techniques, e.g., the number of bits associated with coding the residual data, and Rother represents a rate metric for coding the other block data using a FLC or universal VLC technique, e.g., the number of bits associated with coding the other block data.
[0039] In computing the estimated coding cost (J), encoding module 30 may determine the number of the bits associated with coding block data using FLC or universal VLC, i.e., Rother, relatively easy. Encoding module 30 may, for example, use a code table to identify the number of bits associated with coding the block data using FLC or universal VLC. The code table may, for example, include a plurality of codewords and the number of bits associated with coding the codeword. Determining the number of bits associated with coding the residual data (Residual), however, presents a much more difficult task due to the adaptive nature of context-adaptive coding as a function of the context of the data. To determine the precise number of bits associated with coding the residual data, or whatever data is being context-adaptive coded, encoding module 30 must transform the residual data, quantize the transformed residual data and encode the transform-quantized residual data. However, in accordance with the techniques of this disclosure, bit estimate module 42 may estimate the number of bits associated with coding the residual data using the context-adaptive coding techniques without actually coding the residual data.
[0040] In the example illustrated in FIG. 2, bit estimate module 42 estimates the number of bits associated with coding the residual data using transform coefficients for the residual data. Thus, for each mode to be analyzed, encoding module 30 only needs to compute the transform coefficients for the residual data to estimate the number of bits associated with coding the residual data. Encoding module 30 therefore reduces the amount of computing resources and time required to determine the number of bits associated with coding the residual data by not quantizing the transform coefficients or encoding the quantized transform coefficients for each of the modes.
[0041] Bit estimate module 42 analyzes the transform coefficients output by transform module 40 to identify one or more transform coefficients that will remain non-zero after quantization. In particular, bit estimate module 42 compares each of the transform coefficients to a corresponding threshold. In some aspects, the corresponding thresholds may be computed as a function of a QP of encoding module 30. Bit estimate module 42 identifies, as the transform coefficients that will remain non-zero after quantization, the transform coefficients that are greater than or equal to their corresponding thresholds.
[0042] Bit estimate module 42 estimates the number of bits associated with coding the residual data based on at least the transform coefficients identified to remain non-zero after quantization. In particular, bit estimate module 42 determines the number of non-zero transform coefficients that will survive quantization. Bit estimate module 42 also sums at least a portion of the absolute values of the transform coefficients identified to survive quantization. Bit estimate module 42 then estimates the rate metric for the residual data, i.e., the number of bits associated with coding the residual data, using the equation:
Rresrdual = U1 * SATD + Q2 * NZest + Q3, (4) where SATD is the sum of the at least a portion absolute values of the non-zero transform coefficients predicted to survive quantization, NZest is the estimated number of non-zero transform coefficients predicted to survive quantization and ai, a2, and a3 are coefficients. Coefficients ai, a2, and a3 may be computed, for example, using least squares estimation. Although the sum of the transform coefficients is the sum of absolute transform differences SATDs in the example of equation (4), other difference coefficients may be used such as SSTDs.
[0043] An exemplary computation of Rresiduai for a 4x4 block is illustrated below. Similar computations may be performed for blocks of different sizes. Encoding module 30 computes a matrix of transform coefficients for the residual data. An exemplary matrix of transform coefficients is illustrated below.
Figure imgf000016_0001
The number of rows of the matrix of transform coefficients (A) is equal to the number of rows of pixels in the block and the number of columns of the matrix of transform coefficients is equal to the number of columns of pixels in the block. Thus, in the example above, the dimensions of the matrix of transform coefficients is 4x4 to correspond with the 4x4 block. Each of the entries A (i,j) of the matrix of transform coefficients is the transform of the respective residual coefficients.
[0044] During quantization, the transform coefficients of matrix A that have smaller values tend to become zero after quantization. As such, encoding module 30 compares the matrix of residual transform coefficients A to a matrix of threshold values to predict which of the transform coefficient of matrix A will remain non-zero after quantization. An exemplary matrix of threshold values is illustrated below.
Figure imgf000016_0002
The matrix C may be computed as a function of a QP value. The dimensions of matrix C are the same as the dimensions of matrix A. In the case of the H.264 standard, for example, the entries of matrix C may be computed based on the equation:
Figure imgf000016_0003
where QBITS(QPJ is a parameter that determines scaling as a function of QP, Level- _Offset(i,j){QP} is a deadzone parameter for the entry at row i and column j of the matrix and is also a function of QP, Level _Scale(i, j){QP} is a multiplicative factor for the entry at row i and column j of the matrix and is also a function of QP, i corresponds to a row of the matrix, j corresponds to a column of the matrix, and QP corresponds to a quantization parameter of encoding module 30. In the example equation (5), the variables may be defined in the H.264 coding standard as a function of the operating QP. Other equations may be used to determine which of the variables will survive quantization, and may be defined in other coding standards based on the quantization method adopted by the particular standard. In some aspects, encoding module 30 may be configured to operate within a range of QP values. In this case, encoding module 30 may pre-compute a plurality of comparison matrices that corresponds with each of the QP values in the range of QP values. Encoding module 30 selects the comparison matrix that corresponds with the QP of encoding module 30 to compare with the transform coefficient matrix.
[0045] The result of the comparison between the matrix of transform coefficients A and the matrix of thresholds C is a matrix of ones and zeros. In the example above, the comparison results in the matrix of ones and zeros illustrated below:
M= (abs(A(ij)) > C(i,jpi,j, QP =
Figure imgf000017_0001
where the ones represent locations of transform coefficients identified as likely to survive quantization, i.e., likely to remain non-zero, and the zeros represent locations of transform coefficients not likely to survive quantization, i.e., likely to become zero. As described above, a transform coefficient is identified as likely to remain non-zero when the absolute value of the transform coefficient of matrix A is greater than or equal to the corresponding threshold of matrix C.
[0046] Using the resulting matrix of ones and zeros, bit estimate module 42 determines the number of transform coefficients that will survive quantization. In other words, bit estimate module 42 determines the number of transform coefficients identified as remaining non-zero after quantization. Bit estimate module 42 may determine the number of transform coefficients identified as remaining non-zero after quantization according to the equation:
NZ.t = ∑∑M{ij), (6) ι=0 j=0 where NZest is the estimated number of non-zero transform coefficients and M(i,j) is the value of the matrix M at row i and column/ In the example described above, NZest is equal to 8.
[0047] Bit estimate module 42 also computes a sum of at least a portion of the absolute value of the transform coefficients estimated to survive quantization. In certain aspects, bit estimate module 42 may compute the sum of the at least a portion of absolute values of the transform coefficients according to the equation:
SATD = ∑∑(M(ijyabs(4ij)))> (7)
where SATD is the sum total of the transform coefficients identified as remaining non-zero after quantization, M(i,j) is the value of the matrix Mat row i and columny, and A(i,j) is the value of the matrix A at row i and columny, and abs(x) is an absolute value function that computes the absolute value of x. In the example described above, SATD is equal to 2361. Other difference metrics may be used for the transform coefficients, such as SSTDs.
[0048] Using these values, bit estimate module 42 approximates the number of bits associated with coding the residual coefficients using equation (3) above. Control module 32 may use the estimate of RreSiduai to compute an estimate of the total coding cost of the mode. Encoding module 30 may estimate the total coding cost for one or more other possible modes in the same manner, and then select the mode with the smallest coding cost. Encoding module 30 then applies the selected coding mode to code the block or blocks of the frame.
[0049] The foregoing techniques may be implemented individually, or two or more of such techniques, or all of such techniques, may be implemented together in encoding device 12. The components in encoding module 30 are exemplary of those applicable to implement the techniques described herein. Encoding module 30, however, may include many other components, if desired, as well as fewer components that combine the functionality of one or more of the modules described above. The components in encoding module 30 may be implemented as one or more processors, digital signal processors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware, or any combinations thereof. Depiction of different features as modules is intended to highlight different functional aspects of encoding module 30 and does not necessarily imply that such modules must be realized by separate hardware or software components. Rather, functionality associated with one or more modules may be integrated within common or separate hardware or software components.
[0050] FIG. 3 is a block diagram illustrating another exemplary encoding module 50. Encoding module 50 of FIG. 3 conforms substantially to encoding module 30 of FIG. 2, except bit estimate module 52 of encoding module 50 estimates the number of bits associated with coding the residual data after quantization of the transform coefficients for the residual data. In particular, after quantization of the transform coefficients, bit estimate module 52 estimates the number of bits associated with coding the residual coefficients using the equation:
Rresrdual = Cl1 * SATQD + U2 * NZTQ + U3, (8) where SATQD is the sum of the absolute values of the non-zero quantized transform coefficients, NZTQ is the number of non-zero quantized transform coefficients, and ai, ci2, and ci3 are coefficients. Coefficients ai, CI2, and CI3 may be computed, for example, using least squares estimation. Although encoding module 50 quantizes the transform coefficients prior to estimating the number of bits associated with coding the residual data, encoding module 50 still estimates the coding costs for the modes without actually coding the data of the blocks. Thus, the amount of computationally intensive calculations is still reduced.
[0051] FIG. 4 is a flow diagram illustrating exemplary operation of an encoding module, such as encoding module 30 of FIG. 2 and/or encoding module 50 of FIG. 3, selecting an encoding mode based on at least the estimated coding costs. For exemplary purposes, however, FIG. 4 will be discussed in terms of encoding module 30. Encoding module 30 selects a mode for which to estimate a coding cost (60). Encoding module 30 generate a distortion metric for the current block (62). Encoding module 30 may, for example, compute the distortion metric based on a comparison between the block and at least one reference block. In the case of a block selected to be intra-coded, the reference block may be an adjacent block within the same frame. For a block selected to be inter-coded, on the other hand, the reference block may be a block from an adjacent frame. The distortion metric may be, for example, a SAD, SSD, SATD, SSTD, or other similar distortion metric.
[0052] In the example of FIG. 4, encoding module 30 determines the number of bits associated with coding the portion of the data that is coded using non context-adaptive coding techniques (64). As described above, this data may include one or more motion vectors of the block, an identifier that indicates a coding mode of the block, one or more reference frame indices, QP information, slice information of the block and the like. Encoding module 30 may, for example, use a code table to identify the number of bits associated with coding the data using FLC, universal VLC or other non context-adaptive coding technique.
[0053] Encoding module 30 estimates and/or computes the number of bits associated with coding the portion of the data that is coded using context-adaptive coding techniques (66). In the context of the H.264 standard, for example, encoding module 30 may estimate the number of bits associated with coding the residual data using context-adaptive coding. Encoding module 30 may estimate the number of bits associated with coding the residual data without actually performing the coding the residual data. In certain aspects, encoding module 30 may estimate the number of bits associated with coding the residual data without quantizing the residual data. For example, encoding module 30 may compute transform coefficients for the residual data and identify the transform coefficients likely to remain non-zero after quantization. Using these identified transform coefficients, encoding module 30 estimates the number of bits associated with coding the residual data. In other aspects, encoding module 30 may quantize the transform coefficients and estimate the number of bits associated with coding the residual data based on at least the quantized transform coefficients. In either case, encoding module 30 saves time and processing resources by estimating the required number of bits. If there are sufficient computing resources, encoding module 30 may compute the actual number of bits required instead of estimating.
[0054] Encoding module 30 estimates and/or computes the total coding cost for coding the block in the selected mode (68). Encoding module 30 may estimate the total coding cost for coding the block based on the distortion metric, the bits associated with coding the portion of the data that is coded using non context-adaptive coding and the bits associated with coding the portion of the data that is coded using context-adaptive coding. For example, encoding module 30 may estimate the total coding cost for coding the block in the selected mode using equation (2) or (3) above.
[0055] Encoding module 30 determines whether there are any other coding modes for which to estimate the coding cost (70). As described above, encoding module 30 estimates the coding cost for at least a portion of the possible modes. In certain aspects, encoding module 30 may estimate the cost of coding the block in each of the possible coding modes. In the context of the H.264 standard, for example, encoding module 30 may estimate the coding cost for twenty-two different coding modes (the inter- and intra-coding modes) for a block selected for inter-coding and thirteen different coding modes for a block selected for intra-coding. In other aspects, encoding module 30 may use another mode selection technique to initially reduce the set of possible modes, and then utilize the techniques of this disclosure to estimate the coding cost for the reduced set of coding modes.
[0056] When there are more coding modes for which to estimate the coding cost, encoding module 30 selects the next coding mode and estimates the cost of coding the data in the selected coding mode. When there are no more coding modes for which to estimate the coding cost, encoding module 30 selects one of the modes to use for coding the block based on at least the estimated coding costs (72). In one example, coding module 30 may select the coding mode that has the smallest estimated coding cost. Upon selection of the mode, coding module 30 may apply the selected mode to code the particular block (74). The process may continue for additional blocks in a given frame. As an example, the process may continue until all the blocks in the frame have been coded using the coding mode selected in accordance with the techniques described herein. Moreover, the process may continue until blocks of a plurality of frames have been coded using a high efficiency mode.
[0057] FIG. 5 is a flow diagram illustrating exemplary operation of an encoding module, such as encoding module 30 of FIG. 2, estimating the number of bits associated with coding the residual coefficients of a block. After selecting one of the coding modes for which to estimate the coding cost, encoding module 30 generates the residual data of the block for the selected mode (80). For a block selected to be intra-coded, for example, spatial prediction module 34 generate the residual data for the block based on a comparison of the block with a predicted version of the block. Alternatively, for a block selected to be inter-coded, motion estimation module 36 and motion compensation module 38 compute the residual data for the block based on a comparison between the block and a corresponding block in a reference frame. In some aspects, the residual data may have already been computed to generate the distortion metric of the block. In this case, encoding module 30 may retrieve the residual data from a memory.
[0058] Transform module 40 transforms the residual coefficients of the block in accordance with a transform function to generate transform coefficients for the residual data (82). Transform module 40 may, for example, apply a 4x4 or 8x8 integer transform or a DCT transform to the residual data to generate the transform coefficients for the residual data. Bit estimate module 42 compares one of the transform coefficients to a corresponding threshold to determine whether the transform coefficient is greater than or equal to the threshold (84). The threshold corresponding with the transform coefficient may be computed as a function of the QP of encoding module 30. If the transform coefficient is greater than or equal to the corresponding threshold, bit estimate module 42 identifies the transform coefficient as a coefficient that will remain non-zero after quantization (86). If the transform coefficient is less than the corresponding threshold, bit estimate module 42 identifies the transform coefficient as a coefficient that will become zero after quantization (88).
[0059] Bit estimate module 42 determines whether there are any additional transform coefficients for the residual data of the block (90). If there are additional transform coefficients of the block, bit estimate module 42 selects another one of the coefficients and compares it to a corresponding threshold. If there are no additional transform coefficients to analyze, bit estimate module 42 determines the number of coefficients identified to remain non-zero after quantization (92). Bit estimate module 42 also sums at least a portion of the absolute values of the transform coefficients identified to remain non-zero after quantization (94). Bit estimate module 42 estimates the number of bits associated with coding the residual data using the determined number of non-zero coefficients and the sum of the portion of the non-zero coefficients (96). Bit estimate module 42 may, for example, estimate the number of bits associated with coding the residual data using equation (4) above. In this manner, the encoding module 30 estimates the number of bits associated with coding the residual data of the block in the selected mode without quantizing or encoding the residual data.
[0060] FIG. 6 is a flow diagram illustrating exemplary operation of an encoding module, such as encoding module 50 of FIG. 3, estimating the number of bits associated with coding the residual coefficients of a block. After selecting one of the coding modes for which to estimate the coding cost, encoding module 50 generates the residual coefficients of the block (100). For a block selected to be intra-coded, for example, spatial prediction module 34 computes the residual data for the block based on a comparison of the block with a predicted version of the block. Alternatively, for a block selected to be inter-coded, motion estimation module 36 and motion compensation module 38 compute the residual data for the block based on a comparison between the block and a corresponding block in a reference frame. In some aspects, the residual coefficients may have already been computed to generate the distortion metric of the block.
[0061] Transform module 40 transforms the residual coefficients of the block in accordance with a transform function to generate transform coefficients for the residual data (102). Transform module 40 may, for example, apply a 4x4 or 8x8 integer transform or a DCT transform to the residual data to generate transformed residual coefficients. Quantization module 46 quantizes the transform coefficients in accordance with a QP of encoding module 50 (104).
[0062] Bit estimate module 52 determines the number of quantized transform coefficients that are non-zero (106). Bit estimate module 42 also sums the absolute values of the non-zero levels or quantized transform coefficients (108). Bit estimate module 52 estimates the number of bits associated with coding the residual data using the computed number of non-zero quantized transform coefficients and the sum of the non-zero quantized transform coefficients (110). Bit estimate module 52 may, for example, estimate the number of bits associated with coding the residual coefficients using equation (4) above. In this manner, the encoding module estimates the number of bits associated with coding the residual data of the block in the selected mode without encoding the residual data. [0063] Based on the teachings described herein, it should be apparent that an aspect disclosed herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. The techniques described herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in hardware, the techniques may be realized using digital hardware, analog hardware or a combination thereof. If implemented in software, the techniques may be realized at least in part by a computer-program product that includes a computer readable medium having instructions or code stored thereon. The instructions or code associated with a computer-readable medium of the computer program product may be executed by a computer, e.g., by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, ASICs, FPGAs, or other equivalent integrated or discrete logic circuitry.
[0064] By way of example, and not limitation, such computer-readable media can comprise RAM, such as synchronous dynamic random access memory (SDRAM), readonly memory (ROM), non-volatile random access memory (NVRAM), ROM, electrically erasable programmable read-only memory (EEPROM), EEPROM, FLASH memory, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other tangible medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
[0065] A number of aspects and examples have been described. However, various modifications to these examples are possible, and the principles presented herein may be applied to other aspects as well. These and other aspects are within the scope of the following claims.

Claims

CLAIMSWhat is claimed is:
1. A method for processing digital video data, the method comprising: identifying one or more transform coefficients for residual data of a block of pixels that will remain non-zero when quantized; estimating a number of bits associated with coding of the residual data based on at least the identified transform coefficients; and estimating a coding cost for coding the block of pixels based on at least the estimated number of bits associated with coding the residual data.
2. The method of claim 1, wherein identifying the transform coefficients comprises comparing each of the transform coefficients to a corresponding one of a plurality of thresholds to identify the transform coefficients that will remain non-zero when quantized, wherein each of the plurality of thresholds is computed as a function of a quantization parameter (QP).
3. The method of claim 2, wherein comparing each of the transform coefficients to a corresponding one of a plurality of thresholds to identify the transform coefficients that will remain non-zero when quantized comprises identifying, as transform coefficients that will remain non-zero when quantized, the transform coefficients that are less than their corresponding threshold.
4. The method of claim 2, further comprising: pre-computing a plurality of sets of thresholds, wherein each of the sets of thresholds corresponds to a different value of the QP; and selecting one of the plurality of sets of thresholds based on the value of the QP used for encoding the block of pixels.
5. The method of claim 1, wherein estimating the number of bits associated with coding the residual data comprises: determining a number of the transform coefficients identified as remaining non-zero when quantized; summing absolute values of at least one of the transform coefficients identified as remaining non-zero when quantized; and estimating the number of bits associated with coding of the residual data based on at least the determined number of non-zero transform coefficients and the sum of the absolute values of the at least one non-zero transform coefficient.
6. The method of claim 1 , wherein estimating the number of bits associated with coding of the residual data comprises estimating a number of bits required to code the residual data in each of at least two block modes, and estimating the coding cost comprises estimating, in each of the at least two block modes, the coding cost based on at least the estimated number of bits in the respective one of the block modes, and further comprising selecting one of the block modes based on at least the estimated coding cost for each of the modes.
7. The method of claim 6, further comprising: estimating, for each of the modes, a total coding cost for coding the block of pixels using at least the estimated number of bits associated with coding of the residual data; selecting the one of the plurality of modes with a smallest estimated total coding cost; and applying the selected mode to code the block of pixels.
8. The method of claim 7, wherein estimating the total coding cost comprises: computing a distortion metric for the block of pixels; computing a number of bits associated with coding of non-residual data of the block of pixels; and estimating the total coding cost for coding the block of pixels based on at least the distortion metric, the number of bits associated with coding of the non-residual data and the number of bits associated with coding of the residual data.
9. The method of claim 1, further comprising: selecting a coding mode based on at least the estimated number of bits associated with coding of the residual data; quantizing the transform coefficients for the residual data after selecting the coding mode; encoding the quantized transform coefficients for the residual data; and transmitting the encoded coefficients for the residual data.
10. The method of claim 1 , further comprising: generating a matrix of the transform coefficients, wherein a number of rows of the matrix of transform coefficients is equal to a number of rows of pixels in the block and a number of columns of the matrix of transform coefficients is equal to a number of columns of pixels in the block; comparing the matrix of transform coefficients to a matrix of thresholds, wherein the matrix of thresholds has a dimension the same as that of the matrix of transform coefficients, and further wherein the comparison results in a matrix of ones and zeros, where the zeros represent locations in the matrix of transform coefficients that will become zero after quantization and the ones represent locations in the matrix of transform coefficients that will remain non-zero after quantization; summing a number of ones in the matrix of ones and zeros to compute a number of the transform coefficients identified as remaining non-zero when quantized; summing absolute values of at least one of the transform coefficients in the matrix of transform coefficients that correspond to the location of the ones in the matrix of ones and zeros; and estimating the number of bits associated with coding of the residual data based on at least the number of non-zero transform coefficients and the sum of the at least one non-zero transform coefficient.
11. An apparatus for processing digital video data, the apparatus comprising: a transform module that generates transform coefficients for residual data of a block of pixels; a bit estimate module that identifies one or more of the transform coefficients that will remain non-zero when quantized and estimates a number of bits associated with coding of the residual data based on at least the identified transform coefficients; and a control module that estimates a coding cost for coding the block of pixels based on at least the estimated number of bits associated with coding the residual data.
12. The apparatus of claim 11, wherein the bit estimate module compares each of the transform coefficients to a corresponding one of a plurality of thresholds to identify the transform coefficients that will remain non-zero when quantized, wherein each of the plurality of thresholds is computed as a function of a quantization parameter (QP).
13. The apparatus of claim 12, wherein the bit estimate module identifies, as transform coefficients that will remain non-zero when quantized, the transform coefficients that are less than their corresponding threshold.
14. The apparatus of claim 12, wherein the bit estimate module pre-computes a plurality of sets of thresholds, wherein each of the sets of thresholds corresponds to a different value of the QP, and selects one of the plurality of sets of thresholds based on the value of the QP used for encoding the block of pixels.
15. The apparatus of claim 11, wherein the bit estimate module determines a number of the transform coefficients identified as remaining non-zero when quantized, sums absolute values of at least one of the transform coefficients identified as remaining non-zero when quantized and estimates the number of bits associated with coding of the residual data based on at least the determined number of non-zero transform coefficients and the sum of the absolute values of the at least one non-zero transform coefficient.
16. The apparatus of claim 11 , wherein: the bit estimate module estimates the number of bits associated with coding of the residual data in each of at least two block modes, and the control module estimates a coding cost for each of the at least two block modes based on at least the estimated number of bits in the respective one of the block modes and selects one of the block modes based on at least the estimated coding cost for each of the modes.
17. The apparatus of claim 16, wherein the control module estimates, for each of the modes, a total coding cost for coding the block of pixels using at least the estimated number of bits associated with coding of the residual data, selects the one of the plurality of modes with a smallest estimated total coding cost, and applies the selected mode to code the block of pixels.
18. The apparatus of claim 17, wherein the control module computes a distortion metric for the block of pixels, computes a number of bits associated with coding of non-residual data of the block of pixels and estimates the total coding cost for coding the block of pixels based on at least the distortion metric, the number of bits associated with coding of the non-residual data and the number of bits associated with coding of the residual data.
19. The apparatus of claim 11 , further comprising: a control module that selects a coding mode based on at least the estimated number of bits associated with coding the residual data; a quantization module that quantizes the transform coefficients for the residual data after selection of the coding mode; an entropy encoding module that encodes the quantized transform coefficients for the residual data; and a transmitter that transmits the encoded coefficients for the residual data.
20. The apparatus of claim 11 , wherein: the transform module generates a matrix of the transform coefficients, wherein a number of rows of the matrix of transform coefficients is equal to a number of rows of pixels in the block and a number of columns of the matrix of transform coefficients is equal to a number of columns of pixels in the block, and the bit estimate module compares the matrix of transform coefficients to a matrix of thresholds, wherein the matrix of thresholds has a dimension the same as that of the matrix of transform coefficients, and further wherein the comparison results in a matrix of ones and zeros, where the zeros represent locations in the matrix of transform coefficients that will become zero after quantization and the ones represent locations in the matrix of transform coefficients that will remain non-zero after quantization, further wherein the bit estimate module sums a number of ones in the matrix of ones and zeros to compute a number of the transform coefficients identified as remaining non-zero when quantized, sums absolute values of at least one of the transform coefficients in the matrix of transform coefficients that correspond to the location of the ones in the matrix of ones and zeros, and estimates the number of bits associated with coding of the residual data based on at least the number of non-zero transform coefficients and the sum of the at least one non-zero transform coefficient.
21. An apparatus for processing digital video data, the apparatus comprising: means for identifying one or more transform coefficients for residual data of a block of pixels that will remain non-zero when quantized; means for estimating a number of bits associated with coding of the residual data based on at least the identified transform coefficients; means for estimating a coding cost for coding the block of pixels based on at least the estimated number of bits associated with coding the residual data.
22. The apparatus of claim 21, wherein the identifying means compares each of the transform coefficients to a corresponding one of a plurality of thresholds to identify the transform coefficients that will remain non-zero when quantized comprises, wherein each of the plurality of thresholds is computed as a function of a quantization parameter
(QP)-
23. The apparatus of claim 22, wherein the identifying means identifies, as transform coefficients that will remain non-zero when quantized, the transform coefficients that are less than their corresponding threshold.
24. The apparatus of claim 22, further comprising: means for pre-computing a plurality of sets of thresholds, wherein each of the sets of thresholds corresponds to a different value of the QP; and means for selecting one of the plurality of sets of thresholds based on the value of the QP used for encoding the block of pixels.
25. The apparatus of claim 21 , wherein the estimating means determines a number of the transform coefficients identified as remaining non-zero when quantized, sums absolute values of at least one of the transform coefficients identified as remaining non-zero when quantized, and estimates the number of bits associated with coding of the residual data based on at least the determined number of non-zero transform coefficients and the sum of the absolute values of the at least one non-zero transform coefficient.
26. The apparatus of claim 21, wherein the bits estimating means estimates a number of bits associated with coding of the residual data in each of at least two block modes and the coding cost estimating means estimates a coding cost for each of the at least two block modes based on at least the estimated number of bits in the respective one of the block modes, and further comprising means for selecting one of the block modes based on at least the estimated number of bits for each of the modes.
27. The apparatus of claim 26, further comprising means for estimating, for each of the modes, a total coding cost for coding the block of pixels using at least the estimated number of bits associated with coding of the residual data, wherein the selecting means selects the one of the plurality of modes with a smallest estimated total coding cost.
28. The apparatus of claim 27, wherein the coding cost estimating means computes a distortion metric for the block of pixels, computes a number of bits associated with coding of non-residual data of the block of pixels, and estimates the total coding cost for coding the block of pixels based on at least the distortion metric, the number of bits associated with coding of the non-residual data and the number of bits associated with coding of the residual data.
29. The apparatus of claim 21 , further comprising: means for selecting a coding mode based on at least the estimated number of bits associated with coding of the residual data; means for quantizing the transform coefficients for the residual data after selecting the coding mode; means for encoding the quantized transform coefficients for the residual data; and means for transmitting the encoded coefficients for the residual data.
30. The apparatus of claim 21, further comprising means for generating a matrix of the transform coefficients, wherein a number of rows of the matrix of transform coefficients is equal to a number of rows of pixels in the block and a number of columns of the matrix of transform coefficients is equal to a number of columns of pixels in the block, and wherein: the identifying means compares the matrix of transform coefficients to a matrix of thresholds, wherein the matrix of thresholds has a dimension the same as that of the matrix of transform coefficients, and further wherein the comparison results in a matrix of ones and zeros, where the zeros represent locations in the matrix of transform coefficients that will become zero after quantization and the ones represent locations in the matrix of transform coefficients that will remain non-zero after quantization; and the estimating means sums a number of ones in the matrix of ones and zeros to compute a number of the transform coefficients identified as remaining non-zero when quantized, sums absolute values of at least one of the transform coefficients in the matrix of transform coefficients that correspond to the location of the ones in the matrix of ones and zeros, and estimates the number of bits associated with coding of the residual data based on at least the number of non-zero transform coefficients and the sum of the at least one non-zero transform coefficient.
31. A computer-program product for processing digital video data comprising a computer readable medium having instructions thereon, the instructions comprising: code for identifying one or more transform coefficients for residual data of a block of pixels that will remain non-zero when quantized; code for estimating a number of bits associated with coding of the residual data based on at least the identified transform coefficients; and code for estimating a coding cost for coding the block of pixels based on at least the estimated number of bits associated with coding the residual data.
32. The computer-program product of claim 31 , wherein code for identifying the transform coefficients comprises code for comparing each of the transform coefficients to a corresponding one of a plurality of thresholds to identify the transform coefficients that will remain non-zero when quantized comprises, wherein each of the plurality of thresholds is computed as a function of a quantization parameter (QP).
33. The computer-program product of claim 32, wherein code for comparing each of the transform coefficients to a corresponding one of a plurality of thresholds to identify the transform coefficients that will remain non-zero when quantized comprises code for identifying, as transform coefficients that will remain non-zero when quantized, the transform coefficients that are less than their corresponding threshold.
34. The computer-program product of claim 32, further comprising: code for pre-computing a plurality of sets of thresholds, wherein each of the sets of thresholds corresponds to a different value of the QP; and code for selecting one of the plurality of sets of thresholds based on the value of the QP used for encoding the block of pixels.
35. The computer-program product of claim 31 , wherein code for estimating the number of bits associated with coding of the residual data comprises: code for determining a number of the transform coefficients identified as remaining non-zero when quantized; code for summing absolute values of at least one of the transform coefficients identified as remaining non-zero when quantized; and code for estimating the number of bits associated with coding of the residual data based on at least the determined number of non-zero transform coefficients and the sum of the absolute values of the at least one non-zero transform coefficients.
36. The computer-program product of claim 31 , wherein code for estimating the number of bits associated with coding of the residual data comprises code for estimating a number of bits associated with coding of the residual data in each of at least two block modes and code for estimating the coding cost comprises code for estimating the coding cost for each of the at least two block modes based on at least the estimated number of bits in the respective one of the block modes, and further comprising code for selecting one of the block modes based on at least the estimated number of bits for each of the modes.
37. The computer-program product of claim 36, further comprising: code for estimating, for each of the modes, a total coding cost for coding the block of pixels using at least the estimated number of bits associated with coding of the residual data; code for selecting the one of the plurality of modes with a smallest estimated total coding cost; and code for applying the selected mode to code the block of pixels.
38. The computer-program product of claim 37, wherein code for estimating the total coding cost comprises: code for computing a distortion metric for the block of pixels; code for computing a number of bits associated with coding of non-residual data of the block of pixels; and code for estimating the total coding cost for coding the block of pixels based on at least the distortion metric, the number of bits associated with coding of the non-residual data and the number of bits associated with coding of the residual data.
39. The computer-program product of claim 31 , further comprising : code for selecting a coding mode based on at least the estimated number of bits associated with coding of the residual data code for quantizing the transform coefficients for the residual data after selecting the coding mode; code for encoding the quantized transform coefficients for the residual data; and code for transmitting the encoded coefficients for the residual data.
40. The computer-program product of claim 31 , further comprising: code for generating a matrix of the transform coefficients, wherein a number of rows of the matrix of transform coefficients is equal to a number of rows of pixels in the block and a number of columns of the matrix of transform coefficients is equal to a number of columns of pixels in the block; code for comparing the matrix of transform coefficients to a matrix of thresholds, wherein the matrix of thresholds has a dimension the same as that of the matrix of transform coefficients, and further wherein the comparison results in a matrix of ones and zeros, where the zeros represent locations in the matrix of transform coefficients that will become zero after quantization and the ones represent locations in the matrix of transform coefficients that will remain non-zero after quantization; and code for summing a number of ones in the matrix of ones and zeros to compute a number of the transform coefficients identified as remaining non-zero when quantized; code for summing absolute values of at least one of the transform coefficients in the matrix of transform coefficients that correspond to the location of the ones in the matrix of ones and zeros; and code for estimating the number of bits associated with coding of the residual data based on at least the number of non-zero transform coefficients and the sum of the at least one non-zero transform coefficients.
PCT/US2007/068307 2007-05-04 2007-05-04 Video coding mode selection using estimated coding costs WO2008136828A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
JP2010507374A JP2010526515A (en) 2007-05-04 2007-05-04 Video coding mode selection using estimated coding cost
EP07761930A EP2156672A1 (en) 2007-05-04 2007-05-04 Video coding mode selection using estimated coding costs
KR1020097025315A KR101166732B1 (en) 2007-05-04 2007-05-04 Video coding mode selection using estimated coding costs
PCT/US2007/068307 WO2008136828A1 (en) 2007-05-04 2007-05-04 Video coding mode selection using estimated coding costs
KR1020127007471A KR20120031529A (en) 2007-05-04 2007-05-04 Video coding mode selection using estimated coding costs
CN2007800528186A CN101663895B (en) 2007-05-04 2007-05-04 Video coding mode selection using estimated coding costs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2007/068307 WO2008136828A1 (en) 2007-05-04 2007-05-04 Video coding mode selection using estimated coding costs

Publications (1)

Publication Number Publication Date
WO2008136828A1 true WO2008136828A1 (en) 2008-11-13

Family

ID=39145223

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/068307 WO2008136828A1 (en) 2007-05-04 2007-05-04 Video coding mode selection using estimated coding costs

Country Status (5)

Country Link
EP (1) EP2156672A1 (en)
JP (1) JP2010526515A (en)
KR (2) KR20120031529A (en)
CN (1) CN101663895B (en)
WO (1) WO2008136828A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101187238B1 (en) 2008-02-21 2012-10-02 퀄컴 인코포레이티드 Two pass quantization for cabac coders
US8891615B2 (en) 2008-01-08 2014-11-18 Qualcomm Incorporated Quantization based on rate-distortion modeling for CABAC coders
US9749661B2 (en) 2012-01-18 2017-08-29 Qualcomm Incorporated Sub-streams for wavefront parallel processing in video coding
CN107517384A (en) * 2011-06-16 2017-12-26 Ge视频压缩有限责任公司 Support the entropy code of pattern switching
US11405616B2 (en) 2011-03-08 2022-08-02 Qualcomm Incorporated Coding of transform coefficients for video coding

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102126855B1 (en) * 2013-02-15 2020-06-26 한국전자통신연구원 Method and apparatus for coding mode decision
KR102229386B1 (en) * 2014-12-26 2021-03-22 한국전자통신연구원 Apparatus and methdo for encoding video
US11418788B2 (en) 2019-01-21 2022-08-16 Lg Electronics Inc. Method and apparatus for processing video signal
WO2023067822A1 (en) * 2021-10-22 2023-04-27 日本電気株式会社 Video encoding device, video decoding device, video encoding method, video decoding method, and video system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0646268A (en) * 1992-07-24 1994-02-18 Chinon Ind Inc Code quantity controller
FR2753330B1 (en) * 1996-09-06 1998-11-27 Thomson Multimedia Sa QUANTIFICATION METHOD FOR VIDEO CODING
NO318318B1 (en) * 2003-06-27 2005-02-28 Tandberg Telecom As Procedures for improved video encoding
JP2006140758A (en) * 2004-11-12 2006-06-01 Toshiba Corp Method, apparatus and program for encoding moving image
JP4146444B2 (en) * 2005-03-16 2008-09-10 株式会社東芝 Video encoding method and apparatus
CN100348051C (en) * 2005-03-31 2007-11-07 华中科技大学 An enhanced in-frame predictive mode coding method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHEN Q ET AL: "A fast bits estimation method for rate-distortion optimization in H.264/AVC", PROCEEDINGS OF THE PICTURE CODING SYMPOSIUM, XX, XX, 15 December 2004 (2004-12-15), pages 133 - 137, XP002348992 *
QIANG WANG ET AL: "Low Complexity RDO Mode Decision based on a Fast Coding-Bits Estimation Model for H.264/AVC", CIRCUITS AND SYSTEMS, 2005. ISCAS 2005. IEEE INTERNATIONAL SYMPOSIUM ON KOBE, JAPAN 23-26 MAY 2005, PISCATAWAY, NJ, USA,IEEE, 23 May 2005 (2005-05-23), pages 3467 - 3470, XP010816271, ISBN: 0-7803-8834-8 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8891615B2 (en) 2008-01-08 2014-11-18 Qualcomm Incorporated Quantization based on rate-distortion modeling for CABAC coders
US9008171B2 (en) 2008-01-08 2015-04-14 Qualcomm Incorporated Two pass quantization for CABAC coders
KR101187238B1 (en) 2008-02-21 2012-10-02 퀄컴 인코포레이티드 Two pass quantization for cabac coders
US11405616B2 (en) 2011-03-08 2022-08-02 Qualcomm Incorporated Coding of transform coefficients for video coding
CN107517384A (en) * 2011-06-16 2017-12-26 Ge视频压缩有限责任公司 Support the entropy code of pattern switching
CN107517384B (en) * 2011-06-16 2020-06-30 Ge视频压缩有限责任公司 Decoder, encoder, decoding method, encoding method, and storage medium
US9749661B2 (en) 2012-01-18 2017-08-29 Qualcomm Incorporated Sub-streams for wavefront parallel processing in video coding

Also Published As

Publication number Publication date
CN101663895A (en) 2010-03-03
KR101166732B1 (en) 2012-07-19
JP2010526515A (en) 2010-07-29
KR20100005240A (en) 2010-01-14
KR20120031529A (en) 2012-04-03
CN101663895B (en) 2013-05-01
EP2156672A1 (en) 2010-02-24

Similar Documents

Publication Publication Date Title
US8150172B2 (en) Video coding mode selection using estimated coding costs
JP5925416B2 (en) Adaptive coding of video block header information
US8311120B2 (en) Coding mode selection using information of other coding modes
US8483285B2 (en) Video coding using transforms bigger than 4×4 and 8×8
EP2704442B1 (en) Template matching for video coding
KR101166732B1 (en) Video coding mode selection using estimated coding costs
KR101247923B1 (en) Video coding using transforms bigger than 4×4 and 8×8
KR101377883B1 (en) Non-zero rounding and prediction mode selection techniques in video encoding
EP2537344A1 (en) Block type signalling in video coding
CN101946515A (en) Two pass quantization for cabac coders
JP5684342B2 (en) Method and apparatus for processing digital video data
KR101136771B1 (en) Coding mode selection using information of other coding modes

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200780052818.6

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07761930

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 1869/MUMNP/2009

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 2010507374

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20097025315

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2007761930

Country of ref document: EP