WO2007038727A2 - Video encoding method enabling highly efficient partial decoding of h.264 and other transform coded information - Google Patents

Video encoding method enabling highly efficient partial decoding of h.264 and other transform coded information Download PDF

Info

Publication number
WO2007038727A2
WO2007038727A2 PCT/US2006/037996 US2006037996W WO2007038727A2 WO 2007038727 A2 WO2007038727 A2 WO 2007038727A2 US 2006037996 W US2006037996 W US 2006037996W WO 2007038727 A2 WO2007038727 A2 WO 2007038727A2
Authority
WO
WIPO (PCT)
Prior art keywords
samples
multimedia
transform coefficients
reconstructed
multimedia data
Prior art date
Application number
PCT/US2006/037996
Other languages
French (fr)
Other versions
WO2007038727A3 (en
Inventor
Peisong Chen
Seyfullah Halit Oguz
Original Assignee
Qualcomm Incorporated
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Incorporated filed Critical Qualcomm Incorporated
Priority to CN2006800430179A priority Critical patent/CN101310536B/en
Priority to EP06815757A priority patent/EP1941742A2/en
Priority to JP2008533642A priority patent/JP2009510938A/en
Publication of WO2007038727A2 publication Critical patent/WO2007038727A2/en
Publication of WO2007038727A3 publication Critical patent/WO2007038727A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/147Discrete orthonormal transforms, e.g. discrete cosine transform, discrete sine transform, and variations therefrom, e.g. modified discrete cosine transform, integer transforms approximating the discrete cosine transform
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/174Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/89Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving methods or arrangements for detection of transmission errors at the decoder
    • H04N19/895Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving methods or arrangements for detection of transmission errors at the decoder in combination with error concealment

Definitions

  • the invention is directed to multimedia signal processing and, more particularly, to video encoding, and decoding.
  • Multimedia signal processing systems may encode multimedia data using encoding methods based on international standards such as MPEG-X and H.26x standards. Such encoding methods generally are directed towards compressing the multimedia data for transmission and/or storage. Compression is broadly the process of removing redundancy from the data.
  • a video signal may be described in terms of a sequence of pictures, which include frames (an entire picture), or fields (e.g., an interlaced video signal comprises fields of alternating odd or even lines of a picture).
  • the term "frame” refers to a picture, a frame or a field.
  • Video encoding methods compress video signals by using lossless or lossy compression algorithms to compress each frame.
  • Intra-frame coding (herein referred to as intra-coding) refers to encoding a frame using that frame.
  • Inter-frame coding (herein referred to as inter-coding) refers to encoding a frame based on other, "reference,” frames.
  • Multimedia processors such as video encoders, may encode a frame by partitioning it into blocks or "macroblocks" of, for example, 16x16 pixels. The encoder may further partition each macroblock into subblocks. Each subblock may further comprise additional subblocks. For example, subblocks of a macroblock may include 16x8 and 8x16 subblocks. Subblocks of the 8x16 subblocks may include 8x8 subblocks, and so forth. As used herein, the term "block” refers to either a macroblock or a subblock.
  • H.264 One compression technology based on developing industry standards is commonly referred to as "H.264” video compression.
  • the H.264 technology defines the syntax of an encoded video bitstream together with the method of decoding this bitstream.
  • an input video frame is presented for encoding.
  • the frame is processed in units of macroblocks corresponding to the original image.
  • Each macroblock can be encoded in intra or inter mode.
  • a predicted macroblock is formed based on portions of an already reconstructed frame or already reconstructed neighboring blocks in the same frame known as causal neighbors.
  • intra mode a macroblock is formed from causal samples in the current frame that have been previously encoded, decoded, and reconstructed.
  • Multimedia samples of one or more causal neighboring macroblocks are subtracted from the current macroblock being encoded to produce a residual or difference macroblock, D.
  • This residual block D is transformed using a block transform and quantized to produce X, a set of quantized transform coefficients. These transform coefficients are re-ordered and entropy encoded.
  • the entropy encoded coefficients, together with other information for decoding the macroblock, become part of a compressed bitstream that is transmitted to a receiving device.
  • error concealment has become critical when delivering multimedia content over error prone networks such as wireless channels.
  • Error concealment schemes make use of the spatial and temporal correlation that exists in the video signal.
  • recovery may occur during entropy decoding.
  • all or part of the data pertaining to one or more macroblocks or video slices could be lost.
  • resynchronization of decoding can take place at the next slice, and missing blocks of the lost slice can be concealed using spatial concealment.
  • the decoded data available to a decoder device includes the causal neighbors that have already been decoded and reconstructed
  • spatial concealment typically uses causal neighbors to conceal the missing blocks.
  • One reason for using the causal neighbors to conceal the lost blocks is that out-of-order reconstruction of the next slice followed by concealment of the lost section of the current slice can be very inefficient, especially when using a highly pipelined video hardware decoder core.
  • the non-causal neighbors could offer valuable information for improved spatial concealment. What is needed is an efficient method for providing out of order reconstruction of non-causal neighboring multimedia samples.
  • a method of processing multimedia data includes receiving transform coefficients, where the transform coefficients are associated with the multimedia data.
  • the method further includes determining a set of multimedia samples to be reconstructed, determining a set of the received transform coefficients based on the multimedia samples to be reconstructed, and processing the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples.
  • a multimedia data processor is provided.
  • the processor is configured to receive transform coefficients, where the transform coefficients are associated with multimedia data.
  • the processor is further configured to determine a set of multimedia samples to be reconstructed, determine a set of the received transform coefficients based on the multimedia samples to be reconstructed, and process the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples.
  • An apparatus for processing multimedia data includes a receiver to receive transform coefficients, where the transform coefficients are associated with multimedia data.
  • the apparatus further includes a first determiner to determine a set of multimedia samples to be reconstructed, a second determiner to determine a set of the received transform coefficients based on the multimedia samples to be reconstructed, and a generator to process the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples.
  • a machine readable medium including instructions that upon executing cause a machine to process multimedia data is provided.
  • the instructions cause the machine to receive transform coefficients, where the transform coefficients are associated with multimedia data.
  • the instructions further cause the machine to determine a set of multimedia samples to be reconstructed, determine a set of the received transform coefficients based on the multimedia samples to be reconstructed, and process the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples.
  • FIG. 1 is a block diagram illustrating a multimedia communications system according to one aspect.
  • FIG. 2A is a block diagram illustrating an aspect of a decoder device that may be used in a system such as illustrated in FIG. 1.
  • FIG. 2B is a block diagram illustrating an example of a computer processor system of a decoder device that may be used in a system such as illustrated in FIG. 1.
  • FIG. 3 is a flowchart illustrating one example of a method of decoding a portion of a video stream in a system such as illustrated in FIG. 1.
  • FIG. 4 is a flowchart illustrating in more detail another example of a method of decoding a portion of a video stream in a system such as illustrated in FIG. 1.
  • FIG. 5 shows a detailed diagram of a 4x4 block and its surrounding causal neighbor pixels.
  • FIG. 6 shows a directivity mode diagram that illustrates nine directivity modes (0-8) which are used to describe a directivity characteristic of a block in H.264.
  • FIG. 7 illustrates one example of an intra-coded 4x4 pixel block immediately below and right of one or more slice boundaries.
  • FIG. 8 illustrates a nomenclature of neighbor pixels and pixels within an intra- coded 4x4 pixel block.
  • FIG. 9 illustrates one example of an intra-coded 16x16 Luma macroblock immediately below and right of a slice boundary.
  • FIG. 10 illustrates one example of an intra-coded 8x8 Chroma block immediately below and right of a slice boundary.
  • FIG. 11 illustrates a portion of multimedia samples located immediately below a slice boundary.
  • FIG. 12 is a block diagram illustrating another example of a decoder device that may be used in a system such as illustrated in FIG. 1.
  • FIG. 13 is a block diagram illustrating another example of a decoder device 150 that may be used in a system such as illustrated in FIG. 1.
  • Video signals may be characterized in terms of a series of pictures, frames, or fields.
  • frame is a broad term that may encompass either frames of a progressive video signal or frames or fields of an interlaced video signal.
  • Aspects include systems and methods of improving processing in an encoder and a decoder in a multimedia transmission system.
  • Multimedia data may include one or more of motion video, audio, still images, or any other suitable type of audio-visual data.
  • aspects include an apparatus and method of decoding video data in an efficient manor providing improved error concealment by reconstructing non-causal multimedia samples and using the reconstructed samples to perform spatial concealment of lost or erroneous encoded multimedia data. For example, it has been found according to one aspect that generating reconstructed causal and/or non-causal neighboring samples prior to estimating multimedia concealment data for the lost or erroneous data can improve the quality of the spatial concealment. In some examples, the reconstructed multimedia samples and directivity indicators with which the reconstructed samples were originally encoded are used in the estimation of the multimedia concealment data. In another aspect, it has been found that reconstructing a subset of a matrix of multimedia samples to be used in spatial error concealment can further improve the processing efficiency.
  • the reconstruction of the multimedia samples and the estimation of the multimedia concealment data are performed in a pre-processor.
  • the multimedia concealment data can then be communicated with the originally encoded non-causal multimedia data to be decoded in an efficient video core processor, further improving processing efficiency.
  • HG. 1 is a functional block diagram illustrating a multimedia communications system 100 according to one aspect.
  • the system 100 includes an encoder device 110 in communication with a decoder device 150 via a network 140.
  • the encoder device receives a multimedia signal from an external source 102 and encodes that signal for transmission on the network 140.
  • the encoder device 110 comprises a processor 112 coupled to a memory 114 and a transceiver 116.
  • the processor 112 encodes data from the multimedia data source and provides it to the transceiver 116 for communication over the network 140.
  • the decoder device 150 comprises a processor 152 coupled to a memory 154 and a transceiver 156.
  • the processor 152 may include one or more of a general purpose processor and/or a digital signal processor and/or an application specific hardware processor.
  • the memory 154 may include one or more of solid state or disk based storage or any readable and writeable random access memory device.
  • the transceiver 156 is configured to receive multimedia data over the network 140 and make it available to the processor 152 for decoding. Li one example, the transceiver 156 includes a wireless transceiver.
  • the network 140 may comprise one or more of a wireline or wireless communication system, including one or more of a Ethernet, telephone (e.g., POTS), cable, power-line, and fiber optic systems, and/or a wireless system comprising one or more of a code division multiple access (CDMA or CDMA2000) communication system, a frequency division multiple access (EDMA) system, a time division multiple access (TDMA) system such as GSM/GPRS (General Packet Radio Service)/EDGE (enhanced data GSM environment), a TETRA (Terrestrial Trunked Radio) mobile telephone system, a wideband code division multiple access (WCDMA) system, a high data rate (IxEV-DO or IxEV-DO Gold Multicast) system, an IEEE 802.11 system, a MediaFLO system, a DMB system, an orthogonal frequency division multiple access (OFDM) system, or a DVB-H system.
  • a code division multiple access CDMA or CDMA2000
  • EDMA frequency division multiple access
  • FIG. 2A is a functional block diagram illustrating an aspect of the decoder device 150 that may be used in a system such as the system 100 illustrated in FIG. 1.
  • the decoder 150 comprises a receiver element 202, a multimedia sample determiner element 204, a transform coefficient determiner element 206, a reconstructed sample generator element 208, and a multimedia concealment estimator element 210.
  • the receiver 202 receives encoded video data (e.g., data encoded by the encoder 110 of FIGS. 1).
  • the receiver 202 may receive the encoded data over a wireline or wireless network such as the network 140 of FIG. 1.
  • the received data includes transform coefficients representing source multimedia data.
  • the transform coefficients are transformed into a domain where the correlations of neighboring samples are significantly reduced. For example, images typically exhibit a high degree of spatial correlation in the spatial domain. On the other hand, the transformed coefficients are typically orthogonal to each other, exhibiting zero correlation.
  • Some examples of transforms that can be used for multimedia data include, but are not limited to, the DCT (Discrete Cosine Transform), the DFT (Discrete Fourier Transform), the Hadamard (or Walsh-Hadamard) transform, discrete wavelet transforms, the DST (Discrete Sine Transform), the Haar transform, the Slant transform, the KL (Karhunen- Loeve) transform and integer transforms such as one used in H.264.
  • the transforms are used to transform a matrix or array of multimedia samples.
  • the received data also includes information indicating how the encoded blocks were encoded. Such information may include inter-coding reference information such as motion vectors and frame sequence numbers, and intra-coding reference information including block sizes, and spatial prediction directivity indicators, and others. Some received data includes quantization parameters indicating how each transform coefficient was rounded, nonzero indicators indicating how many transform coefficients in the transformed matrix are non-zero, and others.
  • the multimedia sample determiner 204 determines which multimedia samples are to be reconstructed. In one aspect the multimedia sample determiner 204 determines neighboring multimedia samples or pixels that are near to and/or border regions of multimedia data that are lost and can be concealed. In one example the multimedia sample determiner identifies pixels adjacent to a border of a slice or other group of blocks where a portion of the data has been lost due to errors or channel loss. In some examples, the multimedia sample determiner 204 identifies the fewest number of pixels associated with reconstructing neighboring blocks spatially predicted from the determined pixels. For example, compressed multimedia data can comprise a block of , transform coefficients resulting from a transformation of individual blocks (e.g. 8x8 pixel blocks and/or 4x4 pixel blocks) or matrices.
  • individual blocks e.g. 8x8 pixel blocks and/or 4x4 pixel blocks
  • the multimedia sample determiner 204 can identify a specific subset of multimedia samples of the transformed block to be reconstructed in order to be used to conceal the lost data or be used to reconstruct other encoded multimedia samples in other blocks predicted from those samples.
  • the determined multimedia samples can include non-causal samples and/or causal samples.
  • the transform coefficient determiner 206 determines a set of transform coefficients to be used to reconstruct some or all of the multimedia samples determined to be reconstructed by the multimedia sample determiner 204. The determination of which transform coefficients to use depends on the encoding method that was used to generate the transform coefficients. The transform coefficient determination also depends on which multimedia samples are being reconstructed and whether there are transform coefficients with zero values (thereby negating the potential need to use them). Details of which transform coefficients may be sufficient to reconstruct multimedia samples are discussed below.
  • the reconstructed sample generator 208 reconstructs multimedia samples based on those samples determined by the multimedia sample determiner 204.
  • the set of reconstructed samples can be a whole set, such as an entire NxN matrix of samples, where N is an integer.
  • the set of samples can be a subset of samples from an NxN matrix such as a row, a column, part of a row or column, a diagonal, etc.
  • the reconstructed sample generator 208 uses the transform coefficients determined by the transform coefficient determiner 206 in reconstructing the samples.
  • the reconstructed sample generator 208 also uses information based on the encoding method used to encode the transform coefficients in reconstructing the multimedia samples. Details of actions performed by the reconstructed sample generator 208 are discussed below.
  • the multimedia concealment estimator 210 uses the reconstructed samples calculated by the reconstructed sample generator 208 to form concealment multimedia samples to replace or conceal regions of multimedia data that are lost or altered with errors during transmission/reception.
  • the multimedia concealment estimator 210 uses reconstructed sample values in one aspect to form the concealment multimedia data.
  • the multimedia concealment estimator 210 uses the reconstructed sample values and a received spatial prediction directivity mode indicator in estimating the multimedia concealment data. Further details of spatial error concealment can be found in Application No. 11/182,621 (now published patent application U.S. 2006/0013320) "METHODS AND APPARATUS FOR SPATIAL ERROR CONCEALMENT" which is assigned to the assignee hereof.
  • one or more of the elements of the decoder 150 of FIG. 2 A may be rearranged and/or combined.
  • the elements may be implemented by hardware, software, firmware, middleware, microcode or any combination thereof. Details of the actions performed by the elements of the decoder 150 will be discussed in reference to the methods illustrated in FIGS. 3 and 4 below.
  • FIG. 2B is a block diagram illustrating an example of a computer processor system of a decoder device that may be used in a system such as illustrated in FIG. 1.
  • the decoder device 150 of this example includes a pre-processor element 220, a random access memory (RAM) element 222, a digital signal processor (DSP) element 224, and a video core element 226.
  • RAM random access memory
  • DSP digital signal processor
  • the pre-processor 220 is used in one aspect to perform one or more of the actions performed by the various elements in FIG. 2A.
  • the pre-processor parses the video bitstream and writes the data to the RAM 222.
  • the preprocessor 220 implements the actions of the multimedia sample determiner 204, the transform coefficient determiner 206, the reconstructed sample generator 208 and the multimedia concealment estimator 210. By performing these more efficient, less computationally intensive actions in the preprocessor 220, the more computationally intensive video decoding can be done, in causal order, in the highly efficient video core 226.
  • the DSP 224 retrieves the parsed video data stored in the RAM 222 and reorganizes it to be handled by the video core 226.
  • the video core 226 performs the dequantization (also known as rescaling or scaling), inverse transforming and deblocking functions as well as other video decompression functions.
  • the video core is typically implemented in a highly optimized and pipelined fashion. Because of this, the video data can be decoded in the fastest manner when it is decoded in causal order. By performing the out-of-order reconstruction of multimedia samples and the subsequent spatial concealment in the pre-processor, the causal order is maintained for decoding in the video core allowing for improved overall decoding performance.
  • FIG. 3 is a flowchart illustrating one example of a method of decoding a portion of a video stream in a system such as illustrated in FIG. 1.
  • the process 300 can be performed by a decoding device such as the examples shown in FIGS. 2 A and 2B.
  • the process 300 enables reconstruction of selected multimedia samples.
  • the process 300 may be used to reconstruct multimedia samples in a causal order where other encoded multimedia data is predicted from the causal data and may need reconstruction of the causal data prior to its own reconstruction.
  • the process 300 may be used to reconstruct multimedia samples in non-causal order.
  • the non-causal data is reconstructed in a manner so as to permit a subsequent reconstruction of all the multimedia data (both causal and non-causal) in a more efficient and timely manner.
  • the process 300 starts at block 305 where the decoder device receives transform coefficients associated with a multimedia data bitstream.
  • the decoder device may receive the transform coefficients over a wireline and/or wireless network such as the network 140 shown in FIG. 1.
  • the transform coefficients can represent multimedia samples including color and/or brightness parameters such as chrominance and luminance, respectively.
  • the transforms used to generate the transform coefficients may include, but are not limited to, the DCT (Discrete Cosine Transform), the DFT (Discrete Fourier Transform), the Hadamard (or Walsh-Hadamard) transform, discrete wavelet transforms, the DST (Discrete Sine Transform), the Haar transform, the Slant transform, the KL (Karhunen-Loeve) transform and integer transforms such as one used in H.264.
  • the multimedia samples may be transformed in groups such as one dimensional arrays and/or two dimensional matrices when the transform coefficients are generated during encoding.
  • the transformed coefficients may be intra-coded and may or may not include spatial prediction.
  • the transform coefficients may represent a residual value that is the error of a predictor provided by a reference value.
  • the transform coefficients may be quantized.
  • the transform coefficients may be entropy encoded.
  • the receiver element 202 of FIG. 2A may perform the acts at block 305.
  • the process 300 continues at block 310 where the decoder device determines a set of multimedia samples to be reconstructed.
  • the multimedia samples to be reconstructed may include luminance (luma) and chrominance (chroma) samples.
  • the set of multimedia samples to be reconstructed are determined in response to loss of synchronization while decoding the multimedia bitstream being received at block 305.
  • the loss of synchronization may be caused by the erroneous reception or the loss of some or all of the encoded data corresponding to multimedia samples contained in a first slice of macroblocks.
  • the determined multimedia samples to be reconstructed may be contained in a second slice of macroblocks.
  • the second slice of macroblocks borders at least a part of the lost portion of the first slice of macroblocks.
  • the determined multimedia samples may be causal or non-causal with respect to the lost portion of multimedia samples, as discussed above.
  • the multimedia samples determined to be reconstructed at block 310 may enable reconstruction of other multimedia samples that border a lost portion of multimedia data to be concealed.
  • intra-coded macroblocks at the bottom of another slice of macroblocks may be spatially predicted in reference to the determined set of multimedia samples determined to be reconstructed at block 310. Therefore, by reconstructing the determined set of multimedia samples which strongly correlate with the intra-coded blocks, the intra-coded blocks themselves can be reconstructed through a concealment process.
  • the multimedia samples determined to be reconstructed at block 310 may comprise samples located on or near a slice border. The samples to be reconstructed may comprise an entire matrix of associated multimedia samples that were transformed as a group during encoding.
  • the samples to be reconstructed may also comprise a portion of the matrix of associated multimedia samples such as a row, a column, a diagonal, or portions and/or combinations thereof.
  • the multimedia sample determiner 204 of FIG. 2A may perform the acts at block 310. Details of subsets of multimedia samples that may be reconstructed are discussed below.
  • the process 300 continues at block 315 where the decoder device determines a set of transform coefficients associated with the multimedia samples determined to be reconstructed at block 310.
  • the determination of which transform coefficients to use for reconstruction depends on the encoding method that was used to generate the transform coefficients.
  • the transform coefficient determination also depends on which multimedia samples are being reconstructed. For example, it may be determined that the entire set of multimedia samples determined at block 310 may be reconstructed, or a subset may alternatively be determined to be reconstructed.
  • the transform coefficient determination at block 315 also depends on whether there are transform coefficients with zero value (thereby negating the potential need to use them). Details of which transform coefficients may be sufficient to reconstruct multimedia samples are discussed below.
  • the transform coefficient determiner of FIG. 2A can perform the acts at block 315.
  • the process 300 proceeds to block 320.
  • the decoder device processes the set of determined transform coefficients in order to generate reconstructed multimedia samples.
  • the processing performed depends on the encoding methods that were used to generate the transform coefficients.
  • the processing includes inverse transforming the transform coefficients, but may also include other acts including, but not limited to, entropy decoding, dequantization (also called rescaling or scaling), etc. Details of examples of processing performed at block 320 are discussed below in reference to FIG. 4.
  • some or all acts of the process 300 are performed in a pre-processor such as the pre-processor 220 shown in FIG. 2B. It should be noted that some of the blocks of the process 300 may be combined, omitted, rearranged or any combination thereof.
  • HG. 4 is a flowchart illustrating in more detail another example of a method of decoding a portion of a video stream in a system such as illustrated in FIG. 1.
  • the example process 400 includes all of the actions performed at the blocks 305 to 320 contained in the process 300. The blocks 305, 310 and 315 remain unchanged from the examples shown in FIG. 3 and discussed above.
  • the block 320 of the process 300, where the transform coefficients are processed to generate reconstructed samples, is illustrated in more detail in the process 400, where it comprises four blocks 405, 410, 420 and 425.
  • the process 400 also includes additional blocks where concealment multimedia samples are estimated, block 430, and where transform coefficients, based on the estimated concealment multimedia samples, are generated, block 435.
  • the decoder device performs the actions at blocks 305, 310 and 315 in a similar fashion as discussed above.
  • the detailed example of the block 320 is shown where transform coefficients are associated with basis images in order to efficiently reconstruct the multimedia samples.
  • the decoder device partitions the transform coefficients into groups, where the groups of transform coefficients are associated with the multimedia samples determined to be reconstructed at block 305.
  • the groups of transform coefficients comprise the transform coefficients that modify (or weigh) a common basis image during an inverse transformation process in the reconstruction. Details of how transform coefficients are partitioned into groups are discussed below in relation to an example using H.264.
  • the decoder device calculates a weight value associated with each partitioned group based on the encoding method which generated the coefficients.
  • the weight is the sum of scaled transform coefficients of each group.
  • the scaling duplicates the inverse transform characteristics of the encoding method. Examples of scaling and calculating the weight value are discussed below in relation to the H.264 example.
  • Basis images are determined for each of the groups based on the encoding transform method.
  • Basis images are typically two dimensional orthogonal matrices, although one-dimensional arrays may also be utilized. Portions of the two dimensional basis images are used, where the portions depend on which multimedia samples are being reconstructed (as determined at block 310).
  • the values calculated for each group at block 410 are use to modify (or weigh) the associated basis images at block 425. By combining all the weighed basis images, multimedia samples are reconstructed at block 425. Details of blocks 420 and 425 are discussed below in reference to the H.264 example.
  • the process 400 continues at block 430, where the decoder device estimates concealment multimedia samples, in some examples, based on the reconstructed samples.
  • reconstructed sample values of the multimedia samples are used to form the concealment multimedia data.
  • the reconstructed sample values and a received spatial prediction directivity mode indicator are used to form the multimedia concealment data. Further details of spatial error concealment can be found in the Application No. 11/182/621 (now published patent application U.S. 2006/0013320) "METHODS AND APPARATUS FOR SPATIAL ERROR CONCEALMENT" which is assigned to the assignee hereof.
  • the estimated concealment multimedia samples are used directly and inserted into a frame buffer containing reconstructed data of the same frame to then be displayed.
  • the estimated concealment multimedia samples are transformed, in a manner replicating an encoding process, to generate transform coefficients representing the estimated concealment multimedia samples at block 435. These transformed coefficients are then inserted into the undecoded (still encoded) bitstream as if they were normal encoded samples.
  • the entire bitstream can then be forwarded to a video decoder core, such as the video core 226 in FIG. 2B, to be decoded.
  • all or part of the process 400 can be performed in a preprocessor such as the pre-processor 220 of FIG. 2B.
  • This method of performing the reconstruction and concealment estimation is especially useful for reconstructing non- causal portions which are then used to conceal other portions of multimedia data that were lost due to channel errors. Details of methods used to improve the efficiency of the reconstruction of multimedia samples will now be discussed in relation to H.264 encoded multimedia bitstreams.
  • H.264 uses spatial prediction to exploit the spatial correlation among neighboring blocks of pixels.
  • the spatial prediction modes use the causal neighbors to the left and above a 4x4, 8x8 or 16x16 pixel block for spatial prediction.
  • H.264 offers 2 modes of spatial prediction for Luma values, one for 4x4 pixel blocks (herein referred to as intra-4x4 coding) and one for 16x16 pixel macroblocks (herein referred to as intra- 16x16 coding). Note that other causal and non-causal neighboring samples may be used for spatial prediction.
  • FIG. 5 shows a detailed diagram of a 4x4 pixel block 502 and its surrounding causal neighbor pixels to the left and above, shown generally as 504.
  • the causal neighbor pixels 504 are used to generate various predictors, values and/or parameters describing the block 502 pixels.
  • the block 502 comprises pixels (p ⁇ -pl5) and the causal neighbor pixels 504 are identified using reference indicators n3, n7, nil, nl2, nl3, nl4, and nl5 where the number corresponds to the similar positions of the block 502 pixels.
  • FIG. 6 shows a directivity mode diagram 600 that illustrates nine directivity modes (0-8) which are used to describe a directivity characteristic of an intra-coded block in H.264.
  • the nine directivity modes (or indicators) are used to describe a directivity characteristic of the spatial prediction of block 502. For example, mode 0 describes a vertical directivity characteristic, mode 1 describes a horizontal directivity characteristic, and mode 2 describes a DC characteristic where the average value of available causal neighboring pixels is used as a reference for the prediction.
  • the causal neighboring pixels (those immediately above and to the left of the 4x4, 8x8 or 16x16 pixel block) that are in the same slice are used in calculating the average. For example, if the block being encoded borders a slice above, then the pixels to the left are averaged. If the block being encoded borders another slice to the left and above, then a value of 128 is used as the DC average (half of the 8-bit range of values provided in H.264).
  • the modes illustrated in the directivity mode diagram 600 are used in the H.264 encoding process to generate prediction values for the block 502.
  • the luma values can be encoded in reference to the pixels to the left and above the 4x4 block using any of the nine directivity modes.
  • intra-16xl6 coding the luma values can be encoded in reference to the pixels to the left and above the entire 16x16 pixel block using four modes: i) vertical (mode 0), ii) horizontal (mode 1), iii) DC (mode 2), and iv) planar (mode 3).
  • mode 3 In the planar prediction mode, it is assumed that the luma values vary spatially and smoothly across the macroblock and the reference is formed based on a planar equation. For chroma, there is one prediction mode, 8x8.
  • the 8x8 block can be predicted with the same modes used in intra-16xl6 coding: i) vertical (mode 0), ii) horizontal (mode 1), iii) DC (mode 2), and iv) planar (mode 3). Details of reconstructing the predicted blocks encoded in H.264 will now be discussed. [0062]
  • the residual values ⁇ can be reconstructed by inverse transformation of the transform coefficients.
  • the prediction values p are obtained from causal neighboring pixels depending on the spatial prediction mode used to encode them. [0063] The following are observations affecting reconstruction of pixels within intra- 4x4 coded macroblocks located immediately below a slice boundary (non-causal neighbors in H.264).
  • FIG. 7 shows one aspect of an intra-4x4 coded block immediately below a slice boundary.
  • the line AA' marks the mentioned slice boundary and the 4x4 block 702 is the current block being reconstructed.
  • the 9 neighboring pixels 704 above the slice boundary line AA' which could normally have been used for performing spatial prediction in the intra-4x4 coding, are not available since they are located on the other side of the slice boundary and hence they belong to another slice. Spatial prediction as well as any other predictive coding dependency across a slice boundary is not permitted in H.264 since slices act as resynchronization points.
  • FIG. 8 illustrates a nomenclature for the neighbor pixels and pixels within an intra-4x4 coded block. Since pixels above the slice boundary AA' are not available for spatial prediction, the neighboring pixels of block 702 available for prediction are the pixels ⁇ I, J, K, L ⁇ . This implies that the permissible intra-4x4 coding prediction modes for the 4x4 block 702 are: i) mode 1 (horizontal), ii) mode 2 (DC), and iii) mode 8 (horizontal-up). If the line BB' in Figure 7 marked another slice boundary, then none of the pixels ⁇ I, J, K, L ⁇ or ⁇ M, A, B, C, D, E, F, G and H) would be available for spatial prediction.
  • the permissible intra-4x4 coding prediction mode available is mode 2 (DC) where the reference value for all the pixels of block 702 is 128.
  • DC mode 2
  • the information for decoding and reconstructing some or all of the pixels of an intra-4x4 coded block located immediately below a slice boundary includes:
  • This sufficient data set can enable the reconstruction of all pixel values ⁇ a, b, c, . . . , n, o, p in FIG. 8 ⁇ of the current 4x4 block.
  • this data set is sufficient for reconstructing the values of the pixel subset ⁇ d, h, 1, p ⁇ which in turn may be used for the reconstruction of the next 4x4 block immediately to the right.
  • the following are observations affecting reconstruction of pixels within intra- 16x16 coded macroblocks located immediately below a slice boundary (non-causal neighbors in H.264).
  • the interest is in the uppermost four 4x4 blocks (i.e. those with block indices b ⁇ , bl, b4, and b5 in FIG. 9), of an intra-16xl6 coded macroblock located immediately below a slice boundary.
  • FIG. 9 shows one aspect of an intra-16xl6 coded macroblock located below a slice boundary.
  • the line AA 1 marks the mentioned slice boundary and the four 4x4 blocks labeled b ⁇ , bl, b4 and b5 constitute the portion of the 16x16 macroblock under consideration for reconstruction.
  • the 17 neighboring pixels above the line AA' which could normally have been used for performing the intra- 16x16 spatial prediction, are not available since they are located on the other side of the slice boundary and hence they belong to another slice.
  • the permissible intra- 16x16 coding spatial prediction modes for the current macroblock are i) mode 1 (horizontal), and ii) mode 2 (DC).
  • the permissible intra-16xl6 prediction mode is mode 2 (DC).
  • the topmost four neighboring pixels located immediately to the left of line BB' and below the line AA' are sufficient for decoding and reconstructing the topmost four 4x4 blocks within the current 16x16 macroblock. This is consistent with the above described framework enabling the decoding of the topmost four 4x4 blocks in intra-4x4 coded macroblocks.
  • the intra- 16x16 coding of macroblocks which are located immediately below a slice boundary should be limited to the spatial prediction mode 1 (horizontal), unless they are located immediately to the right of a slice boundary, or at the left frame boundary. This allows for computationally efficient reconstruction of the rightmost four pixels of all the topmost 4x4 blocks in the row. This in turn allows for computationally efficient reconstruction of the topmost four pixels of all the topmost 4x4 blocks in the row.
  • HG. 10 shows one aspect of a 8x8 chroma block located immediately below a slice boundary.
  • the line AA' marks the slice boundary and the two 4x4 blocks immediately below line AA' and to the right of line BB' constitute data for one of the two chroma channels (Cr and Cb).
  • the nine neighboring pixels above the slice boundary line AA' are not available for spatial prediction, in this example, since they are located on the other side of the slice boundary and hence they belong to another slice.
  • the availability of 8 neighboring pixels, those located immediately to the left of line BB' implies that the permissible chroma channel intra prediction modes for the current MB are limited to i) mode 0 (DC) and ii) mode 1 (horizontal).
  • the permissible chroma channel intra prediction mode is mode 0 (DC).
  • the topmost four neighboring pixels located immediately to the left of line BB' may be needed for decoding and reconstructing the topmost two 4x4 chroma blocks within the current MB. It should be noted that there are two 8x8 chroma blocks corresponding to one 16x16 luma macroblock.
  • the intra-8x8 coding of chroma channels (Cr and Cb) of intra- coded macroblocks should be limited to the spatial prediction mode 1 (horizontal), unless they are located immediately to the right of a slice boundary, or at the left frame boundary.
  • transforms that may be partially decoded using these methods include, but are not limited to, the DCT (Discrete Cosine Transform), the DFT (Discrete Fourier Transform), the Hadamard (or Walsh-Hadamard) transform, discrete wavelet transforms, the DST (Discrete Sine Transform), the Haar transform, the Slant transform, and the KL (Karhunen-Loeve) transform.
  • DCT Discrete Cosine Transform
  • DFT Discrete Fourier Transform
  • the Hadamard or Walsh-Hadamard
  • discrete wavelet transforms discrete wavelet transforms
  • DST Discrete Sine Transform
  • Haar transform discrete wavelet transforms
  • Slant transform the Slant transform
  • KL Kerhunen-Loeve
  • the transformations represented by equations (3) and (4) can each be thought of as two one-dimensional (ID) transforms resulting in a two-dimensional (2D) transform.
  • the [Y] [T] matrix multiplication operation can be thought of as a ID row transform and the [T] T [Y] matrix multiplication operation can be thought of as a ID column transform.
  • the combination forms a 2D transform.
  • Another way of thinking about the 2D transform of an NxN matrix [Y] is to perform N 2 inner-products of [Y] with 2D basis images corresponding to the 2D transform characterized by the transform matrix [T], leading to a set of N 2 values identical to the set of transform coefficients.
  • Basis images of a given transform [T] can be calculated by setting one of the transform coefficients to one and setting all others to zero, and taking an inverse transform of the resulting coefficient matrix. For example, using a 4x4 transform coefficient matrix [w], and setting the W 11 coefficient to 1 and all others to zero, and using the H.264 integer transform [T H ], equation (4) results in:
  • the 16 basis images associated with the H.264 4x4 integer transformation process for residual 4x4 blocks can be determined to be as follows, where sij (for i,j e ⁇ 0,1,2,3 ⁇ ) is the basis image associated with ith horizontal and jth vertical frequency channel.
  • FIG. 11 illustrates a portion of multimedia samples located immediately below a slice boundary.
  • the pixels may comprise luma and chroma values. The pixel positions
  • the intra-4x4 spatial prediction modes which could have been used to generate the prediction signal for this 4x4 block can be one of the following:
  • Intra-4x4 spatial prediction mode 1 (Horizontal):
  • the rescaling factors vij i,je ⁇ 0,1,2,3 ⁇ which are used to scale zij ij e ⁇ 0,1,2,3 ⁇ , in addition to their dependence on the quantization parameter, also possess the following position related structure within a 4x4 matrix: v 00 v 10 v 20 v 30 v ⁇ l v ll v 21 v 31 v 02 v 12 v 22 v 32 v 03 v 13 v 23 v 33 where three groups of rescaling factors including [vOO, v20, vO2 v22], [vll, v31, vl3, v33] and [vlO, v30, v01, v21, vl2, v32, v03, v23] each have the same value for a given quantization parameter QPy.
  • FIG. 12 is a functional block diagram illustrating another example of a decoder device 150 that may be used in a system such as illustrated in FIG. 1.
  • This aspect includes means for receiving transform coefficients, wherein the transform coefficients are associated with multimedia data, first determiner means for determining a set of multimedia samples to be reconstructed, second determiner means for determining a set of the received transform coefficients based on the multimedia samples to be reconstructed, and generator means for processing the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples.
  • the receiving means comprises a receiver 202
  • the first determiner means comprises a multimedia sample determiner 204
  • the second determiner means comprises a transform coefficient determiner 206
  • the generator means comprises a reconstructed sample generator 208.
  • FIG. 13 is a functional block diagram illustrating another example of a decoder device 150 that may be used in a system such as illustrated in FIG. 1.
  • This aspect includes means for receiving transform coefficients, wherein the transform coefficients are associated with multimedia data, first determiner means for determining a set of multimedia samples to be reconstructed, second determiner means for determining a set of the received transform coefficients based on the multimedia samples to be reconstructed, and generator means for processing the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples.
  • the receiving means comprises a module for receiving 1302, where the first determiner means comprises a module for determining samples for reconstruction 1304, where the second determiner means comprises a module for determining transform coefficients 1306 and where the generator means comprises a module for processing transform coefficients 1308.
  • the receiving means comprises a module for receiving 1302
  • the first determiner means comprises a module for determining samples for reconstruction 1304
  • the second determiner means comprises a module for determining transform coefficients 1306
  • the generator means comprises a module for processing transform coefficients 1308.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine.
  • a processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core or ASIC core, or any other such configuration.
  • the steps of a method or algorithm described in connection with the examples disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two.
  • a software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, an optical storage medium, or any other form of storage medium known in the art.
  • An example storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium.
  • the storage medium may be integral to the processor.
  • the processor and the storage medium may reside in an Application Specific Integrated Circuit (ASIC).
  • the ASIC may reside in a wireless modem.
  • the processor and the storage medium may reside as discrete components in the wireless modem.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Discrete Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)
  • Detection And Prevention Of Errors In Transmission (AREA)

Abstract

Methods and apparatus to process multimedia data enabling efficient partial decoding of transform coded data are described. A decoder device receives transform coefficients, where the transform coefficients are associated with multimedia data. The decoder device determines a set of multimedia samples to be reconstructed. In one aspect, the set of samples to be reconstructed is a subset of a matrix of transformed multimedia samples. The decoder device determines a set of transform coefficients to be used to reconstruct the multimedia samples. In one aspect, the transform coefficients are used to scale partial basis images associated with the encoding method used to generate the transform coefficients, resulting in reconstructed multimedia samples.

Description

VIDEO ENCODING METHOD ENABLING HIGHLY EFFICIENT PARTIAL DECODING OF H.264 AND OTHER TRANSFORM
CODED INFORMATION
CROSS-REFERENCE TO RELATED APPLICATIONS
Claim of Priority under 35 U.S.C. §119
[0001] The present Application for Patent claims priority to Provisional Application No. 60/721,377 entitled "ERROR CONCEALMENT" filed September 27, 2005, and assigned to the assignee hereof and hereby expressly incorporated by reference herein.
BACKGROUND Field of the Invention
[0002] The invention is directed to multimedia signal processing and, more particularly, to video encoding, and decoding.
Description of the Related Art
[0004] Multimedia signal processing systems, such as video encoders, may encode multimedia data using encoding methods based on international standards such as MPEG-X and H.26x standards. Such encoding methods generally are directed towards compressing the multimedia data for transmission and/or storage. Compression is broadly the process of removing redundancy from the data.
[0005] A video signal may be described in terms of a sequence of pictures, which include frames (an entire picture), or fields (e.g., an interlaced video signal comprises fields of alternating odd or even lines of a picture). As used herein, the term "frame" refers to a picture, a frame or a field. Video encoding methods compress video signals by using lossless or lossy compression algorithms to compress each frame. Intra-frame coding (herein referred to as intra-coding) refers to encoding a frame using that frame. Inter-frame coding (herein referred to as inter-coding) refers to encoding a frame based on other, "reference," frames. For example, video signals often exhibit spatial redundancy in which portions of video frame samples near each other in the same frame have at least portions that match or at least approximately match each other. [0006] Multimedia processors, such as video encoders, may encode a frame by partitioning it into blocks or "macroblocks" of, for example, 16x16 pixels. The encoder may further partition each macroblock into subblocks. Each subblock may further comprise additional subblocks. For example, subblocks of a macroblock may include 16x8 and 8x16 subblocks. Subblocks of the 8x16 subblocks may include 8x8 subblocks, and so forth. As used herein, the term "block" refers to either a macroblock or a subblock.
[0007] One compression technology based on developing industry standards is commonly referred to as "H.264" video compression. The H.264 technology defines the syntax of an encoded video bitstream together with the method of decoding this bitstream. In one aspect of an H.264 encoding process, an input video frame is presented for encoding. The frame is processed in units of macroblocks corresponding to the original image. Each macroblock can be encoded in intra or inter mode. A predicted macroblock is formed based on portions of an already reconstructed frame or already reconstructed neighboring blocks in the same frame known as causal neighbors. In intra mode, a macroblock is formed from causal samples in the current frame that have been previously encoded, decoded, and reconstructed. Multimedia samples of one or more causal neighboring macroblocks are subtracted from the current macroblock being encoded to produce a residual or difference macroblock, D. This residual block D is transformed using a block transform and quantized to produce X, a set of quantized transform coefficients. These transform coefficients are re-ordered and entropy encoded. The entropy encoded coefficients, together with other information for decoding the macroblock, become part of a compressed bitstream that is transmitted to a receiving device.
[0008] Unfortunately, during the transmission process, errors in one or more macroblocks may be introduced. For example, one or more degrading transmission effects, such as signal fading, may cause the loss of data in one or more macroblocks. As a result, error concealment has become critical when delivering multimedia content over error prone networks such as wireless channels. Error concealment schemes make use of the spatial and temporal correlation that exists in the video signal. When errors are encountered, recovery may occur during entropy decoding. For example, when packet errors are encountered, all or part of the data pertaining to one or more macroblocks or video slices (groups of usually neighboring macroblocks) could be lost. When the video data of a slice is lost, resynchronization of decoding can take place at the next slice, and missing blocks of the lost slice can be concealed using spatial concealment.
[0009] Since the decoded data available to a decoder device includes the causal neighbors that have already been decoded and reconstructed, spatial concealment typically uses causal neighbors to conceal the missing blocks. One reason for using the causal neighbors to conceal the lost blocks is that out-of-order reconstruction of the next slice followed by concealment of the lost section of the current slice can be very inefficient, especially when using a highly pipelined video hardware decoder core. The non-causal neighbors could offer valuable information for improved spatial concealment. What is needed is an efficient method for providing out of order reconstruction of non-causal neighboring multimedia samples.
SUMMARY
[0010] The system, method, and devices of the invention each have several aspects, no single one of which is solely responsible for its desirable attributes. Without limiting the scope of this invention as expressed by the claims which follow, its more prominent features will now be discussed briefly. After considering this discussion, and particularly after reading the section entitled "Detailed Description of Certain Aspects" one will understand how sample features of this invention provide advantages to multimedia encoding and decoding that include improved error concealment, and improved efficiency.
[0011] A method of processing multimedia data is provided. The method includes receiving transform coefficients, where the transform coefficients are associated with the multimedia data. The method further includes determining a set of multimedia samples to be reconstructed, determining a set of the received transform coefficients based on the multimedia samples to be reconstructed, and processing the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples.
[0012] A multimedia data processor is provided. The processor is configured to receive transform coefficients, where the transform coefficients are associated with multimedia data. The processor is further configured to determine a set of multimedia samples to be reconstructed, determine a set of the received transform coefficients based on the multimedia samples to be reconstructed, and process the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples.
[0013] An apparatus for processing multimedia data is provided. The apparatus includes a receiver to receive transform coefficients, where the transform coefficients are associated with multimedia data. The apparatus further includes a first determiner to determine a set of multimedia samples to be reconstructed, a second determiner to determine a set of the received transform coefficients based on the multimedia samples to be reconstructed, and a generator to process the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples.
[0014] A machine readable medium including instructions that upon executing cause a machine to process multimedia data is provided. The instructions cause the machine to receive transform coefficients, where the transform coefficients are associated with multimedia data. The instructions further cause the machine to determine a set of multimedia samples to be reconstructed, determine a set of the received transform coefficients based on the multimedia samples to be reconstructed, and process the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] FIG. 1 is a block diagram illustrating a multimedia communications system according to one aspect.
[0016] FIG. 2A is a block diagram illustrating an aspect of a decoder device that may be used in a system such as illustrated in FIG. 1.
[0017] FIG. 2B is a block diagram illustrating an example of a computer processor system of a decoder device that may be used in a system such as illustrated in FIG. 1.
[0018] FIG. 3 is a flowchart illustrating one example of a method of decoding a portion of a video stream in a system such as illustrated in FIG. 1. [0019] FIG. 4 is a flowchart illustrating in more detail another example of a method of decoding a portion of a video stream in a system such as illustrated in FIG. 1. [0020] FIG. 5 shows a detailed diagram of a 4x4 block and its surrounding causal neighbor pixels.
[0021] FIG. 6 shows a directivity mode diagram that illustrates nine directivity modes (0-8) which are used to describe a directivity characteristic of a block in H.264. [0021] FIG. 7 illustrates one example of an intra-coded 4x4 pixel block immediately below and right of one or more slice boundaries.
[0023] FIG. 8 illustrates a nomenclature of neighbor pixels and pixels within an intra- coded 4x4 pixel block.
[0024] FIG. 9 illustrates one example of an intra-coded 16x16 Luma macroblock immediately below and right of a slice boundary.
[0025] FIG. 10 illustrates one example of an intra-coded 8x8 Chroma block immediately below and right of a slice boundary.
[0026] FIG. 11 illustrates a portion of multimedia samples located immediately below a slice boundary.
[0027] FIG. 12 is a block diagram illustrating another example of a decoder device that may be used in a system such as illustrated in FIG. 1.
[0028] FIG. 13 is a block diagram illustrating another example of a decoder device 150 that may be used in a system such as illustrated in FIG. 1.
DETAILED DESCRIPTION OF CERTAIN ASPECTS
[0029] The following detailed description is directed to certain specific sample aspects of the invention. However, the invention can be embodied in a multitude of different ways as defined and covered by the claims. In this description, reference is made to the drawings wherein like parts are designated with like numerals throughout. [0030] Video signals may be characterized in terms of a series of pictures, frames, or fields. As used herein, the term "frame" is a broad term that may encompass either frames of a progressive video signal or frames or fields of an interlaced video signal. [0031] Aspects include systems and methods of improving processing in an encoder and a decoder in a multimedia transmission system. Multimedia data may include one or more of motion video, audio, still images, or any other suitable type of audio-visual data. Aspects include an apparatus and method of decoding video data in an efficient manor providing improved error concealment by reconstructing non-causal multimedia samples and using the reconstructed samples to perform spatial concealment of lost or erroneous encoded multimedia data. For example, it has been found according to one aspect that generating reconstructed causal and/or non-causal neighboring samples prior to estimating multimedia concealment data for the lost or erroneous data can improve the quality of the spatial concealment. In some examples, the reconstructed multimedia samples and directivity indicators with which the reconstructed samples were originally encoded are used in the estimation of the multimedia concealment data. In another aspect, it has been found that reconstructing a subset of a matrix of multimedia samples to be used in spatial error concealment can further improve the processing efficiency. In some examples the reconstruction of the multimedia samples and the estimation of the multimedia concealment data are performed in a pre-processor. The multimedia concealment data can then be communicated with the originally encoded non-causal multimedia data to be decoded in an efficient video core processor, further improving processing efficiency.
Multimedia Communications System
[0032] HG. 1 is a functional block diagram illustrating a multimedia communications system 100 according to one aspect. The system 100 includes an encoder device 110 in communication with a decoder device 150 via a network 140. In one example, the encoder device receives a multimedia signal from an external source 102 and encodes that signal for transmission on the network 140.
[0033] In this example, the encoder device 110 comprises a processor 112 coupled to a memory 114 and a transceiver 116. The processor 112 encodes data from the multimedia data source and provides it to the transceiver 116 for communication over the network 140.
[0034] In this example, the decoder device 150 comprises a processor 152 coupled to a memory 154 and a transceiver 156. The processor 152 may include one or more of a general purpose processor and/or a digital signal processor and/or an application specific hardware processor. The memory 154 may include one or more of solid state or disk based storage or any readable and writeable random access memory device. The transceiver 156 is configured to receive multimedia data over the network 140 and make it available to the processor 152 for decoding. Li one example, the transceiver 156 includes a wireless transceiver. The network 140 may comprise one or more of a wireline or wireless communication system, including one or more of a Ethernet, telephone (e.g., POTS), cable, power-line, and fiber optic systems, and/or a wireless system comprising one or more of a code division multiple access (CDMA or CDMA2000) communication system, a frequency division multiple access (EDMA) system, a time division multiple access (TDMA) system such as GSM/GPRS (General Packet Radio Service)/EDGE (enhanced data GSM environment), a TETRA (Terrestrial Trunked Radio) mobile telephone system, a wideband code division multiple access (WCDMA) system, a high data rate (IxEV-DO or IxEV-DO Gold Multicast) system, an IEEE 802.11 system, a MediaFLO system, a DMB system, an orthogonal frequency division multiple access (OFDM) system, or a DVB-H system.
[0035] FIG. 2A is a functional block diagram illustrating an aspect of the decoder device 150 that may be used in a system such as the system 100 illustrated in FIG. 1. In this aspect, the decoder 150 comprises a receiver element 202, a multimedia sample determiner element 204, a transform coefficient determiner element 206, a reconstructed sample generator element 208, and a multimedia concealment estimator element 210. [0036] The receiver 202 receives encoded video data (e.g., data encoded by the encoder 110 of FIGS. 1). The receiver 202 may receive the encoded data over a wireline or wireless network such as the network 140 of FIG. 1. In one aspect the received data includes transform coefficients representing source multimedia data. The transform coefficients are transformed into a domain where the correlations of neighboring samples are significantly reduced. For example, images typically exhibit a high degree of spatial correlation in the spatial domain. On the other hand, the transformed coefficients are typically orthogonal to each other, exhibiting zero correlation. Some examples of transforms that can be used for multimedia data include, but are not limited to, the DCT (Discrete Cosine Transform), the DFT (Discrete Fourier Transform), the Hadamard (or Walsh-Hadamard) transform, discrete wavelet transforms, the DST (Discrete Sine Transform), the Haar transform, the Slant transform, the KL (Karhunen- Loeve) transform and integer transforms such as one used in H.264. The transforms are used to transform a matrix or array of multimedia samples. Two dimensional matrices are commonly used, but one dimensional arrays may also be used. The received data also includes information indicating how the encoded blocks were encoded. Such information may include inter-coding reference information such as motion vectors and frame sequence numbers, and intra-coding reference information including block sizes, and spatial prediction directivity indicators, and others. Some received data includes quantization parameters indicating how each transform coefficient was rounded, nonzero indicators indicating how many transform coefficients in the transformed matrix are non-zero, and others.
[0037] The multimedia sample determiner 204 determines which multimedia samples are to be reconstructed. In one aspect the multimedia sample determiner 204 determines neighboring multimedia samples or pixels that are near to and/or border regions of multimedia data that are lost and can be concealed. In one example the multimedia sample determiner identifies pixels adjacent to a border of a slice or other group of blocks where a portion of the data has been lost due to errors or channel loss. In some examples, the multimedia sample determiner 204 identifies the fewest number of pixels associated with reconstructing neighboring blocks spatially predicted from the determined pixels. For example, compressed multimedia data can comprise a block of , transform coefficients resulting from a transformation of individual blocks (e.g. 8x8 pixel blocks and/or 4x4 pixel blocks) or matrices. The multimedia sample determiner 204 can identify a specific subset of multimedia samples of the transformed block to be reconstructed in order to be used to conceal the lost data or be used to reconstruct other encoded multimedia samples in other blocks predicted from those samples. The determined multimedia samples can include non-causal samples and/or causal samples. [0038] The transform coefficient determiner 206 determines a set of transform coefficients to be used to reconstruct some or all of the multimedia samples determined to be reconstructed by the multimedia sample determiner 204. The determination of which transform coefficients to use depends on the encoding method that was used to generate the transform coefficients. The transform coefficient determination also depends on which multimedia samples are being reconstructed and whether there are transform coefficients with zero values (thereby negating the potential need to use them). Details of which transform coefficients may be sufficient to reconstruct multimedia samples are discussed below.
[0039] The reconstructed sample generator 208 reconstructs multimedia samples based on those samples determined by the multimedia sample determiner 204. The set of reconstructed samples can be a whole set, such as an entire NxN matrix of samples, where N is an integer. The set of samples can be a subset of samples from an NxN matrix such as a row, a column, part of a row or column, a diagonal, etc. The reconstructed sample generator 208 uses the transform coefficients determined by the transform coefficient determiner 206 in reconstructing the samples. The reconstructed sample generator 208 also uses information based on the encoding method used to encode the transform coefficients in reconstructing the multimedia samples. Details of actions performed by the reconstructed sample generator 208 are discussed below. [0040] The multimedia concealment estimator 210 uses the reconstructed samples calculated by the reconstructed sample generator 208 to form concealment multimedia samples to replace or conceal regions of multimedia data that are lost or altered with errors during transmission/reception. The multimedia concealment estimator 210 uses reconstructed sample values in one aspect to form the concealment multimedia data. In another aspect the multimedia concealment estimator 210 uses the reconstructed sample values and a received spatial prediction directivity mode indicator in estimating the multimedia concealment data. Further details of spatial error concealment can be found in Application No. 11/182,621 (now published patent application U.S. 2006/0013320) "METHODS AND APPARATUS FOR SPATIAL ERROR CONCEALMENT" which is assigned to the assignee hereof.
[0041] In some aspects, one or more of the elements of the decoder 150 of FIG. 2 A may be rearranged and/or combined. The elements may be implemented by hardware, software, firmware, middleware, microcode or any combination thereof. Details of the actions performed by the elements of the decoder 150 will be discussed in reference to the methods illustrated in FIGS. 3 and 4 below.
[0042] FIG. 2B is a block diagram illustrating an example of a computer processor system of a decoder device that may be used in a system such as illustrated in FIG. 1. The decoder device 150 of this example includes a pre-processor element 220, a random access memory (RAM) element 222, a digital signal processor (DSP) element 224, and a video core element 226.
[0043] The pre-processor 220 is used in one aspect to perform one or more of the actions performed by the various elements in FIG. 2A. The pre-processor parses the video bitstream and writes the data to the RAM 222. In addition, in one aspect, the preprocessor 220 implements the actions of the multimedia sample determiner 204, the transform coefficient determiner 206, the reconstructed sample generator 208 and the multimedia concealment estimator 210. By performing these more efficient, less computationally intensive actions in the preprocessor 220, the more computationally intensive video decoding can be done, in causal order, in the highly efficient video core 226.
[0044] The DSP 224 retrieves the parsed video data stored in the RAM 222 and reorganizes it to be handled by the video core 226. The video core 226 performs the dequantization (also known as rescaling or scaling), inverse transforming and deblocking functions as well as other video decompression functions. The video core is typically implemented in a highly optimized and pipelined fashion. Because of this, the video data can be decoded in the fastest manner when it is decoded in causal order. By performing the out-of-order reconstruction of multimedia samples and the subsequent spatial concealment in the pre-processor, the causal order is maintained for decoding in the video core allowing for improved overall decoding performance. [0045] FIG. 3 is a flowchart illustrating one example of a method of decoding a portion of a video stream in a system such as illustrated in FIG. 1. The process 300 can be performed by a decoding device such as the examples shown in FIGS. 2 A and 2B. The process 300 enables reconstruction of selected multimedia samples. The process 300 may be used to reconstruct multimedia samples in a causal order where other encoded multimedia data is predicted from the causal data and may need reconstruction of the causal data prior to its own reconstruction. The process 300 may be used to reconstruct multimedia samples in non-causal order. In one aspect, the non-causal data is reconstructed in a manner so as to permit a subsequent reconstruction of all the multimedia data (both causal and non-causal) in a more efficient and timely manner. [0046] The process 300 starts at block 305 where the decoder device receives transform coefficients associated with a multimedia data bitstream. The decoder device may receive the transform coefficients over a wireline and/or wireless network such as the network 140 shown in FIG. 1. The transform coefficients can represent multimedia samples including color and/or brightness parameters such as chrominance and luminance, respectively. The transforms used to generate the transform coefficients may include, but are not limited to, the DCT (Discrete Cosine Transform), the DFT (Discrete Fourier Transform), the Hadamard (or Walsh-Hadamard) transform, discrete wavelet transforms, the DST (Discrete Sine Transform), the Haar transform, the Slant transform, the KL (Karhunen-Loeve) transform and integer transforms such as one used in H.264. The multimedia samples may be transformed in groups such as one dimensional arrays and/or two dimensional matrices when the transform coefficients are generated during encoding. The transformed coefficients may be intra-coded and may or may not include spatial prediction. In the cases where spatial prediction was used in generating the transform coefficients, the transform coefficients may represent a residual value that is the error of a predictor provided by a reference value. The transform coefficients may be quantized. The transform coefficients may be entropy encoded. The receiver element 202 of FIG. 2A may perform the acts at block 305.
[0047] After receiving the transform coefficients, the process 300 continues at block 310 where the decoder device determines a set of multimedia samples to be reconstructed. The multimedia samples to be reconstructed may include luminance (luma) and chrominance (chroma) samples. In some examples, the set of multimedia samples to be reconstructed are determined in response to loss of synchronization while decoding the multimedia bitstream being received at block 305. The loss of synchronization may be caused by the erroneous reception or the loss of some or all of the encoded data corresponding to multimedia samples contained in a first slice of macroblocks. The determined multimedia samples to be reconstructed may be contained in a second slice of macroblocks. The second slice of macroblocks borders at least a part of the lost portion of the first slice of macroblocks. The determined multimedia samples may be causal or non-causal with respect to the lost portion of multimedia samples, as discussed above.
[0048] In one aspect, the multimedia samples determined to be reconstructed at block 310 may enable reconstruction of other multimedia samples that border a lost portion of multimedia data to be concealed. For example, intra-coded macroblocks at the bottom of another slice of macroblocks may be spatially predicted in reference to the determined set of multimedia samples determined to be reconstructed at block 310. Therefore, by reconstructing the determined set of multimedia samples which strongly correlate with the intra-coded blocks, the intra-coded blocks themselves can be reconstructed through a concealment process. Bi another aspect, the multimedia samples determined to be reconstructed at block 310 may comprise samples located on or near a slice border. The samples to be reconstructed may comprise an entire matrix of associated multimedia samples that were transformed as a group during encoding. The samples to be reconstructed may also comprise a portion of the matrix of associated multimedia samples such as a row, a column, a diagonal, or portions and/or combinations thereof. The multimedia sample determiner 204 of FIG. 2A may perform the acts at block 310. Details of subsets of multimedia samples that may be reconstructed are discussed below.
[0049] The process 300 continues at block 315 where the decoder device determines a set of transform coefficients associated with the multimedia samples determined to be reconstructed at block 310. The determination of which transform coefficients to use for reconstruction depends on the encoding method that was used to generate the transform coefficients. The transform coefficient determination also depends on which multimedia samples are being reconstructed. For example, it may be determined that the entire set of multimedia samples determined at block 310 may be reconstructed, or a subset may alternatively be determined to be reconstructed. The transform coefficient determination at block 315 also depends on whether there are transform coefficients with zero value (thereby negating the potential need to use them). Details of which transform coefficients may be sufficient to reconstruct multimedia samples are discussed below. The transform coefficient determiner of FIG. 2A can perform the acts at block 315.
[0050] After determining the set of multimedia samples to be reconstructed at block 310, and determining the set of transform coefficients associated with the determined multimedia samples at block 315, the process 300 proceeds to block 320. At block 320, the decoder device processes the set of determined transform coefficients in order to generate reconstructed multimedia samples. The processing performed depends on the encoding methods that were used to generate the transform coefficients. The processing includes inverse transforming the transform coefficients, but may also include other acts including, but not limited to, entropy decoding, dequantization (also called rescaling or scaling), etc. Details of examples of processing performed at block 320 are discussed below in reference to FIG. 4.
[0051] In some example systems, some or all acts of the process 300 are performed in a pre-processor such as the pre-processor 220 shown in FIG. 2B. It should be noted that some of the blocks of the process 300 may be combined, omitted, rearranged or any combination thereof.
[0052] HG. 4 is a flowchart illustrating in more detail another example of a method of decoding a portion of a video stream in a system such as illustrated in FIG. 1. The example process 400 includes all of the actions performed at the blocks 305 to 320 contained in the process 300. The blocks 305, 310 and 315 remain unchanged from the examples shown in FIG. 3 and discussed above. The block 320 of the process 300, where the transform coefficients are processed to generate reconstructed samples, is illustrated in more detail in the process 400, where it comprises four blocks 405, 410, 420 and 425. The process 400 also includes additional blocks where concealment multimedia samples are estimated, block 430, and where transform coefficients, based on the estimated concealment multimedia samples, are generated, block 435. [0053] The decoder device performs the actions at blocks 305, 310 and 315 in a similar fashion as discussed above. The detailed example of the block 320 is shown where transform coefficients are associated with basis images in order to efficiently reconstruct the multimedia samples. At block 405, the decoder device partitions the transform coefficients into groups, where the groups of transform coefficients are associated with the multimedia samples determined to be reconstructed at block 305. In one aspect, the groups of transform coefficients comprise the transform coefficients that modify (or weigh) a common basis image during an inverse transformation process in the reconstruction. Details of how transform coefficients are partitioned into groups are discussed below in relation to an example using H.264.
[0054] At block 410, the decoder device calculates a weight value associated with each partitioned group based on the encoding method which generated the coefficients. In one aspect, the weight is the sum of scaled transform coefficients of each group. The scaling duplicates the inverse transform characteristics of the encoding method. Examples of scaling and calculating the weight value are discussed below in relation to the H.264 example.
[0055] At block 420, basis images are determined for each of the groups based on the encoding transform method. Basis images are typically two dimensional orthogonal matrices, although one-dimensional arrays may also be utilized. Portions of the two dimensional basis images are used, where the portions depend on which multimedia samples are being reconstructed (as determined at block 310). The values calculated for each group at block 410 are use to modify (or weigh) the associated basis images at block 425. By combining all the weighed basis images, multimedia samples are reconstructed at block 425. Details of blocks 420 and 425 are discussed below in reference to the H.264 example.
[0056] After generating the reconstructed multimedia samples, the process 400 continues at block 430, where the decoder device estimates concealment multimedia samples, in some examples, based on the reconstructed samples. In one aspect, reconstructed sample values of the multimedia samples are used to form the concealment multimedia data. In another aspect the reconstructed sample values and a received spatial prediction directivity mode indicator are used to form the multimedia concealment data. Further details of spatial error concealment can be found in the Application No. 11/182/621 (now published patent application U.S. 2006/0013320) "METHODS AND APPARATUS FOR SPATIAL ERROR CONCEALMENT" which is assigned to the assignee hereof.
[0057] In some examples, the estimated concealment multimedia samples are used directly and inserted into a frame buffer containing reconstructed data of the same frame to then be displayed. In other examples, the estimated concealment multimedia samples are transformed, in a manner replicating an encoding process, to generate transform coefficients representing the estimated concealment multimedia samples at block 435. These transformed coefficients are then inserted into the undecoded (still encoded) bitstream as if they were normal encoded samples. The entire bitstream can then be forwarded to a video decoder core, such as the video core 226 in FIG. 2B, to be decoded. In these examples, all or part of the process 400 can be performed in a preprocessor such as the pre-processor 220 of FIG. 2B. This method of performing the reconstruction and concealment estimation is especially useful for reconstructing non- causal portions which are then used to conceal other portions of multimedia data that were lost due to channel errors. Details of methods used to improve the efficiency of the reconstruction of multimedia samples will now be discussed in relation to H.264 encoded multimedia bitstreams.
High-Efficiency Partial Intra Decoding in H.264 Bitstreams
[0058] H.264 uses spatial prediction to exploit the spatial correlation among neighboring blocks of pixels. The spatial prediction modes use the causal neighbors to the left and above a 4x4, 8x8 or 16x16 pixel block for spatial prediction. H.264 offers 2 modes of spatial prediction for Luma values, one for 4x4 pixel blocks (herein referred to as intra-4x4 coding) and one for 16x16 pixel macroblocks (herein referred to as intra- 16x16 coding). Note that other causal and non-causal neighboring samples may be used for spatial prediction.
[0059] FIG. 5 shows a detailed diagram of a 4x4 pixel block 502 and its surrounding causal neighbor pixels to the left and above, shown generally as 504. For example, during the H.264 encoding process, the causal neighbor pixels 504 are used to generate various predictors, values and/or parameters describing the block 502 pixels. The block 502 comprises pixels (pθ-pl5) and the causal neighbor pixels 504 are identified using reference indicators n3, n7, nil, nl2, nl3, nl4, and nl5 where the number corresponds to the similar positions of the block 502 pixels.
[0060] The spatial prediction modes provided in H.264 use various directivity modes to spatially predict the block 502 from the various causal neighbor pixels 504. FIG. 6 shows a directivity mode diagram 600 that illustrates nine directivity modes (0-8) which are used to describe a directivity characteristic of an intra-coded block in H.264. The nine directivity modes (or indicators) are used to describe a directivity characteristic of the spatial prediction of block 502. For example, mode 0 describes a vertical directivity characteristic, mode 1 describes a horizontal directivity characteristic, and mode 2 describes a DC characteristic where the average value of available causal neighboring pixels is used as a reference for the prediction. In the DC mode, the causal neighboring pixels (those immediately above and to the left of the 4x4, 8x8 or 16x16 pixel block) that are in the same slice are used in calculating the average. For example, if the block being encoded borders a slice above, then the pixels to the left are averaged. If the block being encoded borders another slice to the left and above, then a value of 128 is used as the DC average (half of the 8-bit range of values provided in H.264). The modes illustrated in the directivity mode diagram 600 are used in the H.264 encoding process to generate prediction values for the block 502.
[0061] In intra-4x4 coding of H.264, the luma values can be encoded in reference to the pixels to the left and above the 4x4 block using any of the nine directivity modes. In intra-16xl6 coding, the luma values can be encoded in reference to the pixels to the left and above the entire 16x16 pixel block using four modes: i) vertical (mode 0), ii) horizontal (mode 1), iii) DC (mode 2), and iv) planar (mode 3). In the planar prediction mode, it is assumed that the luma values vary spatially and smoothly across the macroblock and the reference is formed based on a planar equation. For chroma, there is one prediction mode, 8x8. In the intra-8x8 chroma coding, the 8x8 block can be predicted with the same modes used in intra-16xl6 coding: i) vertical (mode 0), ii) horizontal (mode 1), iii) DC (mode 2), and iv) planar (mode 3). Details of reconstructing the predicted blocks encoded in H.264 will now be discussed. [0062] The reconstructed signal within a predictive (intra or inter) coded 4x4 (luma or chroma) block can be expressed as: r = /? + Δ (1) where r, p and Δ , respectively denote the reconstructed signal (an approximation to the original uncompressed signal s), the prediction signal, and the compressed residual signal (an approximation to the original uncompressed residual signal: Δ = s-p , where s is the original signal), all of which are integer valued 4x4 matrices in this example. The residual values Δ can be reconstructed by inverse transformation of the transform coefficients. The prediction values p are obtained from causal neighboring pixels depending on the spatial prediction mode used to encode them. [0063] The following are observations affecting reconstruction of pixels within intra- 4x4 coded macroblocks located immediately below a slice boundary (non-causal neighbors in H.264). In a 16x16 macroblock, these blocks include the uppermost four 4x4 blocks located immediately below a slice boundary. For example, the blocks with indices bθ, bl, b4 and b5 in the 16x16 pixel macroblock shown in FIG. 9 are representative of blocks immediately below a slice boundary AA'. [0064] FIG. 7 shows one aspect of an intra-4x4 coded block immediately below a slice boundary. The line AA' marks the mentioned slice boundary and the 4x4 block 702 is the current block being reconstructed. The 9 neighboring pixels 704 above the slice boundary line AA' which could normally have been used for performing spatial prediction in the intra-4x4 coding, are not available since they are located on the other side of the slice boundary and hence they belong to another slice. Spatial prediction as well as any other predictive coding dependency across a slice boundary is not permitted in H.264 since slices act as resynchronization points.
[0065] FIG. 8 illustrates a nomenclature for the neighbor pixels and pixels within an intra-4x4 coded block. Since pixels above the slice boundary AA' are not available for spatial prediction, the neighboring pixels of block 702 available for prediction are the pixels {I, J, K, L}. This implies that the permissible intra-4x4 coding prediction modes for the 4x4 block 702 are: i) mode 1 (horizontal), ii) mode 2 (DC), and iii) mode 8 (horizontal-up). If the line BB' in Figure 7 marked another slice boundary, then none of the pixels {I, J, K, L } or { M, A, B, C, D, E, F, G and H) would be available for spatial prediction. In this case, the permissible intra-4x4 coding prediction mode available is mode 2 (DC) where the reference value for all the pixels of block 702 is 128. [0066] Thus, in the most general case, the information for decoding and reconstructing some or all of the pixels of an intra-4x4 coded block located immediately below a slice boundary includes:
1. the intra-4x4 prediction mode indicator;
2. the residual information (quantized transform coefficients); and
3. the values of the 4 neighboring pixels {I, J, K, L in FIG. 8} located immediately to the left of the 4x4 block.
This sufficient data set can enable the reconstruction of all pixel values {a, b, c, . . . , n, o, p in FIG. 8} of the current 4x4 block. In addition, this data set is sufficient for reconstructing the values of the pixel subset {d, h, 1, p} which in turn may be used for the reconstruction of the next 4x4 block immediately to the right. [0067] The following are observations affecting reconstruction of pixels within intra- 16x16 coded macroblocks located immediately below a slice boundary (non-causal neighbors in H.264). Here again, the interest is in the uppermost four 4x4 blocks (i.e. those with block indices bθ, bl, b4, and b5 in FIG. 9), of an intra-16xl6 coded macroblock located immediately below a slice boundary.
[0066] FIG. 9 shows one aspect of an intra-16xl6 coded macroblock located below a slice boundary. The line AA1 marks the mentioned slice boundary and the four 4x4 blocks labeled bθ, bl, b4 and b5 constitute the portion of the 16x16 macroblock under consideration for reconstruction. The 17 neighboring pixels above the line AA', which could normally have been used for performing the intra- 16x16 spatial prediction, are not available since they are located on the other side of the slice boundary and hence they belong to another slice. The potential availability of 16 neighboring pixels, those located immediately to the left of line BB' in this example, implies that the permissible intra- 16x16 coding spatial prediction modes for the current macroblock are i) mode 1 (horizontal), and ii) mode 2 (DC). When neither the 16 neighboring pixels located immediately to the left of line BB', nor the 17 pixels located above the line AA' are available, which would, for example, be the case if line BB' marks another slice boundary (or the left boundary of the video frame), the permissible intra-16xl6 prediction mode is mode 2 (DC).
[0069] When the current macroblock is encoded using the Intra- 16x16 prediction mode 1 (horizontal), then the topmost four neighboring pixels located immediately to the left of line BB' and below the line AA' are sufficient for decoding and reconstructing the topmost four 4x4 blocks within the current 16x16 macroblock. This is consistent with the above described framework enabling the decoding of the topmost four 4x4 blocks in intra-4x4 coded macroblocks.
[0070] However, when the current macroblock is encoded using the Intra- 16x16 spatial prediction mode 2 (DC), and it is not immediately to the right of a slice boundary nor on the left frame boundary, then all 16 neighboring pixels located immediately to the left of line BB' are used for decoding and reconstructing the topmost four 4x4 blocks within the current MB (as well as all others in the row). This is an undesirable situation. In one aspect, it is beneficial to avoid encoding with the intra- 16x16 spatial prediction mode 2 (DC) immediately below a slice boundary. It is desirable that the topmost 4 neighboring pixels may be used for reconstruction of the pixels below a slice boundary (e.g., the pixels I, J, K and L in FIG. 8). [0071] In one aspect, the intra- 16x16 coding of macroblocks which are located immediately below a slice boundary should be limited to the spatial prediction mode 1 (horizontal), unless they are located immediately to the right of a slice boundary, or at the left frame boundary. This allows for computationally efficient reconstruction of the rightmost four pixels of all the topmost 4x4 blocks in the row. This in turn allows for computationally efficient reconstruction of the topmost four pixels of all the topmost 4x4 blocks in the row.
[0072] HG. 10 shows one aspect of a 8x8 chroma block located immediately below a slice boundary. The line AA' marks the slice boundary and the two 4x4 blocks immediately below line AA' and to the right of line BB' constitute data for one of the two chroma channels (Cr and Cb). The nine neighboring pixels above the slice boundary line AA' are not available for spatial prediction, in this example, since they are located on the other side of the slice boundary and hence they belong to another slice. The availability of 8 neighboring pixels, those located immediately to the left of line BB', implies that the permissible chroma channel intra prediction modes for the current MB are limited to i) mode 0 (DC) and ii) mode 1 (horizontal). When the line BB' is also a slice boundary or the left boundary of the video frame, neither the 8 neighboring pixels located immediately to the left of line BB', nor the 9 pixels located immediately above the line AA' are available for spatial prediction. In this case, the permissible chroma channel intra prediction mode is mode 0 (DC). [0073] When the current intra coded macroblock's chroma channels are encoded using the Intra-8x8 chroma horizontal prediction mode, the topmost four neighboring pixels located immediately to the left of line BB' may be needed for decoding and reconstructing the topmost two 4x4 chroma blocks within the current MB. It should be noted that there are two 8x8 chroma blocks corresponding to one 16x16 luma macroblock.
[0074] Likewise, when the current intra-coded macroblock chroma channels (Cr and Cb) are encoded using the Intra-8x8 chroma prediction mode 2 (DC), the availability of the 8 neighboring pixels located immediately to the left of line BB' is adequate for decoding and reconstructing the topmost two 4x4 blocks. This is again consistent with the above described framework. [0075] In one aspect, the intra-8x8 coding of chroma channels (Cr and Cb) of intra- coded macroblocks, which are located immediately below a slice boundary, should be limited to the spatial prediction mode 1 (horizontal), unless they are located immediately to the right of a slice boundary, or at the left frame boundary. This allows for computationally efficient reconstruction of the rightmost four pixels of all the topmost 4x4 blocks in the row. This in turn allows for computationally efficient reconstruction of the topmost four pixels of all the topmost 4x4 blocks in the row. This is consistent with the above described framework enabling the decoding of the topmost four 4x4 blocks in intra-coded macroblocks luma channels (both intra-4x4 coded macroblocks, and intra-16xl6 coded macroblocks with the limitations placed on the use of intra- 16x16 DC spatial prediction mode as discussed above.)
Efficient Partial Decoding of Intra-coded Samples in H.264
[0076] It has been shown that partial decoding of the four right most pixels of 4x4 pixel blocks allows for decoding of some and/or all of the pixels of intra-coded blocks to the right of the initial 4x4 block. The problem of efficiently decoding the fourth i.e. the last, column of the residual component of a 4x4 intra-coded block contributing to the reconstruction of final pixel values for positions {d, h, 1, p} in FIG. 8, will now be addressed. This example uses the basis images of the H.264 integer transform. However, it should be noted that basis images of other transforms could be manipulated in similar ways, allowing for similar efficient partial decoding. Other transforms that may be partially decoded using these methods include, but are not limited to, the DCT (Discrete Cosine Transform), the DFT (Discrete Fourier Transform), the Hadamard (or Walsh-Hadamard) transform, discrete wavelet transforms, the DST (Discrete Sine Transform), the Haar transform, the Slant transform, and the KL (Karhunen-Loeve) transform.
[0077] In general, a forward transformation of an NxN matrix [Y] of multimedia samples using a transformation matrix [T] resulting in a transform coefficient matrix [w] takes the form:
Figure imgf000022_0001
[0078] The corresponding inverse transformation to reconstruct the multimedia sample matrix [Y] is of the form:
Figure imgf000023_0001
[0079] The transformations represented by equations (3) and (4) can each be thought of as two one-dimensional (ID) transforms resulting in a two-dimensional (2D) transform. For example, the [Y] [T] matrix multiplication operation can be thought of as a ID row transform and the [T]T[Y] matrix multiplication operation can be thought of as a ID column transform. The combination forms a 2D transform. Another way of thinking about the 2D transform of an NxN matrix [Y] is to perform N2 inner-products of [Y] with 2D basis images corresponding to the 2D transform characterized by the transform matrix [T], leading to a set of N2 values identical to the set of transform coefficients. [0080] Basis images of a given transform [T] can be calculated by setting one of the transform coefficients to one and setting all others to zero, and taking an inverse transform of the resulting coefficient matrix. For example, using a 4x4 transform coefficient matrix [w], and setting the W11 coefficient to 1 and all others to zero, and using the H.264 integer transform [TH], equation (4) results in:
[0081] By summing the 16 (N τ2 _ w^_here N=4) matrices formed by using the individual transform coefficients (weights) in [w] to weigh (scale) the 16 (N2) basis images, the entire reconstructed matrix [Y] can be calculated. This is not an efficient method compared to fast transform methods for calculating the entire matrix. However, reconstructing a subset, such as a row or column, can be done more efficiently than a fast transform through using basis images.
[0082] The 16 basis images associated with the H.264 4x4 integer transformation process for residual 4x4 blocks can be determined to be as follows, where sij (for i,j e {0,1,2,3}) is the basis image associated with ith horizontal and jth vertical frequency channel.
1 1 1 1 1 1 1 1 sOO= (6a) 1 1 1 1 1 1 1 1 1 0.5 -0.5 -1
1 0.5 -0 .5 -1 slθ = (6b)
1 0.5 -0 .5 -1
1 0.5 -0 .5 -1
1 _j 1
1 -1 -1 1 s20 = (6c)
1 -1 -1 1
1 -1 -1 1
Figure imgf000024_0001
1 1 1 1
C ).5 0.5 0.5 0.5 sθl = (6e)
-0.5 -0.5 -0.5 -0.5
-1
1 0.5 -( ).5 -1
0.5 0.25 -0.25 -0.5 sll = (6f)
-0.5 -0.25 0.25 0.5
-1 -0.5 0.5 1
1 -1 -1 1
0.5 -0.5 -0.5 0.5
521 = (6g)
-0.5 0.5 0.5 -0.5
-1 1 1 -1
0.5 _j 1 -0.5
0.25 -0.5 0.5 -0.25 s31 = (6h)
-0.25 0.5 -0.5 0.25
-0.5 1 -1 0.5
Figure imgf000024_0002
W
23
Figure imgf000025_0001
1 -1 -1 1
-1 1 1 -1 s22 = (6k)
1 1
1 -1 -1 1
Figure imgf000025_0002
[0083] A careful look at these 16 basis images reveals that their last columns actually contain four distinct vectors, except for scale factors. This should be intuitively clear since the last column being a 4x1 matrix/vector lies in a four-dimensional vector space and hence can be expressed with exactly 4 basis vectors. [0084] When the quantized transform coefficients (i.e. levels, zij i,je {0,1,2,3}, are received in the bitstream, they are rescaled (dequantized) to generate the coefficients w'ij i,j 6 {0,1,2,3}. These dequantized transform coefficients w'ij i,j 6 {0,1,2,3}, can then be parsed into groups that get combined and be multiplied with the last column (or vector) of the basis images to emulate the inverse transformation process (i.e. to generate the weights to weigh the basis images in the synthesis process). This observation implies that the reconstruction expression for the last column of the 4x4 residual signals [ Δ d Ah Δj Δp ] τ corresponding to the positions { d h 1 p} in FIG. 8 can be written as:
[Ad A11 A1 AP f =
(w'00-w'10+w'20-w'30/2)*[l 1 1 l]τ+
(w'Ol-w'l l+w'21-w'31/2)*[l 0.5 -0.5 -l]τ+ (7)
(w'02-w'12+w'22-w'32/2)*[- 1 -1 -1 l]τ+
(w'03-w'13+w'23-w'33/2)*[0.5 -1 1 -0.5]τ
[0085] Note that once the four different combinations of scalar quantities w'ij in the four sets of parentheses above are calculated, right shifts and additions/subtractions can be used to complete the scaling/calculation of each basis vector. The calculation of the reconstructed samples is then straight forward. By starting at the far left side of a frame or immediately to the right of a slice boundary, it is known that the spatial prediction mode 2 (DC) may be used, and all pixels have a reference (or prediction) value (see p in equation (1) above) equal to 128. Thus the reconstructed samples [rd rh η rp ] corresponding to the positions {d h 1 p} for this first left most block can be calculated as:
[0086] [rd rh T1 rp f = [ Δ d A h A , Δ p f + [128 128 128 128]τ; (8) where the reconstructed residual values [Δd Δh Δj Δp ] τ are calculated with equation (7). The 4x4 blocks to the right of this block can then be calculated by using the appropriate reconstructed values from the block to the left to generate the prediction signal component p in equation (1) (the prediction signal values generated depend on which spatial prediction mode was used to encode the 4x4 block being reconstructed). Examples of calculating the prediction values for other 4x4 blocks positioned below a slice boundary are now discussed. [0087] FIG. 11 illustrates a portion of multimedia samples located immediately below a slice boundary. The pixels may comprise luma and chroma values. The pixel positions
{q r s t} represent previously reconstructed positions with pixel values [rq rr rs rt] (e.g., calculated using equation 7 above). After reconstruction of the residual signal component values [Ad Ah A1 Ap ] 7 for pixel positions {d h 1 p}, the prediction signal component values [pdPhPiPp ] τ for the same set of positions {d h 1 p} will be generated to finalize the reconstruction in accordance with equation (1). Given that the intra-4x4 coded 4x4 block containing the pixels {d, h, 1, p} is immediately below a slice boundary, the intra-4x4 spatial prediction modes which could have been used to generate the prediction signal for this 4x4 block can be one of the following:
1. Intra-4x4 spatial prediction mode 1 (Horizontal):
With respect to Figure 11, the prediction signal component values are given by:
ΪPd Ph Pi PpY = [rq rr rs rt]τ, (9) comprising 0 additions, 0 arithmetic shifts, and 0 multiplications.
2. Intra-4x4 spatial prediction mode 2 (DC):
If pixels at locations {q, r, s, t} are not available, then the prediction signal component values are given by: lPά Ph Px PPf = [128 128 128 128]τ, (10) comprising 0 additions, 0 arithmetic shifts, and 0 multiplications. If {q, r, s, t} are available, then the prediction signal component values are given by:
[pdPhPιPpf = [u u u uf, (11) where u = ((rq + rr + rs + rt) + 2) » 2, comprising 4 additions, 1 arithmetic shift and 0 multiplications.
3. Intra-4x4 spatial prediction mode 8 (Horizontal-Up): The prediction signal component values are given by: pd = ((rr + 2rs + rt + 2) » 2), (12a)
Ph = ((rs + 3rt + 2) » 2) = ((rs + 2rt + rt + 2) » 2), (12b)
Pι =Pp = τt, (12c) comprising 6 additions, 4 arithmetic shifts, and 0 multiplications, or 8 additions, 2 arithmetic shifts and 0 multiplications.
[0088] One more observation regarding the rescaling process ( dequantizing zij i,j 6 {0,1,2,3} to generate w'ij ij- e {0,1,2,3}), may reveal another source of significant computational savings. Note that the rescaling factors vij i,je {0,1,2,3} which are used to scale zij ij e {0,1,2,3}, in addition to their dependence on the quantization parameter, also possess the following position related structure within a 4x4 matrix: v 00 v 10 v 20 v 30 v θl v ll v 21 v 31 v 02 v 12 v 22 v 32 v 03 v 13 v 23 v 33 where three groups of rescaling factors including [vOO, v20, vO2 v22], [vll, v31, vl3, v33] and [vlO, v30, v01, v21, vl2, v32, v03, v23] each have the same value for a given quantization parameter QPy. This can be used to advantage to reduce the number of multiplications associated with the generation of w'ij from zij as follows. Note that in the above given weighted basis vectors sum formula (equation 7) to reconstruct the 4x4 residual signal's last column, the first weight weighing the basis vector [1 1 1 l]τ contains the sum of w'OO and w'20 rather than the individual values of these two weights. Therefore, instead of individually calculating the two values, w'OO and w'20, and consequently summing them up which would have commonly involved two integer multiplications, we can add zOO and z20 first and then rescale this sum with vOO = v20, to get the same final value for (w'00+w'20) through one integer multiplication. [0089] Other than these straightforward reductions in the computational steps for executing this partial decoding, also fast algorithms to calculate the desired last column and first (topmost) row of the 4x4 residual signal can be designed. [0090] Another practical fact which may lead to low computational steps for this partial decoding process is that most of the time out of a maximum of 16 quantized coefficients within a residual signal block, a few, typically less than 5, are actually non-zero. The above in conjunction with this fact can be used to further reduce, almost halve, the number of multiplications involved. [0091] Those of skill in the art will recognize that formula similar to equation (7) above may be derived to reconstruct any column, row, diagonal or any portion and/or combination thereof. For example, the top row values of the basis images (equations 6a to 6p above) could be combined with the corresponding transform coefficients w'jj to reconstruct the pixels just below a slice boundary (see pixel positions {A B C D} in FIG. 11), which are dependent on the same four pixel positions {d h 1 p} in the block to the left. Other subsets of multimedia samples that can be reconstructed using these methods will be apparent to those skilled in the art.
[0092] FIG. 12 is a functional block diagram illustrating another example of a decoder device 150 that may be used in a system such as illustrated in FIG. 1. This aspect includes means for receiving transform coefficients, wherein the transform coefficients are associated with multimedia data, first determiner means for determining a set of multimedia samples to be reconstructed, second determiner means for determining a set of the received transform coefficients based on the multimedia samples to be reconstructed, and generator means for processing the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples. Some examples of this aspect include where the receiving means comprises a receiver 202, where the first determiner means comprises a multimedia sample determiner 204, where the second determiner means comprises a transform coefficient determiner 206 and where the generator means comprises a reconstructed sample generator 208.
[0093] FIG. 13 is a functional block diagram illustrating another example of a decoder device 150 that may be used in a system such as illustrated in FIG. 1. This aspect includes means for receiving transform coefficients, wherein the transform coefficients are associated with multimedia data, first determiner means for determining a set of multimedia samples to be reconstructed, second determiner means for determining a set of the received transform coefficients based on the multimedia samples to be reconstructed, and generator means for processing the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples. Some examples of this aspect include where the receiving means comprises a module for receiving 1302, where the first determiner means comprises a module for determining samples for reconstruction 1304, where the second determiner means comprises a module for determining transform coefficients 1306 and where the generator means comprises a module for processing transform coefficients 1308. [0094] Those of ordinary skill in the art would understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
[0095] Those of ordinary skill would further appreciate that the various illustrative logical blocks, modules, and algorithm steps described in connection with the examples disclosed herein may be implemented as electronic hardware, firmware, computer software, middleware, microcode, or combinations thereof. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed methods.
[0096] The various illustrative logical blocks, components, modules, and circuits described in connection with the examples disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core or ASIC core, or any other such configuration. [0097] The steps of a method or algorithm described in connection with the examples disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, an optical storage medium, or any other form of storage medium known in the art. An example storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuit (ASIC). The ASIC may reside in a wireless modem. In the alternative, the processor and the storage medium may reside as discrete components in the wireless modem.
[0098] The previous description of the disclosed examples is provided to enable any person of ordinary skill in the art to make or use the disclosed methods and apparatus. Various modifications to these examples will be readily apparent to those skilled in the art, and the principles defined herein may be applied to other examples and additional elements may be added.
[0099] Thus, methods and apparatus to perform highly efficient partial decoding of multimedia data have been described.

Claims

CLAIMSWHAT IS CLAIMED IS:
1. A method of processing multimedia data, comprising: receiving transform coefficients, wherein the transform coefficients are associated with the multimedia data; determining a set of multimedia samples to be reconstructed; determining a set of the received transform coefficients based on the multimedia samples to be reconstructed; and processing the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples.
2. The method of Claim 1, wherein processing comprises scaling the transform coefficients of the set.
3. The method of Claim 2, wherein scaling the transform coefficients comprises dequantizing.
4. The method of Claim 1, wherein the determined set of multimedia samples comprises multimedia samples upon which other multimedia samples are encoded in reference to.
5. The method of Claim 1, wherein the determined set of multimedia samples comprises multimedia samples in a first slice of multimedia data that border a second slice of multimedia data.
6. The method of Claim 1, wherein the received transformed coefficients are associated with a matrix of multimedia samples transformed as a set, and the reconstructed samples comprise a subset of the matrix of multimedia samples.
7. The method of Claim 1, wherein processing comprises partitioning the determined set of transform coefficients into a plurality of groups.
8. The method of Claim 7, wherein the processing further comprises calculating a value for each group, wherein the calculation is based on the encoding method which generated the transform coefficients.
9. The method of Claim 8, wherein the processing further comprises: determining an array for each group based on the encoding method which generated the transform coefficients; and generating the set of reconstructed samples of the multimedia data based on the values and the arrays.
10. The method of Claim 1, further comprising estimating a set of concealment multimedia samples based on the reconstructed samples.
11. The method of Claim 10, further comprising generating a set of transform coefficients corresponding to the set of estimated concealment multimedia samples.
12. The method of Claim 10, wherein the reconstructed samples are non- causal to the estimated set of concealment multimedia samples.
13. The method of Claim 1, further comprising: receiving a directivity mode indicator associated with each reconstructed sample; and estimating a set of concealment multimedia samples based on the reconstructed samples and the directivity mode indicators.
14. A multimedia data processor being configured to: receive transform coefficients, wherein the transform coefficients are associated with multimedia data; determine a set of multimedia samples to be reconstructed; determine a set of the received transform coefficients based on the multimedia samples to be reconstructed; and process the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples.
15. The multimedia data processor of Claim 14, wherein the multimedia data processor is further configured to scale the determined set of transform coefficients.
16. The multimedia data processor of Claim 14, wherein the multimedia data processor is further configured to dequantize the determined set of transform coefficients.
17. The multimedia data processor of Claim 14, wherein the set of multimedia samples comprises multimedia samples upon which other multimedia samples are encoded in reference to.
18. The multimedia data processor of Claim 14, wherein the set of multimedia samples comprises multimedia samples in a first slice of multimedia data that border a second slice of multimedia data.
19. The multimedia data processor of Claim 14, wherein the received transformed coefficients are associated with a matrix of multimedia samples transformed as a set, and the reconstructed samples comprise a subset of the matrix of multimedia samples.
20. The multimedia data processor of Claim 14, wherein the multimedia data processor is further configured to partition the determined set of transform coefficients into a plurality of groups.
21. The multimedia data processor of Claim 20, wherein the multimedia- data processor is further configured to calculate a value for each group, wherein the calculation is based on the encoding method which generated the transform coefficients.
22. The multimedia data processor of Claim 21, wherein the multimedia data processor is further configured to: determine an array for each group based on the encoding method which generated the transform coefficients; and generate the set of reconstructed samples of the multimedia data based on the values and the arrays.
23. The multimedia data processor of Claim 14, wherein the multimedia data processor is further configured to estimate a set of concealment multimedia samples based on the reconstructed samples.
24. The multimedia data processor of Claim 23, wherein the multimedia data processor is further configured to generate a set of transform coefficients corresponding to the set of estimated concealment multimedia samples.
25. The multimedia data processor of Claim 23, wherein the reconstructed samples are non-causal to the estimated set of concealment multimedia samples.
26. The multimedia data processor of Claim 14, wherein the multimedia data processor is further configured to: receive a directivity mode indicator associated with each reconstructed sample; and estimate a set of concealment multimedia samples based on the reconstructed samples and the directivity mode indicators.
27. An apparatus for processing multimedia data, comprising: a receiver to receive transform coefficients, wherein the transform coefficients are associated with multimedia data; a first determiner to determine a set of multimedia samples to be reconstructed; a second determiner to determine a set of the received transform coefficients based on the multimedia samples to be reconstructed; and a generator to process the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples.
28. The apparatus of Claim 27, wherein the generator scales the determined set of transform coefficients.
29. The apparatus of Claim 27, wherein the generator dequantizes the determined set of transform coefficients.
30. The method of Claim 27, wherein the determined set of multimedia samples comprises multimedia samples upon which other multimedia samples are encoded in reference to.
31. The method of Claim 27, wherein the determined set of multimedia samples comprises multimedia samples in a first slice of multimedia data that border a second slice of multimedia data.
32. The apparatus of Claim 27, wherein the received transformed coefficients are associated with a matrix of multimedia samples transformed as a set, and the reconstructed samples comprise a subset of the matrix of multimedia samples.
33. The apparatus of Claim 27, wherein the generator partitions the determined set of transform coefficients into a plurality of groups.
34. The apparatus of Claim 33, wherein the generator calculates a value for each group, wherein the calculation is based on the encoding method which generated the transform coefficients.
35. The apparatus of Claim 34, wherein the generator determines an array for each group based on the encoding method which generated the transform coefficients, and generates the set of reconstructed samples of the multimedia data based on the values and the arrays.
36. The apparatus of Claim 27, further comprising an estimator to estimate a set of concealment multimedia samples based on the reconstructed samples.
37. The apparatus of Claim 36, wherein the estimator generates a set of transform coefficients corresponding to the set of estimated concealment multimedia samples.
38. The apparatus of Claim 36, wherein the reconstructed samples are non- causal to the estimated set of concealment multimedia samples.
39. The apparatus of Claim 27, wherein the receiver receives a directivity mode indicator associated with each reconstructed sample, and the apparatus further comprises an estimator to estimate a set of concealment multimedia samples based on the reconstructed samples and the directivity mode indicators.
40. An apparatus for processing multimedia data, comprising: means for receiving transform coefficients, wherein the transform coefficients are associated with multimedia data; first determiner means for determining a set of multimedia samples to be reconstructed; second determiner means for determining a set of the received transform coefficients based on the multimedia samples to be reconstructed; and generator means for processing the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples.
41. The apparatus of Claim 40, wherein the generator means scales the determined set of transform coefficients.
42. The apparatus of Claim 40, wherein the generator means dequantizes the determined set of transform coefficients.
43. The multimedia data processor of Claim 40, wherein the set of multimedia samples comprises multimedia samples upon which other multimedia samples are encoded in reference to.
44. The multimedia data processor of Claim 40, wherein the set of multimedia samples comprises multimedia samples in a first slice of multimedia data that border a second slice of multimedia data.
45. The apparatus of Claim 40, wherein the received transformed coefficients are associated with a matrix of multimedia samples transformed as a set, and the reconstructed samples comprise a subset of the matrix of multimedia samples.
46. The apparatus of Claim 40, wherein the generator means partitions the determined set of transform coefficients into a plurality of groups.
47. The apparatus of Claim 46, wherein the generator means calculates a value for each group, wherein the calculation is based on the encoding method which generated the transform coefficients.
48. The apparatus of Claim 47, wherein the generator means determines an array for each group based on the encoding method which generated the transform coefficients, and generates the set of reconstructed samples of the multimedia data based on the values and the arrays.
49. The apparatus of Claim 40, further comprising means for estimating a set of concealment multimedia samples based on the reconstructed samples.
50. The apparatus of Claim 49, wherein the estimator means generates a set of transform coefficients corresponding to the set of estimated concealment multimedia samples.
51. The apparatus of Claim 49, wherein the reconstructed samples are non- causal to the estimated set of concealment multimedia samples.
52. The apparatus of Claim 40, wherein the receiving means receives a directivity mode indicator associated with each reconstructed sample, and the apparatus further comprises means for estimating a set of concealment multimedia samples based on the reconstructed samples and the directivity mode indicators.
53. A machine readable medium comprising instructions that upon execution cause a machine to: receive transform coefficients, wherein the transform coefficients are associated with multimedia data; determine a set of multimedia samples to be reconstructed; determine a set of the received transform coefficients based on the multimedia samples to be reconstructed; and process the determined set of transform coefficients to generate reconstructed samples corresponding to the determined set of multimedia samples.
54. The machine readable medium of Claim 53, the instructions further cause the machine to scale the determined set of transform coefficients.
55. The machine readable medium of Claim 53, the instructions further cause the machine to dequantize the determined set of transform coefficients.
56. The machine readable medium of Claim 53, wherein the set of multimedia samples comprises multimedia samples upon which other multimedia samples are encoded in reference to.
57. The machine readable medium of Claim 53, wherein the set of multimedia samples comprises multimedia samples in a first slice of multimedia data that border a second slice of multimedia data.
58. The machine readable medium of Claim 53, wherein the received transformed coefficients are associated with a matrix of multimedia samples transformed as a set, and the reconstructed samples comprise a subset of the matrix of multimedia samples.
59. The machine readable medium of Claim 53, the instructions further cause the machine to partition the determined set of transform coefficients into a plurality of groups.
60. The machine readable medium of Claim 59, wherein the instructions further cause the machine to calculate a value for each group, wherein the calculation is based on the encoding method which generated the transform coefficients.
61. The machine readable medium of Claim 60, the instructions further cause the machine to: determine an array for each group based on the encoding method which generated the transform coefficients; and generate the set of reconstructed samples of the multimedia data based on the values and the arrays.
62. The machine readable medium of Claim 53, the instructions further cause the machine to estimate a set of concealment multimedia samples based on the reconstructed samples.
63. The machine readable medium of Claim 62, the instructions further cause the machine to generate a set of transform coefficients corresponding to the set of estimated concealment multimedia samples.
64. The machine readable medium of Claim 62, wherein the reconstructed samples are non-causal to the estimated set of concealment multimedia samples.
65. The machine readable medium of Claim 53, the instructions further cause the machine to: receive a directivity mode indicator associated with each reconstructed sample; and estimate a set of concealment multimedia samples based on the reconstructed samples and the directivity mode indicators.
PCT/US2006/037996 2005-09-27 2006-09-27 Video encoding method enabling highly efficient partial decoding of h.264 and other transform coded information WO2007038727A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN2006800430179A CN101310536B (en) 2005-09-27 2006-09-27 Video encoding method enabling highly efficient partial decoding of H.264 and other transform coded information
EP06815757A EP1941742A2 (en) 2005-09-27 2006-09-27 Video encoding method enabling highly efficient partial decoding of h.264 and other transform coded information
JP2008533642A JP2009510938A (en) 2005-09-27 2006-09-27 H. H.264 and other video coding methods enabling efficient partial decoding of transform coding information

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US72137705P 2005-09-27 2005-09-27
US60/721,377 2005-09-27

Publications (2)

Publication Number Publication Date
WO2007038727A2 true WO2007038727A2 (en) 2007-04-05
WO2007038727A3 WO2007038727A3 (en) 2007-08-02

Family

ID=37835195

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/037996 WO2007038727A2 (en) 2005-09-27 2006-09-27 Video encoding method enabling highly efficient partial decoding of h.264 and other transform coded information

Country Status (7)

Country Link
EP (1) EP1941742A2 (en)
JP (2) JP2009510938A (en)
KR (1) KR100984650B1 (en)
CN (1) CN101310536B (en)
AR (1) AR055185A1 (en)
TW (1) TW200719726A (en)
WO (1) WO2007038727A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010036720A1 (en) * 2008-09-26 2010-04-01 Qualcomm Incorporated Determining availability of video data units
CN102334337A (en) * 2008-10-02 2012-01-25 韩国电子通信研究院 Apparatus and method for coding/decoding image selectivly using descrete cosine/sine transtorm
US8660176B2 (en) 2008-09-26 2014-02-25 Qualcomm Incorporated Resolving geometric relationships among video data units
US8724697B2 (en) 2008-09-26 2014-05-13 Qualcomm Incorporated Locating motion vectors for video data units
WO2021187855A1 (en) * 2020-03-17 2021-09-23 Samsung Electronics Co., Ltd. Method and apparatus for video encoding and decoding

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113556563B (en) * 2010-04-13 2024-08-20 Ge视频压缩有限责任公司 Coding of saliency maps and transform coefficient blocks
CN103636220B (en) * 2011-06-28 2017-10-13 寰发股份有限公司 The method and device of coding/decoding intra prediction mode

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004064406A1 (en) * 2003-01-10 2004-07-29 Thomson Licensing S.A. Defining interpolation filters for error concealment in a coded image

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01141481A (en) * 1987-11-27 1989-06-02 Matsushita Electric Ind Co Ltd Image processor
GB2263373B (en) * 1992-01-09 1995-05-24 Sony Broadcast & Communication Data error concealment
US5532837A (en) * 1992-12-18 1996-07-02 Matsushita Electric Industrial Co., Ltd. Digital video signal processing apparatus
US5621467A (en) * 1995-02-16 1997-04-15 Thomson Multimedia S.A. Temporal-spatial error concealment apparatus and method for video signal processors
JP3795960B2 (en) * 1996-07-18 2006-07-12 三洋電機株式会社 Image display device
JP2001086504A (en) * 1999-09-09 2001-03-30 Toshiba Digital Media Engineering Corp Mpeg video decoder
US6662329B1 (en) * 2000-03-23 2003-12-09 International Business Machines Corporation Processing errors in MPEG data as it is sent to a fixed storage device
EP2860978B1 (en) * 2002-05-28 2020-04-08 Dolby International AB Method and systems for image intra-prediction mode estimation, communication, and organization
CN1323553C (en) * 2003-01-10 2007-06-27 汤姆森许可贸易公司 Spatial error concealment based on the intra-prediction modes transmitted in a coded stream

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004064406A1 (en) * 2003-01-10 2004-07-29 Thomson Licensing S.A. Defining interpolation filters for error concealment in a coded image

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
HIROSHI FUJIWARA ET AL: "AN ALL-ASIC IMPLEMENTATION OF A LOW BIT-RATE VIDEO CODEC" IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 2, no. 2, 1 June 1992 (1992-06-01), pages 123-134, XP000280652 ISSN: 1051-8215 *
KORDASIEWICZ R ET AL: "Hardware implementation of the optimized transform and quantization blocks of H.264" ELECTRICAL AND COMPUTER ENGINEERING, 2004. CANADIAN CONFERENCE ON NIAGARA FALLS, ONT., CANADA 2-5 MAY 2004, PISCATAWAY, NJ, USA,IEEE, US, 2 May 2004 (2004-05-02), pages 943-946Vol2, XP010733981 ISBN: 0-7803-8253-6 *
LE BUHAN C: "Software-embedded data retrieval and error concealment scheme for MPEG-2 video sequences" PROCEEDINGS OF THE CONFERENCE ON DIGITAL VIDEO COMPRESSION: ALGORITHMS AND TECHNOLOGIES 1996, SAN JOSE, CA, USA, 31 JAN.-2 FEB. 1996, vol. 2668, 31 January 1996 (1996-01-31), pages 384-391, XP002433614 SPIE - THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING SPIE-INT. SOC. OPT. ENG USA ISSN: 0277-786X *
MALVAR H S ET AL: "LOW-COMPLEXITY TRANSFORM AND QUANTIZATION IN H.264/AVC" IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 13, no. 7, July 2003 (2003-07), pages 598-603, XP001051187 ISSN: 1051-8215 *
SUH J-W ET AL: "ERROR CONCEALMENT BASED ON DIRECTIONAL INTERPOLATION" IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 43, no. 3, 1 August 1997 (1997-08-01), pages 295-302, XP000742504 ISSN: 0098-3063 *
WAI-KUEN CHAM ET AL: "INTEGER SINUSOIDAL TRANSFORMS FOR IMAGE PROCESSING" INTERNATIONAL JOURNAL OF ELECTRONICS, TAYLOR AND FRANCIS.LTD. LONDON, GB, vol. 70, no. 6, 1 June 1991 (1991-06-01), pages 1015-1030, XP000240218 ISSN: 0020-7217 *
WILSON KWOK ET AL: "MULTI-DIRECTIONAL INTERPOLATION FOR SPATIAL ERROR CONCEALMENT" IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 39, no. 3, 1 August 1993 (1993-08-01), pages 455-460, XP000396318 ISSN: 0098-3063 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010036720A1 (en) * 2008-09-26 2010-04-01 Qualcomm Incorporated Determining availability of video data units
US8634457B2 (en) 2008-09-26 2014-01-21 Qualcomm Incorporated Determining availability of video data units
US8660176B2 (en) 2008-09-26 2014-02-25 Qualcomm Incorporated Resolving geometric relationships among video data units
US8724697B2 (en) 2008-09-26 2014-05-13 Qualcomm Incorporated Locating motion vectors for video data units
CN102334337A (en) * 2008-10-02 2012-01-25 韩国电子通信研究院 Apparatus and method for coding/decoding image selectivly using descrete cosine/sine transtorm
US11176711B2 (en) 2008-10-02 2021-11-16 Intellectual Discovery Co., Ltd. Apparatus and method for coding/decoding image selectively using discrete cosine/sine transform
US11538198B2 (en) 2008-10-02 2022-12-27 Dolby Laboratories Licensing Corporation Apparatus and method for coding/decoding image selectively using discrete cosine/sine transform
WO2021187855A1 (en) * 2020-03-17 2021-09-23 Samsung Electronics Co., Ltd. Method and apparatus for video encoding and decoding

Also Published As

Publication number Publication date
KR100984650B1 (en) 2010-10-01
CN101310536A (en) 2008-11-19
JP2012231505A (en) 2012-11-22
TW200719726A (en) 2007-05-16
AR055185A1 (en) 2007-08-08
KR20080066714A (en) 2008-07-16
WO2007038727A3 (en) 2007-08-02
JP2009510938A (en) 2009-03-12
CN101310536B (en) 2010-06-02
EP1941742A2 (en) 2008-07-09

Similar Documents

Publication Publication Date Title
US9055298B2 (en) Video encoding method enabling highly efficient partial decoding of H.264 and other transform coded information
US10687075B2 (en) Sub-block transform coding of prediction residuals
US20210014509A1 (en) Signaling residual signs predicted in transform domain
EP2941004B1 (en) Method and apparatus for decoding intra prediction mode
EP2942956A2 (en) Image decoding apparatus
US20070025631A1 (en) Adaptive variable block transform system, medium, and method
US9852521B2 (en) Image coding device, image decoding device, methods thereof, and programs
CN110741645B (en) Blockiness reduction
WO2008004768A1 (en) Image encoding/decoding method and apparatus
EP1997317A1 (en) Image encoding/decoding method and apparatus
WO2008020687A1 (en) Image encoding/decoding method and apparatus
EP1941742A2 (en) Video encoding method enabling highly efficient partial decoding of h.264 and other transform coded information
CN110741636A (en) Transform block level scan order selection for video coding
US10015484B2 (en) Adaptive scan device and method for scanning thereof
EP3754983B1 (en) Early intra coding decision
WO2022146215A1 (en) Temporal filter
Kesireddy A new adaptive trilateral filter for in-loop filtering

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200680043017.9

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase

Ref document number: 2008533642

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2006815757

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 700/MUMNP/2008

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 1020087010065

Country of ref document: KR