PARTIALLY PARALLEL DECODER FOR COMPRESSED HDTV VIDEO DATA
Related Applications
This application claims priority from U.S. Serial No. 60/057,361 which was filed on August 29, 1997.
Technical Field The present invention relates to a system and method for decompressing digital video data, particularly for decoding MPEG video data in high-bit-rate applications.
Background The Moving Picture Experts Group (MPEG) of the International Standards Organization (ISO) has produced the MPEG-2 Draft Standard for compression and decompression of video data. The standard is described in a publication entitled "Information Technology-Generic Coding of Moving Pictures and Associated Audio, Recommendation H.626" ISO/IEC 13818-2 DIS, 3/94, which is available from the ISO and hereby incorporated herein by reference. One data format described in the MPEG- 2 standard has been proposed as a standard for digitally encoded high definition television (HDTV) signals.
The MPEG-2 standard defines a method for very effective compression of video data by taking advantage of information redundancy both within each video frame and between video frames occurring near each other in time. This compression is achieved through a combination of motion compensation, discrete cosine transforms (DCT), coefficient
quantization, and variable length encoding of many of these parameters and data, each of these processes being well known in the video data processing art. MPEG-2 HDTV decoders for distribution and end user applications operate on compressed video Program Elementary Streams (PESs) at a 20 Mbit/sec transfer rate . For broadcast studio applications requiring production-type services to be performed on the video PESs, the MPEG-2 HDTV decoders may be required to operate on compressed PESs up to a High Data Rate of 622 Mbit/sec. Video PESs at this high rate are typically suited for editing, splicing, fading in or out, and other effects, and as such need high quality data containment with very little compression of the original data obtained from the video source, such as uncompressed video camera output.
For MPEG-2 HDTV decoders to operate at high data rates, the variable length decoder (VLD) must operate on streams at data rates several times faster than those used in some end user applications. To handle high bit rate streams, it has been proposed that decompression of data representing sub-images ("slices") within each video image be processed in parallel to yield corresponding blocks of final pel values, which blocks are combined to form the image to be displayed. In order to generate final pel values from MPEG-2 data using such a fully parallel process, the motion compensation portion of each parallel block must communicate final pel values to the corresponding portions of other parallel blocks that depend on those pel values in the motion compensation process. Further, it requires redundant components for inverse quantization and inverse DCT processing, even for applications where a single instance of that circuitry may be adequate to process the full combined stream.
Disclosure of the Invention It is one object of this invention to enable real-time decoding of high data rate MPEG-2-coded data with a decreased number of redundant components in the decoder. It is another object of this invention to provide a decoder system that efficiently uses hardware components to enable high-speed decoding.
It is still another object of this invention to provide a partially parallel process in distinct components for simultaneously decoding variable-length code words for multiple portions of an image, then combining the decoded data substreams to form a composite decoded stream, then applying motion compensation vectors to the composite decoded stream.
Other features and advantages of the invention will appear from the following description with reference to the accompanying drawings, which illustrate an embodiment of the invention by way of a non-limitative example.
Brief Description of the Drawings Fig. 1 is a block diagram of a television receiving system according to a preferred embodiment of the invention.
Fig. 2 is a block diagram of an MPEG-2 decoder according to a preferred embodiment of the invention.
Fig. 3 is a block diagram of a parallel variable length decoder (VLD) system and certain upstream logic according to a preferred embodiment of the invention.
Fig. 4 is a detailed block diagram of a slice engine (VLD) block for use in a system according to a preferred embodiment of the invention.
Mode(s) for Carrying Out the Invention Fig. 1 provides an overall block diagram of an exemplary embodiment of a television receiving system that uses the present invention. Compressed television signal 90 (in this example, comprising MPEG-2 video and audio streams) from a television service provider is received by antenna 92 and forwarded to television receiving unit 104, which could be any device that uses or stores compressed video, including for example a television receiver, a video tape recorder (VTR), or a optical media reader or recorder. Within television receiving unit 104 is a television signal processor 94, which processes the compressed video input as required by the particular application.
Television signal processor 94 contains an MPEG decoder 96 that processes the MPEG-2 streams in compressed television signal 90. Decoder 96 comprises a video decoder 97 and an audio decoder 98 for converting the MPEG-2 video and audio streams, respectively, to a more easily usable format. Television signal processor 94 may send the output of decoders 96, 97, and 98 to a display component 100 and speakers 102, or that output may be recorded on magnetic or optical media (not shown). Fig. 2 shows a variable length decoder (VLD) in the context of an MPEG-2 video decoder system. An MPEG-2 coded stream source 2 provides variable length encoded data to buffer 4 to begin the decoding process. VLD 6 accepts variable-length-encoded data from buffer 4 and converts it to fixed-length data structures, including for example motion vectors, DCT coefficients, DCT DC luminance and chromiance values, Macro Block (MB) type descriptors, and other data structures used in MPEG-2 coded streams or custom data streams transmitted therewith.
In the preferred embodiment shown in Fig. 2, VLD 6 forwards motion vectors to motion compensation unit 14 for later processing. Inverted, quantized DCT coefficients are sent to inverse scanner 8, which reorders the quantized coefficients. At inverse quantization block 10, quantized
DCT coefficients are converted to actual DCT coefficients, which are transformed into pel characteristic data by inverse discrete cosine transform (I DCT) block 12. Motion compensation unit 14 adjusts the pel data output from IDCT 12 based on motion vectors received from VLD 6. The resulting motion-compensated pel data may be stored in frame store memory 16 for use in later-decoded predicted or bi-directionally interpolated images. This fully decoded data may also be transmitted to a display device (such as a television), a recording device (such as a video tape recorder), or other destination as required by the particular application. Turning now to Fig. 3, we will discuss a preferred embodiment of the invention in the context of one possible application, namely decoding an MPEG-2 data stream that is received through an asynchronous transfer mode (ATM) interface. In this exemplary embodiment, an ATM interface 20 provides a data stream to serial-to-parallel converter 22 at a data rate of 622 MBits/second. Converter 22 forwards this data in byte form at
77.75 MBytes/second to ATM packet decoder 24, which removes ATM envelope data and passes the payload MPEG-2 transport stream to MPEG-2 Transport Stream Decoder 26. Transport Stream Decoder 26 separates the various portions of the MPEG-2 transport stream (for example, video, audio, auxiliary, and PCR) for appropriate processing. The remainder of this discussion will focus on decoding of the video stream.
The video portion of the MPEG stream is sent from Transport Stream Decoder 26 to buffer 28, and then to byte-to-word converter 30, where it is converted to 32-bit words. These words are transmitted at 19.4375 MWords/second to start codes detector 32, which scans the data stream for data record start codes as defined in the MPEG-2 standard. Start codes detector 32 transmits system and picture parameters to other portions of the decoder for use in final reconstruction of the encoded images. Slice-level data (as defined by the MPEG-2 standard) is sent to
slice parser 34 with a 2-bit index indicating at which of the four simultaneously-transmitted bytes the slice begins.
Slice parser 34 distributes MPEG-2 slice records to slice buffers 36- 1 to 36-n. (A single slice buffer, slice engine, slice output buffer, or a parallel path including each of them will be referred to generically herein with the suffix "-i".) Slice parser 34 might use any distribution scheme appropriate for the application. For example, slice parser 34 might send one slice record to each slice buffer 36-i in a sequential rotation. Alternatively, slice parser 34 might monitor the size of each slice record as it is distributed, then use that information to balance the loads between all available slice buffers 36-i. In the case of MPEG-2 video data, data encoding different slices within the same image may have varied sizes, so the latter approach might be advantageous in most such applications.
Note that the 2-bit index value is distributed by slice parser 34 with the slice data to slice buffers 36-i. In this exemplary embodiment, the slice buffers 36-i translate the index and slice data into a pure slice data stream, which is provided via a 32-bit-wide data path by each slice buffer 36-i to a corresponding slice engine (VLD) 38-i. Each slice engine 38-i can be designed to accept data at a lower clock rate of 38.88 MHz, with the combination of all slice engines 38-i being adequate to decode the entire stream of slice data.
Each slice engine 38-i transmits data it has decoded to a corresponding slice output buffer 40-i, which feeds the decoded slices to commutator 42 for reassembly. Commutator 42 provides motion vectors to a motion compensator (for example, motion compensator 14 shown in Fig. 2), and provides MB address, run length, DCT level, and MB parameter data to downstream decoding elements, such as inverse scanner 8, inverse quantizer 10, and IDCT block 12. It will be appreciated by those skilled in the art that the number of parallel decoder blocks 50-i may be selected as a system design parameter, based on required cost
and performance levels. In the exemplary embodiment, n = 16 decoder blocks 50-i are used.
It will also be appreciated by those skilled in the art that, depending on available components and price/performance requirements, one may include an inverse scanner, inverse quantizer, and/or IDCT block in each parallel decoder block 50-i without undue experimentation. In this alternative design, each parallel decoder block 50-i can still process its own data stream without reliance on output from other parallel decoder blocks, thus avoiding the additional bus structures and complexity required by some parallel decoder systems in the prior art.
Fig. 4 describes a possible design for a slice engine 38-i in accordance with the present invention. Buffer/serializer 60 receives variable length coded data in a preferably 32-bit data stream from other receiving or decoding elements. It stores this data as necessary, and forwards it in a 1 -bit-wide data stream to D-type flip flop 62, which is clocked at 38.88 MHz. Flip flop 62 passes this bit stream to address register 64, where it is combined with data from VLD type encoder 72 (discussed below) and data output drive 70 (also discussed below) to create an address for accessing 8k x 16-bit SRAM 66. SRAM 66 is initialized from a PROM (not shown) to contain the desired decoding trees and data. VLD type encoder 72 receives signals from a video data stream parser (which is not shown, but is easily designed by those skilled in the art based on specifications of the data streams being decoded) identifying the type of data being decoded. Encoder 72 provides a 4-bit VLD type code to address register 64, thereby selecting the appropriate decoding tree in SRAM 66.
The address constructed by address register 64 (13 bits in the present example embodiment) controls the address of SRAM 66 from which data is currently being retrieved. Portions of the data word at that address are output along reset line 74, and to data output drive 70 and
escape code detector 68. If the end of the current variable-length code word has not yet been reached, then 8 bits of data that were received by data output drive 70 are fed back to provide a portion of the address in address register 64 for the next clock cycle. Address register 64 then proceeds along the appropriate branch of the decoding tree based on the next bit of input from flip flop 62, and continues until a full variable-length code word has been received. Data loaded into SRAM 66 is arranged such that, when a full variable-length code word has been received, data provided to data output drive 70 comprises the decoded code word, and data sent to escape code detector 68 allows that decoded code word to reach output buffers 76, 77, and 78. This data is accepted by buffer 76, buffer 77, or buffer 78 depending on block-type signals from a video data stream parser (not shown) or VLD type encoder 72. The output of buffer 76, buffer 77, and buffer 78 is processed to complete the remaining steps of decoding the MPEG-2 video stream.
It may be appreciated that although the exemplary embodiment given above has been described in terms of a hardware implementation of an MPEG-2 video decoder, the invention claimed herein may be implemented in software, a combination of hardware and software, or application-specific integrated circuits (ASICs). The variable-length decoder 38-i may be implemented using any known method, including for example a decoder that detects the length of the current variable-length code word, obtains the decoded data, and barrel-shifts the input stream to acquire the next variable length code word. Other variable-length decoders adaptable for use in a system using the present invention are disclosed in U.S. Patent Nos. 5,657,016 to Bakhmutsky et al. and 5,663,725 to Jang. It should be noted that some read/write signals, reset signals, and clock signals have been omitted from this description, but will be fully within the understanding of those skilled in the video processing art, and may be included without undue experimentation.