WO2009085788A1 - System, method and device for processing macroblock video data - Google Patents

System, method and device for processing macroblock video data Download PDF

Info

Publication number
WO2009085788A1
WO2009085788A1 PCT/US2008/087084 US2008087084W WO2009085788A1 WO 2009085788 A1 WO2009085788 A1 WO 2009085788A1 US 2008087084 W US2008087084 W US 2008087084W WO 2009085788 A1 WO2009085788 A1 WO 2009085788A1
Authority
WO
WIPO (PCT)
Prior art keywords
macroblock
video data
video
memory
partially decoded
Prior art date
Application number
PCT/US2008/087084
Other languages
French (fr)
Inventor
Eric J. Devolder
Brendan D. Donahe
Sandip J. Ladhani
Rens Ross
Erik M. Schlanger
Eric Swartzendruber
Original Assignee
Rmi Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/967,697 external-priority patent/US8923384B2/en
Priority claimed from US11/967,690 external-priority patent/US8462841B2/en
Application filed by Rmi Corporation filed Critical Rmi Corporation
Publication of WO2009085788A1 publication Critical patent/WO2009085788A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/423Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A video processing device (650) includes a bitstream accelerator module (106) and a video processing engine (108). In another form, a video processing device (650) includes a memory (610, 630) and a plurality of staged macroblock processing engines (612, 614, 616). The memory (610, 630) is operable to store partially decoded video data decoded from a stream of encoded video data. The plurality of staged macroblock processing engines (612, 614, 616) is coupled to the memory (610, 630) and is responsive to a request to process the partially decoded video data to generate a plurality of macroblocks of decoded video data. In another form, a first a first macroblock of decoded video data having a first location (926) within a first row (908) of a video frame (900) is generated, and a second macroblock of decoded video data having a second location (924) within a second row (910) of the video frame (900) is generated during the generating of the first macroblock.

Description

SYSTEM, METHOD AND DEVICE FOR PROCESSING MACROBLOCK VIDEO DATA
FIELD OF THE DISCLOSURE
The present disclosure relates to video processing and more particularly to a system, method and device to encode and decode video stream data.
BACKGROUND
High-definition (HD) video signals typically require a high-definition television or other devices in order to be viewed. With an aspect ratio of 16:9 (1.78:1), HD video approaches current aspect ratios of regular widescreen film recorded at typically 1.85: 1 or 2.40: 1 (sometimes traditionally quoted at 2.35: 1). Standard-definition (SD) video differs from HD video by having an aspect ratio of 4:3 (1.33:1). Numerous video standards and formats have emerged to output HD and SD video. However, each format presents unique characteristics and specifications. As such, decoding and encoding digital video can be limited by processing capabilities of video processing systems to support either one standard or the other.
Moreover, HD video requires a significantly greater processing capability than SD because of HD's higher resolution. Video processing is typically carried out on macroblocks. A macroblock is a group of spatially adjacent pixels, usually forming a rectangular block, processed more or less together and somewhat separately from other pixels. An SD video system has a resolution of 720 by 480 pixels at a frame rate of 30 frames per second (fps). Thus for a macroblock having 256 pixels, an SD system requires 1,350 macroblocks to be processed per frame and a total of 40,500 macroblocks to be processed per second. On the other hand, HD has a resolution of 1920 by 108 pixels, which becomes 1920 by 108 when rounding 108 up to the nearest number divisible by 16, and thus for a macroblock of 256 pixels, an HD system requires 8,160 macroblocks to be processed per frame and a total of 244,800 macroblocks to be processed per second. These different processing requirements make it difficult to design a common video processing architecture that will be useful in processing both SD and HD signals and with sufficient processing power for HD systems.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will hereinafter be described in conjunction with the following drawing figures, wherein like numerals denote like elements, and:
FIG. 1 is a block diagram of a video processing system according to one embodiment of the present invention;
FIG. 2 is a flow diagram for a method of partially decoding an encoded video data bitstream used by the bitstream accelerator module of FIG. 1 ;
FIG. 3 is a flow diagram for a method of decoding partially decoded video data used by the video processing engine of FIG. 1; FIG. 4 is a flow diagram for a method of encoding raw video data using the video processing engine and the bitstream accelerator module of FIG. 1; and
FIG. 5 is a functional block diagram of the bitstream accelerator module of FIG. 1;
FIG. 6 is a block diagram of a video processing system according to the present invention;
FIG. 7 is a block diagram of a process used by the macroblock processing engines of FIG. 6 to perform decoding and encoding operations;
FIG. 8 illustrates a portion of a partially processed video data frame useful in understanding how video processing engine 608 operates when using a single macroblock processing engine; and
FIG. 9 is a graphical illustration of staged macroblock processing of video data in accordance with a particular aspect of the disclosure .
DETAILED DESCRIPTION
The following description in combination with the Figures is provided to assist in understanding the teachings disclosed herein. The following discussion will focus on specific implementations and embodiments of the teachings. This focus is provided to assist in describing the teachings and should not be interpreted as a limitation on the scope or applicability of the teachings. However, other teachings can certainly be utilized in this application. The teachings can also be utilized in other applications and with several different types of architectures such as distributed computing architectures, client/server architectures, or middleware server architectures and associated components.
According to one aspect, a video processing device includes a bitstream accelerator module and a video processing engine. The bitstream accelerator module has an input for receiving a stream of encoded video data, and an output adapted to be coupled to a memory for storing partially decoded video data. The bitstream accelerator module partially decodes the stream of encoded video data according to a selected one of a plurality of video formats to provide the partially decoded video data. The video processing engine has an input adapted to be coupled to the memory for reading the partially decoded video data, and an output for providing decoded video data.
According to another aspect, one of a plurality of video formats is selected. In response to selecting a first video format, a stream of encoded video data is processed according to the first video format using a bitstream accelerator module to provide partially decoded video data in a predetermined output format. In response to selecting a second video format, the stream of encoded video data is processed according to the second video format using the bitstream accelerator module to provide the partially decoded video data in the predetermined output format. The partially decoded video data is processed to provide output video data.
According to a further aspect, a video processing system includes a host processor, a memory, a bitstream accelerator module, and a video processing engine. The host processor is operable to detect a request to process a stream of encoded video data received from a video source, wherein the stream of encoded video data is represented in a first video format. The memory is operable to store the stream of encoded video data. The bitstream accelerator module is responsive to the host processor to process the stream of encoded video data according to a selected one of a plurality of different video formats to provide partially decoded video data, and to store the partially decoded video data in the memory. The bitstream accelerator module is operable to use one of a plurality of firmwares corresponding to the first video format. The video processing engine is configured to access the memory to further process the partially decoded video data.
According to one aspect of the present invention, a video processing device includes a memory and a plurality of staged macroblock processing engines. The memory is operable to store partially decoded video data decoded from a stream of encoded video data. The plurality of staged macroblock processing engines is coupled to the memory and is responsive to a request to process the partially decoded video data to generate a plurality of macroblocks of decoded video data.
According to another aspect, a first macroblock of decoded video data having a first location within a first row of a video frame is generated, and a second macroblock of decoded video data having a second location within a second row of the video frame is generated during the generation of the first macroblock.
According to yet another aspect, a video processing system includes a host processor, a multiple format video processing engine, and a memory system. The host processor is operable to detect a request to process a frame of encoded video data. The multiple format video processing engine is coupled to the host processor and responsive thereto to process the frame of encoded video data. The multiple format video processing engine includes a plurality of staged macroblock processing engines that simultaneously processes multiple rows of partially decoded video data and previously generated macroblocks to provide a plurality of macroblocks of decoded video data. The memory system is coupled to the multiple format video processing engine and is responsive thereto to store the partially decoded video data and the previously generated macroblocks.
Now turning to the drawings, FIG. 1 is a block diagram of a video processing system 100 according to one embodiment of the present invention. Video processing system 100 can be used as part of a handheld electronic device, a computer system, a portable digital video player, a set-top video box, a digital video recorder, a video module or video card, or various other devices or systems, and such system can incorporate all, or portions of, video processing system 100.
Video processing system 100 processes video data in various video and audio formats, including, but not limited to, MPEG ("Moving Pictures Expert Group") 1, 2, and 4, MJPEG ("Motion JPEG"), DV ("Digital Video"), WMV
("Windows Media Video"), RM ("Real Media"), DivX, Sorenson 3, Quicktime 6, RP9, WMV9, Ogg, Ogg Theora, Dirac, H.264, MP3, WMA, or various other formats and coder/decoder specification (codecs). In particular video processing system 100 has an architecture that allows it to efficiently process video data in a variety of different formats, and to divide up the video processing task among different resources that are capable of performing their assigned tasks most efficiently. Video processing system 100 includes generally a memory 112, a first input video data source 116, a second input video data source 118, and an n* input video data source 120, a video output display 122, and a video processing device 150. Each input video data source 116, 118, and 120 can be a different type of video source, such as a digital video disk (DVD), a digital cable signal, a satellite broadcast signal, an Internet Protocol (IP) signal, a hard disk drive (HDD) storing digital video data, a removable flash drive or various memory modules configured to store digital video data, a digital video recorder, a digital camera including digital video recording capabilities, or various other video sources or combinations of digital video sources.
Video output display 122 is a display device that presents video and/or audio information in a form perceptible to a human, and can be realized with any of a number of display types including cathode ray tube (CRT), liquid crystal display (LCD), plasma display, and the like.
Memory 112 is operably connected to video processing device 150. In one form, memory 112 is implemented as a Double -Data-Rate Synchronous Dynamic Random Access Memory (DDR SDRAM) module that can be configured in various sizes, and can operate at one or more speeds (e.g. 133 MHz, 266 MHz, etc.). Memory 112 can be implemented with other types of memory such as static RAM, asynchronous DRAM, single data rate SDRAM, graphics double data rate (gDDR) SDRAM, and the like. When combined with an integrated memory controller, memory 112 is part of a larger memory module that facilitates read and write accesses to various memory locations within the memory.
Memory 112 includes several areas for storage of data in formats useful for decoding or encoding video data that could be in any of a variety of different video formats. Briefly, memory 112 includes a bitstream data area 124, a macroblock 126 formed by a coefficient data area 128 and a motion vector data area 130, a current reference frame area 132, a previous reference frame area 134, a partially encoded macroblock data area 136, and a raw video data area 138. Video processing device 150 uses these areas to facilitate the decoding and encoding tasks in a manner that will be described more fully below.
Video processing device 150 includes generally a host processor 102, a multiple format video processing engine (MFVPE) 104, a data bus 110, and an input/output (I/O) interface 114. In the illustrated embodiment, all the components of video processing device 150 are combined on a single integrated circuit. In an alternate embodiment memory 112 may also be combined on the same integrated circuit.
Host processor 102 operates as the central processing unit (CPU) of video processing system 100. Host processor 102 is capable of running an operating system and various software applications to control the processing of digital video data using MFVPE 104. In response to a video processing software application, host processor 102 issues requests to MFVPE 104 to process a video data stream according to a selected video format.
MFVPE 104 includes generally a bitstream accelerator module 106 and a video processing engine 108. Bitstream accelerator module 106 and video processing engine 108 are each connected to a data bus 110 and are operable to enable access to memory 112 and other resources coupled to data bus 110 through I/O interface 114. Application software running on host processor 102 is capable of flexibly using bitstream accelerator module 106 and video processing engine 108 to either decode an encoded video data stream, or encode raw video data.
As will be described more particularly with reference to FIG. 5 below, bitstream accelerator module 106 includes its own dedicated processor, separate from host processor 102, which is responsive to firmware corresponding to the selected video format, as well as hardware acceleration circuits for use by the dedicated processor. Bitstream accelerator module 106 includes internal memory for storing different sets of firmware, or "firmwares," corresponding to each supported video format, allowing bitstream accelerator module 106 to be updated from time to time to include revised or new video formats and codecs when available.
Moreover offloading the decoding (and encoding) tasks from host processor 102 and separating various tasks between bitstream accelerator module 106 and video processing engine 108 allows video processing device 150 to perform computationally intensive video decoding (and encoding) tasks efficiently. In particular, bitstream accelerator module 106 performs tasks that relate to processing data received as (or provided as) a stream of encoded video data, in which the processing is generally sequential. Video processing engine 108, however, operates on sections of video data known as "macroblocks" that utilize video information from horizontally and vertically adjacent neighbor macroblocks of a video frame to recover the video information for a particular macroblock. Video processing engine 108 uses macroblocks that are not necessarily sequentially adjacent to an area of video information being decoded. Thus, bitstream accelerator module 106 and video processing engine 108 process the macroblocks in different orders based on their own assigned processing tasks.
Input/output (I/O) interface 114 is operable to connect input video sources 116, 118, and 120 and video output display 122 to data bus 110. I/O interface 114 is configurable to interface data bus 110 with one or more types of communication buses such as a universal serial bus (USB), a USB 2.0 bus, a personal computer interconnect (PCI) bus, a PCI-Express bus, a "Fire Wire" bus, a digital video bus, an iSCSI bus, and Ethernet, Gigabit Ethernet, or various other communication buses.
When decoding a stream of encoded video data, bitstream accelerator module 106 reads the data from memory 112, partially decodes it, and returns partially decoded video data to memory 112. Bitstream accelerator module 106 partially decodes the data by parsing and entropy decoding it. The algorithm for the parsing and entropy decoding operation is stored in the particular firmware that corresponds to the selected video format. Host processor 102 selects this video format and corresponding firmware by writing to a control register of bitstream accelerator module 106. To facilitate subsequent use by video processing engine 108, bitstream accelerator module 106 stores the partially decoded data in macroblock form using a common output format that remains constant between different video formats. Thus video processing engine 108 can efficiently read the macroblock data in the common output format regardless of the selected video format.
When encoding a stream of raw video data, video processing engine 108 reads the raw video data and encodes it to form partially encoded macroblocks. Bitstream accelerator module 106 reads the partially encoded macroblocks from memory 112, entropy encodes them, and outputs a stream of encoded video data to memory 112. Bitstream accelerator module 106 and video processing engine 108 communicate indirectly through the use of shared memory within memory module 112, and through reading and writing a control register, not illustrated in FIG. 1.
For example, the bitstream accelerator module 106 can load a first firmware to decode a first bitstream having a first format. Bitstream accelerator module 106 decodes the first bitstream and outputs the first bitstream as partially decoded macroblock data to memory 112. Upon detection of a request to decode a second bitstream data type, bitstream accelerator module 106 loads a different firmware that enables it to decode the second bitstream data type. The second bitstream data type is decoded into a format that is the same or similar to the partially decoded macroblock data, and stored within memory 112. In this manner video processing engine 108 can access memory 112, and read the partially decoded macroblocks using the same (or substantially the same) format, thereby reducing the processing needed to output decoded video data.
Bitstream accelerator module 106 writes partially decoded macroblock data to the memory module 112 in multiple regions before video processing engine 108 further processes the macroblock to output decoded video data. Video processing engine 108 generates decoded video data without having specific knowledge of the order that the bitstream accelerator module 106 decoded the bitstream data.
Bitstream accelerator module 106 writes a series of partially decoded macroblocks to memory 112 using a macroblock map or table to allow arbitrary ordering. As such, bitstream accelerator module 106 writes partially decoded macroblock data to various locations within memory 112 and is not bound to a specific memory location or range of memory locations. As such, the bitstream accelerator module 106 processes the bitstream data before video processing engine 108 outputs decoded video data, thereby reducing latency in processing the bitstream data to generate decoded video data.
Video processing system 100 can process video data having different video data formats and output decoded video data to the video output display 122. For example, the first input video data source 116 can be configured to input multiple digital video data files, with each digital video data file having the same digital video data format such as an standard definition (SD) digital video data format. In another form, the second input video data source 116 provides video data in a high definition (HD) digital video data format, such as an H.264 digital video data format. Thus video processing system 100 detects the type of video data format and initiates decoding of that specific type of video.
For example, a bitstream is loaded within memory 112, and MFVPE 104 subsequently processes the bitstream using bitstream accelerator module 106 and video processing engine 108. The MFVPE 104 including the bitstream accelerator module 106 can accelerate processing of bitstreams by performing bitstream parsing and entropy decode/encode portions of a video decode process for various video formats. The bitstream accelerator module 106 decodes macroblock data and outputs the partially decoded macroblock data in a format that can be efficiently processed by the video processing engine 108. For example, the bitstream accelerator module 106 can store or access various video specification data. The video specification data can define fields having formatted data, information for a specific type of video, syntax element, coefficient data, motion vector data, or various other specification data for a specific type of bitstream.
Host processor 102 reads header information to identify a firmware to be employed by the bitstream accelerator module 106. For example, the bitstream accelerator module 106 can access a specific codec to process an identified bitstream having a specific video format that can be decoded using a specific codec. Upon processing using this codec at the bitstream accelerator module 106 to generate partially decoded macroblock data, the partially decoded macroblocks can be written to the memory module 112 and accessed by the video processing engine 108.
Video processing engine 108 includes a hardwired processor that reads and writes video data to and from memory 112, and performs macroblock decoding and encoding functions to output a current frame of video data. For example, the partially decoded macroblock data output by bitstream accelerator module 106 can be translated into a format configured to enable efficient decoding and encoding of the video data using video processing engine 108. The partially decoded macroblock data can be stored within memory module 112 and accessed using video processing engine 108 including the hardwired processor. In this manner, additional firmware, software, etc. need not be loaded into the video processing engine 108 prior to processing the partially decoded macroblock data allowing for efficient processing of macroblock data by the video processing engine 108.
Thus MVFPE 104 can be used as a multi-standard video processor operable to support decoding up to HD resolution video and encoding up to SD resolution video. MVFPE 104, and various portions such as bitstream accelerator module 106, video processing engine 108, or combinations thereof, can be clocked at various speeds and in one form can be clocked at a speed of at least 200 MHz to ensure that the performance of MVFPE 104 is sufficient to process SD and HD video data. For example, NTSC SD video includes a resolution of 720x480 at a frame rate of 30 frames per second (48Op) and includes 1,350 macroblocks of video data per frame, and 40,500 macroblocks of video data per second. PAL SD video is configured with a larger resolution of video having a lower frame rate, resulting in the same macroblock rate as NTSC SD video. As such, MVFPE 104 can be configured to operate at a given clock frequency sufficient to process a macroblock in less than 4,938 cycles.
FIG. 2 is a flow diagram for a method of partially decoding an encoded video data bitstream used by bitstream accelerator module 106 of FIG. 1. The method begins at block 200. At block 202 processing of bitstream video data is initiated. At block 204, a bitstream in the form of encoded video data is copied from a video source to memory module 112. Upon copying the bitstream (or a portion thereof) to memory module 112, the method proceeds to block 206 in which header information of the bitstream is processed to determine a video type or format, which is identified at step 208. Then, at block 210, a firmware associated with the selected video type is identified using the header information of the bitstream. Upon identifying the video type and its corresponding firmware, the method proceeds to block 212, in which the appropriate firmware is loaded into bitstream accelerator module 106. The method then proceeds to block 214, which detects a clock rate. For example, a current clock rate of bitstream accelerator module 106 and video processing module 108 is detected. Then block 216 determines a clock rate sufficient to process the bitstream. Decision block 218 determines whether the clock rate should be altered to accommodate the processing needs of the selected bitstream. If so then the clock rate is altered at block 220. For example, the bitstream may be an HD bitstream that requires a relatively high clock rate, and the clock rate may be increased at block 220. Alternatively the bitstream data may be an SD bitstream and the clock rate may be decreased at block 220 to conserve power, processor use, etc. as desired.
Upon altering or maintaining a clock rate, the method proceeds to block 222 in which the bitstream is read from off- chip memory by bitstream accelerator module 106. Proceeding to block 224, bitstream accelerator module 106 parses the bitstream data by separated it into smaller portions or elements. For example, the bitstream parsing process determines a syntax element in the bitstream, and extracts the correct number of bits from the bitstream to represent for the specific syntax element. The dedicated processor in bitstream accelerator module 106 parses the bitstream into syntax elements according to the firmware previously loaded at step 212. The firmware can issue requests to specialized processing circuits to assist with decoding syntax elements. The method then proceeds to block 226, in which bitstream accelerator module 106 performs entropy decode.
Then at block 228 bitstream accelerator module 106 writes the partially decoded video data to memory 112 in the form of macroblocks. For example, bitstream accelerator module 106 outputs macroblocks that include coefficient information (e.g. runs and levels), motion vectors, header information, and various combinations thereof, to memory 112.
The method then proceeds to decision block 230, which determines if additional bitstream video should be processed. If additional bitstream data of the current bitstream, or the same bitstream type, is available to be processed, the method returns to block 222 and repeats the parsing and decoding process. If at decision block 230, additional bitstream data is not available to process, then the method proceeds to decision block 232, which determines whether another bitstream is available to process. If another bitstream is available, flow returns to block 204. If another bitstream is not available, the method ends at block 236.
FIG. 3 is a flow diagram for a method of decoding macroblock data used by video processing engine 108 of FIG. 1. Video processing engine 108 can implement the method of FIG. 3 using software or firmware running on a general purpose processor, special purpose circuitry, or various combinations thereof.
The method begins generally at block 300. At block 302, video processing engine 108 reads a control register to determine whether to begin a decoding operation. For example, host processor 102 or bitstream accelerator module 106 initiates the decoding operation by writing to the control register. At block 304, video processing engine 108 accesses memory module 112 to read partially decoded macroblock data, which it will further decode to generate decoded video data to output as a video frame. After reading the partially decoded macroblock data, video processing engine 108 initiates further decoding of the partially decoded macroblock data at block 306. In order to further decode the macroblock data for a current macroblock location, video processing engine 108 may require additional macroblock data from neighboring locations within the frame that have already been decoded. Accordingly video processing engine 108 reads this additional macroblock data from one or more previously generated macroblock locations within the video frame as needed. Video processing engine 108 uses coefficient runs and levels of the partially decoded macroblock at step 310, and uses motion vectors for the current macroblock frame at step 312. Then at block 314 it uses header information, and at block 316 it writes the decoded macroblock data to memory 112 as video data of a current frame. The current frame is populated with previously decoded macroblock data, if any, and is updated with new macroblocks as they are decoded by video processing engine 108.
The method then proceeds to decision block 318, which determines whether additional partially decoded macroblock data is available to be decoded. If so, the method proceeds to block 320 which repeats the flow starting at block 304. If not, the method proceeds to block 322 at which video processing engine 108 monitors the control register for additional macroblock data to decode.
FIG. 4 is a flow diagram for a method of encoding raw video data using video processing engine 108 and bitstream accelerator module 106 of FIG. 1. The method begins generally at block 400. At block 402, video processing engine 108 initiates macroblock data encoding to convert raw video data into encoded macroblocks. The method next proceeds to block 404 in which video processing engine 108 reads raw video data from memory 112. At block 406 video processing engine 108 encodes the raw video data to form partially encoded macroblock data. Then at block 408 it outputs the partially encoded macroblock data to memory module 112. At block 410 bitstream acceleration module 106 detects the availability of partially encoded macroblock data for encoding. Either host processor 102 or video processing engine 108 sets a control register in bitstream accelerator module 106 to indicate that partially encoded macroblock data is available to be encoded.
Upon detecting an availability of the partially encoded macroblock data, at block 412 bitstream accelerator module 106 reads the partially encoded macroblock data from memory 112. Then at block 414 it determines a video data type which it then uses to encode the partially encoded macroblock data.
The method then proceeds to block 416 at which bitstream accelerator module 416 performs entropy encoding of the partially encoded macroblock. Then at block 418 it outputs the encoded macroblock data as bitstream data to memory 112. At decision block 420 bitstream accelerator module 106 determines whether additional video data is available to encode. If so, the method returns to block 404 and repeats. If not, the method ends at block 422.
FIG. 5 is a functional block diagram of a bitstream accelerator module 500 that may be used to implement bitstream accelerator module 106 of FIG. 1. Generally bitstream accelerator module 500 includes a processor 502, a Variable Length Code (VLC)/context adaptive VLC (CAVLC) engine 504, a context adaptive binary arithmetic coding (CABAC) engine 506, a Motion Vector (MV) predictor engine 508, a neighbor block engine 510, an instruction RAM buffer 512, a data RAM buffer 514, an input shift first in first out (FIFO) buffer 516, a coefficient output shift FIFO buffer 518, and an MV output shift FIFO buffer 520. Each buffer 512, 514, 516, 518, and 520 and neighbor block engine 510 are coupled to a direct memory access (DMA) controller 522 that may be configured to access remote memory autonomously through an interface to data bus 110 (not shown in FIG. 5).
Processor 502 is a dedicated processor that includes several features that give it the flexibility to perform different video codecs efficiently. It has a three-stage execution pipeline, is capable of performing up to one instruction per cycle, and operates with a register file having thirty-two 16-bit registers. The instruction set of processor 502 includes both general-purpose data processing instructions and specialized instructions that are particularly useful for video decoding and encoding operations. The instruction set also supports both 8- and 16-bit data types. Processor 502 has an input coupled to Instruction RAM 512 for receiving instructions that are part of the selected codec firmware. It also has a bidirectional connection to Data RAM 514 that it uses as a high-speed scratchpad area. Processor 502 also has an input connected to input shift FIFO 516, and an output connected to MV output shift FIFO 520.
Processor 502 is bidirectionally connected to VLC/CAVLC engine 504 and CABAC engine 506 and controls these specialized processing circuits to efficiently implement the selected video processing codec. VLC/CAVLC engine 504 converts variable-length code words into corresponding values or run-level pairs. VLC/CAVLC engine 504 supports multiple levels of tables to allow flexibility in describing the number of bitstream bits each table can decode. As such, increases in performance can be realized and compacting the tables can occur thereby conserving space within data RAM buffer 514. For example, VLC tables can range from 1 to 8 bits from a bitstream at a time. As such, the "wider" the VLC table (e.g. 8 bits), the more quickly the VLC/CAVLC engine 504 can resolve code words (but the more data RAM buffer space is used). Conversely, the "narrower" the VLC table, the more compact the table (but the more VLC cycles needed to resolve code words).
CABAC engine 506 performs syntax element decoding, including calculations for various high definition processes. CABAC engine 506 references items in neighbor block engine 510 when performing context adaptive arithmetic that forms the "ctxldx" calculations. CABAC engine 506 can also perform an entire loop for extracting coefficients from the bitstream and offloads the task of decoding macroblock coefficients from processor 502.
In response to host processor 102 programming a control register (not shown in FIG. 5), processor 502 loads specialized codec firmware into the instruction RAM buffer 512 and the data RAM buffer 514. The codec firmware includes instructions associated with decoding a stream of encoded video data, encoding a stream of raw video data, or both. Processor 502 performs bit-stream parsing for syntax elements. Bitstream data is provided to the bitstream accelerator module 500 through the input shift FIFO buffer 516. The input shift FIFO buffer 516 also provides bitstream data to the CABAC module 506, and the VLC/CAVLC module 504.
DMA controller 522 implements a "ping-pong" DMA operation with input shift FIFO buffer 516 that allows a first portion of input shift FIFO 516 to be loaded while contents of a second portion of the input shift FIFO 516 can be accessed for processing. Input shift FIFO 516 can be managed or loaded by an external processor. In the illustrated embodiment, processor 502, VLC/CAVLC engine 504, and CABAC engine 506 have a direct interface to the input shift FIFO buffer 516 to process bits from the bitstream buffers.
Coefficient output shift FIFO 518 and MV output shift FIFO 520 are written by the VLC/CAVLC engine 504, CABAC engine 506, and processor 502. If FIFOs 518 and/or 520 become full, bitstream accelerator module 500 stalls further writing will stall until room becomes available.
Thus bitstream accelerator module 500 supports the efficient bitstream processing of video data during both the decoding of encoded video data, and encoding of partially encoded macroblocks. It includes a dedicated processor that is especially adapted for video processing applications, as well as a variety of hardware modules that support the movement of data from and back to memory and various video processing functions. While the above invention has been described in the context of a preferred embodiment, various modifications will be apparent to those skilled in the art. For example, various portions of the description herein describe decoding video data. However, it should be understood that one skilled in the art can use the teachings of the invention to decode and encode video data, audio data, or any combination thereof. Accordingly, it is intended by the appended claims to cover all modifications of the invention that fall within the true scope of the invention.
Now turning to FIGs. 6-9, another embodiment of the present disclosure is presented. FIG. 6 is a block diagram of a video processing system 600 according to the present invention. Video processing system 600 can be used as part of a handheld electronic device, computer system, a portable digital video player, a set-top video box, a digital video recorder, a video module or video card, or various other devices or systems, or any combination thereof, that can incorporate all, or portions of, video processing system 600.
Video processing system 600 processes video data in various video and audio formats, including, but not limited to, MPEG ("Moving Pictures Expert Group") 1, 2, and 4, MJPEG ("Motion JPEG"), DV ("Digital Video"), WMV ("Windows Media Video"), RM ("Real Media"), DivX, Sorenson 3, Quicktime 6, RP9, WMV9, Ogg, Ogg Theora, Dirac, H.264, MP3, WMA, or various other formats and coder/decoder specification (codecs). In particular video processing system 600 has an architecture that allows it to efficiently process video data in a variety of different formats, and to perform video processing tasks such as HD decoding efficiently using multiple, staged video processing engines.
Video processing system 600 includes generally a first input video data source 622, a second input video data source 624, and an n* input video data source 626, a video output display 628, a remote memory 630, and a video processing device 650. Each input video data source 622, 624, and 626 can be a different type of video source, such as a digital video disk (DVD), a digital cable signal, a satellite broadcast signal, an Internet Protocol (IP) signal, a hard disk drive (HDD) storing digital video data, a removable flash drive or various memory modules configured to store digital video data, a digital video recorder, a digital camera including digital video recording capabilities, or various other video sources or combinations of digital video sources.
Video output display 628 is a display device that presents video and/or audio information in a form perceptible to a human, and can be realized with any of a number of display types including cathode ray tube (CRT), liquid crystal display (LCD), plasma display, and the like.
Remote memory 630 is operably connected to video processing device 650. In one form, remote memory 630 is implemented as a Double-Data-Rate Synchronous Dynamic Random Access Memory (DDR SDRAM) module that can be configured in various sizes, and can operate at one or more speeds (e.g. 133 MHz, 266 MHz, etc.). Remote memory 630 can be implemented with other types of memory such as static RAM, asynchronous DRAM, single data rate SDRAM, graphics double data rate (gDDR) SDRAM, and the like. If combined with an integrated memory controller, remote memory 630 is part of a larger memory module that facilitates read and write accesses to various memory locations within the memory. Note that remote memory 630 is "remote" in the sense that it is at a lower level of the memory hierarchy than other memory local to video processing device 650. This feature will be described further below. Remote memory 630 includes areas for storing data in formats useful for decoding or encoding video data. Briefly, video processing device 650 uses remote memory 630 to store both partially decoded video data and decoded macroblocks for the current video frame. As part of the decoding task, video processing device 650 examines both the partially decoded video data of a current macroblock as well as horizontally and vertically adjacent neighbor macroblocks to recover the video information for the current macroblock. Thus as shown in FIG. 6, remote memory 630 includes a first area 632 for storing partially decoded video data, and a second area 634 for storing a decoded video frame.
Video processing device 650 includes generally a host processor 602, a multiple format video processing engine (MFVPE) 604, a data bus 618, and an input/output (I/O) interface 620. In the illustrated embodiment, all the components of video processing device 650 are combined on a single integrated circuit. In an alternate embodiment, remote memory 630 may also be combined with other elements of video processing device 650 on the same integrated circuit.
Host processor 602 operates as the central processing unit (CPU) of video processing system 600. Host processor 602 is capable of running an operating system and various software applications to control the processing of digital video data using MFVPE 604. In response to a video processing software application, host processor 602 issues requests to MFVPE 604 to process a video data stream according to a selected video format.
Input/output (I/O) interface 620 is operable to connect input video sources 622, 624, and 626 and video output display 628 to data bus 618. I/O interface 620 is configurable to interface data bus 618 with one or more types of communication buses such as a universal serial bus (USB), a USB 2.0 bus, a personal computer interconnect (PCI) bus, a PCI-Express bus, a "Fire Wire" bus, a digital video bus, an iSCSI bus, and Ethernet, Gigabit Ethernet, or various other communication buses.
MFVPE 604 includes generally a bitstream accelerator module 606 and a video processing engine 608. Bitstream accelerator module 606 and video processing engine 608 are each connected to data bus 618 and are operable to access remote memory 630 and other resources coupled to data bus 618 through I/O interface 620. Application software running on host processor 602 is capable of flexibly using MFVPE 604 to either decode an encoded video data stream, or encode raw video data.
Bitstream accelerator module 606 accelerates the process of bit stream parsing and entropy decode of a video bit stream. Bitstream accelerator module 606 includes its own dedicated processor, separate from host processor 602, which is responsive to firmware corresponding to the selected video format, as well as hardware acceleration circuits for use by the dedicated processor. Bitstream accelerator module 606 includes internal memory for storing different sets of firmware, or "firmwares," corresponding to each supported video format, allowing bitstream accelerator module 606 to be updated from time to time to include revised or new video formats and codecs when available.
Video processing engine 608 includes a local memory 610, a first macroblock processing engine (MPE) 612, a second MPE 614, and a third MPE 616. One or any combination of the MPE's 612, 614, and 616 can be employed to process macroblock data. For example, video processing engine 608 uses only one macroblock processing engine to decode or encode data in the SD format, but uses all three macroblock processing engines to decode data in the HD format. Additionally, video processing engine 608 can utilize two or more additional MPE 's as needed or desired to process video data. Video processing engine 608 uses local memory 610 to store a portion of the processed macroblocks of a video frame before finally storing them in remote memory 630.
Generally, processing (decoding or encoding) of video data takes place as follows. Host processor 602 selects the video format and corresponding firmware by writing to a control register of bitstream accelerator module 606. When decoding a stream of encoded video data, bitstream accelerator module 606 reads the data from remote memory 630, partially decodes it, and returns partially decoded video data to remote memory 630. Bitstream accelerator module 606 partially decodes the data by parsing and entropy decoding it. The algorithm for the parsing and entropy decoding operation varies according to the selected video format, and bitstream accelerator module 606 loads firmware that corresponds to the selected video format to control how it processes the encoded video data. To facilitate subsequent use by video processing engine 608, bitstream accelerator module 606 stores the partially decoded data in remote memory 630 in macroblock form.
Video processing engine 608 operates on macroblocks and uses video information from neighbor (i.e. horizontally and vertically adjacent) macroblocks of the current macroblock to recover the video information. In particular video processing engine 608 uses three macroblock processing engines to decode the partially decoded video data efficiently. The multiple macroblock processing engines operate simultaneously to decode adjacent rows of partially decoded video data, and use local memory 610 to temporarily store video data that will be used by one or more adjacent macroblock processing engines. They also operate on macroblocks associated with diagonally adjacent macroblock locations in adjacent rows of the video frame to take advantage of using the previously decoded macroblocks while they are present in local memory 610. By using multiple, staged macroblock processing engines and a memory hierarchy employing both high-speed local memory and slower remote memory, video processing engine 608 reduces overhead that would otherwise be associated with moving neighbor macroblock data to and from main memory. Video processing engine 608 does this by keeping decoded macroblock data at a higher level of the memory hierarchy while it is still needed.
When encoding a stream of raw video data, video processing engine 608 reads the raw video data and encodes it to form partially encoded macroblocks. Bitstream accelerator module 606 reads the partially encoded macroblocks from remote memory 630, entropy encodes them, and outputs a stream of encoded video data to memory 630.
The operation of each macroblock processing engine can be better understood with reference to FIG. 7, which is a block diagram of a process 700 used by the macroblock processing engines of FIG. 6 to perform decoding and encoding operations. Process 700 may be implemented in hardware, in software or firmware, or in some combination thereof. MPE 700 includes generally a run-length re-order block 704, a coefficient predictor 706, a multiplexer 708, an inverse quantization and transform process 712, a summing device 714, an intra prediction search block 718, an intra prediction (H.264) block 720, a motion estimator 224, a motion compensator 728, a multiplexer 730, a summing device 732, a transform/quantize block 734, and a run length encode/re-order block 736. Run-length re-order processor 704 has an input for receiving partially decoded video data from bitstream accelerator module 606 of FIG. 6, and an output. Coefficient predictor 706 has an input connected to the output of run length re-order process 704, and an output, and it performs coefficient prediction as needed. Multiplexer 708 has a first input connected to an output of a transform quantize process 734, a second input connected to an output of coefficient predictor 706, and an output. Inverse quantization and transform process 712 has an input connected to the output of multiplexer 708, and an output. Adder 714 has a first input connected to the output of inverse quantization and transform process 712, a second input, and an output. Intra-prediction search block 718 has an input connected to the output of adder 714, a second input connected to the output of summing device 714, and an output. Intra-prediction (H.264) block 718 has an input connected to the output of adder 714, a second input connected to the output of intra prediction search 718, and an output. Motion estimator 624 has a first input coupled to remote memory 630 for receiving a current frame/macroblock, a second input for receiving a reference data input 726, and an output. Motion compensator 728 has a first input coupled to the output of motion estimator 624, a second input for receiving a reference data input 726, and an output. Multiplexer 730 has a first input connected to the output of intra prediction (H.264) block 720, a second input connected to the output of motion compensator 728, and an output connected to the second input of summing device 714. Summing device 732 has a first input connected to the output of multiplexer 730, a second input for receiving a current frame/macroblock, and an output. Transform/quantize block 734 has an input connected to the output of summing device 732, and an output. Run length encode/re -order block 736 has an input connected to the output of transform/quantize block 734, and an output connected to local memory 612. In-loop filter 740 has an input connected to the output of summing device 714, and an output for providing a macroblock output, to local memory 610 or remote memory 630.
Each macroblock processing engine performs the decoding operation of process 700 as follows. As described above bitstream accelerator module 606 of FIG. 6 performs entropy decoding and derives motion vectors to form partially decoded video data. MFVPE 604 uses a direct memory access (DMA) controller to transfer the partially decoded video data to the appropriate macroblock processing engine. The coefficients are run length decoded and re-ordered in block 704. Then coefficient prediction is performed as needed in block 706. Finally the coefficients are inverse quantized and inverse transformed in block 712 to form residuals. Intra macroblock predictions are computed in block 720 for neighbors partially fetched from the memory system. Inter macroblock predictions are computed in block 728 using reference data from neighbor macroblocks fetched from the memory system. Pixels are produced by adding predictions from either the output of block 720 or 728, to the residuals coming from block 712. These pixels are then filtered (for the H.264 and VC-I formats) in block 740 and output to the memory system. If the macroblocks will be used as reference frames, they are written to local memory 610 for use by another macroblock processing engine. Otherwise, they are written to remote memory 630.
Each macroblock processing engine performs the encoding operation of process 700 as follows. Current macroblock data is transferred, by use of the DMA controller, into blocks 718 and 724. Block 718 performs an intra prediction search using neighbor data. Block 718 passes the chosen mode information to block 720, in which predictions are made. Block 724 performs motion estimation using reference data, and then the chosen macroblock types are passed to block 728, which makes predictions. The difference between the prediction, either intra- or inter-macroblock, and the current macroblock is determined by summing device 732. This difference is then transformed and quantized in block 734. The quantized data is run length encoded and re-ordered in block 736, and written to the memory system in block 738. The reconstructed macroblocks are also used to form reference data. They are fed back into block 712 through multiplexer 708, added to the original prediction in adder 714, filtered in block 740, and written to the memory system to be later used as reference data.
It should be apparent that the particular decoding and encoding operations performed by the macroblock processing engines are specific to the supported video formats and could vary to support new video formats.
FIG. 8 illustrates a portion 800 of a partially processed video data frame useful in understanding how video processing engine 608 operates when using a single macroblock processing engine. Portion 800 includes processed macroblocks labeled "P" such as representative processed macroblocks 802, and unprocessed macroblocks illustrated as blank cells such as macroblocks 804. Portion 800 includes a first row of macroblocks 806 that has been previously processed, and a second row of macroblocks 808 that includes both processed and unprocessed (i.e. only partially decoded) macroblocks. Portion 800 also includes additional rows of unprocessed macroblocks 810, 812, 814, 816, 818, and 820. Since each macroblock processing engine operates left to right and top to bottom within a video data frame, it should be apparent that any rows of the complete video data frame at or above row 806 have been previously processed, and all rows at or below row 810 have not yet been processed.
Portion 800 includes a macroblock location 822 that is currently being processed by macroblock processing engine 612, labeled "MPEl," a first neighbor macroblock location 824 labeled "P-N" located adjacent to and above macroblock location 822, and a second neighbor macroblock location 826 also labeled "P-N" located adjacent to and to the left of macroblock location 822. Macroblock processing engine 612 utilizes both the partially decoded video data for the next macroblock of interest as well as the immediately adjacent macroblocks in the vertical and horizontal directions. Note that FIG. 8 shows the macroblocks according to their corresponding locations in the video frame, but they may be stored in memory in an arbitrary location. After macroblock processing engine 612 completes processing macroblock 822, it proceeds to a subsequent macroblock to the right along row 808, namely macroblock 828. After macroblock processing engine 612 completes processing row 808, it begins with the first macroblock at the left side of row 810, and so on until it processes all of the video data frame.
FIG. 9 illustrates a portion 900 of a partially processed video data frame useful in understanding how video processing engine 608 uses multiple staged macroblock processing engines. Portion 900 includes processed macroblocks labeled "P" such as representative processed macroblocks 902, and unprocessed macroblocks illustrated as blank cells such as macroblocks 904. Portion 900 includes a first row of macroblocks 506 that has been previously processed, and rows of macroblocks 908, 910, and 912 that include both processed and unprocessed (i.e. partially decoded) macroblocks. Portion 900 also includes additional rows of macroblocks 914, 916, 918, and 920. Since each macroblock processing engine operates left to right and top to bottom within a video data frame, it should be apparent that all macroblocks in rows of the complete video data frame at or above row 906 have been previously processed, and all macroblocks in rows at or below row 914 have not yet been processed.
Portion 900 also includes a macroblock location 926 that is currently being processed by macroblock processing engine 612, labeled "MPE 1," a second macroblock location 924 that is currently being processed by macroblock processing engine 614, labeled "MPE 2," and a third macroblock location 922 that is currently being processed by macroblock processing engine 616, labeled "MPE 3". Video processing engine 608 processes each macroblock location 922, 924, 926 simultaneously using both the partially decoded video data for the corresponding location, and neighbor macroblocks to that location. In particular macroblock processing engine 612 accesses neighbor macroblocks corresponding to locations 928 and 930, macroblock processing engine 614 accesses neighbor macroblocks corresponding to locations 930 and 932, and macroblock processing engine 616 accesses neighbor macroblocks corresponding to locations 932 and 934.
Macroblock processing engines 612, 614, and 616 are "staged" in that they simultaneously decode macroblock data corresponding to locations in the video data frame that are spaced apart at intervals. The particular interval used by video processing engine 608 is one row below and one row to the left. Thus macroblock processing engine 614 decodes a macroblock having a location in the video data frame that is one row below and one column to the left of the location currently being processed by macroblock processing engine 612. Likewise macroblock processing engine 616 decodes a macroblock having a location in the video data frame that is one row below and one column to the left of the location being processed by macroblock processing engine 614. Moreover, the macroblock processing engines start processing macroblocks in their assigned rows at in particular columns at successively later times, and so in this respect they are also staged. In alternate embodiments that use more than three macroblock processing engines, the additional macroblock processing engines would follow the same pattern of staging.
By using multiple staged macroblock processing engines, video processing device 650 is able to store recently decoded macroblocks in local memory 610 from which they can be accessed by another macroblock processing engine soon thereafter without having to access remote memory 630. Thus video processing device 650 saves the overhead of moving the decoded macroblock into and out of remote memory 630. Moreover the other macroblock processing engine is able to access the recently decoded macroblock from local memory 610 faster than from remote memory 630, because local memory 610 is at a higher level of the memory hierarchy. Using multiple staged macroblock processing engines also meets the higher demands of HD decoding while video processing device 650 operates at speeds attainable in today's integrated circuit manufacturing processes.
Once the macroblock is no longer needed soon by any of the staged macroblock processing engines, video processing device 608 moves it to the decoded video frame area in remote memory 630. For example, macroblock 826 will not be needed again for a considerable amount of time. Thus, video processing device 608 stores it in remote memory 630. When macroblock processing engine 612 later needs it as a neighbor macroblock while processing macroblocks for row 816, video processing device 608 reads it from remote memory 630.
In other embodiments, the staging intervals could be altered to fit more complex decoding schemes. For example, the first macroblock processing engine could access multiple neighbor macroblocks in the prior row. In this example, the staged macroblock processing engines would be staged at longer intervals, such as separated by two macroblocks. Video processing engine 608 would use a wider window for storing macroblocks in local memory 610. The wider window would be useful when using macroblock processing engines that do not process neighbor data at precisely the same time. While the invention has been described in the context of a preferred embodiment, various modifications will be apparent to those skilled in the art. For example, when using three macroblock processing engines to process HD video data, it may be necessary for video processing device 650 to incorporate another host processor like host processor 602 to perform additional processing tasks. Accordingly, it is intended by the appended claims to cover all modifications of the invention that fall within the true scope of the invention.

Claims

WHAT IS CLAIMED IS:
1. A video processing device comprising: a bitstream accelerator module (106) having an input for receiving a stream of encoded video data, and an output adapted to be coupled to a memory for storing partially decoded video data, wherein the bitstream accelerator (106) module partially decodes the stream of encoded video data according to a selected one of a plurality of video formats to provide the partially decoded video data; and a video processing engine (108) having an input adapted to be coupled to the memory (112) for reading the partially decoded video data, and an output for providing decoded video data.
2. The video processing device of claim 1, wherein the bitstream accelerator module (106) comprises an embedded processor (504) responsive to a first request to decode the stream of encoded video data according to a first video format using a first firmware, and to a second request to decode the stream of encoded video data according to a second video format using a second firmware.
3. The video processing device as in any of claims 1 and 2, wherein the bitstream accelerator module (106) stores the partially decoded video data as a plurality of macroblocks.
4. The video processing device as in any of claims 1, 2, and 3 wherein the video processing engine (108) is further configured to determine a process to further decode the partially decoded video data according to the selected one of the plurality of video formats.
5. The video processing device as in any of claims 1 and 2-4, wherein the bitstream accelerator module (106) and the video processing engine (108) are combined on a single integrated circuit.
6. The video processing device as in any of claims 5 and 2-4, wherein the memory is external to the single integrated circuit.
7. The video processing device as in any of claims 1-6, wherein the bitstream accelerator module (106) is configured to: parse and entropy decode a first bitstream stored within the memory to provide the partially decoded video data; and write the partially decoded video data to the memory (112).
8. The video processing device as in any of claims 1 and 2-7, wherein: in response to a first video format being selected, the bitstream accelerator module processes the stream of encoded video data according to the first video format to provide the partially decoded video data in a predetermined output format; and in response to a second video format being selected, the bitstream accelerator module (108) processes the stream of encoded video data according to the second video format to provide the partially decoded video data in the predetermined output format, the bitstream accelerator module (106) writing the partially decoded video data to the memory (112) in the predetermined output format.
9. The video processing device as in any of claims 1 and 2-8, wherein: the bitstream accelerator module (106) is further configured to store the partially decoded video data in a first order; and the video processing engine (108) is further configured to process the partially decoded video data in a second order different from the first order.
10. A method of processing video data comprising: selecting one of a plurality of video formats (208); in response to selecting a first video format, processing a stream of encoded video data according to the first video format using a bitstream accelerator module to provide partially decoded video data in a predetermined output format (210, 212); in response to selecting a second video format, processing the stream of encoded video data according to the second video format using the bitstream accelerator module to provide the partially decoded video data in the predetermined output format (210, 212); and processing the partially decoded video data to provide output video data.
11. The method of claim 10, further comprising: in response to selecting the first video format, processing the stream of encoded video data in the bitstream accelerator module using a first firmware (212); and in response to selecting the second video format, processing the stream of encoded video data in the bitstream accelerator module using a second firmware (212).
12. The method of claim 11, further comprising: parsing and entropy decoding the stream of encoded video data using the first firmware to provide the partially decoded video data (224, 226); and writing the partially decoded video data to a memory in the predetermined output format (228).
13. The method of claim 12, further comprising: parsing and entropy decoding the stream of encoded video data using the second firmware to provide the partially decoded video data (224, 226); and writing the partially decoded video data to the memory in the predetermined output format (228).
14. The method as in any of claims 11-13, further comprising: detecting a request to process the partially decoded video data stored within the memory (302); reading the partially decoded video data from the memory (304); processing the partially decoded video data to generate a decoded macroblock (306); and writing the decoded macroblock to the memory (316).
15. The method as in any of claims 10 and 11-13, further comprising: reading raw video data from a memory (404); encoding the raw video data to generate a partially encoded macroblock (406); writing the partially encoded macroblock to the memory (408); entropy encoding of the partially encoded macroblock using the bitstream accelerator module
(416); and outputting an encoded bitstream to the memory (418).
16. The method as in any of claims 10 and 11-15, further comprising: detecting a neighbor macroblock of a first macroblock location (410); reading a portion of the neighbor macroblock from the memory (412); using the portion of the neighbor macroblock to generate a first macroblock corresponding to the first macroblock location (414-416); and writing the first macroblock to the memory (418).
17. A video processing system comprising: a host processor (102) operable to detect a request to process a stream of encoded video data received from a video source, wherein the stream of encoded video data is represented in a first video format; a memory operable to store the stream of encoded video data (112); a bitstream accelerator module (106) responsive to the host processor to process the stream of encoded video data according to a selected one of a plurality of different video formats to provide partially decoded video data and to store the partially decoded video data in the memory, the bitstream accelerator module operable to use one of a plurality of firmwares corresponding to the first video format; and a video processing engine (108) configured to access the memory to further process the partially decoded video data.
18. The video processing system of claim 17, wherein the bitstream accelerator module (106) is further configured to write the predecoded video data to the memory in macroblocks.
19. The video processing system of claim 18, wherein the bitstream accelerator module (106) is coupled to and responsive to the host processor to select the one of the plurality of firmwares corresponding to the first video format.
20. The video processing system of claim 18 wherein the host processor (102), the bitstream accelerator module (106), and the video processing engine (108) are combined together on a single integrated circuit.
21. A video processing device (650) comprising: a memory (610, 630) operable to store partially decoded video data decoded from a stream of encoded video data; and a plurality of staged macroblock processing engines (612, 614, 616) coupled to the memory (610, 630) and responsive to a request to process the partially decoded video data to generate a plurality of macroblocks of decoded video data.
22. The video processing device (650) of claim 21, wherein the plurality of staged macroblock processing engines
(612, 614, 616) includes: a first macroblock processing engine (612) configurable to process a first row of partially decoded video data (908); a second macroblock processing engine (614) configurable to process a second row of partially decoded video data (910), wherein the second row of partially decoded video data (910) is adjacent to the first row of partially decoded video data (908); and a third macroblock processing engine (616) operable to process a third row of partially decoded video data (912), wherein the third row of partially decoded video data (912) is adjacent to the second row of partially decoded video data (910).
23. The video processing device (650) of claim 22, wherein: the first macroblock processing engine (612) is configurable to generate a first macroblock having a first location (926) within the first row of partially decoded video data (908); the second macroblock processing engine (614) is configurable to generate a second macroblock having a second location (924) within the second row of partially decoded video data (910), wherein the second location (924) is aligned diagonally to the first location (926); and the third macroblock processing engine (616) is configurable to process a third macroblock having a third location (922) within the third row of partially decoded video data (912), wherein the third location (922) is aligned diagonally to the first location (926) and the second location (924).
24. The video processing device (650) of claim 21, wherein the plurality of staged macroblock processing engines
(612, 614, 616) includes: a first macroblock processing engine (612) configurable to generate a first macroblock using a first neighbor macroblock (928) generated by a third macroblock processing engine (922); a second macroblock processing engine (614) configurable to generate a second macroblock using a second neighbor macroblock (930) generated by the first macroblock processing engine (612); and the third macroblock processing engine (616) configurable to generate a third macroblock using a third neighbor macroblock (932) generated by the second macroblock processing engine
(614).
25. The video processing device (650) of claim 24, wherein the memory (610, 630) comprises a local memory (610) and a remote memory (630), and wherein: the first neighbor macroblock (928) is stored within the remote memory (630) after the first macroblock processing engine (630) uses the first neighbor macroblock (928); the second neighbor macroblock (930) is stored within a local memory (610) after the second macroblock processing engine (614) uses the second neighbor macroblock (930); and the third neighbor macroblock (932) is stored within the local memory (610) after the third macroblock processing engine (616) uses the third neighbor macroblock (932).
26. The video processing device (650) of claim 24, wherein: the first macroblock processing engine (612) initiates storing the first macroblock (926) within the local memory (610) in response to completing processing; the second macroblock processing engine (614) initiates storing the second macroblock (924) within the local memory (610) in response to completing processing; and the third macroblock processing engine (616) initiates storing the third macroblock (922) within the remote memory (630) in response to completing processing.
27. The video processing device (650) of claim 24, wherein: the first macroblock processing engine (612) is configurable to process a first row of partially decoded video data to generate a first plurality of decoded macroblocks; the second macroblock processing engine (614) is configurable to a second row of partially decoded video data to generate a second plurality of decoded macroblocks; and the third macroblock processing engine (616) is configurable to processes a third row of partially decoded video data to generate a third plurality of decoded macroblocks.
28. The video processing device (650) of claim 21, wherein the plurality of staged macroblock processing engines
(612, 614, 616) comprises: a first macroblock processing engine (612) configurable to process partially decoded video data to generate a first plurality of decoded macroblocks for a first row (908) of a video frame (900); a second macroblock processing engine (614) configurable to process partially decoded video data including the first plurality of decoded macroblocks to generate a second plurality of decoded macroblocks for a second row (910) of the video frame (900), the second macroblock processing engine (614) initiating processing the partially decoded video data including the first plurality of decoded macroblocks at a first period later than the first macroblock processing engine initiates process the first row of partially decoded video data.
29. A method of processing video data comprising: generating a first macroblock of decoded video data having a first location (926) within a first row (908) of a video frame (900); and generating a second macroblock of decoded video data having a second location (924) within a second row (910) of the video frame (900) during the generating of the first macroblock.
30. The method of claim 29, wherein the generating of the first macroblock comprises: reading a first neighbor macroblock having a location (928) in the video frame (900) above the first location (926), the location (928) of the first neighbor macroblock in a row (906) above the first row (908) of the video frame; reading a second neighbor macroblock having a location (930) in the video frame (900) adjacent to the first location (926) within the first row (908) of the video frame (900); and generating the first macroblock using the first neighbor macroblock and the second neighbor macroblock.
31. The method of claim 30, wherein: the reading of the first neighbor macroblock comprises reading the first neighbor macroblock from a remote memory (630); and the reading of the second neighbor macroblock comprises reading the second neighbor macroblock from a local memory (610).
32. The method of claim 30, wherein the generating of the second macroblock of decoded video data includes: reading the second neighbor macroblock from a local memory (610); reading a third neighbor macroblock having a location (932) in the video frame (900) adjacent to the second location (924) from the local memory (610); and generating the second macroblock using the second neighbor macroblock and the third neighbor macroblock.
33. The method as in any of claims 29-32 further comprising: generating a third macroblock of decoded video data having a third location (922) within a third row of the video frame (900) during the generating of the first macroblock and the generating of the second macroblock.
34. The method of claim 33 further comprising: reading the third neighbor macroblock from a local memory (610); reading a fourth neighbor macroblock having a location (934) in the video frame (900) adjacent the third location (922) from the local memory (610); and generating the third macroblock using the third neighbor macroblock and the fourth neighbor macroblock.
35. The method as in any of claims 29-34 further comprising: writing the first and second macroblocks to the local memory (610); and writing the third macroblock to a remote memory (630).
36. The method as in any of claims 29-35 further comprising: generating a plurality of additional macroblocks of decoded video data in the first row (908) of the video frame (900) while generating a plurality of additional macroblocks of decoded video data in the second row (910) of the video frame (900).
37. A video processing system (600) comprising: a host processor (602) operable to detect a request to process a frame of encoded video data; a multiple format video processing engine (604) coupled to the host processor (602) and responsive thereto to process the frame of encoded video data, wherein the multiple format video processing engine (604) comprises a plurality of staged macroblock processing engines (612, 614, 616) that simultaneously processes multiple rows of partially decoded video data and previously generated macroblocks to provide a plurality of macroblocks of decoded video data; and a memory system (610, 630) coupled to the multiple format video processing engine (604) and responsive thereto to store the partially decoded video data and the previously generated macroblocks.
38. The video processing system (600) of claim 37, wherein the plurality of staged macroblock processing engines (612, 614, 616) comprises: a first macroblock processing engine (612) configurable to process a first row of partially decoded video data (908); a second macroblock processing engine (614) configurable to process a second row of partially decoded video data (910), wherein the second row of partially decoded video data (910) is adjacent to the first row of partially decoded video data (908); and a third macroblock processing engine (616) operable to process a third row of partially decoded video data (912), wherein the third row of partially decoded video data (912) is adjacent to the second row of partially decoded video data (910).
39. The video processing system (600) as in any of claims 37 and 38, wherein the memory system (610, 630) comprises: a local memory (610); and a remote memory (630), wherein the multiple format video processing engine (604) uses the remote memory (630) to store the partially decoded video data and uses the local memory (610) to store a portion of the previously generated macroblocks.
40. The video processing system (600) as in any of claims 37-39, wherein the multiple format video processing engine (604) further comprises a bitstream accelerator module (606) coupled to the host processor (602) and responsive thereto and to a stream of encoded video data to provide the multiple rows of partially decoded video data.
PCT/US2008/087084 2007-12-31 2008-12-17 System, method and device for processing macroblock video data WO2009085788A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US11/967,697 US8923384B2 (en) 2007-12-31 2007-12-31 System, method and device for processing macroblock video data
US11/967,690 2007-12-31
US11/967,697 2007-12-31
US11/967,690 US8462841B2 (en) 2007-12-31 2007-12-31 System, method and device to encode and decode video data having multiple video data formats

Publications (1)

Publication Number Publication Date
WO2009085788A1 true WO2009085788A1 (en) 2009-07-09

Family

ID=40824652

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/087084 WO2009085788A1 (en) 2007-12-31 2008-12-17 System, method and device for processing macroblock video data

Country Status (1)

Country Link
WO (1) WO2009085788A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8462841B2 (en) 2007-12-31 2013-06-11 Netlogic Microsystems, Inc. System, method and device to encode and decode video data having multiple video data formats
CN117063468A (en) * 2021-03-30 2023-11-14 高通股份有限公司 Video processing using multiple bit stream engines

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040028141A1 (en) * 1999-11-09 2004-02-12 Vivian Hsiun Video decoding system having a programmable variable-length decoder
US20070201559A1 (en) * 2006-02-24 2007-08-30 Freescale Semiconductor Inc. Flexible macroblock odering with reduced data traffic and power consumption
US20070253491A1 (en) * 2006-04-27 2007-11-01 Yoshiyuki Ito Image data processing apparatus, image data processing method, program for image data processing method, and recording medium recording program for image data processing method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040028141A1 (en) * 1999-11-09 2004-02-12 Vivian Hsiun Video decoding system having a programmable variable-length decoder
US20070201559A1 (en) * 2006-02-24 2007-08-30 Freescale Semiconductor Inc. Flexible macroblock odering with reduced data traffic and power consumption
US20070253491A1 (en) * 2006-04-27 2007-11-01 Yoshiyuki Ito Image data processing apparatus, image data processing method, program for image data processing method, and recording medium recording program for image data processing method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8462841B2 (en) 2007-12-31 2013-06-11 Netlogic Microsystems, Inc. System, method and device to encode and decode video data having multiple video data formats
CN117063468A (en) * 2021-03-30 2023-11-14 高通股份有限公司 Video processing using multiple bit stream engines

Similar Documents

Publication Publication Date Title
US8462841B2 (en) System, method and device to encode and decode video data having multiple video data formats
EP1775961B1 (en) Video decoding device and method for motion compensation with sequential transfer of reference pictures
US8923384B2 (en) System, method and device for processing macroblock video data
US10230991B2 (en) Signal-processing apparatus including a second processor that, after receiving an instruction from a first processor, independantly controls a second data processing unit without further instrcuction from the first processor
US9392292B2 (en) Parallel encoding of bypass binary symbols in CABAC encoder
JP2012508485A (en) Software video transcoder with GPU acceleration
JP2006197587A (en) System and method for decoding dual video
US20060133512A1 (en) Video decoder and associated methods of operation
US20170019679A1 (en) Hybrid video decoding apparatus for performing hardware entropy decoding and subsequent software decoding and associated hybrid video decoding method
US20070014367A1 (en) Extensible architecture for multi-standard variable length decoding
JP2004356851A (en) Compression apparatus for moving picture and imaging apparatus employing the same
US10728557B2 (en) Embedded codec circuitry for sub-block based entropy coding of quantized-transformed residual levels
US8406306B2 (en) Image decoding apparatus and image decoding method
US10798419B2 (en) Embedded codec circuitry for sub-block based encoding of quantized prediction residual levels
WO2009085788A1 (en) System, method and device for processing macroblock video data
JP5182285B2 (en) Decoding method and decoding apparatus
JP2002112268A (en) Compressed image data decoding apparatus
US20100278237A1 (en) Data processing circuit and processing method with multi-format image coding and decoding function
US7336834B2 (en) Image processing system
JP2010135885A (en) Image coding apparatus and method
US20030123555A1 (en) Video decoding system and memory interface apparatus
US20060129729A1 (en) Local bus architecture for video codec
JP2010074705A (en) Transcoding apparatus
KR100247977B1 (en) Video decoder having an extensible memory
JP2010268094A (en) Image decoder and image decoding method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08867755

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08867755

Country of ref document: EP

Kind code of ref document: A1