WO2008037113A1 - Apparatus and method for processing video data - Google Patents

Apparatus and method for processing video data Download PDF

Info

Publication number
WO2008037113A1
WO2008037113A1 PCT/CN2006/002518 CN2006002518W WO2008037113A1 WO 2008037113 A1 WO2008037113 A1 WO 2008037113A1 CN 2006002518 W CN2006002518 W CN 2006002518W WO 2008037113 A1 WO2008037113 A1 WO 2008037113A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
data
module
instruction
filter
Prior art date
Application number
PCT/CN2006/002518
Other languages
French (fr)
Inventor
Shilin Wang
Huaping Liu
Zesheng Yuan
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Priority to PCT/CN2006/002518 priority Critical patent/WO2008037113A1/en
Priority to CN200680055930.0A priority patent/CN101513067B/en
Publication of WO2008037113A1 publication Critical patent/WO2008037113A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44004Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving video buffer management, e.g. video decoder buffer or video display buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/423Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/438Interfacing the downstream path of the transmission network originating from a server, e.g. retrieving encoded video stream packets from an IP network
    • H04N21/4382Demodulation or channel decoding, e.g. QPSK demodulation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/804Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components

Definitions

  • This invention relates to an apparatus and a method for processing video data.
  • the processing can be performed in the context of decoding video data.
  • the decoding procedure mainly includes four stages: entropy or bit-stream decoding, inverse transformation and inverse quantization, motion compensation, and de-blocking filter (except for MPEG2) .
  • entropy or bit-stream decoding For supporting high-resolution HD video, a high performance decoding process is required.
  • All current video standards use macroblocks (MBs) , particularly MBs of 16X16 pixels as the luma processing unit.
  • the MB can be divided into sixteen sub-blocks of 4X4 pixel.
  • the corresponding colour or chroma data unit (Cb and Cr) is the 8x8 pixel block, which can be divided into sixteen 2X2 pixel blocks .
  • the MBs are processed one by one, i.e. processing of a new MB begins after the previous MB is finished, and each processing block handles one MB at a time.
  • Entropy decoding E for a MB comprises decoding the non-residual 10a and decoding the residual syntax element 10b.
  • inverse transformation and inverse quantization ITIQ are performed 10c.
  • motion compensation MC the prediction data are computed 1Od and the picture data are reconstructed 1Oe.
  • the single blocks work simultaneously, but all on the same MB. Each block starts working when it has enough input data from the previous block.
  • the duration of the process per MB is the cycle number clO from decoding the first MB level syntax to getting the reconstructed data for the last sub- block.
  • the same steps lla-lle are performed for the next MB, wherein the first step of decoding 11a is executed after the last step of reconstructing the current MB 1Oe is finished.
  • the present invention provides a universal, modular and decentralized processing flow that enables high performance processing of video data according to a plurality of encoding standards. Moreover, the single function blocks can be used for a plurality of coding formats and standards.
  • each of the different video standards has its special features.
  • the proposed architecture uses a combination of hardware and firmware (i.e. software that is not modified during normal operation and that is adapted to interact with particular hardware) to meet the requirements of different applications.
  • the firmware implements the different video standard algorithms, while the hardware provides a modular platform that is adapted for the implementation. That means, it is possible to add some firmware code to support a particular video standard, and it is possible to remove some firmware code to make it not support a particular video standard. Thus, it is possible to adapt the decoder later to new standards.
  • the interface between hardware and firmware is the instruction set.
  • the hardware architecture comprises elements of a conventional RISC processor and re-programmable video processing function blocks, which are embedded into the structure of the RISC processor. That means e.g. that the video processing function blocks use the same channels for inter-block communication as the conventional RISC processing blocks, such as arithmetic-logic unit (ALU) , fetch unit, queue unit etc.
  • the video decoding function blocks are sub-units within a specialized RISC processor.
  • RISC is a processor design philosophy that uses a simple set of instructions that however take about the same amount of time to execute as a corresponding more complex set of instructions on a complex instruction set computer (CISC) .
  • the single function blocks of the architecture can be re-programmed to comply with new formats and standards.
  • the multi-standard decoder adaptable for all current video standards uses 4X4 pixel blocks for luma and 2X2 pixel blocks for chroma (Cb and Cr) as the minimum processing unit. Although blocks of this size are not employed in some video standards, it is possible to support the minimum processing unit also for those video standards, including MPEG2.
  • the function blocks are controlled in a decentralized manner.
  • a device for decoding video data comprises at least means for providing decoded instructions, a queuing unit for receiving the decoded instructions and receiving result data, and for providing instructions on an instruction bus, an arithmetic-logic unit (ALU) and a data cache unit receiving instructions through the instruction bus and providing data to the queuing unit, a motion compensation unit, an ITIQ unit for performing inverse transformation (namely inverse DCT) and inverse quantization, an entropy decoding unit, and a filter unit, wherein the motion compensation unit, the ITIQ unit, the entropy decoding unit and the filter unit receive instructions through the instruction bus and provide data to the queuing unit.
  • Fig.l a conventional video data processing flow
  • Fig.2 a pipelined video data processing flow
  • Fig.4 the position of macroblocks within a picture
  • Fig.5 an architecture comprising video processing modules embedded in a RISC processor
  • Fig.6 details of the motion compensation module.
  • the present invention uses a dedicated architecture and a corresponding instruction set.
  • the instruction set can be divided into two parts, namely the general instructions similar to the conventional RISC (reduced instruction set computer) instructions, and the specialized instructions dedicated to video decoding.
  • the general instructions are mainly used for controlling the decoding procedure, and the specialized instructions are mainly used for processing the computation during the decoding procedure.
  • the instructions are 32 bit wide.
  • the video data to be processed and the instructions are stored in SDRAMs.
  • the architecture according to the invention uses a pipeline for instruction processing. As shown in Fig.3, any instruction execution can be divided into the following five stages:
  • Fetch fetch the instruction from the SDRAM
  • Decode translate the instruction's format into the internal format
  • a first instruction il starts with being fetched.
  • next phase c2 it is translated into the internal format, while the next instruction i2 is being fetched.
  • the fetched first instruction il is stored in a pipeline.
  • next phase c3 while the two previous instructions il,i2 are in the pipeline, a new instruction i3 starts.
  • Fig.2 shows a generalized pipelined video data processing flow according to one aspect of the invention.
  • the currently processed pixel data are copied into a pixel buffer for faster access.
  • Input data are processed in an entropy decoding stage E by first decoding the non-residual data 20a and then decoding the residual data 20b, for which the decoded non-residual data are required. While decoded data are output of the residual data decoding procedure 20b, they are successively passed (through the queuing unit, not shown here) to the next step 20c of inverse transformation and inverse quantization ITIQ.
  • the entropy decoding stage E waits for a certain time after it has processed its data 20b and before it starts processing new data 21a, to prevent buffer overflow due to slower units, e.g. motion compensation MC.
  • At least the specialized video function modules can hold two or more MBs to be processed in parallel. If only two MBs in parallel are supported, the buffer for storing MVs and residual data in the related modules stores the MVs and residual data for the two MBs. Simultaneous processing of three or more MBs can be supported if additional buffer space is available within the modules.
  • the hardware architecture can include five parts: an instruction fetch part, an instruction decoding part, an instruction issuing part, an instruction execution part and a result return part.
  • the architecture is shown in Fig.5.
  • the instruction fetch part includes an instruction cache interface module 51, an instruction cache module 52 and the actual fetch module 53 including a program counter PC.
  • the instruction decoding part includes the decoding module 54, and the instruction issuing part includes a queue module 55.
  • the instruction execution part includes a data cache module 57, a data cache interface module 58, an ALU module 59, a motion compensation module 510, a motion compensation interface module 511, an Inverse Transform/Inverse Quantization (ITIQ) module 512, an entropy decoder module 513, an entropy decoder interface module 514, a de-blocking filter module 515, a filter interface module 516 and a result arbiter module 56.
  • the result arbiter module 56 sends intermediate results, i.e.
  • the input data come from an SDRAM via the "visiting SDRAM bus", and the final results are returned to the same SDRAM using the same bus. For the returning data, it could also be a separate bus.
  • the result return part includes a visiting bus arbiter module 517.
  • the instruction cache module 52 is mainly responsible for providing the instructions in this architecture. Through it, the instructions can be faster accessed than directly through the external SDRAM, since it stores instructions in an internal SRAM. The next instruction is determined by a program counter PC within the fetch module 53. If the access hits, i.e. if the determined instruction is cached in the SRAM of the instruction cache 52, the instruction cache module 52 sends the instruction data back. If the access misses, which means that the desired instruction does not exist in the SRAM of the instruction cache, then a command for getting the corresponding instruction from the SDRAM is issued to the instruction cache interface module 51. After the instruction is acquired from the instruction cache interface module 51, the instruction data are provided to the instruction cache module 52.
  • the fetch module 53 is responsible for determining the PC value according to the procedure of the program execution.
  • the PC value is sent to the instruction cache module 52. If a jump or branch instruction can be met, the PC value in the fetch module 53 is changed accordingly; otherwise, it will be automatically increased by a defined increment.
  • the decode module 54 decodes the instruction, i.e. it transfers the external format into an internal instruction format.
  • the external format depends on the firmware, while the internal format is used by the function module that will receive the instruction.
  • the instructions After being decoded into the internal format by the decode module 54, the instructions are sent to the queue module 55, where they are stored, in principle in a FIFO manner (first-in-first-out) , in an operation queue 550 waiting for being issued to the function modules.
  • the queue module 55 further comprises general registers 551 and specialized registers 552. When for an instruction being the first in the queue the corresponding function module is not busy, and all of the related source registers' values for this instruction are prepared, then the instruction are put on the issue bus IB, along with the data read from the general registers 551 and the specialized registers 552. Some instructions on the issue bus IB however may require no further data to be provided.
  • the general registers 551 provide data on a general data bus GDB, which is e.g. 32 bit wide, and the specialized data registers provide data on a special data bus SDB, which is e.g. 128 bit wide.
  • every function module monitors the common issue bus IB and accepts instructions that are directed to it. Instructions can be conventional RISC processor instructions and can be addressed as in conventional RISC processors, e.g. by an address portion within the instruction. After execution in the respective functional module, the result is sent back via an intermediate result bus IRB to the queue module 55, and the queue module updates its destination registers.
  • the queue module 55 can in a way be regarded as the control centre of the architecture. Though the processing is more decentralized than in conventional video decoding systems, the queue module controls the instruction flow.
  • the RISC processor elements that control the decoding process e.g. the queue, are directly involved in the decoding process, so that only little communication between modules is necessary for the assignment of new data and instructions to the function modules.
  • the data cache module 57 contains an SRAM to enable faster accessing the picture data than directly through the external SDRAM. This module is mainly responsible for performing data load and store operations. When it captures from the issue bus IB an instruction for accessing the data cache, it calculates the access address according to the data of the instruction. For each data access, it first checks if the data exists in its SRAM. If the access of a store operation hits, the data in the SRAM of the data cache module 57 are updated. If the access of a load operation hits, the data are read and sent to the intermediate result bus IRB.
  • The- entropy module 513 is the start point of the decoding procedure, obtaining all the elements for reconstructing the pictures from the encoded bit-stream. It decodes from the bit-stream the syntax elements according to the utilized video standard, including e.g. differential motion vector (mvd) , reference index, residual data etc. This module performs various computations incl . motion vector computation according to the mvd, computing the intra-mode according to pred_mode_flag and intra_luma_pred_mode, and computing the neighbour information for decoding the syntax elements .
  • mvd differential motion vector
  • the entropy module may automatically read the bit-stream to be decoded from an external SDRAM according to an address, which the programmer can set in the instruction.
  • the entropy module 513 works together with the entropy interface module 514 to obtain the bit-stream from the SDRAM. If the entropy module is idle because it has currently no bit-stream data to process, it may send a request for data to the entropy interface module 514.
  • the entropy interface module either sends back the required data to the entropy module 513, or if it has no data to provide then it may send a request for data to the SDRAM.
  • the motion compensation (MC) module 510 includes two parts or sub-modules (not shown in Fig.5): intra MC for intra prediction and inter MC for inter prediction.
  • intra MC for intra prediction, the prediction mode and residual data, which the entropy module decoded from the compressed bit-stream before, are sent to the intra MC sub-module.
  • the intra MC sub-module is invoked by an instruction, calculates the prediction of a current 4x4 block, adds the prediction and residual data and thus gets the motion compensated (i.e. reconstructed) data for the block.
  • the inter MC sub-module performs the inter motion compensation.
  • this part needs to find appropriate integer samples based on motion vectors and reference index (refidx) of a sub block (4x4 block for luma, 2X2 block for Cb and Cr chroma) . Then the fractional prediction samples are derived through interpolation.
  • the MC interface module 511 provides the reference data for inter prediction in the inter MC sub-module. If currently required reference data for the inter MC sub-module are not available in the buffers of the MC interface module 511, the MC interface module 511 sends a request to the SDRAM to obtain those data. After the required data were returned to the MC interface module, they are stored in buffers and sent to the MC module 510.
  • the Inverse Transform and Inverse Quantization (ITIQ) module 512 is responsible for inverse scanning, inverse transformations and inverse quantization operations on 4*4 pixel sub-blocks of the residual data. It returns its result via the intermediate result bus IRB into the queue module 55.
  • the data that are required by the ITIQ module are provided by the respective instruction.
  • the filter module 515 is applied to every decoded macro- block (MB) for reducing blocking distortion.
  • the filter smoothes block edges, thus improving the appearance of the decoded frames.
  • the filter module 515 can deal with the filter process of a MB (not mbaff) or a MB pair (mbaff) . It receives the required data of a current MB for filtering, such as MVs, "none zero" information, frame or field flag, the pixel data etc. through the instructions. For the mbaff mode case, it reads those data of the other MB from a filter interface module 516.
  • the filter interface module 516 is for storing and providing the neighbour MVs and the pixel data of the neighbour 4*4 sub-block, and for storing the loop-filtered and finally processed data into the SDRAM. If the neighbour information and filtered data would be stored into the SDRAM directly, the process will be very slow. Therefore these data are- stored into a buffer within the filter interface module 516, and then stored in the SDRAM using a burst write function, e.g. when the buffer is full. Thus the SDRAM efficiency is improved significantly. Also burst read operation from SDRAM to the interface module can in principle be used.
  • Several function modules return intermediate results, which require further processing and which are sent back to the queue module.
  • these modules are an arithmetic-logic unit (ALU) 59, the data cache 57, the entropy module 513, the ITIQ 512, and the MC block 510.
  • ALU arithmetic-logic unit
  • the queue module can accept only one result at a time, a result bus arbiter module 56 selects one result at a time and transfers it via the intermediate result bus IRB to the queue module 55.
  • the result bus arbiter module may have internal buffers to store the results received from the function blocks while waiting for the intermediate result bus IRB.
  • the visiting bus arbiter module 517 selects one bus request to be active at a time, according to predefined priorities for the different interface modules.
  • the fetch module 53 fetches an instruction from the instruction cache module 52 according to the program counter in the fetch module.
  • the instruction is sent via the instruction decoder module 54 to the instruction queue module 55.
  • the instructions in the instruction queue module 55 are issued to the related function module according to the respectively required operation.
  • the function module performs its processing according to the instruction.
  • the operation result is returned via the intermediate result bus IRB to the register file
  • the function modules may send requirement signals to their respectively related interface module if required data are missing.
  • the video decoding specific function modules such as entropy decoder 513, ITIQ 512, motion compensation 510 and de-blocking filter 515, can be configured depending on a particular application to perform the actual operation required for decoding the respective coding format.
  • the configuration can be based on firmware or software.
  • the motion compensation block can perform certain operations for decoding according to MPEG-4 Video standard, and other operations according to the AVC standard.
  • the decoding procedure is controlled by the program, which uses always the same, defined instruction set. Further, the SDRAM storage space is shared by the program, the input bit-stream, the output decoding result and temporary data created during program execution. Before the decoding, some parts of the bit-stream are automatically put into the program.
  • SDRAM by the related hardware. New parts of the bit-stream are successively stored in the SDRAM little by little automatically. During the decoding procedure, the decoder uses the bit-streams little by little. At the same time, the reconstructed data being the picture data computed by the decoding architecture are stored into the SDRAM. The different stages of the processing however use separate areas of the SDRAM.
  • the entropy module 513 and the entropy interface module 514 automatically read the bit-stream from a fixed SDRAM space according to the corresponding address in the entropy interface module 514.
  • the address is increased by hardware, wherein the address will continue at the minimum address after the maximum address of the bit-stream address space in the SDRAM is reached.
  • the de-blocking filter module 515 and the filter interface module 516 store the decoded result into a fixed SDRAM space automatically according to a corresponding address that is provided by the program.
  • the decoding procedure controlled by the firmware can be divided into three steps: First step is to decode the parameters on picture or slice level: If those parameters of the picture or slice level (such as QP, weighted prediction parameters, picture size, slice type etc.) are useful for decoding the other syntax elements, they will be stored in so-called global registers, The global registers are connected with the function modules, and control the instruction execution. Second step is to decode the syntax elements on MB level: these elements are decoded one by one. Like the picture or slice level parameters, these elements (such as macro-block type, frame or field flag) are stored into the global registers if they will control the other function modules.
  • This architecture allows that the entropy module, ITIQ module, MC module and filter module are working in parallel on different MBs.
  • the third step is post decoding: after decoding all the elements of a macro-block, the firmware computes the next MB position in the whole picture. For the last MB of the picture, the firmware will do the DBP buffer management, and then continue with decoding the next picture.
  • the basic processing unit is the MB.
  • the position of a MB in the whole picture is defined as shown in Fig.4, assuming the size of the picture is M*N.
  • the execution within each of the functional modules is similar. Taking the MC module 510 as an example, the execution can be divided into the following steps, cf. Fig.5.
  • the instruction (in internal format, i.e. decoded) which was received in the queue module 55 from the decode module, and if necessary queued in the operation queue 550, is sent into the motion compensation module 510.
  • the instruction brings some data along, e.g. motion vectors and/or the residual data. These data may be stored in an internal buffer MCBUF.
  • the MC module 510 begins to execute the instruction. During execution, if the required data are available within the internal buffer MCBUF of the MC module 510 (e.g. from the previous MB) , those data will be used immediately. If the reference data are missing, the MC module sends a request signal to the MC interface module 511. If the MC interface module 511 finds those data in its internal buffer, then it returns these data to the MC module 510. Otherwise, the MC interface module 511 sends a request to the visiting bus arbiter module 517 which connects to the external SDRAM. The visiting bus arbiter 517 gets requests from all the interface modules, and selects one to visit the SDRAM and get the data.
  • the motion compensation result is sent to the result arbiter module 56, which gets all the results from the function modules and selects one after the other for returning to the queue module 55.
  • the result data after execution are written back to the registers 551,552 in the queue module 55, and the value in the registers 551,552 of the queue module 55 is updated.
  • An advantage of the invention is that the idle time of processing blocks is reduced. This leads to an improved efficiency, namely either less power consumption with a similar performance, or increased performance with comparable power consumption.
  • An improved device for decoding video data comprises common elements of a RISC processor, including instruction providing unit, queuing unit and ALU, and special video processing modules, wherein the video processing modules are embedded in the RISC processor, so that they also receive instructions through the instruction bus and provide data to the queuing unit like the common RISC processor elements.
  • the special video processing modules include a MC unit, means for performing IDCT and inverse quantization, an entropy decoding unit and a filter unit .
  • the invention is advantageous for video decoding products, particularly for HD resolution decoders implemented in a modular fashion, both in hardware or software, such as e.g. multi-standard decoders for H.264, VC-I, MPEG-2, AVC etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Video decoding includes very similar processing steps for different standards. The processing can work independently and in parallel in separate modules. Known multi-standard video decoders suffer from bottlenecks resulting from centrally organized processing. An improved apparatus for decoding video data comprises common elements of a RISC processor, including instruction providing unit (51,52,53, 54), queuing unit (55) and ALU (59), and special video processing modules, wherein the video processing modules are embedded in the RISC processor, so that they also receive instructions through the instruction bus (IB) and provide (IRB) data to the queuing unit (55), like the common RISC processor elements. The special video processing modules include a motion compensation unit (510), means (512) for performing IDCT and inverse quantization, an entropy decoding unit (513) and a filter unit (515).

Description

Apparatus and method for processing video data
Field of the invention
This invention relates to an apparatus and a method for processing video data. In particular, the processing can be performed in the context of decoding video data.
Background
For today's video standards e.g. MPEG2, AVS, VC-I and H264, the decoding procedure mainly includes four stages: entropy or bit-stream decoding, inverse transformation and inverse quantization, motion compensation, and de-blocking filter (except for MPEG2) . For supporting high-resolution HD video, a high performance decoding process is required. All current video standards use macroblocks (MBs) , particularly MBs of 16X16 pixels as the luma processing unit. The MB can be divided into sixteen sub-blocks of 4X4 pixel. The corresponding colour or chroma data unit (Cb and Cr) is the 8x8 pixel block, which can be divided into sixteen 2X2 pixel blocks .
It is desirable to have a decoder chip that can process all current standards. The traditional approach is to put the individual decoding cores into one chip. However, the gate count of that chip will be high: though function blocks for different standards are similar, the processing details differ. Therefore function blocks for different standards are usually implemented in parallel. Further, programmable architectures exist in which the actual video processing is performed by software programs or in which the function blocks for video processing are controlled by separate processing cores. This requires a high amount of control information between the function blocks and the processing cores, usually on shared data buses.
Conventionally, the MBs are processed one by one, i.e. processing of a new MB begins after the previous MB is finished, and each processing block handles one MB at a time. This is depicted in Fig.l. Entropy decoding E for a MB comprises decoding the non-residual 10a and decoding the residual syntax element 10b. Then, inverse transformation and inverse quantization ITIQ are performed 10c. In the next step motion compensation MC, the prediction data are computed 1Od and the picture data are reconstructed 1Oe. The single blocks work simultaneously, but all on the same MB. Each block starts working when it has enough input data from the previous block. The duration of the process per MB is the cycle number clO from decoding the first MB level syntax to getting the reconstructed data for the last sub- block. The same steps lla-lle are performed for the next MB, wherein the first step of decoding 11a is executed after the last step of reconstructing the current MB 1Oe is finished.
Summary of the Invention
In order to reduce the gate count of a multi-standard video decoding chip, a uniform architecture is desirable that can support the decoding of several video standards. Further, known video processing systems suffer from the bottlenecks that result from centrally organized processing stages with shared data busses, shared memories and centralized control units that reduce the processing performance. The present invention provides a universal, modular and decentralized processing flow that enables high performance processing of video data according to a plurality of encoding standards. Moreover, the single function blocks can be used for a plurality of coding formats and standards.
Each of the different video standards has its special features. In order to support all of the video standards, the proposed architecture uses a combination of hardware and firmware (i.e. software that is not modified during normal operation and that is adapted to interact with particular hardware) to meet the requirements of different applications. The firmware implements the different video standard algorithms, while the hardware provides a modular platform that is adapted for the implementation. That means, it is possible to add some firmware code to support a particular video standard, and it is possible to remove some firmware code to make it not support a particular video standard. Thus, it is possible to adapt the decoder later to new standards. The interface between hardware and firmware is the instruction set.
According to one aspect of the invention, the hardware architecture comprises elements of a conventional RISC processor and re-programmable video processing function blocks, which are embedded into the structure of the RISC processor. That means e.g. that the video processing function blocks use the same channels for inter-block communication as the conventional RISC processing blocks, such as arithmetic-logic unit (ALU) , fetch unit, queue unit etc. In principle, the video decoding function blocks are sub-units within a specialized RISC processor. RISC is a processor design philosophy that uses a simple set of instructions that however take about the same amount of time to execute as a corresponding more complex set of instructions on a complex instruction set computer (CISC) .
In one embodiment of the invention, the single function blocks of the architecture can be re-programmed to comply with new formats and standards.
According to one aspect of the invention the multi-standard decoder adaptable for all current video standards uses 4X4 pixel blocks for luma and 2X2 pixel blocks for chroma (Cb and Cr) as the minimum processing unit. Although blocks of this size are not employed in some video standards, it is possible to support the minimum processing unit also for those video standards, including MPEG2.
According to one aspect of the invention, the function blocks are controlled in a decentralized manner.
According to one aspect of the invention, a device for decoding video data comprises at least means for providing decoded instructions, a queuing unit for receiving the decoded instructions and receiving result data, and for providing instructions on an instruction bus, an arithmetic-logic unit (ALU) and a data cache unit receiving instructions through the instruction bus and providing data to the queuing unit, a motion compensation unit, an ITIQ unit for performing inverse transformation (namely inverse DCT) and inverse quantization, an entropy decoding unit, and a filter unit, wherein the motion compensation unit, the ITIQ unit, the entropy decoding unit and the filter unit receive instructions through the instruction bus and provide data to the queuing unit. Advantageous embodiments of the invention are disclosed in the dependent claims, the following description and the figures.
Brief description of the drawings
Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in
Fig.l a conventional video data processing flow;
Fig.2 a pipelined video data processing flow;
Fig.3 pipeline stages of instruction execution;
Fig.4 the position of macroblocks within a picture;
Fig.5 an architecture comprising video processing modules embedded in a RISC processor; and
Fig.6 details of the motion compensation module.
Detailed description of the invention
The present invention uses a dedicated architecture and a corresponding instruction set. The instruction set can be divided into two parts, namely the general instructions similar to the conventional RISC (reduced instruction set computer) instructions, and the specialized instructions dedicated to video decoding. The general instructions are mainly used for controlling the decoding procedure, and the specialized instructions are mainly used for processing the computation during the decoding procedure. Exemplarily, the instructions are 32 bit wide.
The video data to be processed and the instructions are stored in SDRAMs. The architecture according to the invention uses a pipeline for instruction processing. As shown in Fig.3, any instruction execution can be divided into the following five stages:
Fetch: fetch the instruction from the SDRAM;
Decode: translate the instruction's format into the internal format;
Issue: issue the instruction into the function modules;
Execute: execute the instruction for the function modules;
Return: return the execution result
E.g. in one phase cl a first instruction il starts with being fetched. In then next phase c2 it is translated into the internal format, while the next instruction i2 is being fetched. In this phase, the fetched first instruction il is stored in a pipeline. In the next phase c3, while the two previous instructions il,i2 are in the pipeline, a new instruction i3 starts.
Fig.2 shows a generalized pipelined video data processing flow according to one aspect of the invention. The currently processed pixel data are copied into a pixel buffer for faster access. Input data are processed in an entropy decoding stage E by first decoding the non-residual data 20a and then decoding the residual data 20b, for which the decoded non-residual data are required. While decoded data are output of the residual data decoding procedure 20b, they are successively passed (through the queuing unit, not shown here) to the next step 20c of inverse transformation and inverse quantization ITIQ. In this example, the entropy decoding stage E waits for a certain time after it has processed its data 20b and before it starts processing new data 21a, to prevent buffer overflow due to slower units, e.g. motion compensation MC. At least the specialized video function modules can hold two or more MBs to be processed in parallel. If only two MBs in parallel are supported, the buffer for storing MVs and residual data in the related modules stores the MVs and residual data for the two MBs. Simultaneous processing of three or more MBs can be supported if additional buffer space is available within the modules.
In the following, the hardware architecture according to the invention is described. Corresponding to the pipeline stages of Fig.3, the architecture can include five parts: an instruction fetch part, an instruction decoding part, an instruction issuing part, an instruction execution part and a result return part. The architecture is shown in Fig.5.
The instruction fetch part includes an instruction cache interface module 51, an instruction cache module 52 and the actual fetch module 53 including a program counter PC. The instruction decoding part includes the decoding module 54, and the instruction issuing part includes a queue module 55. The instruction execution part includes a data cache module 57, a data cache interface module 58, an ALU module 59, a motion compensation module 510, a motion compensation interface module 511, an Inverse Transform/Inverse Quantization (ITIQ) module 512, an entropy decoder module 513, an entropy decoder interface module 514, a de-blocking filter module 515, a filter interface module 516 and a result arbiter module 56. The result arbiter module 56 sends intermediate results, i.e. results from the other blocks of the execution part, to the queuing stage 55 before the next processing step is executed. The input data come from an SDRAM via the "visiting SDRAM bus", and the final results are returned to the same SDRAM using the same bus. For the returning data, it could also be a separate bus. The result return part includes a visiting bus arbiter module 517.
In the following, the mentioned functional modules are described.
The instruction cache module 52 is mainly responsible for providing the instructions in this architecture. Through it, the instructions can be faster accessed than directly through the external SDRAM, since it stores instructions in an internal SRAM. The next instruction is determined by a program counter PC within the fetch module 53. If the access hits, i.e. if the determined instruction is cached in the SRAM of the instruction cache 52, the instruction cache module 52 sends the instruction data back. If the access misses, which means that the desired instruction does not exist in the SRAM of the instruction cache, then a command for getting the corresponding instruction from the SDRAM is issued to the instruction cache interface module 51. After the instruction is acquired from the instruction cache interface module 51, the instruction data are provided to the instruction cache module 52.
The fetch module 53 is responsible for determining the PC value according to the procedure of the program execution. The PC value is sent to the instruction cache module 52. If a jump or branch instruction can be met, the PC value in the fetch module 53 is changed accordingly; otherwise, it will be automatically increased by a defined increment. The decode module 54 decodes the instruction, i.e. it transfers the external format into an internal instruction format. The external format depends on the firmware, while the internal format is used by the function module that will receive the instruction.
After being decoded into the internal format by the decode module 54, the instructions are sent to the queue module 55, where they are stored, in principle in a FIFO manner (first-in-first-out) , in an operation queue 550 waiting for being issued to the function modules. The queue module 55 further comprises general registers 551 and specialized registers 552. When for an instruction being the first in the queue the corresponding function module is not busy, and all of the related source registers' values for this instruction are prepared, then the instruction are put on the issue bus IB, along with the data read from the general registers 551 and the specialized registers 552. Some instructions on the issue bus IB however may require no further data to be provided. The general registers 551 provide data on a general data bus GDB, which is e.g. 32 bit wide, and the specialized data registers provide data on a special data bus SDB, which is e.g. 128 bit wide. At the same time, every function module monitors the common issue bus IB and accepts instructions that are directed to it. Instructions can be conventional RISC processor instructions and can be addressed as in conventional RISC processors, e.g. by an address portion within the instruction. After execution in the respective functional module, the result is sent back via an intermediate result bus IRB to the queue module 55, and the queue module updates its destination registers. Thus, the queue module 55 can in a way be regarded as the control centre of the architecture. Though the processing is more decentralized than in conventional video decoding systems, the queue module controls the instruction flow.
Advantageously, the RISC processor elements that control the decoding process, e.g. the queue, are directly involved in the decoding process, so that only little communication between modules is necessary for the assignment of new data and instructions to the function modules.
The data cache module 57 contains an SRAM to enable faster accessing the picture data than directly through the external SDRAM. This module is mainly responsible for performing data load and store operations. When it captures from the issue bus IB an instruction for accessing the data cache, it calculates the access address according to the data of the instruction. For each data access, it first checks if the data exists in its SRAM. If the access of a store operation hits, the data in the SRAM of the data cache module 57 are updated. If the access of a load operation hits, the data are read and sent to the intermediate result bus IRB.
If the access misses, which means that the desired data does not exist in the SRAM of the data cache module 57, a command for getting the corresponding data is issued to the data cache interface module 58, which sends a request signal to the SDRAM to get the required data. After the data cache interface module 58 acquired the data from the SDRAM, the data are updated into the data cache SRAM and sent to the intermediate result bus IRB. The- entropy module 513 is the start point of the decoding procedure, obtaining all the elements for reconstructing the pictures from the encoded bit-stream. It decodes from the bit-stream the syntax elements according to the utilized video standard, including e.g. differential motion vector (mvd) , reference index, residual data etc. This module performs various computations incl . motion vector computation according to the mvd, computing the intra-mode according to pred_mode_flag and intra_luma_pred_mode, and computing the neighbour information for decoding the syntax elements .
The entropy module may automatically read the bit-stream to be decoded from an external SDRAM according to an address, which the programmer can set in the instruction. The entropy module 513 works together with the entropy interface module 514 to obtain the bit-stream from the SDRAM. If the entropy module is idle because it has currently no bit-stream data to process, it may send a request for data to the entropy interface module 514. The entropy interface module either sends back the required data to the entropy module 513, or if it has no data to provide then it may send a request for data to the SDRAM.
The motion compensation (MC) module 510 includes two parts or sub-modules (not shown in Fig.5): intra MC for intra prediction and inter MC for inter prediction. For the intra prediction, the prediction mode and residual data, which the entropy module decoded from the compressed bit-stream before, are sent to the intra MC sub-module. The intra MC sub-module is invoked by an instruction, calculates the prediction of a current 4x4 block, adds the prediction and residual data and thus gets the motion compensated (i.e. reconstructed) data for the block.
The inter MC sub-module performs the inter motion compensation. When decoding, this part needs to find appropriate integer samples based on motion vectors and reference index (refidx) of a sub block (4x4 block for luma, 2X2 block for Cb and Cr chroma) . Then the fractional prediction samples are derived through interpolation.
The MC interface module 511 provides the reference data for inter prediction in the inter MC sub-module. If currently required reference data for the inter MC sub-module are not available in the buffers of the MC interface module 511, the MC interface module 511 sends a request to the SDRAM to obtain those data. After the required data were returned to the MC interface module, they are stored in buffers and sent to the MC module 510.
The Inverse Transform and Inverse Quantization (ITIQ) module 512 is responsible for inverse scanning, inverse transformations and inverse quantization operations on 4*4 pixel sub-blocks of the residual data. It returns its result via the intermediate result bus IRB into the queue module 55. The data that are required by the ITIQ module are provided by the respective instruction.
The filter module 515 is applied to every decoded macro- block (MB) for reducing blocking distortion. The filter smoothes block edges, thus improving the appearance of the decoded frames. The filter module 515 can deal with the filter process of a MB (not mbaff) or a MB pair (mbaff) . It receives the required data of a current MB for filtering, such as MVs, "none zero" information, frame or field flag, the pixel data etc. through the instructions. For the mbaff mode case, it reads those data of the other MB from a filter interface module 516.
The filter interface module 516 is for storing and providing the neighbour MVs and the pixel data of the neighbour 4*4 sub-block, and for storing the loop-filtered and finally processed data into the SDRAM. If the neighbour information and filtered data would be stored into the SDRAM directly, the process will be very slow. Therefore these data are- stored into a buffer within the filter interface module 516, and then stored in the SDRAM using a burst write function, e.g. when the buffer is full. Thus the SDRAM efficiency is improved significantly. Also burst read operation from SDRAM to the interface module can in principle be used.
Several function modules return intermediate results, which require further processing and which are sent back to the queue module. In this example, these modules are an arithmetic-logic unit (ALU) 59, the data cache 57, the entropy module 513, the ITIQ 512, and the MC block 510. But since the queue module can accept only one result at a time, a result bus arbiter module 56 selects one result at a time and transfers it via the intermediate result bus IRB to the queue module 55. The result bus arbiter module may have internal buffers to store the results received from the function blocks while waiting for the intermediate result bus IRB.
There are several modules that need to access the external SDRAM, such as instruction cache interface 51, data cache interface 58, MC interface 511, entropy interface 514 and filter interface 516. The requests from all of these blocks to the SDRAM can not be served at the same time. Therefore the visiting bus arbiter module 517 selects one bus request to be active at a time, according to predefined priorities for the different interface modules.
In the following, the decoding procedure according to the previously mentioned phases is described. First, the fetch module 53 fetches an instruction from the instruction cache module 52 according to the program counter in the fetch module. Second, the instruction is sent via the instruction decoder module 54 to the instruction queue module 55. Third, the instructions in the instruction queue module 55 are issued to the related function module according to the respectively required operation. In the fourth phase, the function module performs its processing according to the instruction. Fifth, the operation result is returned via the intermediate result bus IRB to the register file
551,552 in the instruction queue module 55. When executing an instruction, the function modules may send requirement signals to their respectively related interface module if required data are missing.
The video decoding specific function modules, such as entropy decoder 513, ITIQ 512, motion compensation 510 and de-blocking filter 515, can be configured depending on a particular application to perform the actual operation required for decoding the respective coding format. The configuration can be based on firmware or software. For example, the motion compensation block can perform certain operations for decoding according to MPEG-4 Video standard, and other operations according to the AVC standard.
Whatever video standard to support with this architecture, the decoding procedure is controlled by the program, which uses always the same, defined instruction set. Further, the SDRAM storage space is shared by the program, the input bit-stream, the output decoding result and temporary data created during program execution. Before the decoding, some parts of the bit-stream are automatically put into the
SDRAM by the related hardware. New parts of the bit-stream are successively stored in the SDRAM little by little automatically. During the decoding procedure, the decoder uses the bit-streams little by little. At the same time, the reconstructed data being the picture data computed by the decoding architecture are stored into the SDRAM. The different stages of the processing however use separate areas of the SDRAM.
When the reconstructed data in the SDRAM are needed for displaying or other purpose, those data are output by hardware circuitry automatically. If those data are useful for decoding further picture, they remain in the SDRAM. Otherwise the related space in the SDRAM is overwritten with new picture data.
During the decoding procedure, the entropy module 513 and the entropy interface module 514 automatically read the bit-stream from a fixed SDRAM space according to the corresponding address in the entropy interface module 514. The address is increased by hardware, wherein the address will continue at the minimum address after the maximum address of the bit-stream address space in the SDRAM is reached. The de-blocking filter module 515 and the filter interface module 516 store the decoded result into a fixed SDRAM space automatically according to a corresponding address that is provided by the program.
In this architecture, the decoding procedure controlled by the firmware can be divided into three steps: First step is to decode the parameters on picture or slice level: If those parameters of the picture or slice level (such as QP, weighted prediction parameters, picture size, slice type etc.) are useful for decoding the other syntax elements, they will be stored in so-called global registers, The global registers are connected with the function modules, and control the instruction execution. Second step is to decode the syntax elements on MB level: these elements are decoded one by one. Like the picture or slice level parameters, these elements (such as macro-block type, frame or field flag) are stored into the global registers if they will control the other function modules. This architecture allows that the entropy module, ITIQ module, MC module and filter module are working in parallel on different MBs.
The third step is post decoding: after decoding all the elements of a macro-block, the firmware computes the next MB position in the whole picture. For the last MB of the picture, the firmware will do the DBP buffer management, and then continue with decoding the next picture.
For all current video standards including MPEG2, H.264, AVS and VC-I, the basic processing unit is the MB. The position of a MB in the whole picture is defined as shown in Fig.4, assuming the size of the picture is M*N. The execution within each of the functional modules is similar. Taking the MC module 510 as an example, the execution can be divided into the following steps, cf. Fig.5.
First, the instruction (in internal format, i.e. decoded) which was received in the queue module 55 from the decode module, and if necessary queued in the operation queue 550, is sent into the motion compensation module 510. The instruction brings some data along, e.g. motion vectors and/or the residual data. These data may be stored in an internal buffer MCBUF.
Second, after getting the instruction and the related data, the MC module 510 begins to execute the instruction. During execution, if the required data are available within the internal buffer MCBUF of the MC module 510 (e.g. from the previous MB) , those data will be used immediately. If the reference data are missing, the MC module sends a request signal to the MC interface module 511. If the MC interface module 511 finds those data in its internal buffer, then it returns these data to the MC module 510. Otherwise, the MC interface module 511 sends a request to the visiting bus arbiter module 517 which connects to the external SDRAM. The visiting bus arbiter 517 gets requests from all the interface modules, and selects one to visit the SDRAM and get the data.
Third, if the requested data returned from the SDRAM, they are. stored in the MC interface module 511 and returned to MC module 510. Fourth, after its computation the motion compensation result is sent to the result arbiter module 56, which gets all the results from the function modules and selects one after the other for returning to the queue module 55. Fifth, the result data after execution are written back to the registers 551,552 in the queue module 55, and the value in the registers 551,552 of the queue module 55 is updated.
For those modules that have no related interface module, such as ALU 59 or ITIQ 512, the execution has only three steps, namely the first, fourth and fifth of the above description.
An advantage of the invention is that the idle time of processing blocks is reduced. This leads to an improved efficiency, namely either less power consumption with a similar performance, or increased performance with comparable power consumption.
The present invention prevents the bottlenecks resulting from centrally organized processing of known multi-standard video decoders . An improved device for decoding video data comprises common elements of a RISC processor, including instruction providing unit, queuing unit and ALU, and special video processing modules, wherein the video processing modules are embedded in the RISC processor, so that they also receive instructions through the instruction bus and provide data to the queuing unit like the common RISC processor elements. The special video processing modules include a MC unit, means for performing IDCT and inverse quantization, an entropy decoding unit and a filter unit .
The invention is advantageous for video decoding products, particularly for HD resolution decoders implemented in a modular fashion, both in hardware or software, such as e.g. multi-standard decoders for H.264, VC-I, MPEG-2, AVC etc.

Claims

Claims
1. Device for decoding video data, comprising means (51,52,53,54) for providing decoded instructions; a queuing unit (55) for receiving the decoded instructions and receiving result data (IRB) , and for providing instructions on an instruction bus (IB); - an arithmetic-logic unit (59) and a data cache unit (57) receiving instructions through the instruction bus (IB) and providing (IRB) data to the queuing unit (55) ; a motion compensation unit (510); - ITIQ means (512) for performing inverse transformation and inverse quantization; an entropy decoding unit (513) ; and a filter unit (515) , wherein the motion compensation unit (510), the ITIQ means (512) , the entropy decoding unit (513) and the filter unit (515) receive instructions through said instruction bus (IB) and provide (IRB) data to said queuing unit (55) .
2. Device according to claim 1, wherein the motion compensation unit (510), the ITIQ means (512), the entropy decoding unit (513) and the filter unit (515) are capable of simultaneously processing data of two or more macroblocks .
3. Device according to claim 1 or 2, wherein each of the motion compensation unit (510) , the ITIQ means (512), the entropy decoding unit (513) and the filter unit (515) can simultaneously process video data blocks of different size.
4. Device according to one of the claims 1-3, wherein the queuing unit (55) comprises an operation queue (550) for instructions and at least two data queues (551,552), wherein the two data queues (551,552) have different width.
5. Device according to one of the claims 1-4, wherein each of the motion compensation unit (510), the ITIQ means (512), the entropy decoding unit (513) and the filter unit (515) has means for detecting that it has free processing capacity, and upon said detecting requests a new instruction from the queuing unit (55) .
6. Device according to one of the claims 1-5, further comprising a result arbiter module (56) for providing said result data (IRB) to the queue module (55) , wherein the result arbiter module receives data from the data cache unit (57) , the arithmetic- logic unit (59), the motion compensation unit (510), the ITIQ means (512) , the entropy decoding unit (513) and the filter unit (515), and wherein the result arbiter module comprises means for selecting one of said results at a time.
7. Device according to one of the claims 1-6, wherein the video processing unit is a 4X4 pixel block for luma data and a 2X2 pixel block for chroma data.
8. Device according to one of the claims 1-7, wherein the filter module (515) is a de-blocking filter that has a first mode for filtering single macroblocks and a second mode for filtering macroblock pairs, the device further comprising a filter interface module (516) , wherein for said second mode macroblock data of a second macroblock are read from the filter interface module (516) .
9. Device according to one of the claims 1-8, further comprising a bus arbiter module (517) for connecting to an external memory, the bus arbiter module (517) having means for selecting one of a plurality of bus requests from different interface modules according to predefined priorities.
10. Device according to one of the claims 1-9, wherein the entropy decoding unit (513), the ITIQ means (512), the motion compensation unit (510) and the filter unit (515) can be firmware configured to perform their respective operation adapted to different video coding formats.
PCT/CN2006/002518 2006-09-25 2006-09-25 Apparatus and method for processing video data WO2008037113A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2006/002518 WO2008037113A1 (en) 2006-09-25 2006-09-25 Apparatus and method for processing video data
CN200680055930.0A CN101513067B (en) 2006-09-25 2006-09-25 Equipment for processing video data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2006/002518 WO2008037113A1 (en) 2006-09-25 2006-09-25 Apparatus and method for processing video data

Publications (1)

Publication Number Publication Date
WO2008037113A1 true WO2008037113A1 (en) 2008-04-03

Family

ID=39229695

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2006/002518 WO2008037113A1 (en) 2006-09-25 2006-09-25 Apparatus and method for processing video data

Country Status (2)

Country Link
CN (1) CN101513067B (en)
WO (1) WO2008037113A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010034204A1 (en) * 2008-09-25 2010-04-01 Mediatek Inc. Adaptive interpolation filter for video coding
TWI586149B (en) * 2014-08-28 2017-06-01 Apple Inc Video encoder, method and computing device for processing video frames in a block processing pipeline
US10205957B2 (en) * 2015-01-30 2019-02-12 Mediatek Inc. Multi-standard video decoder with novel bin decoding

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103379330A (en) * 2012-04-26 2013-10-30 展讯通信(上海)有限公司 Code stream data decoding pretreatment method and decoding method, processor and decoder
CN114339044B (en) * 2021-12-29 2024-06-18 天津天地伟业智能安全防范科技有限公司 High-throughput snapshot method and device based on message queue

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003179923A (en) * 2001-12-12 2003-06-27 Nec Corp Decoding system for dynamic image compression coded signal and method for decoding, and program for decoding
EP1351512A2 (en) * 2002-04-01 2003-10-08 Broadcom Corporation Video decoding system supporting multiple standards
EP1475972A2 (en) * 2003-05-08 2004-11-10 Matsushita Electric Industrial Co., Ltd. Apparatus and method for moving picture decoding device with parallel processing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003179923A (en) * 2001-12-12 2003-06-27 Nec Corp Decoding system for dynamic image compression coded signal and method for decoding, and program for decoding
EP1351512A2 (en) * 2002-04-01 2003-10-08 Broadcom Corporation Video decoding system supporting multiple standards
EP1475972A2 (en) * 2003-05-08 2004-11-10 Matsushita Electric Industrial Co., Ltd. Apparatus and method for moving picture decoding device with parallel processing

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010034204A1 (en) * 2008-09-25 2010-04-01 Mediatek Inc. Adaptive interpolation filter for video coding
US8548041B2 (en) 2008-09-25 2013-10-01 Mediatek Inc. Adaptive filter
US9762925B2 (en) 2008-09-25 2017-09-12 Mediatek Inc. Adaptive interpolation filter for video coding
TWI586149B (en) * 2014-08-28 2017-06-01 Apple Inc Video encoder, method and computing device for processing video frames in a block processing pipeline
US9762919B2 (en) 2014-08-28 2017-09-12 Apple Inc. Chroma cache architecture in block processing pipelines
US10205957B2 (en) * 2015-01-30 2019-02-12 Mediatek Inc. Multi-standard video decoder with novel bin decoding

Also Published As

Publication number Publication date
CN101513067B (en) 2012-02-01
CN101513067A (en) 2009-08-19

Similar Documents

Publication Publication Date Title
US8369420B2 (en) Multimode filter for de-blocking and de-ringing
Zhou et al. Implementation of H. 264 decoder on general-purpose processors with media instructions
US8116379B2 (en) Method and apparatus for parallel processing of in-loop deblocking filter for H.264 video compression standard
US7747088B2 (en) System and methods for performing deblocking in microprocessor-based video codec applications
US8516026B2 (en) SIMD supporting filtering in a video decoding system
US20060115002A1 (en) Pipelined deblocking filter
US20120328000A1 (en) Video Decoding System Supporting Multiple Standards
KR101158345B1 (en) Method and system for performing deblocking filtering
EP1673942A1 (en) Method and apparatus for processing image data
US9161056B2 (en) Method for low memory footprint compressed video decoding
WO2003047265A2 (en) Multiple channel video transcoding
JP2007295423A (en) Processing apparatus and method of image data, program for processing method of image data, and recording medium with program for processing method of image data recorded thereon
US20090010326A1 (en) Method and apparatus for parallel video decoding
Cheng et al. An in-place architecture for the deblocking filter in H. 264/AVC
US8036269B2 (en) Method for accessing memory in apparatus for processing moving pictures
US20100321579A1 (en) Front End Processor with Extendable Data Path
Engelhardt et al. FPGA implementation of a full HD real-time HEVC main profile decoder
Shafique et al. Optimizing the H. 264/AVC video encoder application structure for reconfigurable and application-specific platforms
JP2006157925A (en) Pipeline deblocking filter
WO2008037113A1 (en) Apparatus and method for processing video data
Koziri et al. Implementation of the AVS video decoder on a heterogeneous dual-core SIMD processor
WO2002087248A2 (en) Apparatus and method for processing video data
Kuo et al. An H. 264/AVC full-mode intra-frame encoder for 1080HD video
JP3861607B2 (en) Image signal decoding apparatus
EP1351512A2 (en) Video decoding system supporting multiple standards

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200680055930.0

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 06791107

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06791107

Country of ref document: EP

Kind code of ref document: A1