CN101072351B - Systems and methods of video compression deblocking - Google Patents

Systems and methods of video compression deblocking Download PDF

Info

Publication number
CN101072351B
CN101072351B CN 200710110359 CN200710110359A CN101072351B CN 101072351 B CN101072351 B CN 101072351B CN 200710110359 CN200710110359 CN 200710110359 CN 200710110359 A CN200710110359 A CN 200710110359A CN 101072351 B CN101072351 B CN 101072351B
Authority
CN
Grant status
Grant
Patent type
Prior art keywords
pixel
filter
pixels
logic circuit
predetermined
Prior art date
Application number
CN 200710110359
Other languages
Chinese (zh)
Other versions
CN101072351A (en )
Inventor
扎伊尔德·荷圣
Original Assignee
威盛电子股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Grant date

Links

Abstract

An exemplary video decoder comprises: an entropy decoder; a spatial decoder; combining logic; and an inloop deblocking filter. The entropy decoder receives an incoming coded bit stream. The spatial decoder receives the output of the entropy encoder and produces an encoded picture comprising a plurality of pixels. The combining logic combines a current picture with a prediction picture to produce a combined picture. The inloop deblocking filter receives the combined picture. The inloop deblocking filter comprises: logic configured to filter a predefined pixel group; and logic configured to filter each of the remaining pixel groups in the plurality after the predefined pixel group, according to a corresponding set of taps in a plurality of sets of taps, if the predefined pixel group meets acriteria.

Description

去方块效应滤波器和视频解码器与图形处理单元 Deblocking filter and a video decoder and a graphics processing unit

技术领域 FIELD

[0001] 本发明是关于图像压缩与解压缩,且尤其是关于具有图像压缩与解压缩特征的图形处理单元。 [0001] The present invention relates to an image compression and decompression, and in particular regarding an image compression and decompression feature graphics processing unit.

背景技术 Background technique

[0002] 个人计算机与消费性电子产品是用于各种娱乐用品。 [0002] the personal computer and consumer electronics products for a variety of entertainment products. 这些娱乐用品可以大致区分为2类:使用计算机制图(computer-generated graphics)的那些,例如计算机游戏;与使用压缩视频数据流(compressed video stream)的那些,例如预录节目到数字式激光视盘(DVD)上,或由有线电视或卫星业者提供数字节目(digital programming)至机顶盒(set-top box) ο第2种亦包含编码模拟视频数据流,例如由数字录像机(DVR, digitalvideo recorder)所执行。 These entertainment goods can be roughly divided into two categories: using computer graphics (computer-generated graphics) are those, for example, computer games; those, for example, pre-recorded program with the use of compressed video data streams (compressed video stream) to the digital laser disk ( on DVD), or by cable or satellite operators digital program (digital programming) to STB (set-top box) ο second type also comprises a coded analog video data stream, for example, performed by a digital video recorder (DVR, digitalvideo recorder) .

[0003] 计算机制图通常由图形处理单元(GPU, graphic processing unit)产生。 [0003] Computer graphics are usually generated by a graphics processing unit (GPU, graphic processing unit). 图形处理单元是一种建立在计算机游戏平台(computer game consoles)与一些个人计算机上一种特别的微处理器。 Graphics processing unit is a computer game on the platform (computer game consoles) with some of the personal computer to establish a special microprocessor. 图形处理单元是被最佳化为快速执行描绘三度空间基本对象(three-dimensional primitive objects),例如三角形、四边形等。 The graphics processing unit is best performed quickly into a three-dimensional drawing basic objects (three-dimensional primitive objects), such as triangular, square or the like. 这些基本对象是以多个顶点描述,其中每个顶点具有属性(例如颜色),且可施加纹理(texture)至该基本对象上。 These basic objects are described in a plurality of vertices, wherein each vertex having attributes (e.g. color), and may be applied to the texture (Texture) to the base object. 描绘的结果是二度空间像素阵列(two-dimensional array of pixels),显示在计算机的显示器或监视器上。 Depicts the results of the pixel array to second space (two-dimensional array of pixels), displayed on a computer display or monitor.

[0004] 视频数据流的编码与解码牵涉到不同种类的运算,例如,离散余弦变换(discretecosine transform)、移动估测(motion estimation)、移动补偿(motion compensation)、去方块效应滤波器(deblocking filter)。 [0004] encoding and decoding video data stream involves different kinds of operations, e.g., a discrete cosine transform (discretecosine transform), motion estimation (motion estimation), motion compensation (motion compensation), deblocking filter (deblocking filter ). 这些计算通常由一般用途中央处理器(CPU)结合特别的硬件逻辑电路,例如专用集成电路(ASIC, application specific integratedcircuit),来处理。 These calculations often in connection with specific hardware logic circuit by a general purpose central processing unit (CPU), for example, application specific integrated circuits (ASIC, application specific integratedcircuit), treated. 消费者因而需要多个运算平台以满足他们的娱乐需求。 Consumers therefore need more computing platforms to meet their entertainment needs. 因而需要可以处理计算机制图与视频编码/解码的单一计算平台。 Drawing can thus requiring processing computer with video encoding / decoding a single computing platform.

发明内容 SUMMARY

[0005] 在此揭露的实施例提供一种用于视频压缩去方块效应的系统与方法。 [0005] In the disclosed embodiments provide a system and method for video compression deblocking. 一种用于视频解码的示范性去方块效应滤波器包含:设置成用来判定多个像素群中的预定像素群的像素是否达到条件的逻辑电路;设置成当达到该条件时,先对该预定像素群的像素滤波的逻辑电路;以及设置成当达到该条件时,根据在多组滤波单元(set of taps)中的相应组滤波单元,循序对该多个像素群中剩下的像素群滤波的逻辑电路,其中该条件是由预定的计算与比较的集合而定,该预定的计算与比较为一组滤波单元。 Exemplary deblocking filter for a video decoding comprising: pixels arranged for determining a predetermined group of pixels in the plurality of pixel groups has reached logic circuit condition; when arranged so that when this condition is reached, the first predetermined pixel filtering logic pixel group; and when arranged so that when this condition is reached, according to the respective set of filter units in the filter unit a plurality of sets (set of taps) of, sequentially in the plurality of pixel groups remaining pixel group filtering logic circuit, wherein the predetermined condition is set by the calculation of the comparison set, the comparison with a predetermined calculation is a set of filter units.

[0006] 一种示范性视频解码器包含:熵解码器、空间解码器、组合逻辑电路与回路内去方块效应滤波器。 [0006] An exemplary video decoder comprises: an entropy decoder, deblocking filters spatial decoder, a combinational logic circuit with a loop. 该熵解码器接收输入编码比特流。 The entropy decoder receives an input encoded bit stream. 该空间解码器接收该熵解码器的输出并产生包含多个像素的编码图片。 The spatial decoder receives the output of the entropy decoder and generates a coded picture comprising a plurality of pixels. 该组合逻辑电路结合目前图片与预测图片以产生结合图片。 The combinational logic circuit with the current image and the prediction image to generate a combined image. 该回路内去方块效应滤波器接收该结合图片。 The inner loop deblocking filter receives the combined picture. 该回路内去方块效应滤波器包含:设置成对预定像素群滤波的逻辑电路;以及设置成当该预定像素群达到条件时,根据在多组滤波单元中的相应组滤波单元,对多个像素群中剩下的各像素群滤波的逻辑电路,其中该条件是由预定的计算与比较的集合而定,该预定的计算与比较为一组滤波单元。 The inner loop deblocking filter comprises: a pixel group arranged to predetermined logic filtering circuit; and when provided to the pixel group to reach a predetermined condition, according to the respective set of filter units in a multiple filter unit in the group, a plurality of pixels group remaining logic circuit of each pixel group filter, wherein the predetermined condition is set by the calculation of the comparison set, the comparison with a predetermined calculation is a set of filter units.

[0007] 一种示范性图形处理单元包含主处理接口与视频加速单元。 [0007] An exemplary graphics processing unit comprises a main processing unit to accelerate video interface. 该主处理接口,接收至少一视频加速指令。 The main processing interface, receiving at least a video acceleration command. 该视频加速单元,用于该至少一视频加速指令。 The video acceleration unit for accelerating the at least one video command. 该视频加速单元包含回路内去方块效应滤波器。 The video unit comprises accelerating the inner loop deblocking filter. 该回路内去方块效应滤波器包含:设置成判定多个像素群的预定像素群的像素是否达到第一条件的逻辑电路;设置成当达到该第一条件时,先对该预定像素群的像素滤波的逻辑电路;以及设置成当达到该第一条件时,根据在多组滤波单元(set of taps)中的相应组滤波单元,循序对该多个像素群中剩下的像素群滤波的逻辑电路,其中该条件是由预定的计算与比较的集合而定,该预定的计算与比较为一组滤波单元。 The inner loop deblocking filter comprising: setting a predetermined group of pixels into a plurality of pixel groups is determined whether the pixel of the first logic circuit condition; provided when the first condition has been met, the first predetermined pixels of the pixel group filtering logic circuit; and when a pixel group filter arranged to when the first condition is reached, according to the respective set of filter units in the filter unit a plurality of sets (set of taps) of, sequentially in the plurality of pixel groups remaining logic circuit, wherein the predetermined condition is set by the calculation of the comparison set, the comparison with a predetermined calculation is a set of filter units.

[0008] 附图说明 [0008] BRIEF DESCRIPTION OF DRAWINGS

[0009] 图I是用于图形与视频编码及/或解码的示范性运算平台的方块图。 [0009] Figure I is a block diagram for graphics and video encoding and / or decoding of an exemplary computing platform.

[0010] 图2是图I中该视频解码器160的方块图。 [0010] FIG. 2 is a view I of the video decoder block 160 of FIG.

[0011] 图3说明VC-I滤波器的子方块像素设置。 [0011] Figure 3 illustrates sub-blocks of pixels VC-I filter settings.

[0012] 图4是图1VC-1回路内去方块效应滤波器硬件加速逻辑电路400的硬件描述伪码的列表。 [0012] FIG. 4 is a deblocking filter within the hardware acceleration circuit of FIG 1VC-1 lists the logic circuit hardware description pseudocode 400.

[0013] 图5是图4行加速逻辑电路500的硬件描述语言程序码的列表。 [0013] FIG. 5 is a listing of FIG. 4, line 500 Acceleration logic circuit hardware description language program code.

[0014] 图6A至图6D形成图4、图5的行加速逻辑电路的方块图。 [0014] FIGS. 6A to 6D formed in FIG. 4, FIG. 5 rows acceleration logic circuits block of FIG.

[0015] 图7是图I的图形处理单元120的数据流程图。 [0015] FIG. 7 is a graphics processing unit 120 of FIG. I data flow chart.

[0016] 图8是H. 264所用的16x16大方块的方块图。 [0016] FIG. 8 is used in the H. 264 16x16 block of the large block of FIG.

[0017][主要元件标号说明] [0017] [Reference Numerals main elements]

[0018] 100〜系统、110〜一般用途CPU、120〜图形处理器(GPU)、130〜存储器、140〜总线、150〜视频加速单元(VPU)、160〜软件解码器、170〜视频加速驱动器。 [0018] 100~ system, 110~ general purpose CPU, 120~ graphics processor (GPU), 130~ memory, 140~ bus, 150~ video acceleration unit (VPU), 160~ software decoder, the video speed-up drive 170~ .

[0019] 205〜输入的比特流、210〜熵解码器、215〜空间解码器、220〜反向量化器、230〜反向离散余弦转换、235〜图形、245〜移动向量、250〜移动补偿、255〜先前解码图形、265〜预测图形、270〜空间补偿、280〜加法器、290〜去方块效应滤波器、295〜解码图形。 [0019] 205~ input bit stream, 210~ entropy decoder, 215~ spatial decoder, 220~ inverse quantizer, an inverse discrete cosine transformation 230~, 235~ graphics, 245~ motion vector, motion compensation 250~ , 255~ previously decoded graphics, 265~ prediction picture, 270~ spatial compensation, 280~ adder, 290~ deblocking filter, decode pattern 295~.

[0020] 310-320〜两个邻近4x4子方块、330〜垂直边界。 [0020] 310-320~ two adjacent 4x4 sub-block, 330~ vertical boundaries.

[0021] 400〜回路内去方块效应滤波器硬件加速逻辑电路、410〜模块定义区段、420〜迭代循环区段、430〜测试垂直参数区段、440〜比较循环参数与3区段、450〜示例区段。 [0021] The inner loop deblocking filter 400~ hardware acceleration logic, 410~ module definition section, 420~ iterative loop section, 430~ test parameters vertical section 3 and cycle parameters 440~ comparison section 450 ~ example section.

[0022] 500〜行加速逻辑电路、510〜模块定义区段、520〜像素值运算区段、530〜比较循环参数与3区段、540〜测试D0_FILTER区段、550〜更新状态区段。 [0022] 500~ line acceleration logic circuit, 510~ module definition section, the pixel value calculation section 520~, 530~ comparison section 3 and cycle parameters, 540~ D0_FILTER test zone, 550~ status update section.

[0023] 605-610-615-620 〜多工器、625-630-679 〜减法器、635-640-655-680 〜逻辑电路方块、645-650〜加法器、660-665-670〜暂存器、671〜P4暂存器输出、673〜P5暂存器输出。 [0023] ~ multiplexers 605-610-615-620, 625-630-679 ~ subtractor logic circuit blocks 635-640-655-680 ~, 645-650~ adder, temporary 660-665-670~ register, 671~P4 register output, 673~P5 output register. 681〜减法器、685〜加法器。 681~ subtractor, an adder 685~. 687-689-691-693〜多工器、697〜OR门。 687-689-691-693~ multiplexer, 697~OR door.

[0024] 710〜指令流处理器、720〜指令、730〜指令数据、740〜执行单元池、750〜纹理滤波单元、760〜纹理快取、770〜后包装器。 [0024] 710~ processor instruction stream, instruction 720~, 730~ instruction data, the execution unit 740~ pool, 750~ texture filtering means 760~ texture cache after 770~ wrapper.

具体实施方式[0025]用于视频编码/解码的运算平台 DETAILED DESCRIPTION [0025] A video encoding / decoding operation platform

[0026] 图I是用于图形与视频编码及/或解码的示范性运算平台的方块图。 [0026] Figure I is a block diagram for graphics and video encoding and / or decoding of an exemplary computing platform. 系统100包含一般用途CPUllO (此后称为主处理器)、图形处理器(GPU) 120、存储器130与总线140。 The system 100 comprises a general purpose CPUllO (hereinafter referred to as a main processor), a graphics processor (GPU) 120, a memory 130 and bus 140. 图形处理单元120包含视频加速单元(VPU) 150,其可加速视频编码及/或解码,将于后叙述。 Processing unit 120 includes a video graphics acceleration unit (VPU) 150, which may be accelerated video encoding and / or decoding, to be described later. 图形处理单元120的视频加速功能是可在图形处理单元120上执行的指令。 Acceleration video graphics processing unit 120 is an instruction to be executed on the graphics processing unit 120.

[0027] 软件解码器160与视频加速驱动器170位于存储器130中,而至少一部分的解码器160与视频加速驱动器170在主处理器110上执行。 [0027] The software decoder 160 and the video accelerator driver 170 located in the memory 130, and at least a portion of the video decoder 160 and the acceleration driver 170 executed on the host processor 110. 通过一个由视频加速驱动器170提供的主处理器接口180,解码器160亦可发出给图形处理单元120的视频加速指令。 Host processor interface 180 via an accelerated provided by the video driver 170, decoder 160 may issue a video graphics processing unit 120 to the acceleration command. 如此一来,系统100通过发出视频加速指令给图形处理单元120的主处理器软件(host processorsoftware)执行视频编码及/或解码,图形处理单元120通过加速解码器160的一部分响应这些指令。 Thus, the system 100 performs video encoding and / or decoding, the graphics processing unit 120 in response to instructions by the decoder portion of accelerated video acceleration 160 by issuing instructions to the graphics processing unit of the main processor software (host processorsoftware) 120 a. [0028] 在一些实施例中,仅有一小部分的解码器160在主处理器上执行,而大部分的解码器160是由图形处理单元120执行,在驱动器极少超载之下。 [0028] In some embodiments, only a small portion of the decoder 160 execute on the host processor, and most of the decoder 160 are performed by the graphics processing unit 120, under the drive little overloaded. 依此法,经常被执行的密集运算方块(computationally intensive blocks)被卸至图形处理单元120,而更复杂的运算是由主处理器110所执行。 So method is often performed intensive calculation block (computationally intensive blocks) are discharged to the graphic processing unit 120, and the more complex operations are executed by the main processor 110. 在一些实施例中,由图形处理单元120所实现的一个密集运算功能包含回路内去方块效应滤波器硬件加速逻辑器(inloop deblocking filterhardwareacceleration logic) 400,亦称为回路内方块效应滤波器400或去方块效应滤波器400,其稍后将结合图4说明。 In some embodiments, the function of a computationally intensive graphics processing unit 120 includes inner loop implemented deblocking filter hardware acceleration logic (inloop deblocking filterhardwareacceleration logic) 400, also referred to as an effect of the loop filter 400 or to block effect filter block 400, to be described in connection with FIG. 4 later. 另一密集运算功能的范例是判定各滤波器的边界强度(BS,boundary strength)。 Another example is compute-intensive function of each filter is determined boundary strength (BS, boundary strength).

[0029] 上述的结构因而使下列运作有弹性:在主处理器110上对解码器160执行一些通过对大方块(marcoblock)执行着色程序(shader program)的特殊功能(例如去方块效应或计算边界强度);或在图形处理单元120上执行大部分的解码器160,利用管线流通(pipelining)与平行化(parallelism)。 [0029] The above configuration thus the following operating resiliently: perform some special functions performed by the shader (shader program) for large block (marcoblock) (e.g., deblocking or decoder 160 calculates a boundary on a host processor 110 intensity); perform most or decoder on a graphics processing unit 120160, using the flow line (PIPELINING) of parallel (parallelism). 在一些解码器160在图形处理单元120上执行的实施例中,该去方块效应处理是该解码器160各态样间同步的线程(thread)。 In some embodiments of the decoder 160 is performed on the graphics processing unit 120, the process is a de-blocking effect between the threads of each aspect decoder 160 synchronized (thread).

[0030] 图I中省略数个对于解释图形处理单元120的视频加速特征并非必要且本领域技术人员熟知的已知元件。 [0030] The number of video graphics processing unit 120 to explain the known elements will be omitted acceleration features are not necessary and well known to those skilled in FIG. I.

[0031] 视频解码器 [0031] Video decoder

[0032] 图2是图I中该视频解码器160的方块图。 [0032] FIG. 2 is a view I of the video decoder block 160 of FIG. 在图2中说明的特殊实施例,解码器160施用ITU H. 264视频压缩规范。 Particular embodiment, the decoder illustrated in FIG 2160 administering specification ITU H. 264 video compression. 然而,本领域技术人员应当了解到图2的解码器160是视频解码器的初步表示,该视频解码器亦说明类似于H. 264的其它类型解码器的运作,例如SMPTE VC-I与MPEG-2规范。 However, those skilled in the art should be appreciated that FIG. 2 decoder is a video decoder 160 indicates the initial, the video decoder also similar to H. 264 illustrate the operation of other types of decoders, e.g. SMPTE VC-I as MPEG- 2 specification. 此外,尽管示为图形处理单元120的一部分,本领域技术人员亦应了解到在此揭露的部分解码器160亦可实现于图形处理单元之外,例如独立存在的逻辑电路,专用集成电路(ASIC)的一部分等。 Further, although shown as part of the graphics processing unit 120, those skilled in the art should also understand that the herein disclosed partial decoder 160 may be implemented in addition to a graphics processing unit such as logic circuits exist independently, an application specific integrated circuit (ASIC ) part and so on.

[0033] 输入的比特流205首先由熵解码器(entropy decoder) 210所处理。 [0033] The input bitstream 205 is first processed by the entropy decoder (entropy decoder) 210. 熵编码具有统计重复型(statistic redundancy)的优点:一些图样比其它图样更常出现,所以较常出现的就用较短的码代表。 Statistical entropy coding has the advantage of repeating type (statistic redundancy): Some patterns appear more frequently than the other pattern, so that it appears more often with shorter codes representative. 熵编码包含霍夫曼编码(HufTmancoding)与运行长度编码(run-1 ength encoding)。 Entropy encoding comprises Huffman encoding (HufTmancoding) and run-length encoding (run-1 ength encoding). 在熵编码之后,该数据由空间解码器(spatial decoder)215所处理,其具有下述优点,事实上,图形中邻近的像素通常相同或相关,所以只要对差异编码即可。 Following the entropy coding, the data is processed by the spatial decoder (spatial decoder) 215, which has the advantage, in fact, the adjacent pixels pattern generally the same or a related, so long as the differences can be encoded. 在此示范性实施例中,空间解码器215包含反向量化器(inverse quantizer) 220,与反向离散余弦转换(IDCT)功能230。 In this exemplary embodiment, spatial decoder 215 comprises an inverse quantizer (inverse quantizer) 220, and inverse discrete cosine transform (IDCT) function 230. IDCT功能230的输出可视为图形(235),由数像素组成。 IDCT function 230 may be considered as the output pattern (235), the number of pixels.

[0034] 图形235被处理为较小的子区块,称为大方块。 [0034] The pattern 235 is processed into smaller sub-blocks, called the large block. H. 264视频压缩规范使用16x16像素的大方块尺寸,而其它压缩规范可使用其它尺寸。 H. 264 video compression specification using a large block size of 16x16 pixels, and the other compression specifications may use other dimensions. 图形235内的大方块与先前解码图项的信息结合,称为画面间预测(inter prediction)处理,或与图形235的其它大方块的信息结合,称为画面内预测(intraprediction)处理。 Large block 235 in the graphics previously decoded information items combined, referred to as inter-frame prediction (inter prediction) processing, graphics, or other information with a large binding block 235, referred to as intra prediction (intraprediction) process. 该输入比特流205,被熵解码器205解码,而依各类型的图形施用画面间或画面内预测。 The input bitstream 205 is decoded by an entropy decoder 205, and is administered by various types of graphic pictures or between intra prediction.

[0035] 当施用画面间预测时,熵解码器210产生移动向量(motion vector) 245输出。 [0035] When administered inter-frame prediction, the entropy decoder 210 generates 245 an output motion vector (motion vector). 移动向量245被用来暂时的编码,其具有下述优点,事实上,通常在一连串的图形中许多像素会有相同的值。 Motion vector 245 is used to temporarily coding, which has the advantage, in fact, generally have the same value as a number of pixels in a series of drawing. 从一图形到另一图形的改变是编码为移动向量245。 From one pattern to another changing pattern 245 is encoded as a motion vector. 移动补偿方块250将一个或多个先前解码图形255结合移动向量245以产生预测图形(265)。 The motion compensation block 250 one or more previously decoded motion vector 245 binding pattern 255 to produce a prediction picture (265). 当施用画面间预测时,空间补偿方块270将得自邻近大方块的信息与图形235内的大方块结合以产生预测图形(275)。 When administered inter-frame prediction, spatial compensation block 270 from the block adjacent to a large block of information in a large binding pattern 235 to generate a prediction pattern (275).

[0036] 结合器280将图形235与模式选择器(mode selector) 285的输出相加。 [0036] Graphics 235 combined 280 with the mode selector (mode selector) output of 285 is added. 模式选择器285使用熵解码比特流以判定结合器280使用移动补偿方块250产生的预测图形(265)或使用空间补偿方块270所产生的预测图形(275)。 Mode selector 285 entropy decodes the bitstream to determine the prediction picture (265) in conjunction with the use of motion compensation block 280 or 250 generates a prediction block using spatial compensation pattern (275) produced 270.

[0037] 编码程序引起如在沿着大方块边缘的不连续以及沿着大方块内的子方块边缘不连续的产物(artifact)。 [0037] The encoding program causes the edge of the box along the large and discontinuous edge along the sub-blocks within the large block discontinuous product (artifact). 结果是在解码图框出现了“边缘”(edge),而原本没有。 The result is the emergence of "edge" (edge) at decoding the frame, but had not. 去方块效应滤波器290是施用于由结合器280输出的结合图形,以移去这些边缘产物。 Deblocking filter 290 is applied to the binding pattern output from combiner 280, the edges to remove the product. 储存由去方块效应滤波器产生的该解码图形295用来解码接下来的图形。 Storing the decoded pattern generated by the deblocking filter 295 is used to decode the next pattern.

[0038] 结合图I的讨论,部分解码器160在主处理器110上执行,而解码器160亦有由图形处理单元120提供视频加速指令的优点。 [0038] discussed in conjunction with FIG. I, the partial decoder 160 executes on the host processor 110, the decoder 160 also provides the advantage of a video graphics accelerator command processing unit 120. 尤其是,在一些实施例中,去方块效应滤波器290使用由图形处理单元120提供的一个或多个指令用来实现使用相对低运算成本的滤波。 In particular, in some embodiments, the deblocking filter 290 filtered by the use of one or more instructions to the graphics processing unit 120 to achieve a relatively low computation cost for use.

[0039] 去方块效应滤波器 [0039] The deblocking filter

[0040] 去方块效应滤波器290是多单元滤波器(multi-tap filter),其基于邻近像素值调整子方块边缘的像素值。 [0040] The deblocking filter 290 is a multi-filter unit (multi-tap filter), which pixel values ​​based on neighboring pixel values ​​of the sub-block edge adjustment. 可依照解码器160施行的压缩规范使用去方块效应滤波器290的不同实施例。 It may be performed according to a compression decoder 160 specification uses deblocking filter 290 of different embodiments. 各规范使用不同的滤波器参数,例如子区块的尺寸、由该滤波运作更新的像素数目、该滤波器施用的频率(例如每N列或每M行)。 Specification of each filter using different parameters, such as the sub-block size, the number of pixels by a filter update operation, the filter frequency of administration (e.g. N per column or per row M). 此外,各规范使用不同滤波器长度结构。 Further, each using a different filter lengths standardized structure. 本领域技术人员应了解多单元滤波器,在此不讨论特定单元的结构。 Those skilled in the art will appreciate multiple filter unit, do not discuss specific structural units. 由VC-I规范规定的去方块效应滤波器实施例将结合图4说明。 Deblocking filter specified by the VC-I Specification Example 4 described in conjunction with FIG. 首先,VC-I滤波器的子方块像素安排将结合图3说明。 First, the sub-pixel arrangement VC-I block filter 3 will be described in conjunction with FIG.

[0041] 图3显示两个邻近4x4子方块(310,320),定义为列町-!?4与行(:1-08。这两个子方块间的垂直边界330是沿着行C4与C5。该VC-I滤波器对每个4x4子方块运作。对于最左边的子方块,该VC-I滤波器检验在预定列(3)中的预定群像素(P1、P2、P3)。若该预定群像素达到特定条件,则更新相同预定列中另一像素P4。该条件是由该预定组中像素的计算与比较的特殊集合而定。本领域技术人员应了解到这些计算与比较亦可是为一组滤波单元(a set oftaps),而详细的计算与比较将稍后结合图5讨论。更新值亦基于对预定群组中像素所执行的运算。该VC-I滤波器以模拟方式处理最右边的子方块,判定像素6、7、8是否达到条件,若达到该条件则更新P5。换言之,该VC-I滤波器为一预定列(R3)的一群预定像素-边缘像素P4与P5-根据同一列中其它群预定像素的值计算数值,P4的值根据P1、P [0041] FIG. 3 shows two adjacent 4x4 sub-blocks (310, 320), is defined as a column-cho - !? row 4 (: 1-08 vertical boundary 330 of the two sub-blocks along a line C4 and C5 the VC-I filter operation for each 4x4 sub-block for the leftmost sub-block, the VC-I tested in a predetermined column of the predetermined filter group of pixels (P1, P2, P3) (3) in. If the the predetermined group of pixels reaches a certain condition, updating other pixel P4 in the predetermined column. this condition is calculated by the predetermined set of pixels in comparison with a particular set of depending skilled in the art should also understand that these calculations and the comparison is is a set of filter units (a set oftaps), and the detailed calculation and comparison discussed later in connection with FIG. 5. also update value calculation performed based on a predetermined group of pixels. in the VC-I analog filter processing rightmost sub-block, it is determined whether the pixel 6,7,8 conditions, if the update condition is reached P5 words, the VC-I filter is a predetermined row (R3) of a predetermined group of pixels - an edge pixel P4 and P5 - the value of the calculated values ​​in the same column other predetermined group of pixels, according to the value P4 P1, P 2、P3,而P5 的值根据P6、P7、P8。 2, P3, and P5 in accordance with the value of P6, P7, P8.

[0042] 该VC-I有条件的更新其余列的相同群预定像素,是根据为该预定列(R3)的预定群像素(边缘像素P4、P5)所计算的值。 [0042] Conditional updates the VC-I remaining the same group of predetermined pixel column, that is a value according to a predetermined row (R3) of a predetermined group of pixels (edge ​​pixels P4, P5) calculated. 如此一来,Rl中的P4基于Rl中的P1、P2、P3更新了,然而仅有R3中的P4、P5更新了。 Thus, the Rl Rl is based P4 P1, P2, P3 updated, but only R3 is P4, P5 updated. 同样地,Rl中的P5基于Rl中的P6、P7、P8更新了,然而仅有R3中的P4、P5更新了。 Similarly, the Rl Rl is based P5 P6, P7, P8 updated, but only R3 is P4, P5 updated. 第2列与第4列亦以类似方式处理。 Column 2 Column 4 processing drawn up in a similar manner.

[0043] 从另一方面来看,在预定第三列的像素的一些像素被滤波或更新了,当在第三列的其它像素达到条件时。 [0043] On the other hand, in the third column of the pixel of the predetermined number of pixels are filtered or updated, when other conditions reach the pixel of the third column. 该滤波器牵涉到对这些其它像素执行比较与计算。 This involves performing filter calculation and comparison of these other pixels. 若在第三列的其它像素未达到该条件时,在其余列相应的各像素是以模拟方式滤波,如上所述。 If the other conditions are not reached third column of pixels, the filtering in an analog manner corresponding to each pixel of the remaining columns, as described above. 在此揭露的去方块效应滤波器290的一些实施例使用一开创性技术,先对第三列滤波,接着再对其他列滤波。 The deblocking filter is disclosed using some embodiments of a pioneering technique 290, the third column of the first filter, followed by filtering of the other columns. 这些开创性的技术将结合图4、5、6A-6D,更详细的说明。 These pioneering techniques in conjunction with FIGS. 4,5,6A-6D, described in more detail.

[0044] 尽管图3说明一列列的处理垂直边缘,本领域技术人员应可了解同一图旋转90度后亦可说明一行行处理水平边缘。 [0044] Although Figure 3 illustrates the processing of the vertical edges of a column, the skilled artisan can be understood DESCRIPTION FIG rotated 90 degrees after the same process line by line, a horizontal edge. 本领域技术人员亦可了解到尽管VC-I使用四列中的第三列作为判定有条件更新其它列的预定列,在此揭露的原则亦可应用至使用其它预定列的实施例(例如第一列、第二列等),亦可应用至形成子方块列数目不同的其它实施例。 Those skilled in the art can understand that the third column in the four columns of predetermined updated conditions as the determination of the other columns using the VC-I although the principles disclosed herein can also be applied to other embodiments the predetermined column (e.g., first a second column, etc.), other embodiments of forming the sub-block to the number of different columns can also be applied. 同样地,本领域技术人员亦可了解到尽管VC-I检验邻近一组像素的值以设定欲更新像素的值,在此揭露的原则亦可应用至其它像素已被检验且其它像素已设定的实施例。 Likewise, those skilled in the art can understand that although the value of the VC-I test a set of pixels adjacent to the pixel of the set values ​​to be updated, the principles disclosed herein may also be applied to other pixels and other pixels have been tested have been provided examples given. 就一范例而言,可检验P2与P 3以判定P4的更新值。 To one example, the update can be tested to determine the value of P 3 and P4 P2. 另一范例,P3可根据P2与P4的值设定。 Another example, P3 can be set according to the value P2 and P4.

[0045] 图形处理单元120中的视频加速单元150为回路内去方块滤波器(IDF,inloopdeblockging filter),例如由VC-I规范的回路内去方块效应滤波器,实现硬件加速逻辑电路。 [0045] The graphics processing unit 120 is a video accelerator section 150 inner loop deblocking filter (IDF, inloopdeblockging filter), for example, by a deblocking filter within the VC-I standard circuits, hardware acceleration logic. 图形处理单元指令实现此硬件加速逻辑电路,将于后说明。 The graphics processing unit instructs the hardware acceleration logic to achieve this will be described later. 实现VC-I回路内去方块效应滤波器的已知方法是平行处理各列/行,因为相同像素计算是在子方块的各列/行执行。 Known methods to achieve deblocking filter circuit in the VC-I is processed in parallel each column / row, since the same sub-block of pixels is calculated for each column / row execution. 此已知方法每周期对两个邻近的4x4子方块滤波,但需要增进逻辑门(increased gatecount)执行。 This known method per cycle 4x4 sub-block of the two adjacent filters, but the need to improve the logic gate (increased gatecount) performed. 相对的,由VC-I回路内去方块效应滤波器硬件加速逻辑电路400所使用的开创性方法是先处理第三列/行像素,而若这些像素达到该所要求的条件,接着顺序处理剩下的那三列/行。 Opposed by the VC-I loop deblocking filter pioneering hardware acceleration logic 400 is the method used to process the third column / row of pixels, and if these pixels to achieve the required conditions, followed by sequential processing residual that three under / row. 此开创性方法比已知方法使用较少的逻辑门数,其复制各列/行的机能。 This pioneering method uses less logic gate count than known methods, the replication function of each column / row. VC-I回路内去方块效应滤波器加速逻辑电路400循序列处理每个周期对两个邻近的4x4子方块滤波。 Accelerate deblocking filter processing sequence logic circuit 400 in each cycle period of the two 4x4 sub-blocks adjacent to the filter circuit VC-I. 此较长的滤波时间与图形处理单元120的指令周期一致,其中该已知方法较快速的滤波,事实上比所需求的速度还快,造成逻辑门上的浪费。 This is consistent with the longer filter time graphic processing unit 120 of the instruction cycle, the known method wherein a more rapid filter, in fact faster than the speed demand, resulting in a waste of logic gates.

[0046] 图4是VC-I回路内去方块效应滤波器硬件加速逻辑电路400的硬件描述伪码的列表。 [0046] FIG. 4 is a deblocking filter hardware acceleration logic circuit hardware description pseudocode what the list 400 of the inner loop VC-I. 虽非使用实际硬件描述语言(HDL, hardware descriptionlanguage),例如Verilog与VHDL而使用伪码,本领域技术人员应对这些伪码相当熟悉。 Though not the actual hardware description language (HDL, hardware descriptionlanguage), such as Verilog and VHDL pseudocode used, those skilled in the art to address these familiar pseudocode. 这些人应可了解当以实际HDL描述时,这些程序码应可被编译并接着合成为构成部分视频加速单元150的数逻辑门配置。 These should be understood as describing actual HDL, the program code can be compiled to be synthesized and then arranged to constitute part of the number of logic gate 150 is a video accelerator unit. 这些人应当可了解到这些逻辑门可以各种技术实现,例如专用集成电路(ASIC)、可编程逻辑门阵列(PGA)或现场编程逻辑门阵列(FPGA)。 These shall be understood these logic gates may be of various technologies, such as application specific integrated circuits (ASIC), programmable gate array (PGA) or a field programmable gate array (FPGA).

[0047] 此程序码的410段是模块定义(module definition)。 [0047] This program code section 410 is a module definitions (module definition). VC_1回路内去方块效应滤波器硬件加速逻辑电路400有许多输入参数。 Deblocking filter hardware acceleration logic 400 has a number of input parameters in VC_1 circuit. 要进行滤波的子方块是由该方块参数(Blockparameter)所规范。 Sub-blocks to be filtered is regulated by a parameter of the block (Blockparameter). 若垂直参数(Vertical parameter)为真(True),则该加速逻辑电路400将方块参数视为4x8方块(参见图3),并执行垂直边缘滤波。 When the vertical parameter (Vertical parameter) is true (True), then the logic circuit 400 to accelerate block parameters considered 4x8 block (see FIG. 3), and performs vertical edge filtering. 若垂直参数为假(False),则该加速逻辑电路400将方块参数视为8x4方块(参见图3),并执行水平边缘滤波。 When the vertical parameter is false (False), then the logic circuit 400 to accelerate block parameters considered 8x4 block (see FIG. 3), and performing the horizontal edge filtering.

[0048] 程序码的区段420开始迭代循环(iteration loop),设定该循环参数变量的值。 [0048] The program code section 420 begins an iterative loop (iteration loop), the set value of the loop variable parameter. 第一次通过此循环时,循环参数设为3,故先处理第3行。 The first time through this loop that parameter is set to 3, so the first 3 of the processing line. 后续的循环迭代设定循环参数为 Subsequent iterations of the loop is set Cycling parameters

1、2与4。 1, 2 and 4. 利用这些参数,VC-I回路内去方块效应滤波器硬件加速逻辑电路400重复4次,每次处理8个像素,其中一行可为一水平列或一垂直行,每一列是由行加速逻辑电路500所处理(参见图5)。 Using these parameters, to the VC-I blocking effect within the loop filter hardware acceleration logic 400 is repeated four times, each time for 8 pixels, one line may be a vertical column or a horizontal row, each column is accelerated by the row logic circuit process 500 (see FIG. 5). 在一些实施例中,此行加速逻辑电路500是以HDL次模块实现,将结合图5说明。 In some embodiments, the line 500 is acceleration logic to achieve HDL submodule, will be described in conjunction with FIG.

[0049] 区段430测试垂直参数以判定执行垂直或水平边缘滤波。 [0049] The vertical section 430 a test performed to determine parameters vertical or horizontal edge filtering. 根据该结果,行阵列变量的8个元素是自该4x8输入方块的列或8x4输入方块的行初始化。 From this result, eight rows of the array elements is variable from the input block of 4x8 or 8x4 input blocks column initialized.

[0050] 区段440通过将循环参数与3做比较判定该第3行是否处理。 [0050] By circulating section 440 compares the parameter with 3 determines whether or not the third process line. 若循环参数为3, 另两个控制变量,ProcessingPixel 3与FILTER_0THER_3则设为真。 If the cycle parameter is 3, the other two control variables, ProcessingPixel 3 and FILTER_0THER_3 set to TRUE. 若循环参数不为3,将ProcessingPixel 3 设为真。 If the cycle parameter is not 3, the ProcessingPixel 3 is set to true.

[0051] 区段450举例说明另一HDL模块,VCl_IDC_Filter_Line,该滤波器施用目前的行。 [0051] HDL further illustrate the section module 450, VCl_IDC_Filter_Line, administration of the filter current row. (结合图3所述,该行滤波器基于邻近像素值更新边缘像素值。)提供至该子模块的参数包含该控制变量ProcessingPixel 3、FILTER_0THER_3与循环参数。 (In conjunction with FIG. 3, the line filter is updated based on neighboring pixel values ​​of the edge pixel values.) To the parameter of the control sub-module that contains the variable ProcessingPixel 3, FILTER_0THER_3 and cycle parameters. 在一实施例中,VC-I回路内去方块效应滤波器硬件加速逻辑电路400有额外输入参数,量化值,而此量化参数亦提供给该子模块。 In one embodiment, the inner loop to VC-I blocking effect filter hardware acceleration logic 400 extra input parameter, the quantization value, and this parameter is also supplied to the quantization module.

[0052] 在子模块处理该列之后,VC-I回路内去方块效应滤波器硬件加速逻辑电路400在区段420以循环参数更新值继续该迭代循环。 [0052] After the processing sub-module of the column, deblocking filter circuit 400 hardware acceleration logic 420 to update the parameter value of the cycle to continue in a section of the inner loop iteration loop VC-I. 依此法,对输入方块的第3行施用该滤波器,接着第I行、第2行、第4行。 So France, the third row of the input block is applied to the filter, then the second row I, row 2, line 4.

[0053] 图5是行加速逻辑电路500的硬件描述语言程序码的列表,其实现了上述的子模块。 [0053] FIG. 5 is a hardware acceleration logic row 500 listing description language program code, which implements the above sub-module. 程序码的区段510是模块定义。 Program code section 510 is a module definition. 行加速逻辑电路500有许多输入参数。 OK acceleration logic circuit 500 has a number of input parameters. 将进行滤波的行是定义为行输入参数。 The line filter is defined as a line input parameters. ProcessingPixel 3是输入参数,若该行为第3行或第3列则通过较高层逻辑电路将其设为真。 ProcessingPixel 3 is an input parameter, if the behavior of line 3 or by a higher layer 3 is a logic circuit which is set to true. 参数FILTER_0THER_3 —开始是由较高层逻辑电路设为真,而根据像素值由行加速逻辑电路500调整。 Parameters FILTER_0THER_3 - initially set to a relatively true logic level, the logic circuit 500 is adjusted by accelerating the row pixel value.

[0054] 区段520执行如VC-I所定的各种像素值运算。 [0054] The section 520 performs a predetermined VC-I various pixel value calculation. (因为该计算可以参考VC-I的规范理解,将不对这些运算作详细说明。)区段530测试由较高层VC-I回路内去方块效应滤波器硬件加速逻辑电路400所提供的ProcessingPixel 3参数。 (Since this calculation may be understood with reference to specifications of the VC-I, these operations will not be described in detail.) Section 530 by the deblocking filter test than the hardware level VC-I loop acceleration logic circuit ProcessingPixel 3 parameters 400 supplied . 若ProcessingPixel 3为真,则区段530将控制变量D0_FILTER初始化为默认值,真。 If ProcessingPixel 3 is true, then the control section 530 D0_FILTER variable initialized to default values, true. 在区段520中间的运算的各种结果是用来判定是否也要处理其它3行。 In the middle section of the results of various operations 520 for determining whether is should be treated the other three rows. 若该像素运算结果表示不处理其它3行,则将D0_FILTER设为假。 If the calculation result indicates the pixel does not deal with other row 3, then D0_FILTER to false.

[0055] 若ProcessingPixel 3为假,区段540使用输入参数FILTER_0THER_3 (由较高层VC-I回路内去方块效应滤波器硬件加速逻辑电路400所设定)以设定D0_FILTER的值。 [0055] When ProcessingPixel 3 is false, using the input parameter section 540 FILTER_0THER_3 (set by the higher floors VC-I loop deblocking filter hardware acceleration logic 400) to set D0_FILTER value. 若D0_FILTER为真,区段550测试该D0_FILTER变量并更新该行变量的该边缘像素P4、P5 (参见图3)。 D0_FILTER If true, the test section 550 and updates the variable row D0_FILTER the variable edge pixel P4, P5 (see FIG. 3).

[0056]区段 560 测试该ProcessingPixel 3 参数,并适当更新FILTER_0THER_3。 [0056] The testing section 560 ProcessingPixel 3 parameter, and appropriately updated FILTER_0THER_3. 该FILTER_0THER_3变量是用来传达此模块中不同范例的状态信息。 The FILTER_0THER_3 variable is used to communicate the status of this module different paradigm of information. 若ProcessingPixel 3为真,则区段550以D0_FILTER的值更新该FILTER_0THER_3参数。 If ProcessingPixel 3 is true, then the segment 550 to update the value D0_FILTER FILTER_0THER_3 parameter. 此技术使得用来说明此模块的较高层模块(即VCl_InloopFilter)提供由此例的VC_1_INL00PFILTER_LINE低层模块所更新的FILTER_0THER_3 值至另一例的VC_1_INL00PFILTER_LINE。 This technique is used so that the description of this module higher layer module (i.e. VCl_InloopFilter) thereby providing VC_1_INL00PFILTER_LINE lower layer module FILTER_0THER_3 embodiment of the updated value to another embodiment of the VC_1_INL00PFILTER_LINE.

[0057] 本领域技术人员应了解到图5的伪码可以各种方式合成以产生实现行加速逻辑电路500的逻辑门布置。 [0057] Those skilled in the art should be understood that the pseudo code of FIG. 5 can be achieved in various ways, to generate a row acceleration logic gate logic circuit 500 is arranged. 其中一种布置是在图6A中说明,他们一起构成行加速逻辑电路500的方块图。 Wherein an arrangement is illustrated in FIG. 6A, and together they constitute a row acceleration logic circuit block 500 in FIG. 本领域技术人员应当对VC-I回路内去方块效应滤波器算法及逻辑电路结构感到熟悉。 Those skilled in the art should deblocking filter algorithm and logic configuration circuit for the VC-I are familiar. 因此,图6A的元件将不详述。 Thus, FIG. 6A elements will not be described in detail. 而将选择详述行加速逻辑电路500的特征。 Detailed Description of the row selection circuit features acceleration logic 500.

[0058] 本领域技术人员应了解到,VC-I回路内去方块效应滤波器所牵涉到的运算包含下列,其中P1-P8是指像素在被处理的列/行中的位置。 [0058] Those skilled in the art should understand that the deblocking filter circuit VC-I are involved in the operation comprises the following, which refers to the position P1-P8 pixel column / row to be processed in.

[0059] AO = (2*(P3-P6)_5*(P4-P5) +4) >> 3 [0059] AO = (2 * (P3-P6) _5 * (P4-P5) +4) >> 3

[0060] Al = (2*(Ρ1-Ρ4)-5*(Ρ2-Ρ3)+4) >>3 [0060] Al = (2 * (Ρ1-Ρ4) -5 * (Ρ2-Ρ3) +4) >> 3

[0061] A2 = (2*(P5-P8)-5*(P6-P7)+4) >>3 [0061] A2 = (2 * (P5-P8) -5 * (P6-P7) +4) >> 3

[0062] clip = (P4-P5) /2 [0062] clip = (P4-P5) / 2

[0063] 前3个运算中的每一个牵涉到3个减法、2个乘法、I个加法与I个右移。 [0063] 3 before computing each of the three involved subtraction, two multiplications, additions and I I a right. 图6A中的行加速逻辑电路500的一部分使用共享逻辑电路循序计算A0、A1、A2,而非为了A0、A1、A2使用特定独立逻辑电路方块。 FIG. 6A row acceleration logic circuit portion 500 of the sequential logic circuit using a shared computing A0, A1, A2, and not for A0, A1, A2 using a specific logic circuit block independently. 通过避免逻辑电路方块重复,利用多工器循序处理各输入,减少了逻辑门及/或功率消耗。 By avoiding duplicate logic circuit blocks by sequential processing of each multiplexer input, logic gates is reduced, and / or power consumption.

[0064] 多工器605、610与620是用来从像素暂存器P_8在不同时序周期选择不同的输入,而这些输入是提供给各共享逻辑电路方块。 [0064] The multiplexers 605, 610 and 620 are used to select different from the pixel of the input registers P_8 different timing period, which are supplied to each of the shared input logic circuit block. 逻辑电路方块625与630各执行减法。 Logic blocks 625 and 630 of each subtraction. 逻辑电路方块635通过执行左移I位实现乘以2。 A logic circuit block 635 implemented by a left shift by 2 bit I. 乘以是由左移I位所实行,后面接加法器645。 It is carried out by multiplying the I-bit left shift, followed by the adder 645. 加法器650将左移器635的输出、常数4与645输出的负数加在一起。 The adder 650 outputs the left 635, a constant negative 4 together with the output 645. 最后,逻辑电路方块655执行右移3位。 Finally, block 655 performs the logic circuit 3 to the right.

[0065] 在第I时序周期,输入T = I是提供至各多工器605、610与615,而计算Al的值并存在暂存器660。 [0065] In the timing period I, T = I input is provided to each of multiplexers 605, 610 and 615, and the calculated value exists register 660 and Al. 在第2时序周期,输入T = 2是提供至各多工器605、610与615,而计算A2的值并存在暂存器665。 In the second timing period, T = 2 to provide an input to each multiplexer 605, 610 and 615, and calculates the present value A2 register 665. 在第3时序周期,输入T = 3是提供至各多工器605、610与615,而计算AO的值并存在暂存器670。 In the third timing period, T = 3 to provide input to the multiplexer 605, 610 and 615 each, and the present calculated value register 670 and AO. 存在暂存器660、665、670的值Al、A2、A3将被图6B的部分行加速逻辑电路500所使用,将于后说明。 Present value register 660,665,670 Al, A2, A3 will be part of the line of Figure 6B acceleration logic circuit 500 is used, it will be described later. P4暂存器¢71)的输出与P5暂存器(673)的输出将被图6C的部分行加速逻辑电路500所使用,将于后说明。 ¢ register 71 is P4) the output of the P5 register (673) of the output will be part of the line in FIG. 6C acceleration logic circuit 500 is used, it will be described later.

[0066] 本领域技术人员亦应了解在VC-I回路内去方块效应滤波器所牵涉到后叙的额外运算: [0066] Those skilled in the art should also understand in VC-I loop deblocking filter involves an additional operation after the classification:

[0067] D = 5* ((sign (AO) *A3)-A0)/8 [0067] D = 5 * ((sign (AO) * A3) -A0) / 8

[0068] if (CLIP > 0) [0068] if (CLIP> 0)

[0069] { [0069] {

[0070] if (D < 0) [0070] if (D <0)

[0071] D = O [0071] D = O

[0072] if (D > CLIP) [0072] if (D> CLIP)

[0073] D = CLIP [0073] D = CLIP

[0074] } [0074]}

[0075] else [0075] else

[0076] {[0077] if (D > 0) [0076] {[0077] if (D> 0)

[0078] D = O [0078] D = O

[0079] if (D < CLIP) [0079] if (D <CLIP)

[0080] D = CLIP [0080] D = CLIP

[0081] } [0081]}

[0082] 图6B的部分行加速逻辑电路500从图6A的部分行加速逻辑电路500接收输入,并计算D (675)。 Part of the line [0082] FIG. 6B acceleration acceleration logic circuit from the logic circuit portion receiving the input line of FIG. 6A 500 500, and calculates D (675). 再次参照图6A,CLIP (677)是如下产生:像素P4与P5由逻辑电路方块679相减,该结果由逻辑电路方块680右移(整数除以2)以产生CLIP 677。 Referring again to FIG. 6A, CLIP (677) is generated as follows: a pixel P4 and P5 by a logic circuit blocks 679 subtracted by the result of the logic circuit block 680 to the right (2 divided by an integer) to produce CLIP 677. 回到图6B,Al可在第一周期自暂存器660取得,A2可在第二周期自暂存器665取得,AO可在第三周期自暂存器670取得。 Returning to Figure 6B, Al can be made from the register 660 in the first cycle, A2 from the register 665 can be obtained in the second cycle, AO can be obtained from the register 670 in the third period. 因而,在第四周期,第6图的部分行加速逻辑电路500根据上述的方程式计算D (675)。 Accordingly, in a fourth period, the first line of FIG. 6 of the acceleration logic 500 calculates D (675) according to the above equation.

[0083] 行加速逻辑电路500利用¢75)以更新P4、P5的像素位置。 [0083] The acceleration logic circuit 500 using row ¢ 75) to update the pixel position P4, P5's. 尤其是,P4 = P4-D而P5 = P5+D。 In particular, P4 = P4-D and P5 = P5 + D. 尽管图6A、图6B先前结合单一列/行(例如单一组像素位置P0-P8)说明,一子区块第3列/行的运算会影响该子区块其它3列/行的行为。 Although FIGS. 6A, 6B previously bind a single column / row (e.g., a single set of pixel positions P0-P8) described, a calculation of three sub-block / line affects the behavior of the other three sub-blocks / row. 行加速逻辑电路500利用一开创性方法实现此行为。 OK acceleration logic circuit 500 implements this behavior by using a pioneering method. 当独立滤波运算从最前面开始-平行地-完成,结合图6A、图6B的说明,示于图6C、图6D的部分行加速逻辑电路500有条件的选择要更新的位置。 When separate filter operation starting from the foremost - in parallel - to complete, in conjunction with FIGS. 6A, 6B is an explanatory view, shown in 6C, the line portion of FIG. 6D acceleration conditional logic circuit 500 select the location to be updated. 换言之,VC-I回路内去方块效应滤波器硬件加速逻辑电路400判定是原本的值被写回或新的值被写回。 In other words, deblocking filter within the VC-I hardware acceleration logic circuit 400 determines a value to be written back to the original or new value is written back. 相对地,已知方法,VC-I回路内去方块效应滤波器使用循环,所以独立滤波运算有条件地执行。 In contrast, the known methods, to the VC-I blocking effect within the loop filter using a loop, the filter operation performed conditionally independent.

[0084] 如先前说明的,图4解释行加速逻辑电路500的伪码在循环内如此运作:在重复区段420中出现了示例区段(instantiation section)450o此外行加速逻辑电路500的示例使用2个参数,ProcessingPixel3与FILTER_0THER_3。 [0084] As previously described, Figure 4 illustrates the pseudocode line acceleration logic circuit 500 thus operates within the loop: Example Example section appears (instantiation section) in sections 420 450o repeated addition line logic circuit 500 is used to accelerate two parameters, ProcessingPixel3 and FILTER_0THER_3. 用行加速逻辑电路500的这些参数如下执行像素P4、P5有条件的更新。 By a row of these parameters acceleration logic circuit 500 follows the updated pixel P4, P5 conditional execution. 参见图6C,暂存器P4写入减法器681的结果,其中减法器681有输入为P4(671),为O或D(675),依D0_FILTER(683)的值而定。 Referring to Figure 6C, the write register P4 result subtracter 681, the subtracter 681 wherein P4 is input (671), is O or D (675), according to the value D0_FILTER (683) may be. 同样地,暂存器P5写入加法器685的结果,其中加法器685有输入为P5 (673),为O或D (675),依D0_FILTER(683)的值而定。 Similarly, write register P5 adder 685 results, wherein an input of the adder 685 P5 (673), is O or D (675), according to the value D0_FILTER (683) may be. 因而,P4的更新值为原本的P4值(若D0_FILTER为假),或P4-D。 Thus, the updated value P4 P4 of the original value (if D0_FILTER false), or P4-D. 同样地,P5的更新值为原本的P5值(若D0_FILTER为假),或P5+D。 Similarly, the updated value of P5 P5 original value (if D0_FILTER false), or P5 + D.

[0085] 本领域技术人员应当了解到,当处理一子方块第3列时,以P4-D更新P4的条件为: [0085] Those skilled in the art should understand that, when processing a sub-block of 3, under the conditions P4 P4-D is updated:

[0086] ((ABS (AO) < PQUANT) OR (A3 < ABS (AO)) OR (CLIP ! = O) [0086] ((ABS (AO) <PQUANT) OR (A3 <ABS (AO)) OR (CLIP! = O)

[0087] D0_FILTER 683是由图6D中检验这些条件的部分行加速逻辑电路500所计算。 [0087] D0_FILTER 683 is examined by FIG. 6D these conditions part of the line logic circuit 500 calculates the acceleration. 多工器687提供一输入至OR门697,若ABS (AO) < PQUANT则选择真输出,其它则为假。 Multiplexer 687 provides an input to the OR gate 697, if the ABS (AO) <PQUANT true output is selected, and false other. 多工器689提供另一输入至OR门697,若A3 < ABS (AO)则选择真输出,其它则为假。 Multiplexer 689 provides the other input to the OR gate 697, if A3 <ABS (AO) is selected true output, and false other. 多工器691提供另一输入至OR门697,若CLIP ! = O则选择真输出,其它则为假。 Multiplexer 691 provides the other input to the OR gate 697, if CLIP! = O true output is selected, and false other.

[0088] D0_FILTER 683是由多工器693所提供,其利用控制输入Processing_Pixel_3 (695)以选择输出OR门697的输出或输入信号FILTER_0THER_3 (699)。 [0088] D0_FILTER 683 is provided by a multiplexer 693, which utilizes a control input Processing_Pixel_3 (695) to select the output or input signals FILTER_0THER_3 (699) output of the OR gate 697. 输入Processing_Pixel_3 (695)与FILTER_0THER_3 (699)先前结合图4与举例说明行加速逻辑电路500的较高层VC-I回路内去方块效应滤波器硬件加速逻辑电路400的伪码已说明过了。 Input Processing_Pixel_3 (695) and FILTER_0THER_3 (699) previously described in conjunction with FIG. 4 illustrates a logic circuit row acceleration higher layer 500 deblocking filter hardware acceleration logic 400 of the already described pseudo-code inner loop VC-I. 回到图4,当处理第3行/列时(第I圈),Processing_Pixel_3(695)设为真,其它则为假。 Returning to Figure 4, when the processing of the third row / column (ring I), Processing_Pixel_3 (695) is set to true and false other. 基于关于PQUANT、ABS (AO) ,CLIP的条件,记录中间变量DO_FILTER,不论P4/P5是否更新。 Based on PQUANT, ABS (AO), CLIP conditions, recording intermediate variables DO_FILTER, regardless of the P4 / P5 whether to update. 最后FILTER_OTHER_3(699)的值是设自该中间变数DO_FILTER。 Value of the last FILTER_OTHER_3 (699) is provided from the intermediate variable DO_FILTER. 图6C、图6的逻辑电路部分的行加速逻辑电路500的结果为,每4个周期,在4邻近列/行的P4、P5的像素位置设为滤波后的值(根据A0-A3、PQUANT、CLIP等变量)或再次写入其原本的值。 6C, the line of the logic circuit of FIG. 6 the result acceleration logic circuit 500 is, for every four cycles, after the set of filtered pixel position P4 4 adjacent columns / rows, P5 value (according A0-A3, PQUANT , CLIP and other variables) or write to its original value again.

[0089] 该VC-I去方块效应加速单元400开创性地采用平行与循序的结合,如前所述。 [0089] The VC-I accelerate deblocking unit 400 using a combination of parallel and pioneering sequential, as described above. 平行处理提供较快速的执行并减少延迟。 Provide parallel processing is performed more quickly and reduce the delay. 尽管平行化增加了逻辑门数,但增加量被前述的循序处理所抵消。 Although the parallel number of logic gates increases, but the amount of increase is offset by sequential processing. 没有使用前述循序处理的已知方法徒增逻辑门数。 No sequential processing using the aforementioned known method inviting the number of gates.

[0090] 图形处理单元120的一些实施例包含用于H. 264去方块效应的硬件加速单元,而此去方块效应功能是通过图形处理单元指令以供使用。 [0090] Some embodiments of a graphics processing unit 120 comprises a hardware block for H. 264 to effect acceleration unit, and this function is obtained by deblocking a graphics processing unit instructions for use. 图形处理单元120将结合图8详细说明,并加强说明提供H. 264去方块效应加速功能的图形处理单元指令特殊选择。 The graphics processing unit 120 will be described in detail in conjunction with FIG. 8, and to provide instructions to strengthen H. 264 deblocking acceleration a graphics processing unit to select a special instruction.

[0091] 图形处理器 [0091] Graphics processor

[0092] 多重去方块效应指令的原理 [0092] Multiple deblocking directive principles

[0093] 图形处理单元120的指令集包含在软件里执行的部分解码器160可用来加速去方块效应滤波器。 [0093] The graphics processing unit 120 comprises a set of instructions executed in the software part of decoder 160 may be used to accelerate the deblocking filter. 在此说明一开创性技术提供不只一个的多重图形处理单元指令以加速特定去方块效应滤波器。 In a pioneering technique described herein provide more than one of the multiple graphics processing unit specific instructions to accelerate deblocking filter. 回路内去方块效应滤波器290原本就是循序的,因而特定滤波器必须以一定次序对像素滤波(例如H. 264规定从左到右接着从上到下)。 Deblocking filters 290 originally sequential circuit, therefore the filter must be specific to the pixel in a certain order filter (e.g., top to bottom and then from left to right a predetermined H. 264). 因而,先前滤过的或更新过的像素在滤后面像素时被拿来作为输入。 Thus, the previously filtered pixel or updated later when the filter is being used as an input pixel. 主处理器处理储存在已知存储器的像素值,这使得像素一个接一个读取、写入。 Host processor the pixel values ​​stored in memory are known, which makes a pixel connected to a read, write. 然而,这循序的本质当回路内去方块效应滤波器290使用图形处理单元加速部分滤波处理时无法适当配合。 However, this sequential nature of the loop when the deblocking filter 290 using the graphics processing unit can not be properly mated acceleration filtering process section. 已知图形处理单元将像素储存在纹理快取(texture cache),而该图形处理单元管线设计不遵从一个接一个(back-to-back)读取、写入纹理快取。 The graphics processing unit known in the pixel stored in the texture cache (texture cache), and the graphics processing unit pipeline design does not comply with one after another (back-to-back) to read, write texture cache.

[0094] 在此揭露图形处理单元120的一些实施例提供多重图形处理单元指令,其可一起用来加速特定去方块效应滤波器。 [0094] In some embodiments disclosed herein graphics processing unit 120 is a graphics processing unit to provide multiple instructions that may be used together to accelerate the specific deblocking filter. 其中一些指令把纹理快取当像素数据源,而一些指令使用图形处理单元执行单元作为数据源。 Some of the instruction cache when the pixel data of the source texture, while the use of a graphics processing unit instruction execution unit as the data source. 回路内去方块效应滤波器290适当的结合使用这些不同的图形处理单元指令以达成一个接一个读取、写入像素。 These different instructions to the graphics processing unit to reach one by one read, write pixel inner loop deblocking filter 290 suitable combination. 接下来概要说明流经图形处理单元120的数据,再接着解释由图形处理单元120提供的去方块效应加速指令,与回路内去方块效应滤波器290运用这些指令。 Next, a summary of the data flow through the graphics processing unit 120, and then goes on to explain the deblocking processing unit 120 provides graphics acceleration command, and the inner loop deblocking filter 290 using these instructions.

[0095] 图形处理单元流 [0095] The stream graphics processing unit

[0096] 图7是图形处理单元120数据流的图,其中指令流是由图7左边的箭头,而图像或图形流是由右边的箭头表示。 [0096] FIG. 7 is a diagram illustrating a graphics processing unit 120 of the data stream, wherein the instruction stream is a graphics stream or the image is represented by an arrow in FIG. 7 by the arrow to the right of left. 图7省略了数个本领域技术人员已知的元件,这些对解释图形处理单元120的回路内去方块效应特征非必要。 FIG 7 is omitted several known to those skilled in the element, wherein the blocking effect on the interpretation circuit to the graphics processing unit 120 is unnecessary. 指令流处理器710从系统总线(未示)接收指令720,并解码该指令,产生指令数据730,例如顶点数据。 Instruction stream processor 710 (not shown) receives the instruction 720, and decodes the instruction, generates a command data 730, e.g. vertex data from the system bus. 图形处理单元120支持已知图形处理指令,以及加速视频编码及/或解码的指令。 The graphics processing unit 120 supports known in graphics processing instructions, and to accelerate video encoding and / or decoding of instructions.

[0097] 已知图形处理指令牵涉到如顶点着色(vertex shading)、几何着色(geometryshading)、像素着色(pixel shading)等难题。 [0097] The graphics processing instructions involves known as vertex shader (vertex shading), the geometry shader (geometryshading), the colored pixels (pixel shading) and other problems. 因此,指令数据730是施用于着色器执行单元(shader execution units)的池(pool) 740。 Thus, command data 730 is applied to the pool (pool) 740 for shader execution unit (shader execution units) of the. 着色执行单元必要使用纹理滤波单元(TFU, texture filter unit) 750以施加纹理至像素。 Coloring execution unit is necessary to use a texture filter unit (TFU, texture filter unit) 750 to be applied to the pixel texture. 纹理数据是快取自纹理快取760,其是在主存储器(未示)后面。 Texture data from the texture cache 760 is fast, which is a (not shown) in the main memory later.

[0098] 一些指令送给视频加速器150,其运作将于后说明。 [0098] some instructions to the video accelerator 150, its operation will be explained later. 产生的数据接着由后包装器(post-packer 770)处理,其压缩该数据。 Data is then generated (post-packer 770) processed by the wrapper, which compresses the data. 在后处理(post-processing)之后,由视频加速单元所产生的数据是提供给执行单元池(execution unit pool) 740。 After the post-treatment (post-processing), the data generated by the video accelerator unit is provided to execution unit cell (execution unit pool) 740.

[0099] 视频编码/解码加速指令的执行,例如前述的去方块效应滤波指令,在许多方面与前述的已知图形指令不同。 [0099] The video encoding / decoding acceleration instruction execution, such as the aforementioned deblocking filtering instruction, different in many ways known to the graphics command. 首先,视频加速指令是由视频加速单元150执行,而非着色器执行单元。 First, the video acceleration instruction is executed by the video acceleration unit 150, instead of the shader execution unit. 其次,视频加速指令不使用其纹理数据。 Secondly, it does not use the video acceleration command texture data.

[0100] 然而,视频加速指令所使用的图像数据与图形指令所使用的纹理数据均为2维阵列。 [0100] However, the image texture data and graphics data used by the instruction instructs the video accelerator to be used are 2-dimensional array. 图形处理单元120同样利用此优点,使用纹理滤波单元750下载给视频加速单元150的图像数据,因而使纹理快取760快取一些由视频加速单元150运作的图像数据。 Using the graphics processing unit 120 of this same advantage, texture filtering unit 750 downloads the image data to the video acceleration unit 150, thereby making the texture cache 760 caches some of the image data by the operation unit 150 of the video acceleration. 因此,示于图7,视频加速单元150是位于纹理滤波单元750与后包装器770之间。 Thus, it is shown in FIG. 7, the video unit 150 is accelerated texture filter unit 750 is located between the rear packing 770.

[0101] 纹理滤波单元750检验从指令720撷取的指令数据730。 [0101] 750 texture filtering unit 720 from the instruction fetch test instruction data 730. 指令数据730还提供TFU750纹理快取760内想要的图像数据的坐标。 730 also provides command data within the coordinates of the desired texture cache 760 TFU750 image data. 在一实施例中,这些坐标标明为U、V对,本领域技术人员应对此熟悉。 In one embodiment, these coordinates are indicated as U, V pair, this skilled in the art should be familiar. 当指令720是视频加速指令时,所撷取的指令数据还命令纹理滤波单元750略过纹理滤波单元750内的纹理滤波器(未示)。 When the instruction 720 is a video acceleration instruction, the instruction data fetched texture filtering unit 750 also commands the texture filtering means skip the texture filter (not shown) within 750.

[0102] 依此法,纹理滤波单元750是受操纵为视频加速指令去下载图像数据给视频加速单元150。 [0102] according to the Act, the texture filter unit 750 is manipulated acceleration command to download the video image data to the video acceleration unit 150. 视频加速单元150从数据路径上的纹理滤波单元750接收图像数据,与命令路径上的命令数据730,并根据命令数据730对该图像数据执行运作。 Video acceleration unit 150 receives the image data from the texture filtering on the data path unit 750 with commands on the command path data 730, and execute the operation according to the command data 730 to the image data. 由视频加速单元150所输出图像数据是回馈给执行单元池740,在由后包装器770处理之后。 The image data outputted by the video accelerator unit 150 is fed back to the execution unit cell 740, after being handled by the packing 770.

[0103] 去方块效应指令 [0103] deblocking instruction

[0104] 在此叙述的图形处理单元120的实施例,提供VC-I去方块效应滤波器与H. 264去方块效应滤波器硬件加速。 [0104] The graphics processing unit 120 described herein, there is provided VC-I deblocking filter with the H. 264 deblocking filter hardware acceleration. VC-I去方块效应滤波器是由图形处理单元指令(“IDF_VC-1")加速,而H. 264去方块效应滤波器由三个图形处理单元指令(“IDF_H264_0”、“IDF_H264_1”、“IDF_H264_2”)加速。 VC-I deblocking filter is accelerated by a graphics processing unit instruction ( "IDF_VC-1"), the H. 264 deblocking filter consists of three instruction graphics processing unit ( "IDF_H264_0", "IDF_H264_1", "IDF_H264_2 ")accelerate.

[0105] 如先前说明的,各图形处理单元指令是解码且分析(parsed)为指令数据730,其可视为各指令的特定参数集,示于第I表。 [0105] As explained previously, each of the graphics processing unit is decoded and the instruction analysis (of Parsed) command data 730, which may be regarded as a particular set of parameters of each instruction, are shown in Table I. 10?_!1264_1指令共享一些共享参数,而其它的为各指令独有的。 10? _! 1264_1 instructions share some shared parameters, while others are unique to each instruction. 本领域技术人员应了解到这些参数可以使用各种操作码(opcode)与指令格式编码,所以这些议题将不在此讨论。 Those skilled in the art should be understood that these parameters may be used various operation code (opcode) format encoding instruction, these issues will not be discussed herein.

[0106] 第I表:IDF_H264指令的参数 IDF_H264 command parameters: [0106] Table I,

[0107] [0107]

Figure CN101072351BD00141

[0108] [0108]

Figure CN101072351BD00151

[0109] 结合使用许多输入参数以判定由纹理滤波单元75O所撷取的4X4方块地址。 Many use the input parameters to determine a 4X4 block address by the texture captured by the filtering unit 75O [0109] binding. BaseAddress参数指出在纹理快取中该纹理数据的起点。 BaseAddress parameter indicates the starting point of the texture data in the texture cache. 将此区域内左上方块坐标给BaseAddress参数。 This region to the upper left coordinates of the block BaseAddress parameters. PictureHeight与PictureWidth输入参数是用来判断该方块的范围,即左下方坐标。 PictureHeight and PictureWidth input parameter is used to determine the range block, i.e., the lower left coordinates. 最后,视频图形可为渐进式扫描(progessive)或隔行扫描(interlace)。 Finally, the video image may be a progressive scan (Progessive) or interlace (interlace). 若为隔行扫描,其是由两个方向组成(上方与下方)。 If it is interlaced, which is composed of two directions (upward and downward). 纹理滤波单元750使用FieldFlag与TopFieldFlag以适当处理隔行扫描图像。 Texture filtering unit 750 using a suitable process TopFieldFlag FieldFlag with interlaced images.

[0110] 去方块效应8x4x8位输出是提供于目标暂存器,且亦写回执行单元池740。 [0110] deblocking 8x4x8-bit output is provided to the target register, Qieyi write-back execution unit cell 740. 将去方块效应输出写回执行单元池740是“位置修改(modify in place) ”运作,在某些解码器的实现中是必要的,例如H. 264其中方块中的像素值,右边与下方,是依先前的结果所计算。 The deblocking writeback output execution unit cell 740 is a "position correction (modify in place)" operation is necessary to achieve some of the decoder, for example, H. 264 wherein a pixel value in a block, and the lower right side, It is calculated in accordance with previous results. 然而VC-I解码器不像H. 264有此限制关系。 VC-I but not H. 264 decoder has this limitation relationship. 在VC-I中,对每个8x8边界(先垂直再水平)滤波。 In the VC-I, the (first vertical and then horizontal) filtering on each 8x8 boundary. 所有的垂直边缘可以因而实质上平行地执行,4x4边缘稍后滤波。 Thus all vertical edges may be substantially performed in parallel, 4x4 edge filtering later. 可以利用平行化因为仅有两个像素(一个边缘一个)被更新,而这些像素不用来计算其它边缘。 Because of the parallel may be utilized only two pixels (one edge a) is updated, and the pixels without the other edges is calculated. 既然去方块效应数据是写回执行单元池740而非纹理快取760,提供了不同的IDF_H264_x指令,这子方块从不同位置被撷取。 Since deblocking data is written back to the execution unit pool texture cache 740 rather than 760, offers different IDF_H264_x instruction, this sub-blocks are retrieved from different locations. 这可在第I表中看到,在BlockAddress的叙述中,Data Block I与Data Block 2参数。 This can be seen in Table I, in the described BlockAddress, Data Block I and Data Block 2 parameters. IDF_H264_0指令从纹理快取760撷取整个8x4x8位子方块。 IDF_H264_0 instruction fetch entire 8x4x8 block from the texture cache 760 seats. IDF_H264_l指令从纹理快取760撷取半个子方块并从执行单元池740撷取半个。 IDF_H264_l instruction fetching half sub-block from the texture cache 760 and the execution unit to retrieve from the half cell 740. [0111] 随解码器160而变的IDF_H264_x指令的功用将结合图8详述。 [0111] IDF_H264_x function instructions with varying decoder 160 in conjunction with FIG. 8 described in detail. 接下来叙述在供应像素数据给视频加速单元150前,纹理滤波单元750与执行单元池740转换所撷取的像素数据的处理。 Next, the processing described in the data supplied to the pixel 150 before the pixel data of the texture filter unit 750 and execution unit 740 converts the cell captured video acceleration unit.

[0112] 图像数据的转换 [0112] conversion of image data

[0113] 上述的指令参数,提供欲从纹理快取760或从执行单元池740解取的子方块地址的坐标给纹理滤波单元750。 [0113] The instruction parameters to be provided from the texture cache block 760 or sub-address from the coordinates execution unit cell 740 taken to the solution texture filter unit 750. 图像数据包含亮度(Y)与彩度(Cb,Cr)平面。 The image data includes a luminance (Y) and chroma (Cb, Cr) plane. YC旗标输入参数定义要处理Y平面或是CbCr平面。 YC flags define the process input parameter CbCr plane or Y plane.

[0114] 当处理亮度(Y)数据时,如YC旗标参数所标示的,纹理滤波单元750撷取该子方块并提供该128位作为VC-I回路内去方块效应滤波器硬件加速逻辑电路400的输入(例如图4的VC-I加速器范例的方块输入参数)。 [0114] When the processing luminance (Y) data, as indicated YC flags parameter, the texture filtering means to retrieve the sub-blocks 750 and 128 provide the VC-I as a deblocking filter within the hardware acceleration logic circuit input 400 (e.g. VC-I accelerator example 4 a block diagram of input parameters). 所产生的数据是写入目标暂存器作为4组-暂存器(register quad,SP,DST、DST+1、DST+2、DST+3)。 The resulting data is written to the destination register as a group 4 - registers (register quad, SP, DST, DST + 1, DST + 2, DST + 3).

[0115] 当处理彩度数据时,如YC旗标参数所标示的,Cb与Cr方块将由VC-I回路内去方块效应滤波器硬件加速逻辑电路400连续地处理。 [0115] When the processed chroma data, such as parameters YC flags indicated, deblocking filter hardware acceleration logic circuit 400 continuously processed within a block Cb and Cr by the VC-I. 所产生的数据是写入纹理快取760。 The resulting data is written to texture cache 760. 在一些实施例中,此写入操作在各周期中发生,每个周期写入256位。 This write operation occurs in each cycle, in some embodiments, each of the write cycle 256.

[0116] 一些视频加速单元实施例使用隔行扫描CbCr平面,各存为一半宽度与一半长度。 [0116] Some video acceleration unit of the embodiment interlace CbCr plane, each stored with half the length half of the width. 在这些实施例中,纹理滤波单元750为视频加速单元150将CbCr子方块数据解隔行扫描至用来沟通纹理滤波单元750与视频加速单元150的缓冲器。 In these examples, the texture filter unit 750 is a video accelerator section 150 CbCr data sub-block used to communicate de-interlacing to the buffer unit 750 and a texture filter unit 150 of the video acceleration. 尤其是,纹理滤波单元750将2个4x4Cb方块写入该缓冲器,接着将2个4x4Cr方块写入该缓冲器。 In particular, the texture filtering unit 750 writes the block 2 4x4Cb buffer, then written to the two 4x4Cr block buffer. 8x4Cb方块首先由VC-I回路内去方块效应滤波器硬件加速逻辑电路400处理,所产生的数据写入纹理快取760。 8x4Cb first deblocking filter block hardware acceleration logic 400 is processed by the VC-I circuit, writes the generated texture data cache 760. 接着,8x4Cr方块由VC-I回路内去方块效应滤波器硬件加速逻辑电路400处理,所产生的数据写入纹理快取760。 Next, 8x4Cr block deblocking filter hardware acceleration logic 400 is processed by the VC-I circuit, writes the generated texture data cache 760. 视频加速单元150使用CbCr旗标参数以管理此循序处理。 CbCr video acceleration unit 150 flag parameter to manage this sequential processing.

[0117] 软件解码器使用去方块效应指令 [0117] software decoder uses deblocking instruction

[0118] 结合先前图I的说明,解码器160在主处理器110上执行但亦利用图形处理单元120所提供的视频加速指令。 [0118] FIG I in conjunction with the previously described video decoder 160 executes on the host processor 110 but also using the graphics processing unit 120 provides acceleration command. 尤其是H. 264回路内去方块效应滤波器290的实施例使用特定IDF_H264_x结合以处理边缘,依H. 264所规定的次序,从纹理快取760撷取一些子方块并从执行单元池740撷取另一些。 Especially to block the effect of H. 264 loop filter 290 to use a particular embodiment IDF_H264_x combined to edge processing, in order, under the H. 264, to retrieve some of the sub-block from the texture cache 760 and the execution unit cells 740 captures take others. 在适当结合之下,这些IDF_H264_x指令达成一个接一个像素读取与写入。 Under the appropriate combination, these instructions IDF_H264_x reach a pixel one by read and write.

[0119] 图8是用于H. 264的16x16大方块的方块图。 [0119] FIG. 8 is a block diagram for H. 16x16 block 264 is large. 这大方块切割成16个4x4子方块,每个均将进行去方块效应。 This big 4x4 square cut into 16 sub-blocks, each of which will be deblocking. 图8中的4个子方块可依列与行定义(例如R1,C2)。 FIG 84 in sub-block rows and columns defined to follow (e.g. R1, C2). H. 264定义先处理垂直边缘在处理水平边缘,如图8所示的边缘顺序(ah)。 H. 264 defines the first vertical edge processing Processing horizontal edge, an edge sequence as shown in FIG. 8 (ah).

[0120] 因此,该去方块效应滤波器是应用于一对子方块间的边缘,子方块对依此次序滤波: [0120] Thus, the de-block filter effect is applied between the edges of a pair block, the sub-block in this order on the filter:

[0121] edge a = [block to left of Rl,Cl]|[Rl'Cl] ; [block to left of R2,Cl]|[R2,Cl] [0121] edge a = [block to left of Rl, Cl] | [Rl'Cl]; [block to left of R2, Cl] | [R2, Cl]

[0122] [block to left of R3, Cl] | [R3, Cl] ; [block to left of R4, Cl] | [R4, Cl] [0122] [block to left of R3, Cl] | [R3, Cl]; [block to left of R4, Cl] | [R4, Cl]

[0123] edge b = [Rl, Cl] | [R2, C2] ; [R2, Cl] | [R2, C2]; [0123] edge b = [Rl, Cl] | [R2, C2]; [R2, Cl] | [R2, C2];

[0124] [R3, Cl] I [R3, C2] ; [R4, Cl] | [R4, C2]; [0124] [R3, Cl] I [R3, C2]; [R4, Cl] | [R4, C2];

[0125] edge c = [Rl, C2] | [R2, C3] ; [R2, C2] | [R2, C3]; [0125] edge c = [Rl, C2] | [R2, C3]; [R2, C2] | [R2, C3];

[0126] [R3, C2] I [R3, C3] ; [R4, C2] | [R4, C3]; [0126] [R3, C2] I [R3, C3]; [R4, C2] | [R4, C3];

[0127] edge d = [Rl, C3] | [R2, C4] ; [R2, C3] | [R2, C4]; [0127] edge d = [Rl, C3] | [R2, C4]; [R2, C3] | [R2, C4];

[0128] [R3, C3] I [R3, C4] ; [R4, C3] | [R4, C4]; [0128] [R3, C3] I [R3, C4]; [R4, C3] | [R4, C4];

[0129] edge e = [block to top of Rl,Cl]|[Rl,Cl] ; [block to topof Rl,C2]|[Rl,C2]; [0129] edge e = [block to top of Rl, Cl] | [Rl, Cl]; [block to topof Rl, C2] | [Rl, C2];

[0130] [block to top of Rl, C3] | [Rl, C3] ; [block to top ofRl, C4] | [Rl, C4] [0130] [block to top of Rl, C3] | [Rl, C3]; [block to top ofRl, C4] | [Rl, C4]

[0131] edge f = [Rl, Cl] | [R2, Cl] ; [Rl, C2] | [R2, C2]; [0131] edge f = [Rl, Cl] | [R2, Cl]; [Rl, C2] | [R2, C2];

[0132] [Rl, C3] I [R2, C3] ; [Rl, C4] | [R2, C4] [0132] [Rl, C3] I [R2, C3]; [Rl, C4] | [R2, C4]

[0133] edge g = [R2, Cl] | [R3, Cl] ; [R2, C2] | [R3, C2]; [0133] edge g = [R2, Cl] | [R3, Cl]; [R2, C2] | [R3, C2];

[0134] [R2, C3] I [R3, C3] ; [R2, C4] | [R3, C4] [0135] edge h = [R3, Cl] | [R4, Cl] ; [R3, C2] | [R4, C2]; [0134] [R2, C3] I [R3, C3]; [R2, C4] | [R3, C4] [0135] edge h = [R3, Cl] | [R4, Cl]; [R3, C2] | [R4, C2];

[0136] [R3, C3] I [R4, C3] ; [R3, C4] | [R4, C4] [0136] [R3, C3] I [R4, C3]; [R3, C4] | [R4, C4]

[0137] 对于第I对子方块,均下载自纹理快取760,因为还没有像素因施用滤波器而被改变。 [0137] For the first sub-block I, are downloaded from the texture cache 760, because no pixel filter is changed due to administration. 尽管第I垂直边缘(a)的滤波器可以改变(R1,C1)的像素值,第2列垂直边缘实际上与第I列垂直边缘共享所有像素。 Although vertical edges of I (a) may change a filter (R1, C1) the pixel value, the two vertical edges actually shared pixel column I all vertical edges. 因此,第2对子方块(边缘b)亦下载自纹理快取760。 Therefore, the second sub-block (edge ​​b) also downloaded from the texture cache 760. 既然两相邻列间的垂直边缘不共享像素,第3对(边缘c)与第4对(边缘d)子方块亦同。 Since the vertical edges between adjacent two pixel columns are not shared, the third pair (edge ​​c) likewise with (edge ​​d) the fourth sub-block pairs.

[0138] 由回路内去方块效应滤波器290所发出的特定IDF_H264_x指令判定要从那个位置下载像素数据。 [0138] Specific IDF_H264_x instruction deblocking filter 290 by the inner loop that emitted from the determined position of the pixel data is downloaded. 由回路内去方块效应滤波器290所使用的IDF_H264_x指令处理第I组垂直边缘(ad)的次序为: Order to block the effects of the loop filter 290 used by the instruction IDF_H264_x treatment Group I vertical edges (ad) is:

[0139] IDF_H264_0SRC1 = address of(Rl, Cl); [0139] IDF_H264_0SRC1 = address of (Rl, Cl);

[0140] IDF_H264_0SRC1 = address of(R2, Cl); [0140] IDF_H264_0SRC1 = address of (R2, Cl);

[0141] IDF_H264_0SRC1 = address of(R3, Cl); [0141] IDF_H264_0SRC1 = address of (R3, Cl);

[0142] IDF_H264_0SRC1 = address of(R4, Cl); [0142] IDF_H264_0SRC1 = address of (R4, Cl);

[0143] 接下来,回路内去方块效应滤波器290处理第2垂直边缘(b),从(Rl,C2)开始。 [0143] Next, the deblocking filter processing circuit 290 of the second vertical edge (B), from (Rl, C2) begin. 在定义为(Rl,C2)8x4子方块内最左边4个像素与(R1,C1)子方块最右边的像素重迭。 The leftmost pixel and the four rightmost pixel (R1, C1) within the sub-block overlap is defined as (Rl, C2) 8x4 sub-blocks. 这些由(R1,Cl)的垂直边缘滤波器所处理,亦可能更新,之重迭像素是因而被读自执行单元池740而非纹理快取760。 These are handled by the (R1, Cl) vertical edge filter may also be updated, the overlapping pixel is thus read from the execution unit cell texture cache 740 rather than 760. 然而,在(Rl,C2)子方块最右边的4个像素还没被滤波,因而读自纹理快取760。 However, the (Rl, C2) sub-blocks of the rightmost four pixels has not been filtered, thus read from the texture cache 760. 子方块(R2,C2)到(R4,C2)亦同。 Sub-blocks (R2, C2) to (R4, C2) are made. 回路内去方块效应滤波器290通过命令下面IDF_H264_x的顺序以处理第2组垂直边缘,以完成此结果: The inner loop deblocking filter 290 by the following sequence of commands to treatment groups IDF_H264_x second vertical edge, to accomplish this result:

[0144] IDF_H264_1SRC1 = address of(Rl, C2); [0144] IDF_H264_1SRC1 = address of (Rl, C2);

[0145] IDF_H264_1SRC1 = address of(R2, C2); [0145] IDF_H264_1SRC1 = address of (R2, C2);

[0146] IDF_H264_1SRC1 = address of(R3, C2); [0146] IDF_H264_1SRC1 = address of (R3, C2);

[0147] IDF_H264_1SRC1 = address of(R4, C2); [0147] IDF_H264_1SRC1 = address of (R4, C2);

[0148] 当处理第3组垂直边缘时,从(R1,C3)开始。 [0148] When the process of the third group of vertical edges, from (R1, C3) begins. 在(Rl,C3) 8x4子方块内最左边4个像素与(R1,C2)子方块最右边的像素重迭,因而要读自执行单元池740而非纹理快取760。 The leftmost pixel and the four rightmost pixel (R1, C2) overlap in the sub-blocks (Rl, C3) 8x4 sub-blocks, thereby to read from the execution unit cell texture cache 740 rather than 760.

[0149] 然而,在(Rl,C2)子方块最右边的4个像素还没被滤波,因而读自纹理快取760。 [0149] However, in (Rl, C2) sub-blocks of the rightmost four pixels has not been filtered, thus read from the texture cache 760. 子方块(R1,C2)到(R4,C2)亦同。 Sub-blocks (R1, C2) to (R4, C2) are made. 最后一组垂直边缘会发生类似的情形。 Finally, a set of vertical edge of a similar situation occurs. 因此,回路内去方块效应滤波器290通过命令下面IDF_H264_x的顺序以处理剩下2组垂直边缘: Thus, inner loop deblocking filter 290 by the following sequence of commands to process the remaining IDF_H264_x vertical edge Group 2:

[0150] IDF_H264_1SRC1 = address of(Rl, C3); [0150] IDF_H264_1SRC1 = address of (Rl, C3);

[0151] IDF_H264_1SRC1 = address of(R2, C3); [0151] IDF_H264_1SRC1 = address of (R2, C3);

[0152] IDF_H264_1SRC1 = address of(R3, C3); [0152] IDF_H264_1SRC1 = address of (R3, C3);

[0153] IDF_H264_1SRC1 = address of(R4, C3); [0153] IDF_H264_1SRC1 = address of (R4, C3);

[0154] IDF_H264_1SRC1 = address of(Rl, C4); [0154] IDF_H264_1SRC1 = address of (Rl, C4);

[0155] IDF_H264_1SRC1 = address of(R2, C4); [0155] IDF_H264_1SRC1 = address of (R2, C4);

[0156] IDF_H264_1SRC1 = address of(R3, C4); [0156] IDF_H264_1SRC1 = address of (R3, C4);

[0157] IDF_H264_1SRC1 = address of(R4, C4); [0157] IDF_H264_1SRC1 = address of (R4, C4);

[0158] 接着处理水平边缘(eh)。 [0158] Next processing horizontal edges (eh). 此时,去方块效应滤波器已应用于大方块中的每个子方块,因而每个像素可能已更新。 In this case, deblocking filter is applied to each sub-block in a large block, so that each pixel may have been updated. 因此,送去进行水平边缘滤波的各子方块是读自执行单元池740而非纹理快取760。 Thus, the horizontal edge filtering is sent for each sub-block is read from the execution unit cell texture cache 740 rather than 760. 因此,回路内去方块效应滤波器290通过命令下面IDF_H264_x的顺序以处理水平边缘: Thus, inner loop deblocking filter 290 by the following sequence of commands to process horizontal edge IDF_H264_x:

[0159] IDF_H264_2SRC1 = address of(Rl, Cl); [0159] IDF_H264_2SRC1 = address of (Rl, Cl);

[0160] IDF_H264_2SRC1 = address of(R2, Cl); [0160] IDF_H264_2SRC1 = address of (R2, Cl);

[0161] IDF_H264_2SRC1 = address of(R3, Cl); [0161] IDF_H264_2SRC1 = address of (R3, Cl);

[0162] IDF_H264_2SRC1 = address of(R4, Cl); [0162] IDF_H264_2SRC1 = address of (R4, Cl);

[0163] IDF_H264_2SRC1 = address of(Rl, C2); [0163] IDF_H264_2SRC1 = address of (Rl, C2);

[0164] IDF_H264_2SRC1 = address of(R2, C2); [0164] IDF_H264_2SRC1 = address of (R2, C2);

[0165] IDF_H264_2SRC1 = address of(R3, C2); [0165] IDF_H264_2SRC1 = address of (R3, C2);

[0166] IDF_H264_2SRC1 = address of(R4, C2); [0166] IDF_H264_2SRC1 = address of (R4, C2);

[0167] IDF_H264_2SRC1 = address of(Rl, C3); [0167] IDF_H264_2SRC1 = address of (Rl, C3);

[0168] 任何程序说明或流程图中的方块应被理解为表示模块、区段或部分程序码,其包含用于实现特定逻辑电路功能或程序中的步骤的一个或多个可执行的指令。 [0168] or any procedure described in the flowchart block should be understood as representing modules, segments, or portions of program code, comprising a step for implementing specific logical functions or program circuit or more executable instructions. 熟悉软件部门的技术人员应当了解到,其它的实现方法亦包含于所揭露的范围内。 Familiar with the software department of art should understand that other implementations are also included within the disclosed range. 在其它的实现方法中,各功能可不依所示或揭露的顺序执行,包含实质上同步进行或逆向进行,依所涉的功能而定。 In order to achieve another method, the functions may be disclosed or shown in the implementation of non-compliance, comprising substantially simultaneously performed or the reverse, depending upon the functionality involved may be.

[0169] 在此揭露的系统与方法可以软件、硬件或其结合实现。 [0169] Systems and methods disclosed herein may be software, hardware, or a combination thereof to achieve. 在一些实施例中,该系统及/或方法是以存在存储器中的软件实现,且由位于计算装置中的适当处理器所执行(包含而不限于微处理器、微控制器、网络处理器、可重新装配处理器、可扩充处理器)。 In some embodiments, the systems and / or methods are implemented in software stored in the memory of the executed (including, without limitation, a microprocessor, a microcontroller, a network processor device and is calculated by the processor suitably located, may reassembly processor, the processor can be expanded). 在其它实施例中,该系统及/或方法是以逻辑电路实现,包含而不限于可编程逻辑装置(PLD,programmable logic device)、可编程逻辑门阵列(PGA,programmable gate array)、现场可编程逻辑门阵列(FPGA, field programmable gate array)或专用电路(ASIC)。 In other embodiments, the systems and / or methods are implemented logic circuit, a programmable logic device comprising, without limitation, (PLD, programmable logic device), programmable gate array (PGA, programmable gate array), a field programmable gate array (FPGA, field programmable gate array) or special purpose circuits (ASIC). 在其它实施例中,这些逻辑叙述是在图形处理器或图形处理单元(GPU)完成。 In other embodiments, the description is done in the logic of a graphics processor or a graphics processing unit (GPU).

[0170] 在此揭露的系统与方法可被嵌入任何计算机可读媒体而使用,或连接指令执行系统、设备、装置。 [0170] Systems and methods disclosed herein can be embedded in any computer-readable medium used, or connected to the instruction execution system, apparatus, device. 该指令执行系统包含任何以计算机为基础的系统、含有处理器的系统或其它可以从该指令执行系统撷取与执行这些指令的系统。 The instruction execution system include any computer-based system, processor-containing system, or other system to retrieve and execute the instructions from the instruction execution system. 所揭露的文字“计算机可读媒体(computer-readable medium) ”可为任何可以容纳、储存、沟通、传递或传送该程序作为使用或与该指令执行系统连接的工具。 Disclosed text "computer-readable media (computer-readable medium)" may be any that can accommodate, store, communicate, or transport the program is transmitted as a tool or system connected to the command is executed. 该计算机可读媒体可为,例如(非限制)为基于电子的、有磁性的、光的、电磁的、红外线的或半导体技术的系统或传递媒体。 The computer-readable medium may be, for example (non-limiting) based on electronic, magnetic, and optical systems, electromagnetic, infrared, or semiconductor technology or a transfer medium.

[0171] 使用电子技术的计算机可读媒体的特定范例(非限制)可包含:具有一条或多条电性(电子)连接的线;随机存取存储器(RAM,random access memory);只读存储器(ROM,read-only memory);可拭去可编程只读存储器(EPROM或闪存)。 [0171] The use of electronic computer specific examples (non-limiting) readable media may comprise: one or more lines having electrical (electronic) connection; a random access memory (RAM, random access memory); a read only memory (ROM, read-only memory); can be wiped programmable readonly memory (EPROM or Flash memory). 使用磁技术的计算机可读媒体的特定范例(非限制)可包含:可携带计算机磁盘。 Specific examples (non-limiting) techniques using magnetic computer-readable medium may comprise: a portable computer diskette. 使用光技术的计算机可读媒体的特定范例(非限制)可包含:光纤与可携带只读光盘(⑶-ROM)。 Specific examples (non-limiting) techniques using an optical computer-readable media may comprise: an optical fiber with a portable CD-ROM (⑶-ROM).

[0172] 虽然本发明在此以一个或更多个特定的范例作为实施例阐明及描述,不过不应将本发明局限于所示的细节,然而仍可在不背离本发明的精神下且在权利要求范围均等的领域与范围内实现许多不同的修改与结构上的改变。 [0172] Although the invention herein in one or more of the specific embodiments set forth as examples and described embodiments, but the invention should not be limited to the details shown, however, still without departing from the spirit of the invention and in achieve many variations on the structure of various modifications within the scope of the claims uniformly and scope of the art. 因此,最好将所附上的权利要求范围广泛地且以符合本发明领域的方法解释,在随后的权利要求范围前提出此声明。 Accordingly, the preferable range of the appended on the right and in a wide field of the invention conform to methods of explanation, this statement before claim scope of the following claims.

Claims (10)

  1. 1. 一种用于视频解码的去方块效应滤波器,包含: 第一逻辑电路,用来判定多个像素群中的预定像素群的像素是否达到条件; 第二逻辑电路,设置成当达到该条件时,先对该预定像素群的像素滤波;以及第三逻辑电路,设置成当达到该条件时,根据在多组滤波単元中的相应组滤波単元,循序对该多个像素群中剩下的各像素群滤波, 其中该条件是由预定的计算与比较的集合而定,该预定的计算与比较为ー组滤波单元, 其中该多个像素群形成多个邻近4X4型四方形像素方块,预定像素群是邻近的4X4型四方形像素方块的边缘像素, 当处理该4X4型四方形像素方块的P3时,更新P4的条件为: ((ABS (AO) < PQUANT)OR(A3 < ABS(AO)) OR(CLIP ! = 0) 其中, AO = (2*(P3-P6)-5*(P4-P5) +4) >> 3 Al = (2*(Pl-P4)-5*(P2-P3)+4) >>3 A2 = (2*(P5-P8)-5*(P6-P7)+4) >>3 A3 [I] = min (Al [I], A2 [I]) clip = (P4-P5)/2, Pl至P8是指2 A deblocking filter for video decoding, comprising: a first logic circuit for determining the pixel group of the plurality of predetermined pixel group of pixels reaches condition; a second logic circuit, arranged to when it reaches the the condition, a predetermined pixel filtering the first pixel group; and a third logic circuit, is set so that when this condition is reached, according to the plurality of sets of filter element radiolabeling radiolabeling a respective set of filter element, the plurality of sequential pixel groups remaining filtering each group of pixels, wherein the predetermined condition is set by the calculation of the comparison set, and comparing the calculated predetermined group of ー filtering unit, wherein the plurality of pixel groups forming a plurality of pixels adjacent 4X4 square block type, the predetermined group of pixels is an edge pixel 4X4 type square pixel blocks adjacent to, when processing the 4X4 type square pixel blocks P3, the update condition P4 as: ((ABS (AO) <PQUANT) OR (A3 <ABS ( AO)) OR (CLIP! = 0) where, AO = (2 * (P3-P6) -5 * (P4-P5) +4) >> 3 Al = (2 * (Pl-P4) -5 * ( P2-P3) +4) >> 3 A2 = (2 * (P5-P8) -5 * (P6-P7) +4) >> 3 A3 [I] = min (Al [I], A2 [I] ) clip = (P4-P5) / 2, Pl to P8 means 2 个邻近4X4型四方形像素方块中的像素在被处理的列/行中的位置,ABS表示求AO的绝对值,PQUANT为ー个预设值,AO表示被处理的行/列中的像素点的插值,Al、A2分别表示被处理的边缘的像素点的插值, 若ABS(AO) < PQUANT则选择真输出,其它则为假;若A3 < ABS(AO)则选择真输出,其它则为假;若CLIP ! = 0则选择真输出,其它则为假。 4X4 neighboring pixels square box type pixel position to be processed in the column / row, ABS represents the absolute value of AO, PQUANT is ー presets, AO represents the pixel row being processed / column interpolation, Al, A2 represent the interpolated pixels of the edge to be processed, if the ABS (AO) <PQUANT true output is selected, compared with other false; if A3 <ABS (AO) is selected true output, compared with other false;! If CLIP = 0 selects true output, the other was false.
  2. 2.根据权利要求I所述的去方块效应滤波器,其中该第三逻辑电路还包含: 第四逻辑电路,设置成根据在剩下各像素群中的第二预定像素组,更新在剩下各像素群中的第一预定像素组。 The deblocking filter of claim I, wherein the logic circuit further comprises a third claim: a fourth logic circuit arranged in accordance with each of the remaining pixel groups in a second predetermined group of pixels, in the remaining updating a first predetermined group of pixels in each pixel group.
  3. 3.根据权利要求I所述的去方块效应滤波器,该第二逻辑电路还包含: 第五逻辑电路,设置成在达到该条件时,先对该预定像素群的像素平行地滤波。 The deblocking filter as claimed in claim I, the second logic circuit further comprises: a fifth logic circuit, arranged to upon reaching this condition, the first pixel in parallel to filter the predetermined pixel group.
  4. 4.根据权利要求I所述的去方块效应滤波器,其中该去方块效应滤波器是施用于子方块对之间的边缘以移去边缘产物。 The deblocking filter as claimed in claim I, wherein the de-block filter effect is applied to the edge between the sub-blocks on the edge to remove the product.
  5. 5.根据权利要求I所述的去方块效应滤波器,其中该去方块效应滤波器使用适当结合的多个图形处理指令以达成一个接ー个像素读取与写入。 The deblocking filter as claimed in claim I, wherein the effect of the de-block filter using the graphics processing instructions in order to achieve a suitable combination of contact ー read and write pixels.
  6. 6. 一种视频解码器包含: 熵解码器,接收输入编码比特流; 空间解码器,接收该熵解码器的输出并产生包含多个像素的编码图片; 第一逻辑电路,设置成结合目前图片与预测图片以产生结合图片;以及回路内去方块效应滤波器,接收该结合图片,该回路内去方块效应滤波器包含: 第二逻辑电路,设置成对预定像素群滤波;以及第三逻辑电路,设置成当该预定像素群达到条件时,根据在多组滤波単元中的相应组滤波单元,对多个像素群中剩下的各像素群滤波,其中该条件是由预定的计算与比较的集合而定,该预定的计算与比较为ー组滤波单元, 其中该多个像素群形成多个邻近4X4型四方形像素方块,预定像素群是邻近的4X4型四方形像素方块的边缘像素, 当处理该4X4型四方形像素方块的P3时,更新P4的条件为: ((ABS (AO) < PQUANT)OR(A3 < ABS(AO)) OR(CLIP ! = 0) 其 A video decoder comprising: an entropy decoder receiving an input encoded bit stream; spatial decoder, receiving the output of the entropy decoder and generates a plurality of pixels of the coded picture; a first logic circuit, provided with the current picture and the prediction image to generate a combined image; and to block the effect of the loop filter receiving the combined image, the inner loop deblocking filter comprises: a second logic circuit, arranged to filter the predetermined pixel group; and a third logic circuit arranged so that when the pixel group to reach a predetermined condition, according to the respective set of filter units in a plurality of sets of filter element radiolabeling, a plurality of pixel groups each pixel group remaining in the filter, wherein the predetermined condition is calculated and compared may be set, the calculation and comparison of the predetermined set of filter units ー, wherein the plurality of pixel groups forming a plurality of adjacent pixels 4X4 square block type, the predetermined group of pixels is an edge pixel 4X4 square-type blocks adjacent pixels, when this type of processing 4X4 square block of pixels P3, P4 update condition is: ((ABS (AO) <PQUANT) OR (A3 <ABS (AO)) OR (CLIP = 0) which! 中, AO = (2* (P3-P6) -5* (P4-P5) +4) >> 3 Al = (2*(Pl-P4)-5*(P2-P3)+4) >>3 A2 = (2* (P5-P8)-5* (P6-P7)+4) >>3 A3 [I] = min (Al [I], A2 [I]) clip = (P4-P5)/2, Pl至P8是指2个邻近4X4型四方形像素方块中的像素在被处理的列/行中的位置,ABS表示求AO的绝对值,PQUANT为ー个预设值,AO表示被处理的行/列中的像素点的插值,Al、A2分别表示被处理的边缘的像素点的插值, 若ABS(AO) < PQUANT则选择真输出,其它则为假;若A3 < ABS(AO)则选择真输出,其它则为假;若CLIP ! = 0则选择真输出,其它则为假。 In, AO = (2 * (P3-P6) -5 * (P4-P5) +4) >> 3 Al = (2 * (Pl-P4) -5 * (P2-P3) +4) >> 3 A2 = (2 * (P5-P8) -5 * (P6-P7) +4) >> 3 A3 [I] = min (Al [I], A2 [I]) clip = (P4-P5) / 2 , Pl to P8 is a position adjacent to 2-type 4X4 square of pixels in the pixel block being processed in the column / row, ABS represents the absolute value of AO, PQUANT is ー presets, AO represents treated interpolation pixel row / column, Al, A2 represent the interpolated pixels of the edge to be processed, if the ABS (AO) <PQUANT true output is selected, compared with other false; if A3 <ABS (AO) is select true output, and false other;! if CLIP = 0 selects output true, and false other.
  7. 7.根据权利要求6所述的视频解码器,其中该第二逻辑电路还包含: 第四逻辑电路,设置成在达到该条件时,先对该预定像素群的像素平行地滤波。 7. The video decoder of claim 6, wherein the second logic circuit further comprises: a fourth logic circuit, arranged to upon reaching this condition, the first pixel in parallel to filter the predetermined pixel group.
  8. 8.根据权利要求6所述的视频解码器,其中该第三逻辑电路还包含: 第五逻辑电路,设置成根据在剩下各像素群中的第二预定像素组,更新在剩下各像素群中的第一预定像素组。 8. The video decoder of claim 6, wherein the logic circuit further comprises a third claim: fifth logic circuit, arranged according to a predetermined remaining second group of pixels in each pixel group, each pixel in the remaining updating the predetermined set of pixels of the first group.
  9. 9. 一种图形处理单元包含: 主处理接ロ,接收至少ー视频加速指令;以及视频加速単元,用于响应该至少ー视频加速指令,该视频加速单元包含回路内去方块效应滤波器,该回路内去方块效应滤波器包含: 第一逻辑电路,设置成判定多个像素群的预定像素群的像素是否达到第一条件; 第二逻辑电路,设置成当达到该第一条件时,先对该预定像素群的像素滤波;以及第三逻辑电路,设置成当达到该第一条件时,根据在多组滤波単元中的相应组滤波单元,循序对该多个像素群中剩下的像素群滤波, 其中该条件是由预定的计算与比较的集合而定,该预定的计算与比较为ー组滤波单元, 其中该多个像素群形成多个邻近4X4型四方形像素方块,预定像素群是邻近的4X4型四方形像素方块的边缘像素, 当处理该4X4型四方形像素方块的P3时,更新P4的条件为: ((ABS A graphics processing unit comprising: a main processing access ro, receiving at least video ー acceleration command; and a video acceleration radiolabeling element, at least in response to the acceleration command ー video, the video acceleration unit comprises inner loop deblocking filter, the the inner loop deblocking filter comprises: a first logic circuit, a pixel group of pixels into a predetermined plurality of pixel groups is determined whether a first condition; a second logic circuit, arranged to when the first condition has been met, the first of the predetermined pixel filtering pixel group; and a third logic circuit, arranged such that when the first condition is reached, according to the respective set of filter units in a plurality of sets of filter element radiolabeling, the sequential plurality of pixel groups remaining pixel group filtering, wherein the predetermined condition is set by the calculation of the comparison set, and comparing the calculated predetermined group of ー filtering unit, wherein the plurality of pixel groups forming a plurality of adjacent pixels 4X4 square block type, the predetermined pixel group is 4X4 square type adjacent pixel block edge pixel, the processing when the pixel 4X4 square box type P3, P4 update condition is: ((ABS (AO) < PQUANT)OR(A3 < ABS(AO)) OR(CLIP ! = 0) 其中, AO = (2*(P3-P6)-5*(P4-P5) +4) >> 3 Al = (2*(Pl-P4)-5*(P2-P3)+4) >>3A2 = (2*(P5-P8)-5*(P6-P7)+4) >>3 A3 [I] = min (Al [I], A2 [I]) clip = (P4-P5)/2, Pl至P8是指2个邻近4X4型四方形像素方块中的像素在被处理的列/行中的位置,ABS表示求AO的绝对值,PQUANT为ー个预设值,AO表示被处理的行/列中的像素点的插值,Al、A2分别表示被处理的边缘的像素点的插值, 若ABS(AO) < PQUANT则选择真输出,其它则为假;若A3 < ABS(AO)则选择真输出,其它则为假;若CLIP ! = 0则选择真输出,其它则为假。 (AO) <PQUANT) OR (A3 <ABS (AO)) OR (CLIP! = 0) where, AO = (2 * (P3-P6) -5 * (P4-P5) +4) >> 3 Al = (2 * (Pl-P4) -5 * (P2-P3) +4) >> 3A2 = (2 * (P5-P8) -5 * (P6-P7) +4) >> 3 A3 [I] = min (Al [I], A2 [I]) clip = (P4-P5) / 2, Pl to P8 is a position two adjacent 4X4 four-square pixel block of pixels in the column / row being processed, and ABS indicates absolute value of AO, PQUANT is ー presets, AO represents an interpolation pixel row is processed / column, Al, A2 represent the interpolated pixels of the edge to be processed, if the ABS (AO ) <PQUANT true output is selected, compared with other false; if A3 <ABS (AO) is selected true output, and false other;! if CLIP = 0 selects output true, and false other.
  10. 10.根据权利要求9所述的图形处理单元,其中该第三逻辑电路还包含: 第四逻辑电路,设置成根据在剩下各像素群中的第二预定像素组,更新在剩下各像素群中的第一预定像素组。 10. The graphics processing unit according to claim 9, wherein the third logic circuit further comprising: a fourth logic circuit is arranged according to a predetermined remaining second pixel group in each pixel group, each pixel in the remaining updating the predetermined set of pixels of the first group.
CN 200710110359 2006-06-16 2007-06-13 Systems and methods of video compression deblocking CN101072351B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US81462306 true 2006-06-16 2006-06-16
US60/814,623 2006-06-16

Publications (2)

Publication Number Publication Date
CN101072351A true CN101072351A (en) 2007-11-14
CN101072351B true CN101072351B (en) 2012-11-21

Family

ID=38880763

Family Applications (6)

Application Number Title Priority Date Filing Date
CN 200710110359 CN101072351B (en) 2006-06-16 2007-06-13 Systems and methods of video compression deblocking
CN 200710111955 CN101083763B (en) 2006-06-16 2007-06-18 Programmable video processing unit and the video data processing method
CN 200710111956 CN101083764B (en) 2006-06-16 2007-06-18 Programmable video processing unit with video data processing method
CN 200710110194 CN101068365B (en) 2006-06-16 2007-06-18 Method for judging moving vector for describing refrence square moving and the storage media
CN 200710110193 CN101068353B (en) 2006-06-16 2007-06-18 Graph processing unit and method for calculating absolute difference and total value of macroblock
CN 200710110192 CN101068364B (en) 2006-06-16 2007-06-18 Video encoder and graph processing unit

Family Applications After (5)

Application Number Title Priority Date Filing Date
CN 200710111955 CN101083763B (en) 2006-06-16 2007-06-18 Programmable video processing unit and the video data processing method
CN 200710111956 CN101083764B (en) 2006-06-16 2007-06-18 Programmable video processing unit with video data processing method
CN 200710110194 CN101068365B (en) 2006-06-16 2007-06-18 Method for judging moving vector for describing refrence square moving and the storage media
CN 200710110193 CN101068353B (en) 2006-06-16 2007-06-18 Graph processing unit and method for calculating absolute difference and total value of macroblock
CN 200710110192 CN101068364B (en) 2006-06-16 2007-06-18 Video encoder and graph processing unit

Country Status (1)

Country Link
CN (6) CN101072351B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010071504A1 (en) 2008-12-15 2010-06-24 Telefonaktiebolaget Lm Ericsson (Publ) Method and apparatus for avoiding quality deterioration of transmitted media content
CN101901588B (en) 2009-05-31 2012-07-04 比亚迪股份有限公司 Method for smoothly displaying image of embedded system
CN102164284A (en) * 2010-02-24 2011-08-24 富士通株式会社 Video decoding method and system
US8295619B2 (en) * 2010-04-05 2012-10-23 Mediatek Inc. Image processing apparatus employed in overdrive application for compressing image data of second frame according to first frame preceding second frame and related image processing method thereof
US8681162B2 (en) * 2010-10-15 2014-03-25 Via Technologies, Inc. Systems and methods for video processing
KR101567467B1 (en) * 2011-05-10 2015-11-09 미디어텍 인크. Method and apparatus for reduction of in-loop filter buffer
CN105872553A (en) * 2016-04-28 2016-08-17 中山大学 Method for adaptive loop filter based on parallel computing

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050013494A1 (en) 2003-07-18 2005-01-20 Microsoft Corporation In-loop deblocking filter
US6882688B1 (en) 1998-12-11 2005-04-19 Matsushita Electric Industrial Co., Ltd. Deblocking filter arithmetic apparatus and deblocking filter arithmetic method
WO2005122588A1 (en) 2004-06-14 2005-12-22 Tandberg Telecom As Method for chroma deblocking
US20060078048A1 (en) 2004-10-13 2006-04-13 Gisle Bjontegaard Deblocking filter
CN1774722A (en) 2003-03-17 2006-05-17 高通股份有限公司 Method and apparatus for improving video quality of low bit-rate video
EP1659803A1 (en) 2003-08-19 2006-05-24 Matsushita Electric Industrial Co., Ltd. Method for encoding moving image and method for decoding moving image

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3578498B2 (en) 1994-12-02 2004-10-20 株式会社ソニー・コンピュータエンタテインメント Image information processing apparatus
US6177922B1 (en) 1997-04-15 2001-01-23 Genesis Microship, Inc. Multi-scan video timing generator for format conversion
JP3870491B2 (en) 1997-07-02 2007-01-17 松下電器産業株式会社 Image between the corresponding detection method and apparatus
US6487249B2 (en) 1998-10-09 2002-11-26 Matsushita Electric Industrial Co., Ltd. Efficient down conversion system for 2:1 decimation
CN1112714C (en) 1998-12-31 2003-06-25 上海永新彩色显象管有限公司 Kinescope screen washing equipment and method
KR100677082B1 (en) 2000-01-27 2007-02-01 삼성전자주식회사 Motion estimator
US7940844B2 (en) 2002-06-18 2011-05-10 Qualcomm Incorporated Video encoding and decoding techniques
CN1332560C (en) 2002-07-22 2007-08-15 上海芯华微电子有限公司 Method based on difference between block bundaries and quantizing factor for removing block effect without additional frame memory
US6944224B2 (en) 2002-08-14 2005-09-13 Intervideo, Inc. Systems and methods for selecting a macroblock mode in a video encoder
US6922492B2 (en) 2002-12-27 2005-07-26 Motorola, Inc. Video deblocking method and apparatus
US20050105621A1 (en) 2003-11-04 2005-05-19 Ju Chi-Cheng Apparatus capable of performing both block-matching motion compensation and global motion compensation and method thereof
CN1233171C (en) 2004-01-16 2005-12-21 北京工业大学 A simplified loop filtering method for video coding
CN1750660A (en) 2005-09-29 2006-03-22 威盛电子股份有限公司 Method for calculating moving vector

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6882688B1 (en) 1998-12-11 2005-04-19 Matsushita Electric Industrial Co., Ltd. Deblocking filter arithmetic apparatus and deblocking filter arithmetic method
CN1774722A (en) 2003-03-17 2006-05-17 高通股份有限公司 Method and apparatus for improving video quality of low bit-rate video
US20050013494A1 (en) 2003-07-18 2005-01-20 Microsoft Corporation In-loop deblocking filter
EP1659803A1 (en) 2003-08-19 2006-05-24 Matsushita Electric Industrial Co., Ltd. Method for encoding moving image and method for decoding moving image
WO2005122588A1 (en) 2004-06-14 2005-12-22 Tandberg Telecom As Method for chroma deblocking
US20060078048A1 (en) 2004-10-13 2006-04-13 Gisle Bjontegaard Deblocking filter

Also Published As

Publication number Publication date Type
CN101068365A (en) 2007-11-07 application
CN101068364A (en) 2007-11-07 application
CN101072351A (en) 2007-11-14 application
CN101068364B (en) 2010-12-01 grant
CN101068353A (en) 2007-11-07 application
CN101083764B (en) 2014-04-02 grant
CN101083763B (en) 2012-02-08 grant
CN101068353B (en) 2010-08-25 grant
CN101068365B (en) 2010-08-25 grant
CN101083764A (en) 2007-12-05 application
CN101083763A (en) 2007-12-05 application

Similar Documents

Publication Publication Date Title
US6683992B2 (en) Image decoding apparatus and image coding apparatus
US7158141B2 (en) Programmable 3D graphics pipeline for multimedia applications
US7558428B2 (en) Accelerated video encoding using a graphics processing unit
US20050259744A1 (en) Video deblocking memory utilization
US5774676A (en) Method and apparatus for decompression of MPEG compressed data in a computer system
US20060133504A1 (en) Deblocking filters for performing horizontal and vertical filtering of video data simultaneously and methods of operating the same
US20060165164A1 (en) Scratch pad for storing intermediate loop filter data
US20040181564A1 (en) SIMD supporting filtering in a video decoding system
US20120033730A1 (en) Random access point (rap) formation using intra refreshing technique in video coding
US7162090B2 (en) Image processing apparatus, image processing program and image processing method
US20030189982A1 (en) System and method for multi-row decoding of video with dependent rows
US6967659B1 (en) Circuitry and systems for performing two-dimensional motion compensation using a three-dimensional pipeline and methods of operating the same
US20040190617A1 (en) Accelerating video decoding using a graphics processing unit
US20120114039A1 (en) Video coding methods and apparatus
JP2007150913A (en) Image encoding device
US20060078052A1 (en) Method and apparatus for parallel processing of in-loop deblocking filter for H.264 video compression standard
US20090010338A1 (en) Picture encoding using same-picture reference for pixel reconstruction
US20080232471A1 (en) Efficient Implementation of H.264 4 By 4 Intra Prediction on a VLIW Processor
US20090010337A1 (en) Picture decoding using same-picture reference for pixel reconstruction
US20080056389A1 (en) Multimode filter for de-blocking and de-ringing
US20080056363A1 (en) Method and system for intra-prediction in decoding of video data
Masaki et al. VLSI implementation of inverse discrete cosine transformer and motion compensator for MPEG2 HDTV video decoding
JP2005070938A (en) Signal-processor and electronic equipment using it
CN101123723A (en) Digital video decoding method based on image processor
US20070291846A1 (en) Systems and Methods of Improved Motion Estimation using a Graphics Processing Unit

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C14 Grant of patent or utility model