CN101072351A

CN101072351A - Systems and methods of video compression deblocking

Info

Publication number: CN101072351A
Application number: CNA2007101103594A
Authority: CN
Inventors: 扎伊尔德·荷圣
Original assignee: Via Technologies Inc
Current assignee: Via Technologies Inc
Priority date: 2006-06-16
Filing date: 2007-06-13
Publication date: 2007-11-14
Anticipated expiration: 2027-06-13
Also published as: CN101068353A; TW200821986A; TWI482117B; CN101083763A; TW200816082A; TW200816820A; CN101068353B; TW200803525A; CN101068365A; TWI348654B; TW200803527A; CN101072351B; CN101068364B; TWI383683B; TWI350109B; TWI444047B; CN101083764A; CN101083764B; CN101083763B; TWI395488B

Abstract

An exemplary video decoder comprises: an entropy decoder; a spatial decoder; combining logic; and an inloop deblocking filter. The entropy decoder receives an incoming coded bit stream. The spatial decoder receives the output of the entropy encoder and produces an encoded picture comprising a plurality of pixels. The combining logic combines a current picture with a prediction picture to produce a combined picture. The inloop deblocking filter receives the combined picture. The inloop deblocking filter comprises: logic configured to filter a predefined pixel group; and logic configured to filter each of the remaining pixel groups in the plurality after the predefined pixel group, according to a corresponding set of taps in a plurality of sets of taps, if the predefined pixel group meets a criteria.

Description

Deblocking effect filter and Video Decoder and Graphics Processing Unit

Technical field

The invention relates to image compression and decompression, and especially about having the Graphics Processing Unit of image compression and decompression feature.

Background technology

Personal computer and consumption electronic products are to be used for various amusement articles.These amusement articles can roughly be divided into 2 classes: those of the drawing that uses a computer (computer-generated graphics), for example computer game; With use those of compressed video data stream (compressed video stream), for example pre-record program to digital video disk (DVD) (DVD), or provide digital program (digital programming) to set-top box (set-top box) by cable TV or satellite dealer.The 2nd kind also comprises the coding simulation video data stream, for example performed by digital VTR (DVR, digital video recorder).

Computer graphics is produced by Graphics Processing Unit (GPU, graphic processing unit) usually.Graphics Processing Unit is a kind of a kind of special microprocessor on computer game platform (computer game consoles) and some personal computers that is based upon.Graphics Processing Unit is to be optimized to carrying out fast to describe three-dimensional space basic object (three-dimensional primitive objects), for example triangle, quadrangle etc.These basic objects are to describe with a plurality of summits, and wherein each summit has attribute (for example color), and can apply texture (texture) to this basic object.The result who describes is dual space pel array (two-dimensional array of pixels), is presented on the display or monitor of computer.

The encoding and decoding of video data stream involves different types of computing, for example, discrete cosine transform (discrete cosine transform), mobile estimating (motion estimation), motion compensation (motion compensation), deblocking effect filter (deblocking filter).Usually by the special hardware logic electric circuit of general service central processing unit (CPU) combination, for example application-specific integrated circuit (ASIC) (ASIC, application specific integrated circuit) is handled in these calculating.Consumer thereby a plurality of calculate platforms of needs are to satisfy their amusement demand.Thereby need can the process computer drawing and the single computing platform of encoding and decoding of video.

Summary of the invention

Provide a kind of System and method for that is used for the video compression deblocking effect at the embodiment of this exposure.A kind of exemplary deblocking effect filter that is used for video decode comprises: be arranged to be used for judge whether the intended pixel group's of a plurality of pixel groups pixel reaches the logical circuit of standard; Be arranged to when reaching this standard, elder generation is to the logical circuit of this intended pixel group's pixel filter; And be arranged to when reaching this standard, according to the respective sets filter units in many group filter units (set of taps), in proper order to the logical circuit of pixel group filtering remaining in these a plurality of pixel groups.

A kind of exemplary Video Decoder comprises: deblocking effect filter in entropy decoder, spatial decoder, combinational logic circuit and the loop.This entropy decoder receives the input coding bit stream.This spatial decoder receives the output of this entropy decoder and the encoded picture that generation comprises a plurality of pixels.This combinational logic circuit combines picture with predicted pictures to produce in conjunction with present picture.Deblocking effect filter receives this in conjunction with picture in this loop.Deblocking effect filter comprises in this loop: be arranged to the logical circuit to intended pixel group filtering; And be arranged to when this intended pixel group reaches standard, according to the respective sets filter unit in many group filter units, to the logical circuit of each remaining in these a plurality of pixel groups pixel group filtering.

A kind of exemplary figure processing unit comprises main Processing Interface and video accelerator module.This main Processing Interface receives at least one video assisted instruction.This video accelerator module is used for this at least one video assisted instruction.This video accelerator module comprises deblocking effect filter in the loop.Deblocking effect filter comprises in this loop: be arranged to judge whether the intended pixel group's of a plurality of pixel groups pixel reaches the logical circuit of first standard; Be arranged to when reaching this first standard, elder generation is to the logical circuit of this intended pixel group's pixel filter; And be arranged to when reaching this first standard, according to the respective sets filter units in many group filter units (set of taps), in proper order to the logical circuit of pixel group filtering remaining in these a plurality of pixel groups.

Description of drawings

Fig. 1 is the calcspar that is used for the exemplary calculate platform of figure and video coding and/or decoding.

Fig. 2 is the calcspar of this Video Decoder 160 among Fig. 1.

Fig. 3 illustrates the sub-square pixel setting of VC-1 filter.

Fig. 4 is the tabulation of the hardware description pseudo-code of the hardware-accelerated logical circuit 400 of deblocking effect filter in Fig. 1 VC-1 loop.

Fig. 5 is the tabulation of the hardware description language procedure code of the capable acceleration logic circuit 500 of Fig. 4.

Fig. 6 A to Fig. 6 D forms the calcspar of the capable acceleration logic circuit of Fig. 4, Fig. 5.

Fig. 7 is the data flowchart of the Graphics Processing Unit 120 of Fig. 1.

Fig. 8 is the calcspar of the H.264 used big square of 16x16.

[main element label declaration]

100～system, 110～general service CPU, 120～graphic process unit (GPU), 130～memory, 140～bus, 150～video accelerator module (VPU), 160～software decoder, 170～video accelerator actuator.

Bit stream, 210～entropy decoder, 215～spatial decoder, 220～inverse quantizer, the conversion of 230～inverse discrete cosine, 235～figure, 245～motion-vector, 250～motion compensation, 255～early decoding figure, 265～prediction figure, 270～space compensation, 280～adder, the 290～deblocking effect filter, 295～decoding figure of 205～input.

310-320～two a contiguous 4x4 square, 330～vertical boundary.

The hardware-accelerated logical circuit of deblocking effect filter, 410～module definition section, the vertical parameter section of 420～iterative cycles section, 430～test, 440～comparison loop parameter and 3 sections, 450～example section in 400～loop.

500～row acceleration logic circuit, 510～module definition section, 520～pixel value computing section, 530～comparison loop parameter and 3 sections, 540～test DO_FILTER section, 550～update mode section.

605-610-615-620～multiplexer, 625-630-679～subtracter, 635-640-655-680～logical circuit square, 645-650～adder, 660-665-670～buffer, the output of 671～P4 buffer, the output of 673～P5 buffer.681～subtracter, 685～adder.687-689-691-693～multiplexer, 697～OR door.

710～instruction stream processor, 720～instruction, 730～director data, 740～pool of execution units, 750～texture filtering unit, 760～texture quick are got, 770～back wrapper.

Embodiment

The calculate platform that is used for encoding and decoding of video

Fig. 1 is the calcspar that is used for the exemplary calculate platform of figure and video coding and/or decoding.System 100 comprises general service CPU110 (after this being called primary processor), graphic process unit (GPU) 120, memory 130 and bus 140.Graphics Processing Unit 120 comprises video accelerator module (VPU) 150, but its accelerated video encoding and/or decoding, will be in the back narration.It is the instruction that can carry out on Graphics Processing Unit 120 that the video of Graphics Processing Unit 120 quickens function.

Software decoder 160 is arranged in memory 130 with video accelerator actuator 170, and the decoder 160 of at least a portion is carried out on primary processor 110 with video accelerator actuator 170.By a host interface 180 that is provided by video accelerator actuator 170, decoder 160 also can send the video assisted instruction to Graphics Processing Unit 120.Thus, system 100 carries out video coding and/or decoding for the main processor software (host processor software) of Graphics Processing Unit 120 by sending the video assisted instruction, and Graphics Processing Unit 120 responds these instructions by the part of accelerating decoder 160.

In certain embodiments, only have a fraction of decoder 160 on primary processor, to carry out, and most decoder 160 is to be carried out by Graphics Processing Unit 120, under the few overload of driver.Method according to this, the intensive computing square that often is performed (computationally intensive blocks) is unloaded to Graphics Processing Unit 120, and more complex calculations are performed by primary processor 110.In certain embodiments, an intensive calculation function of being realized by Graphics Processing Unit 120 comprises the hardware-accelerated logic device of deblocking effect filter in the loop (inloop deblocking filter hardwareacceleration logic) 400, also be called blocking artifact filter 400 or deblocking effect filter 400 in the loop, it will illustrate in conjunction with Fig. 4 after a while.The example of another intensive calculation function is a boundary intensity (BS, boundary strength) of judging each filter.

Above-mentioned structure thereby make following running flexible: on primary processor 110, decoder 160 is carried out some by big square (marcoblock) being carried out the specific function (for example deblocking effect or computation bound intensity) of coloring process (shader program); Or on Graphics Processing Unit 120, carry out most decoder 160, utilize pipelining (pipelining) and parallelization (parallelism).In the embodiment that some decoders 160 are carried out on Graphics Processing Unit 120, it is synchronous thread (thread) between these decoder 160 each aspects that this deblocking effect is handled.

Omit several among Fig. 1 and quicken feature and inessential and well known elements well known to those skilled in the art for the video of explaining Graphics Processing Unit 120.

Video Decoder

Fig. 2 is the calcspar of this Video Decoder 160 among Fig. 1.In specific embodiments illustrated in fig. 2, decoder 160 is used H.264 video compression standard of ITU.Yet those skilled in the art should recognize that the decoder 160 of Fig. 2 is preliminary expressions of Video Decoder, and this Video Decoder also illustrates the running that is similar to other type of decoder H.264, for example SMPTE VC-1 and MPEG-2 standard.In addition, although be shown the part of Graphics Processing Unit 120, those skilled in the art also should be appreciated that the partial decoding of h device 160 in this exposure also can be implemented in outside the Graphics Processing Unit for example self-existent logical circuit, the part of application-specific integrated circuit (ASIC) (ASIC) etc.

The bit stream 205 of input is at first by 210 processing of entropy decoder (entropy decoder).Entropy coding has the advantage of statistics repetition type (statistic redundancy): some patterns are than the more normal appearance of other pattern, so normal just representing with short sign indicating number of occurring.Entropy coding comprises huffman coding (Huffmancoding) and run length coding (run-length encoding).After entropy coding, these data are by 215 processing of spatial decoder (spatial decoder), and it has following advantage, and in fact, pixel contiguous in the figure is identical or relevant usually, so as long as difference is encoded.In this one exemplary embodiment, spatial decoder 215 comprises inverse quantizer (inverse quantizer) 220, with inverse discrete cosine conversion (IDCT) function 230.The output of IDCT function 230 can be considered figure (235), is made up of the number pixel.

Figure 235 is treated to less sub-block, is called big square.H.264 the video compression standard is used the big block sizes of 16 * 16 pixels, and other compression standard can be used other size.Big squares in the figure 235 combine with the information of early decoding figure item, are called inter-picture prediction (inter prediction) and handle, or combine with the information of other big square of figure 235, are called intra-frame prediction (intraprediction) processing.This incoming bit stream 205 is decoded by entropy decoder 205, and uses between picture or intra-frame prediction according to all types of figures.

When using inter-picture prediction, entropy decoder 210 produces motion-vector (motion vector) 245 outputs.Motion-vector 245 is used to temporary transient coding, and it has following advantage, and in fact, many pixels have identical value in a series of figure usually.Change from a figure to another figure is to be encoded to motion-vector 245.Motion compensation square 250 is predicted figure (265) in conjunction with motion-vector 245 to produce with one or more early decoding figures 255.When using inter-picture prediction, information and the generous agllutination in the figure 235 that space compensation square 270 will derive from contiguous big square close to produce prediction figure (275).

Colligator 280 is with the output addition of figure 235 with mode selector (mode selector) 285.Mode selector 285 uses the entropy decoding bit stream to compensate the prediction figure (275) that square 270 is produced to judge prediction figure (265) or usage space that colligator 280 uses motion compensation square 250 to produce.

Coded program causes as along the discontinuous of generous block edge and the discontinuous product in sub-square edge (artifact) in the big square.The result " edge " occurred (edge) at the decoding picture frame, and does not originally have.Deblocking effect filter 290 be applied to by colligator 280 output in conjunction with figure, to remove these edge products.This decoding figure 295 that storage is produced by deblocking effect filter ensuing figure that is used for decoding.

In conjunction with the discussion of Fig. 1, partial decoding of h device 160 is carried out on primary processor 110, and decoder 160 also has the advantage that the video assisted instruction is provided by Graphics Processing Unit 120.Especially, in certain embodiments, deblocking effect filter 290 uses the one or more instructions that provided by Graphics Processing Unit 120 to be used for realizing using the filtering of low relatively computing cost.

Deblocking effect filter

Deblocking effect filter 290 is multiple unit filter (multi-tap filter), and it adjusts the pixel value at sub-square edge based on the neighborhood pixels value.Can use the different embodiment of deblocking effect filter 290 according to the compression standard that decoder 160 is implemented.Each standard is used different filter parameters, for example the size of sub-block, the number of pixels that is upgraded by this filtering running, the frequency (for example every N row or every M are capable) that this filter is used.In addition, each standard is used different filter length structures.It will be understood by a person skilled in the art that the multiple unit filter, the structure of discrete cell is not discussed at this.Deblocking effect filter embodiment by the VC-1 regulation and stipulation will illustrate in conjunction with Fig. 4.At first, the sub-square pixel arrangement of VC-1 filter will illustrate in conjunction with Fig. 3.

Fig. 3 shows two contiguous 4x4 squares (310,320), is defined as row R1-R4 and row C1-C8.Vertical boundary 330 between these two sub-squares is along row C4 and C5.This VC-1 filter is to each 4x4 square running.For leftmost sub-square, the predetermined sets of pixels (P1, P2, P3) of this VC-1 filter check in predetermined column (3).If should predetermined sets of pixels reach specific criteria, then upgrade another pixel P4 in the identical predetermined column.This standard is to be decided with particular set relatively by the calculating of pixel in this predetermined group.It will be understood by a person skilled in the art that these calculate and be one group of filter unit (a set oftaps) more, and detailed calculating combine Fig. 5 discussion after a while with general relatively.Updating value is also based on to the performed computing of pixel in the predetermined group.This VC-1 filter is handled rightmost sub-square with analog form, judges whether

pixel

6,7,8 reaches standard, then upgrades P5 if reach this standard.In other words, this VC-1 filter is a group intended pixel-edge pixel P4 of a predetermined column (R3) and the P5-value evaluation according to other group intended pixel in the same row, and the value of P4 is according to P1, P2, P3, and the value of P5 is according to P6, P7, P8.

The same cluster intended pixel of this VC-1 all the other row of renewal with good conditionsi is values of being calculated according to for the predetermined sets of pixels (edge pixel P4, P5) of this predetermined column (R3).Thus, the P4 among the R1 has upgraded based on the P1 among the R1, P2, P3, yet only has P4, P5 among the R3 to upgrade.Similarly, the P5 among the R1 has upgraded based on the P6 among the R1, P7, P8, yet only has P4, P5 among the R3 to upgrade.The 2nd row are also handled in a similar manner with the 4th row.

From another point of view, filtered or upgraded in some pixels of predetermined tertial pixel, when when tertial other pixel reaches standard.This filter involves these other pixel execution is compared and calculating.If when tertial other pixel does not reach this standard, be with analog form filtering, as mentioned above in corresponding each pixel of all the other row.Some embodiment at the deblocking effect filter 290 of this exposure use an initiative technology, earlier to the 3rd row filtering, then again to other row filtering.These initiative technology will be in conjunction with Fig. 4,5,6A-6D, more detailed description.

Although Fig. 3 illustrates rows of processing vertical edges, those skilled in the art should understand same figure revolves and an every trade processing horizontal edge also can be described after turning 90 degrees.Although those skilled in the art also can recognize that VC-1 uses the 3rd row in four row to have ready conditions as judgement and upgrades the predetermined column of other row, principle in this exposure also can be applied to the embodiment (for example first row, secondary series etc.) that uses other predetermined column, also can be applied to form other different embodiment of sub-square column number.Similarly, upgrade the value of pixel to set desire, also can be applied to the embodiment that other pixel has been verified and other pixel has been set in the principle of this exposure although those skilled in the art also can recognize the value of the contiguous one group of pixel of VC-1 check.With regard to an example, can check P2 and P3 to judge the updating value of P4.Another example, P3 can set according to the value of P2 and P4.

Video accelerator module 150 in the Graphics Processing Unit 120 is deblocking filter in the loop (IDF, inloop deblockging filter), for example by deblocking effect filter in the loop of VC-1 standard, realizes hardware-accelerated logical circuit.The Graphics Processing Unit instruction realizes this hardware-accelerated logical circuit, will be in the back explanation.The known method of realizing deblocking effect filter in the VC-1 loop is each row of parallel processing/OK, is respectively row/row execution at sub-square because same pixel is calculated.To the 4x4 square filtering of two vicinities but need promote gate (increased gate count) carry out weekly by the phase for this known method.Relative, be to handle the 3rd row/row pixel earlier by the hardware-accelerated logical circuit 400 employed initiative methods of deblocking effect filter in the VC-1 loop, and if these pixels reach this desired standard, then sequential processes be left that three be listed as/OK.This initiative method is used less logic gate number than known method, and it duplicates the function of each row/row.Deblocking effect filter acceleration logic circuit 400 is listed as 4 * 4 sub-square filtering of each cycle of processing to two vicinities in proper order in the VC-1 loop.This long filtering time is consistent with the instruction cycle of Graphics Processing Unit 120, and wherein this known method filtering faster is in fact also fast than required speed, causes the waste on the gate.

Fig. 4 is the tabulation of the hardware description pseudo-code of the hardware-accelerated logical circuit 400 of deblocking effect filter in the VC-1 loop.Though non-use actual hardware descriptive language (HDL, hardware descriptionlanguage), for example Verilog and VHDL and use pseudo-code, it is quite familiar that those skilled in the art tackle these pseudo-codes.These people should understand when describing with actual HDL, and these procedure codes should be compiled and then synthesize several gate configurations of component part video accelerator module 150.These people should recognize that these gates can various technology realize, for example application-specific integrated circuit (ASIC) (ASIC), programmable gate array (PGA) or field programming logic gate array (FPGA).

410 sections of this procedure code is module definition (module definition).The hardware-accelerated logical circuit 400 of deblocking effect filter has many input parameters in the VC-1 loop.The sub-square that carries out filtering is by this square parameter (Block parameter) institute standard.If vertical parameter (Vertical parameter) is true (True), then this acceleration logic circuit 400 is considered as 4 * 8 squares (referring to Fig. 3) with the square parameter, and carries out vertical edge filtering.If vertical parameter is false (False), then this acceleration logic circuit 400 is considered as 8 * 4 squares (referring to Fig. 3) with the square parameter, and the executive level edge filter.

The section 420 of procedure code begins iterative cycles (iteration loop), sets the value of this loop parameter variable.By this circulation time, loop parameter is made as 3 for the first time, so handle the 3rd row earlier.It is 1,2 and 4 that follow-up loop iteration is set loop parameter.Utilize these parameters, the hardware-accelerated logical circuit 400 of deblocking effect filter repeats 4 times in the VC-1 loop, handles 8 pixels at every turn, and wherein delegation can be a horizontal row or a vertical row, and each row is handled (referring to Fig. 5) by row acceleration logic circuit 500.In certain embodiments, this journey acceleration logic circuit 500 is to realize with the HDL secondary module, will illustrate in conjunction with Fig. 5.

The vertical parameter of section 430 tests is carried out horizontal or vertical edge filter to judge.According to this result, 8 elements of row array variable are the capable initialization from the row of this 4 * 8 input square or 8x4 input square.

Section 440 judges by loop parameter and 3 is compared whether the 3rd row is handled.If loop parameter is 3, two control variables in addition, ProcessingPixel3 and FILTER_OTHER_3 then are made as very.If loop parameter is not 3, ProcessingPixel3 is made as very.

Section 450 illustrates another HDL module, VC1_IDC_Filter_Line, and this filter is used present row.(described in conjunction with Fig. 3, this line filter is based on neighborhood pixels value updating edge pixel value.) provide to the parameter of this submodule and comprise this control variables ProcessingPixel3, FILTER_OTHER_3 and loop parameter.In one embodiment, the hardware-accelerated logical circuit 400 of deblocking effect filter has extra input parameter in the VC-1 loop, quantized value, and this quantization parameter also offers this submodule.

After submodule was handled these row, the hardware-accelerated logical circuit 400 of deblocking effect filter continued this iterative cycles at section 420 with the loop parameter updating value in the VC-1 loop.Method is used this filter to the 3rd row of input square according to this, then the 1st row, the 2nd row, the 4th row.

Fig. 5 is the tabulation of the hardware description language procedure code of row acceleration logic circuit 500, and it has realized above-mentioned submodule.The section 510 of procedure code is a module definition.Row acceleration logic circuit 500 has many input parameters.The row that will carry out filtering is to be defined as capable input parameter.ProcessingPixel3 is an input parameter, if the behavior the 3rd goes or the 3rd row then are made as it very by the higher level logical circuit.Parameter F ILTER_OTHER_3 is made as very by the higher level logical circuit, and is adjusted by row acceleration logic circuit 500 according to pixel value.

Section 520 carry out as VC-1 fixed various pixel value computings.(, will these computings not elaborated because this calculating can be understood with reference to the standard of VC-1.) the ProcessingPixel3 parameter that provided by the hardware-accelerated logical circuit 400 of deblocking effect filter in the higher level VC-1 loop of section 530 test.If ProcessingPixel3 is true, then section 530 is initialized as default value with control variables DO_FILTER, and is true.The various results of the computing in the middle of section 520 are used for judging whether also to handle other 3 row.If this Pixel calcualting result represents not handle other 3 row, then DO_FILTER is made as vacation.

If ProcessingPixel3 is false, section 540 uses input parameter FILTER_OTHER_3 (being set by the hardware-accelerated logical circuit 400 of deblocking effect filter in the higher level VC-1 loop) to set the value of DO_FILTER.If DO_FILTER is true, section 550 is tested these DO_FILTER variablees and is upgraded this edge pixel P4, the P5 (referring to Fig. 3) of this row variable.

Section 560 these ProcessingPixel3 parameters of test, and suitably upgrade FILTER_OTHER_3.This FILTER_OTHER_3 variable is the state information that is used for passing on different examples in this module.If ProcessingPixel3 is true, then section 550 upgrades this FILTER_OTHER_3 parameter with the value of DO_FILTER.This technology makes the higher level module (being VC1_InloopFilter) be used for illustrating this module provide FILTER_OTHER_3 value that the VC_1_INLOOPFILTER_LINE modules at lower layers of example thus upgraded to another routine VC_1_INLOOPFILTER_LINE.

The pseudo-code that it will be understood by a person skilled in the art that Fig. 5 can be synthesized the gate layout that realizes row acceleration logic circuit 500 to produce in every way.Wherein a kind of layout is to illustrate in Fig. 6 A, and they constitute the calcspar of capable acceleration logic circuit 500 together.Those skilled in the art should feel familiar to deblocking effect filter algorithm and logic circuit structure in the VC-1 loop.Therefore, the element of Fig. 6 A will not describe in detail.And will select to describe in detail the feature of row acceleration logic circuit 500.

It will be understood by a person skilled in the art that the computing that deblocking effect filter involved in the VC-1 loop comprises following, wherein P1-P8 is meant the position of pixel in processed row/row.

A0＝(2*(P3-P6)-5*(P4-P5)+4)＞＞3

A1＝(2*(P1-P4)-5*(P2-P3)+4)＞＞3

A2＝(2*(P5-P8)-5*(P6-P7)+4)＞＞3

clip＝(P4-P5)/2

In preceding 3 computings each involves 3 subtractions, 2 multiplication, 1 addition and 1 and moves to right.The part of the capable acceleration logic circuit 500 among Fig. 6 A uses the shared logic circuit to calculate A0, A1, A2 in proper order, but not uses specific separate logic square for A0, A1, A2.By avoiding the logical circuit square to repeat, utilize multiplexer to handle each input in proper order, reduced gate and/or power consumption.

Multiplexer 605,610 and 620 is to be used for selecting different inputs from pixel buffer P-8 in the different sequential cycle, and these inputs provide to each shared logic circuit box.

Logical circuit square

625 and 630 is respectively carried out subtraction.Logical circuit square 635 multiply by 2 by execution 1 realization that moves to left.Multiply by be by move to left 1 carry out, the back connects adder 645.To the move to left output, constant 4 of device 635 and negatives of 645 outputs of adder 650 add together.At last, logical circuit square 655 is carried out and is moved to right 3.

In the 1st sequential cycle, input T=1 provides to each multiplexer 605,610 and 615, and calculates the value of A1 and have buffer 660.In the 2nd sequential cycle, input T=2 provides to each multiplexer 605,610 and 615, and calculates the value of A2 and have buffer 665.In the 3rd sequential cycle, input T=3 provides to each multiplexer 605,610 and 615, and calculates the value of A0 and have buffer 670.Exist value A1, A2, the A3 of buffer 660,665,670 to be used by the part row acceleration logic circuit 500 of Fig. 6 B, will be in the back explanation.The output of the output of P4 buffer (671) and P5 buffer (673) will be used by the part row acceleration logic circuit 500 of Fig. 6 C, will be in the back explanation.

Those skilled in the art also should be appreciated that the extra computing of chatting after deblocking effect filter involves in the VC-1 loop:

D＝5*((sign(A0)*A3)-A0)/8

if(CLIP＞0)

{

if(D＜0)

D＝0

if(D＞CLIP)

D＝CLIP

}

else

{

if(D＞0)

D＝0

if(D＜CLIP)

D＝CLIP

}

The part row acceleration logic circuit 500 of Fig. 6 B receives input from the part row acceleration logic circuit 500 of Fig. 6 A, and calculates D (675).Referring again to Fig. 6 A, CLIP (677) is that following generation: pixel P4 and P5 are subtracted each other by logical circuit square 679, and this result moves to right (integer division is with 2) to produce CLIP 677 by logical circuit square 680.Get back to Fig. 6 B, A1 can obtain from buffer 660 in the period 1, and A2 can obtain from buffer 665 in second round, and A0 can obtain from buffer 670 in the period 3.Thereby in the period 4, the part row acceleration logic circuit 500 of the 6th figure calculates D (675) according to above-mentioned equation.

Row acceleration logic circuit 500 utilizes (675) to upgrade the location of pixels of P4, P5.Especially, P4=P4-D and P5=P5+D.Although Fig. 6 is A, Fig. 6 B before illustrated in conjunction with single row/row (for example single group of location of pixels P0-P8) that the computing meeting of a sub-block the 3rd row/row influenced the behavior of other 3 row/row of this sub-block.Row acceleration logic circuit 500 utilizes an initiative method to realize this behavior.When independent filtering operation from the foremost-abreast-finish,, be shown in the position that part row acceleration logic circuit 500 selections with good conditionsi of Fig. 6 C, Fig. 6 D will be upgraded in conjunction with the explanation of Fig. 6 A, Fig. 6 B.In other words, the hardware-accelerated logical circuit 400 of deblocking effect filter judges it is that value is originally write back or new value is write back in the VC-1 loop.Relatively, known method, deblocking effect filter uses circulation in the VC-1 loop, so independent filtering operation is carried out conditionally.

As previously described, the pseudo-code of Fig. 4 interpreting line acceleration logic circuit 500 so running in circulation: in repeating section 420, example section (instantiation section) 450 occurred.The example of this layman's acceleration logic circuit 500 is used 2 parameters, ProcessingPixel3 and FILTER_OTHER_3.The following execution pixel of these parameters P4, P5 renewal with good conditionsi with row acceleration logic circuit 500.Referring to Fig. 6 C, buffer P4 writes the result of subtracter 681, and wherein subtracter 681 is input as P4 (671), is 0 or D (675), decides according to the value of DO_FILTER (683).Similarly, buffer P5 writes the result of adder 685, and wherein adder 685 is input as P5 (673), is 0 or D (675), decides according to the value of DO_FILTER (683).Thereby the updating value of P4 is P4 value originally (if DO_FILTER is for false), or P4-D.Similarly, the updating value of P5 is P5 value originally (if DO_FILTER is for false), or P5+D.

Those skilled in the art should recognize that when processing one sub-square the 3rd was listed as, the standard of upgrading P4 with P4-D was:

((ABS(A0)＜PQUANT)OR(A3＜ABS(A0))OR(CLIP！＝0)

DO_FILTER 683 is by checking the part row acceleration logic circuit 500 of these conditions to be calculated among Fig. 6 D.Multiplexer 687 provides one to input to OR door 697, if ABS (A0)＜PQUANT then selects true output, other then is false.Multiplexer 689 provides another to input to OR door 697, if A3＜ABS (A0) then selects true output, other then is false.Multiplexer 691 provides another to input to OR door 697, if CLIP! Select true output for=0, other then is false.

DO_FILTER 683 is provided by multiplexer 693, and it utilizes control input Processing_Pixel_3 (695) to select the output or the input signal FILTER_OTHER_3 (699) of output OR door 697.The pseudo-code that input Processing_Pixel_3 (695) before combined Fig. 4 with FILTER_OTHER_3 (699) and illustrated the hardware-accelerated logical circuit 400 of deblocking effect filter in the higher level VC-1 loop of capable acceleration logic circuit 500 illustrated.Get back to Fig. 4, when handling the 3rd row/row (the 1st circle), Processing_Pixel_3 (695) is made as very, and other then is false.Based on the condition about PQUANT, ABS (A0), CLIP, record intermediate variable DO_FILTER is no matter whether P4/P5 upgrades.The value of last FILTER_OTHER_3 (699) is to establish from this centre parameter DO_FILTER.The result of the capable acceleration logic circuit 500 of the logical circuit of Fig. 6 C, Fig. 6 part is, in per 4 cycles, is made as filtered value (according to variablees such as A0-A3, PQUANT, CLIP) or writes the value of its script once more at the location of pixels of P4, the P5 of 4 adjacent column/row.

This VC-1 deblocking effect accelerator module 400 adopts parallel and combining in proper order in a creative way, as previously mentioned.Parallel processing provides execution faster and reduces and postpones.Although parallelization has increased logic gate number, recruitment is offset by aforesaid processing in proper order.Do not use the aforementioned known method of handling in proper order to increase logic gate number on foot.

Some embodiment of Graphics Processing Unit 120 comprise and are used for the H.264 hardware-accelerated unit of deblocking effect, and this deblocking effect function is for use by the Graphics Processing Unit instruction.Graphics Processing Unit 120 will describe in detail in conjunction with Fig. 8, and the Graphics Processing Unit of strengthening explanation and providing deblocking effect H.264 to quicken function is instructed special selection.

Graphic process unit

The principle of multiple deblocking effect instruction

The instruction set of Graphics Processing Unit 120 is included in the partial decoding of h device of carrying out in the software 160 and can be used to quicken deblocking effect filter.Illustrate that at this initiative technology provides not only one multiple graphics processing unit instruction to quicken specific deblocking effect filter.Deblocking effect filter 290 is exactly in proper order originally in the loop, thereby specific filter must be with a graded to pixel filter (for example H.264 regulation be from left to right followed from top to bottom).Thereby previous pixel that filter or that upgraded is brought as input when the pixel of filter back.Master processor processes is stored in the pixel value of known as memory device, and this makes pixel read one by one, write.Yet this essence in proper order can't suitably cooperate when deblocking effect filter in the loop 290 uses Graphics Processing Unit accelerating part Filtering Processing.Known Graphics Processing Unit is stored in texture quick with pixel and gets (texture cache), and the design of this gpu pipeline is not deferred to one by one (back-to-back) and read, writes texture quick and get.

Provide the instruction of multiple graphics processing unit at these some embodiment that disclose Graphics Processing Unit 120, it can be used for quickening specific deblocking effect filter together.The some of them instruction is got texture quick when the pixel data source, and some instructions use the Graphics Processing Unit performance element as data source.Pixel is read, writes in deblocking effect filter 290 suitable these different Graphics Processing Unit instructions that are used in combination in the loop to reach one by one.Next the summary description data of Graphics Processing Unit 120 of flowing through then are provided by deblocking effect assisted instruction again that provided by Graphics Processing Unit 120, with 290 these instructions of utilization of deblocking effect filter in the loop.

Graphics Processing Unit stream

Fig. 7 is the figure of Graphics Processing Unit 120 data flow, and wherein instruction stream is the arrow by Fig. 7 left side, and image or graphical stream are to be represented by the arrow on the right.Fig. 7 has omitted several elements well known by persons skilled in the art, and these are inessential to deblocking effect feature in the loop of explaining Graphics Processing Unit 120.Instruction stream processor 710 receives instruction 720 from the system bus (not shown), and this instruction of decoding, and produces director data 730, for example vertex data.Graphics Processing Unit 120 is supported known graphics process instruction, and accelerated video encoding and/or decoded instruction.

Known graphics process instruction involves as vertex coloring (vertex shading), how much painted (geometry shading), the painted difficult problems such as (pixel shading) of pixel.Therefore, director data 730 is the ponds (pool) 740 that are applied to tinter performance element (shader execution units).Necessary texture filtering unit (TFU, the texture filter unit) 750 that use of painted performance element is to apply texture to pixel.Data texturing is to take from texture quick soon to get 760, and it is in main storage (not shown) back.

Video accelerator 150 is given in some instructions, and its running will be in the back explanation.The data that produce are then handled by back wrapper (post-packer 770), and it compresses this data.In reprocessing (post-processing) afterwards, the data that produced by the video accelerator module provide to pool of execution units (execution unit pool) 740.

The execution of encoding and decoding of video assisted instruction, for example aforesaid deblocking effect filter command, different with aforesaid known graphics command in many aspects.At first, the video assisted instruction is to be carried out by video accelerator module 150, but not the tinter performance element.Secondly, the video assisted instruction is not used its data texturing.

Yet employed view data of video assisted instruction and the employed data texturing of graphics command are 2 dimension arrays.Graphics Processing Unit 120 is utilized this advantage equally, uses texture filtering unit 750 to download the view data of giving video accelerator module 150, thereby makes texture quick get 760 to get some view data by 150 runnings of video accelerator module soon.Therefore, be shown in Fig. 7, video accelerator module 150 is between texture filtering unit 750 and back wrapper 770.

750 checks of texture filtering unit are from instructing 720 director datas 730 that capture.Director data 730 also provides TFU 750 texture quicks to get the coordinate of the view data of wanting in 760.In one embodiment, these coordinates are marked as U, V is right, and those skilled in the art tackle this and are familiar with.When instruction 720 when being the video assisted instruction, the director data that is captured also orders texture filtering unit 750 to skip over texture filter (not shown) in the texture filtering unit 750.

Method according to this, texture filtering unit 750 are to be subjected to handle for the video assisted instruction to go the download images data to video accelerator module 150.Video accelerator module 150 receives view data from the texture filtering unit 750 on the data path, with the order data 730 on the order path, and according to 730 pairs of these view data execution of order data running.By 150 output image datas of video accelerator module is to feed back to pool of execution units 740, after being handled by back wrapper 770.

The deblocking effect instruction

At the embodiment of the Graphics Processing Unit 120 of this narration, provide the VC-1 deblocking effect filter and H.264 deblocking effect filter is hardware-accelerated.The VC-1 deblocking effect filter is to be quickened by Graphics Processing Unit instruction (" IDF_VC-1 "), and H.264 deblocking effect filter instructs (" IDF_H264_0 ", " IDF_H264_1 ", " IDF_H264_2 ") to quicken by three Graphics Processing Unit.

As previously described, each Graphics Processing Unit instruction is that decoding and analysis (parsed) are director data 730, and the specific set of parameters that it can be considered each instruction is shown in the 1st table.Some shared parameters are shared in IDF_H264_x instruction, and other for each instruction exclusive.It will be understood by a person skilled in the art that these parameters can use various command codes (opcode) and instruction form coding, so these subjects under discussion will be in this discussion.

The 1st table: the parameter of IDF_H264 instruction

Parameter	Size	Operand	Narration
Parameter	Size	Operand	Narration	FieldFlag (Input)	The 1-position	If FieldFlag==1 is Field Picture then, other Frame Picture
TopFieldFlag (Input)	The 1-position		If TopFieldFlag==1 is Top-Field-Picture then, other Bottom-Field-Picture is if set FieldFlag.	FieldFlag (Input)	The 1-position	If FieldFlag==1 is Field Picture then, other Frame Picture

PictureWidth (Input)	The 16-position		For example, be used for 1920 of HDTV
PictureWidth (Input)	The 16-position		For example, be used for 1920 of HDTV	PictureHeight (Input)	The 16-position		For example, be used for 1080 of 30P HDTV
YC Flag	The 1-position	Control-2	Or chroma plane, Y plane	PictureHeight (Input)	The 16-position		For example, be used for 1080 of 30P HDTV
YC Flag	The 1-position	Control-2	Or chroma plane, Y plane	Field Direction	The 1-position	Control-1
CBCR Flag	The 1-position	Control-1	Cb or Cr	Field Direction	The 1-position	Control-1
CBCR Flag	The 1-position	Control-1	Cb or Cr	BaseAddress (Input)	The 32-position is signless		Be used for IDF_H64_0 and IDF_H64_0: the sub-square base address of texture storage device
BlockAddress (Input)	13.3 form omits fractional part	SRC1[0:1 5]＝U SRC1[31: 16]＝V	Be used for IDF_H64_0: the texture coordinate of whole sub-square (about base address)	BaseAddress (Input)	The 32-position is signless
				For IDF_H64_1: the texture coordinate of remaining sub-square (about base address)
			Do not use at IDF_H64_2
			Do not use at IDF_H64_2	DataBlock1	4 * 4 * 8-position		Do not use at IDF_H64_0
SRC2[127 :0]	Be used for IDF_H64_1: first of sub-square or left side, according to FilterDirection according to the Control2 parameter coding						Do not use at IDF_H64_0
SRC2[127 :0]		SRC2[127 :0]	Be used for IDF_H64_2: first (even number) buffer is right
DataBlock2	4 * 4 * 8-position	SRC2[127 :0]	Be used for IDF_H64_2: first (even number) buffer is right				In IDF_H64_0 o or IDF_H64_1, do not use
		SRC2[255: 128]	Be used for IDF_H64_2: second (odd number) buffer is right				In IDF_H64_0 o or IDF_H64_1, do not use
		SRC2[255: 128]	Be used for IDF_H64_2: second (odd number) buffer is right	Sub-block (Output)	The 128-position		The sub-square of 8 * 4 * 8-bit of deblocking effect (128-position)

Be used in combination many input parameters to judge 4 * 4 square addresses that captured by texture filtering unit 750.The BaseAddress parameter is pointed out the starting point of this data texturing in texture quick is got.Give the BaseAddress parameter with upper left square coordinate in this zone.PictureHeight and PictureWidth input parameter are the scopes that is used for judging this square, i.e. the lower left coordinate.At last, video and graphic can be gradual scanning (progessive) or interlacing scan (interlace).If interlacing scan, it forms (top and below) by both direction.Texture filtering unit 750 uses FieldFlag and TopFieldFlag with suitable processing horizontally interlaced image.

8 * 4 * 8 outputs of deblocking effect provide in the target buffer, and also write back pool of execution units 740.It is " location updating (modify in place) " runnings that deblocking effect output is write back pool of execution units 740, in the realization of some decoder, be necessary, the pixel value in the square wherein H.264 for example, the right and below are to calculate according to previous result.Yet the VC-1 decoder is unlike H.264 this restriction relation is arranged.In VC-1, to each 8x8 border (earlier vertically level) again filtering.All vertical edges can thereby be carried out the filtering after a while of 4 * 4 edges in fact abreast.Can utilize parallelization because only there are two pixels (edge one) to be updated, and these pixels are not used for calculating other edge.Since the deblocking effect data are to write back pool of execution units 740 but not texture quick gets 760, different IDF_H264_x instructions is provided, this sub-square is captured from diverse location.This can the 1st the table in see, in the narration of BlockAddress, Data Block1 and Data Block2 parameter.The IDF_H264_0 instruction is got whole 8 * 4 * 8 seat squares of 760 acquisitions from texture quick.IDF_H264_1 instruction is got half sub-square of 760 acquisitions and from half of pool of execution units 740 acquisition from texture quick.

The function that the IDF_H264_x that becomes with decoder 160 instructs will describe in detail in conjunction with Fig. 8.Next be described in the supply pixel data to before the video accelerator module 150, the processing of the pixel data that texture filtering unit 750 and pool of execution units 740 conversions are captured.

The conversion of view data

Above-mentioned order parameter provides desire to get 760 or separate the coordinate of the sub-square address of getting to texture filtering unit 750 from pool of execution units 740 from texture quick.View data comprises brightness (Y) and chroma (Cb, Cr) plane.The definition of YC flag input parameter will be handled Y plane or CbCr plane.

When handling brightness (Y) data, indicate as YC flag parameter, texture filtering unit 750 these sub-squares of acquisition also provide these 128 inputs as the hardware-accelerated logical circuit 400 of deblocking effect filter in the VC-1 loop (for example square input parameter of the VC-1 accelerator example of Fig. 4).The data that produced are to write the target buffer as 4 groups-buffer (register quad, that is, DST, DST+1, DST+2, DST+3).

When handling the chroma data, indicate as YC flag parameter, Cb and Cr square will be handled continuously by the hardware-accelerated logical circuit 400 of deblocking effect filter in the VC-1 loop.The data that produced are to write texture quick to get 760.In certain embodiments, this write operation took place in each cycle, and each cycle writes 256.

Some video accelerator module embodiment use interlacing scan CbCr plane, respectively save as a half width and half length.In these embodiments, interlacing scan is separated to the buffer that is used for linking up texture filtering unit 750 and video accelerator module 150 for video accelerator module 150 with the sub-square data of CbCr in texture filtering unit 750.Especially, texture filtering unit 750 writes this buffer with 24 * 4 Cb squares, then 24 * 4 Cr squares is write this buffer.8 * 4 Cb squares are at first handled by the hardware-accelerated logical circuit 400 of deblocking effect filter in the VC-1 loop, and the data that produced write texture quick and get 760.Then, 8 * 4 Cr squares are handled by the hardware-accelerated logical circuit 400 of deblocking effect filter in the VC-1 loop, and the data that produced write texture quick and get 760.Video accelerator module 150 uses CbCr flag parameter to handle in proper order to manage this.

Software decoder uses the deblocking effect instruction

In conjunction with the explanation of previous Fig. 1, decoder 160 is carried out on primary processor 110 but the video assisted instruction of also utilizing Graphics Processing Unit 120 to be provided.Especially H.264 the embodiment of deblocking effect filter 290 uses specific ID F_H264_x in conjunction with to handle the edge in the loop, complies with the H.264 order of defined, gets 760 acquisition one a little squares and captures other from pool of execution units 740 from texture quick.Suitably in conjunction with under, these IDF_H264_x instructions reach one by one that pixel reads and writes.

Fig. 8 is the calcspar that is used for 16 * 16 big squares H.264.This big square cuts into 16 4x4 squares, and each all will carry out deblocking effect.4 sub-squares among Fig. 8 can according to row and row definition (R1 for example, C2).H.264 definition is handled vertical edge earlier at the processing horizontal edge, edge order (a-h) as shown in Figure 8.

Therefore, this deblocking effect filter is the edge that is applied between an antithetical phrase square, and sub-square is to order filtering according to this:

edge a＝[block to left of R1，C1]|[R1，C1]；[block to left of R2，C1]|[R2，C1]

[block to left of R3，C1]|[R3，C1]；[block to left of R4，C1]|[R4，C1]

edge b＝[R1，C1]|[R2，C2]；[R2，C1]|[R2，C2]；

[R3，C1]|[R3，C2]；[R4，C1]|[R4，C2]；

edge c＝[R1，C2]|[R2，C3]；[R2，C2]|[R2，C3]；

[R3，C2]|[R3，C3]；[R4，C2]|[R4，C3]；

edge d＝[R1，C3]|[R2，C4]；[R2，C3]|[R2，C4]；

[R3，C3]|[R3，C4]；[R4，C3]|[R4，C4]；

edge e＝[block to top of R1，C1]|[R1，C1]；[block to topof R1，C2]|[R1，C2]；

[block to top of R1，C3]|[R1，C3]；[block to top ofR1，C4]|[R1，C4]

edge f＝[R1，C1]|[R2，C1]；[R1，C2]|[R2，C2]；

[R1，C3]|[R2，C3]；[R1，C4]|[R2，C4]

edge g＝[R2，C1]|[R3，C1]；[R2，C2]|[R3，C2]；

[R2，C3]|[R3，C3]；[R2，C4]|[R3，C4]

edge h＝[R3，C1]|[R4，C1]；[R3，C2]|[R4，C2]；

[R3，C3]|[R4，C3]；[R3，C4]|[R4，C4]

For the 1st antithetical phrase square, all download and get 760, because also there is not pixel to be changed because of using filter from texture quick.(in fact R1, pixel value C1), the 2nd row vertical edge share all pixels with the 1st row vertical edge although the filter of the 1st vertical edge (a) can change.Therefore, the 2nd antithetical phrase square (edge b) is also downloaded from texture quick and is got 760.Since the vertical edge between two adjacent column is not shared pixel, the 3rd to (edge c) and the 4th to (edge d) sub-square also together.

Judge and to download pixel data from that position by the specific ID F_H264_x instruction that deblocking effect filter in the loop 290 is sent.Order by 290 employed the 1st group of vertical edges of IDF_H264_x instruction process (a-d) of deblocking effect filter in the loop is:

IDF_H264_0 SRC1＝address of(R1，C1)；

IDF_H264_0 SRC1＝address of(R2，C1)；

IDF_H264_0 SRC1＝address of(R3，C1)；

IDF_H264_0 SRC1＝address of(R4，C1)；

Next, deblocking effect filter 290 is handled the 2nd vertical edge (b) in the loop, from (R1, C2) beginning.Be defined as (R1, C2) 4 pixels of Far Left and (R1, C1) the rightmost pixel overlapping of sub-square in 8 * 4 sub-squares.These by (R1, vertical edge filter c1) is handled, also may upgrade, the overlapping pixel be thereby read from pool of execution units 740 but not texture quick gets 760.Yet, in that (R1, C2) rightmost 4 pixels of sub-square also do not have filteredly, thereby read to get 760 from texture quick.Sub-square (R2, C2) arrive (R4, C2) also together.The order of deblocking effect filter 290 by ordering following IDF_H264_x is to handle the 2nd group of vertical edge, to finish this result in the loop:

IDF_H264_1 SRC1＝address of (R1，C2)；

IDF_H264_1 SRC1＝address of (R2，C2)；

IDF_H264_1 SRC1＝address of (R3，C2)；

IDF_H264_1 SRC1＝address of (R4，C2)；

When handling the 3rd group of vertical edge, from (R1, C3) beginning.(R1, C3) in the 8x4 square 4 pixels of Far Left with (R1, C2) the rightmost pixel of sub-square overlaps, thereby will read from pool of execution units 740 but not texture quick gets 760.

Yet, in that (R1, C2) rightmost 4 pixels of sub-square also do not have filteredly, thereby read to get 760 from texture quick.Sub-square (R1, C2) arrive (R4, C2) also together.Similar situation can take place in last group vertical edge.Therefore, deblocking effect filter 290 is left 2 groups of vertical edges by the order of ordering following IDF_H264_x to handle in the loop:

IDF_H264_1 SRC1＝address of(R1，C3)；

IDF_H264_1 SRC1＝address of(R2，C3)；

IDF_H264_1 SRC1＝address of(R3，C3)；

IDF_H264_1 SRC1＝address of(R4，C3)；

IDF_H264_1 SRC1＝address of(R1，C4)；

IDF_H264_1 SRC1＝address of(R2，C4)；

IDF_H264_1 SRC1＝address of(R3，C4)；

IDF_H264_1 SRC1＝address of(R4，C4)；

Follow processing horizontal edge (e-h).At this moment, deblocking effect filter has been applied to each the sub-square in the big square, thereby each pixel may be upgraded.Therefore, sending to each the sub-square that carries out horizontal edge filtering is to read from pool of execution units 740 but not texture quick gets 760.Therefore, in the loop order of deblocking effect filter 290 IDF_H264_x below order with the processing horizontal edge:

IDF_H264_2 SRC1＝address of(R1，C1)；

IDF_H264_2 SRC1＝address of(R2，C1)；

IDF_H264_2 SRC1＝address of(R3，C1)；

IDF_H264_2 SRC1＝address of(R4，C1)；

IDF_H264_2 SRC1＝address of(R1，C2)；

IDF_H264_2 SRC1＝address of(R2，C2)；

IDF_H264_2 SRC1＝address of(R3，C2)；

IDF_H264_2 SRC1＝address of(R4，C2)；

IDF_H264_2 SRC1＝address of(R1，C3)；

Square in any program description or the flow chart should be understood that representation module, section or subprogram sign indicating number, and it comprises the one or more executable instruction of the step that is used for realizing particular logic circuit function or program.The technical staff who is familiar with software department should recognize that other implementation method also is contained in the disclosed scope.In other implementation method, shown in each function can be disobeyed or the order that discloses carry out, comprise and carry out in fact synchronously or reverse carrying out, decide according to related function.

Can software, hardware or it is in conjunction with realization at the System and method for of this exposure.In certain embodiments, this system and/or method are existing the software in the memory to realize, and by the suitable processor that is arranged in calculation element performed (comprise and be not limited to microprocessor, microcontroller, network processing unit, can ressemble processor, extendible processor).In other embodiments, this system and/or method are to realize with logical circuit, comprise and be not limited to programmable logic device (PLD, programmable logic device), programmable gate array (PGA, programmable gate array), field programmable gate array (FPGA, field programmable gate array) or special circuit (ASIC).In other embodiments, these logical statements are to finish in graphic process unit or Graphics Processing Unit (GPU).

Can be embedded into any computer-readable media and use at the System and method for of this exposure, or link order executive system, unit.This instruction execution system comprises any system based on computer, contain the system of processor or other can be from this instruction execution system acquisition and the system that carries out these instructions.Disclosed literal " computer-readable media (computer-readable medium) " can be and anyly can hold, stores, links up, transmits or transmit this program as the instrument that uses or be connected with this instruction execution system.This computer-readable media can be, and for example (unrestricted) is system or the transmission medium based on electronics, magnetic, light, electromagnetism, ultrared or semiconductor technology.

Use the particular example (unrestricted) of the computer-readable media of electronic technology to comprise: to have the line that one or more electrical (electronics) connects; Random access memory (RAM, random access memory); Read-only memory (ROM, read-only memory); Can wipe programmable read only memory (EPROM or flash memory) away.Use the particular example (unrestricted) of the computer-readable media of magnetic technology to comprise: the portable computers disk.Use the particular example (unrestricted) of the computer-readable media of optical tech to comprise: optical fiber and portability read-only optical disc (CD-ROM).

Though the present invention illustrates and describes as embodiment with one or more specific example at this, details shown in but should not limiting the invention to, however still can not deviate under the spirit of the present invention and in the field of claim scope equalization and scope, realize many different modifications and structural change.Therefore, preferably explain, before claim scope subsequently, propose this statement with the claim of being enclosed ground in extensive range and with the method that meets field of the present invention.

Claims

1. deblocking effect filter that is used for video decode comprises:

First logical circuit is used for judging whether the pixel of the intended pixel group in a plurality of pixel groups reaches standard;

Second logical circuit is arranged to when reaching this standard, and elder generation is to this intended pixel group's pixel filter; And

The 3rd logical circuit is arranged to when reaching this standard, according to the respective sets filter unit in many group filter units, in proper order to each remaining in these a plurality of pixel groups pixel group filtering.

2. deblocking effect filter according to claim 1, wherein these a plurality of pixel groups form the square pixel square, and each pixel group of these a plurality of pixel groups comprises a row pixel square.

3. deblocking effect filter according to claim 1, wherein these a plurality of pixel groups form the square pixel square, and each pixel group of these a plurality of pixel groups comprises the one-row pixels square.

4. deblocking effect filter according to claim 1, wherein the 3rd logical circuit also comprises:

The 4th logical circuit is arranged to be updated in the first intended pixel group in remaining each pixel group according to the second intended pixel group in remaining each pixel group.

5. deblocking effect filter according to claim 1, this second logical circuit also comprises:

The 5th logical circuit is arranged in when reaching this standard, and elder generation is to this intended pixel group's pixel filtering abreast.

6. deblocking effect filter according to claim 1, wherein this deblocking effect filter be applied to sub-square to the edge to remove the edge product.

7. deblocking effect filter according to claim 1, wherein this deblocking effect filter uses a plurality of graphics process instruction of suitable combination pixel reads and writes to reach one by one.

8. deblocking effect filter according to claim 1, wherein this deblocking effect filter is to define according to the VC-1 standard.

9. a Video Decoder comprises:

Entropy decoder receives the input coding bit stream;

Spatial decoder receives the output of this entropy decoder and the encoded picture that generation comprises a plurality of pixels;

First logical circuit is arranged to combine picture with predicted pictures to produce in conjunction with present picture; And deblocking effect filter in the loop, receiving this in conjunction with picture, deblocking effect filter comprises in this loop:

Second logical circuit is arranged to the filtering to the intended pixel group; And

The 3rd logical circuit is arranged to when this intended pixel group reaches standard, according to the respective sets filter unit in many group filter units, to each remaining in these a plurality of pixel groups pixel group filtering.

10. Video Decoder according to claim 9, wherein these a plurality of pixel groups form the square pixel square, and each pixel group of these a plurality of pixel groups comprises a row pixel square.

11. Video Decoder according to claim 9, wherein these a plurality of pixel groups form the square pixel square, and each pixel group of these a plurality of pixel groups comprises the one-row pixels square.

12. Video Decoder according to claim 9, wherein this second logical circuit also comprises:

The 4th logical circuit is arranged in when reaching this standard, and elder generation is to this intended pixel group's pixel filtering abreast.

13. Video Decoder according to claim 9, wherein the 3rd logical circuit also comprises:

The 5th logical circuit is arranged to be updated in the first intended pixel group in remaining each pixel group according to the second intended pixel group in remaining each pixel group.

14. Video Decoder according to claim 9, wherein this blocking artifact filter is to define according to the VC-1 standard.

15. a Graphics Processing Unit comprises:

Main Processing Interface receives at least one video assisted instruction; And

The video accelerator module is used for this at least one video assisted instruction of response, and this video accelerator module comprises deblocking effect filter in the loop, and deblocking effect filter comprises in this loop:

First logical circuit is arranged to judge whether the intended pixel group's of a plurality of pixel groups pixel reaches first standard;

Second logical circuit is arranged to when reaching this first standard, and elder generation is to this intended pixel group's pixel filter; And

The 3rd logical circuit is arranged to when reaching this first standard, according to the respective sets filter unit in many group filter units, in proper order to pixel group filtering remaining in these a plurality of pixel groups.

16. Graphics Processing Unit according to claim 15, wherein these a plurality of pixel groups form the square pixel square, and each pixel group of these a plurality of pixel groups comprises a row pixel square.

17. Graphics Processing Unit according to claim 15, wherein these a plurality of pixel groups form the square pixel square, and each pixel group of these a plurality of pixel groups comprises the one-row pixels square.

18. Video Decoder according to claim 15, wherein the 3rd logical circuit also comprises:

19. Video Decoder according to claim 15, wherein this blocking artifact filter is to define according to the VC-1 standard.