CN101072351B - Systems and methods of video compression deblocking - Google Patents

Systems and methods of video compression deblocking Download PDF

Info

Publication number
CN101072351B
CN101072351B CN2007101103594A CN200710110359A CN101072351B CN 101072351 B CN101072351 B CN 101072351B CN 2007101103594 A CN2007101103594 A CN 2007101103594A CN 200710110359 A CN200710110359 A CN 200710110359A CN 101072351 B CN101072351 B CN 101072351B
Authority
CN
China
Prior art keywords
pixel
square
filter
logical circuit
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2007101103594A
Other languages
Chinese (zh)
Other versions
CN101072351A (en
Inventor
扎伊尔德·荷圣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Publication of CN101072351A publication Critical patent/CN101072351A/en
Application granted granted Critical
Publication of CN101072351B publication Critical patent/CN101072351B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)
  • Image Generation (AREA)

Abstract

An exemplary video decoder comprises: an entropy decoder; a spatial decoder; combining logic; and an inloop deblocking filter. The entropy decoder receives an incoming coded bit stream. The spatial decoder receives the output of the entropy encoder and produces an encoded picture comprising a plurality of pixels. The combining logic combines a current picture with a prediction picture to produce a combined picture. The inloop deblocking filter receives the combined picture. The inloop deblocking filter comprises: logic configured to filter a predefined pixel group; and logic configured to filter each of the remaining pixel groups in the plurality after the predefined pixel group, according to a corresponding set of taps in a plurality of sets of taps, if the predefined pixel group meets acriteria.

Description

Deblocking effect filter and Video Decoder and GPU
Technical field
The invention relates to image compression and decompression, and especially about having the GPU of image compression and decompression characteristic.
Background technology
Personal computer and consumption electronic products are to be used for various amusement articles.These amusement articles can roughly be divided into 2 types: those of the drawing that uses a computer (computer-generated graphics), for example computer game; With use those of compressed video data stream (compressed video stream); For example pre-record program to digital video disk (DVD) (DVD), or provide digital program (digital programming) to STB (set-top box) by cable TV or satellite dealer.The 2nd kind also comprises the coding simulation video data stream, for example performed by digital VTR (DVR, digital video recorder).
Computer graphics is produced by GPU (GPU, graphic processing unit) usually.GPU is a kind of a kind of special microprocessor on computer game platform (computer game consoles) and some personal computers that is based upon.GPU is to be optimized to carrying out fast to describe three-dimensional space basic object (three-dimensional primitive objects), for example triangle, quadrangle etc.These basic objects are to describe with a plurality of summits, and wherein each summit has attribute (for example color), and can apply texture (texture) to this basic object.The result who describes is dual space pel array (two-dimensional array of pixels), is presented on the display or monitor of computer.
The encoding and decoding of video data stream involves different types of computing; For example, discrete cosine transform (discrete cosine transform), mobile estimating (motion estimation), motion compensation (motion compensation), deblocking effect filter (deblocking filter).These calculate usually and combine special hardware logic electric circuit by general service central processing unit (CPU), and for example application-specific integrated circuit (ASIC) (ASIC, application specific integrated circuit) is handled.Consumer thereby a plurality of calculate platforms of needs are to satisfy their amusement demand.Thereby need can the process computer drawing and the single computing platform of encoding and decoding of video.
Summary of the invention
At the embodiment of this exposure a kind of System and method for that is used for the video compression deblocking effect is provided.A kind of exemplary deblocking effect filter that is used for video decode comprises: be arranged to be used for judge whether the intended pixel crowd's of a plurality of pixel groups pixel reaches the logical circuit of condition; Be arranged to when reaching this condition, elder generation is to the logical circuit of this intended pixel crowd's pixel filter; And be arranged to when reaching this condition; According to the respective sets filter unit in many group filter units (set of taps); In proper order to the logical circuit of pixel group filtering remaining in these a plurality of pixel groups; Wherein this condition is decide by predetermined calculating and set relatively, calculating that this is predetermined and be one group of filter unit relatively.
A kind of exemplary Video Decoder comprises: deblocking effect filter in entropy decoder, spatial decoder, combinational logic circuit and the loop.This entropy decoder receives the input coding bit stream.This spatial decoder receives the output of this entropy decoder and the encoded picture that generation comprises a plurality of pixels.This combinational logic circuit combines present picture to combine picture with predicted pictures to produce.Deblocking effect filter receives and should combine picture in this loop.Deblocking effect filter comprises in this loop: be arranged to the logical circuit to intended pixel crowd filtering; And be arranged to when this intended pixel crowd reaches condition; According to the respective sets filter unit in many group filter units; Logical circuit to each remaining in a plurality of pixel groups pixel group filtering; Wherein this condition is decide by predetermined calculating and set relatively, calculating that this is predetermined and be one group of filter unit relatively.
A kind of exemplary figure processing unit comprises main Processing Interface and video accelerator module.This main Processing Interface receives at least one video assisted instruction.This video accelerator module is used for this at least one video assisted instruction.This video accelerator module comprises deblocking effect filter in the loop.Deblocking effect filter comprises in this loop: be arranged to judge whether the intended pixel crowd's of a plurality of pixel groups pixel reaches the logical circuit of first condition; Be arranged to when reaching this first condition, elder generation is to the logical circuit of this intended pixel crowd's pixel filter; And be arranged to when reaching this first condition; According to the respective sets filter unit in many group filter units (set of taps); In proper order to the logical circuit of pixel group filtering remaining in these a plurality of pixel groups; Wherein this condition is decide by predetermined calculating and set relatively, calculating that this is predetermined and be one group of filter unit relatively.
Description of drawings
Fig. 1 is the calcspar that is used for the exemplary calculate platform of figure and video coding and/or decoding.
Fig. 2 is the calcspar of this Video Decoder 160 among Fig. 1.
Fig. 3 explains the sub-square pixel setting of VC-1 filter.
Fig. 4 is the tabulation of the hardware description pseudo-code of the hardware-accelerated logical circuit 400 of deblocking effect filter in Fig. 1 VC-1 loop.
Fig. 5 is the tabulation of the hardware description language procedure code of the capable acceleration logic circuit 500 of Fig. 4.
Fig. 6 A to Fig. 6 D forms the calcspar of the capable acceleration logic circuit of Fig. 4, Fig. 5.
Fig. 7 is the data flowchart of the GPU 120 of Fig. 1.
Fig. 8 is the calcspar of the H.264 used big square of 16x16.
[main element label declaration]
100~system, 110~general service CPU, 120~graphic process unit (GPU), 130~memory, 140~bus, 150~video accelerator module (VPU), 160~software decoder, 170~video accelerator actuator.
Bit stream, 210~entropy decoder, 215~spatial decoder, 220~inverse quantizer, the conversion of 230~inverse discrete cosine, 235~figure, 245~motion-vector, 250~motion compensation, 255~early decoding figure, 265~prediction figure, 270~space compensation, 280~adder, the 290~deblocking effect filter, 295~decoding figure of 205~input.
310-320~two a contiguous 4x4 square, 330~vertical boundary.
The hardware-accelerated logical circuit of deblocking effect filter, 410~module definition section, the vertical parameter section of 420~iterative cycles section, 430~test, 440~comparison loop parameter and 3 sections, 450~example section in 400~loop.
500~row acceleration logic circuit, 510~module definition section, 520~pixel value computing section, 530~comparison loop parameter and 3 sections, 540~test DO_FILTER section, 550~update mode section.
605-610-615-620~multiplexer, 625-630-679~subtracter, 635-640-655-680~logical circuit square, 645-650~adder, 660-665-670~buffer, the output of 671~P4 buffer, the output of 673~P5 buffer.681~subtracter, 685~adder.687-689-691-693~multiplexer, 697~OR door.
710~instruction stream processor, 720~instruction, 730~director data, 740~pool of execution units, 750~texture filtering unit, 760~texture quick are got, 770~back wrapper.
Embodiment
The calculate platform that is used for encoding and decoding of video
Fig. 1 is the calcspar that is used for the exemplary calculate platform of figure and video coding and/or decoding.System 100 comprises general service CPU110 (after this being called primary processor), graphic process unit (GPU) 120, memory 130 and bus 140.GPU 120 comprises video accelerator module (VPU) 150, but its accelerated video encoding and/or decoding, will narration in the back.It is the instruction that can on GPU 120, carry out that the video of GPU 120 quickens function.
Software decoder 160 is arranged in memory 130 with video accelerator actuator 170, and the decoder 160 of at least a portion is carried out on primary processor 110 with video accelerator actuator 170.Through a host interface 180 that is provided by video accelerator actuator 170, decoder 160 also can send the video assisted instruction to GPU 120.Thus; System 100 carries out video coding and/or decoding for the main processor software (host processor software) of GPU 120 through sending the video assisted instruction, and GPU 120 is through these instructions of part response of accelerating decoder 160.
In certain embodiments, only have a fraction of decoder 160 on primary processor, to carry out, and most decoder 160 is to be carried out by GPU 120, under the few overload of driver.Method according to this, the intensive computing square that often is performed (computationally intensive blocks) is unloaded to GPU 120, and more complex calculations are performed by primary processor 110.In certain embodiments; An intensive calculation function of being realized by GPU 120 comprises the hardware-accelerated logic device of deblocking effect filter in the loop (inloop deblocking filter hardwareacceleration logic) 400; Also be called blocking artifact filter 400 or deblocking effect filter 400 in the loop, it will combine Fig. 4 explanation after a while.The example of another intensive calculation function is a boundary intensity (BS, boundary strength) of judging each filter.
Above-mentioned structure thereby make following running flexible: on primary processor 110, decoder 160 is carried out some through big square (marcoblock) being carried out the specific function (for example deblocking effect or computation bound intensity) of coloring process (shader program); Or on GPU 120, carry out most decoder 160, utilize pipelining (pipelining) and parallelization (parallelism).In the embodiment that some decoders 160 are carried out on GPU 120, it is synchronous thread (thread) between these decoder 160 each aspects that this deblocking effect is handled.
Omit several among Fig. 1 and quicken characteristic and inessential and well known elements well known to those skilled in the art for the video of explaining GPU 120.
Video Decoder
Fig. 2 is the calcspar of this Video Decoder 160 among Fig. 1.In specific embodiments illustrated in fig. 2, decoder 160 is used H.264 video compression standard of ITU.Yet those skilled in the art should recognize that the decoder 160 of Fig. 2 is preliminary expressions of Video Decoder, and this Video Decoder is also explained the running that is similar to other type of decoder H.264, for example SMPTE VC-1 and MPEG-2 standard.In addition; Although be shown the part of GPU 120; Those skilled in the art also should be appreciated that the partial decoding of h device 160 in this exposure also can be implemented in outside the GPU for example self-existent logical circuit, the part of application-specific integrated circuit (ASIC) (ASIC) etc.
The bit stream 205 of input is at first by 210 processing of entropy decoder (entropy decoder).Entropy coding has the advantage of statistics repeat type (statistic redundancy): some patterns more often occur than other pattern, so normal just using than short sign indicating number representative of occurring.Entropy coding comprises huffman coding (Huffmancoding) and run length coding (run-length encoding).After entropy coding, these data are by 215 processing of spatial decoder (spatial decoder), and it has following advantage, and in fact, pixel contiguous in the figure is identical or relevant usually, so as long as difference is encoded.In this example embodiment, spatial decoder 215 comprises inverse quantizer (inverse quantizer) 220, with inverse discrete cosine conversion (IDCT) function 230.The output of IDCT function 230 can be considered figure (235), is made up of the number pixel.
Figure 235 is treated to less sub-block, is called big square.H.264 the video compression standard is used the big block sizes of 16x16 pixel, and other compression standard can be used other size.Big squares in the figure 235 combine with the information of early decoding figure item, are called inter-picture prediction (inter prediction) processing, or combine with the information of other big square of figure 235, are called intra-frame prediction (intraprediction) processing.This incoming bit stream 205 by entropy decoder 205 decodings, and is used between picture or intra-frame prediction according to all types of figures.
When using inter-picture prediction, entropy decoder 210 produces motion-vector (motion vector) 245 outputs.Motion-vector 245 is used to temporary transient coding, and it has following advantage, and in fact, many pixels have identical value in a series of figure usually.Change from a figure to another figure is to be encoded to motion-vector 245.Motion compensation square 250 combines motion-vector 245 to produce prediction figure (265) in one or more early decoding figures 255.When using inter-picture prediction, space compensation square 270 will derive from the information and the generous bonded in the figure 235 of contiguous big square and predict figure (275) to produce.
Colligator 280 is with the output addition of figure 235 with mode selector (mode selector) 285.Mode selector 285 uses the entropy decoding bit stream to compensate the prediction figure (275) that square 270 is produced to judge prediction figure (265) or usage space that colligator 280 uses motion compensation square 250 to produce.
Coded program causes as along the discontinuous of generous block edge and the discontinuous product in sub-square edge (artifact) in the big square.The result " edge " occurred (edge) at the decoding picture frame, and does not originally have.Deblocking effect filter 290 is the combination figures that are applied to by colligator 280 outputs, to remove these edge products.This decoding figure 295 that storage is produced by deblocking effect filter ensuing figure that is used for decoding.
In conjunction with the discussion of Fig. 1, partial decoding of h device 160 is carried out on primary processor 110, and decoder 160 also has the advantage that the video assisted instruction is provided by GPU 120.Especially, in certain embodiments, deblocking effect filter 290 uses the one or more instructions that provided by GPU 120 to be used for realizing using the filtering of low relatively computing cost.
Deblocking effect filter
Deblocking effect filter 290 is multiple unit filter (multi-tap filter), and it adjusts the pixel value at sub-square edge based on the neighborhood pixels value.Can use the different embodiment of deblocking effect filter 290 according to the compression standard that decoder 160 is implemented.Each standard is used different filter parameter, for example the size of sub-block, the number of pixels that is upgraded by this filtering running, the frequency (for example every N row or every M are capable) that this filter is used.In addition, each standard is used different filter length structures.It will be understood by a person skilled in the art that the multiple unit filter, the structure of discrete cell is not discussed at this.Deblocking effect filter embodiment by the VC-1 regulation and stipulation will combine Fig. 4 explanation.At first, the sub-square pixel arrangement of VC-1 filter will combine Fig. 3 explanation.
Fig. 3 shows two contiguous 4x4 squares (310,320), is defined as row R1-R4 and row C1-C8.Vertical boundary 330 between these two sub-squares is along row C4 and C5.This VC-1 filter is to each 4x4 square running.For leftmost sub-square, the predetermined sets of pixels (P1, P2, P3) of this VC-1 filter check in predetermined column (3).If should predetermined sets of pixels reach specified conditions, then upgrade another pixel P4 in the identical predetermined column.This condition is to be decided with particular set relatively by the calculating of pixel in this predetermined group.It will be understood by a person skilled in the art that these calculate with more also but be one group of filter unit (a set oftaps), and detailed calculating with relatively will combine Fig. 5 discussion after a while.Updating value is also based on to the performed computing of pixel in the predetermined group.This VC-1 filter is handled rightmost sub-square with analog form, judges whether pixel 6,7,8 reaches condition, then upgrades P5 if reach this condition.In other words, this VC-1 filter is a group intended pixel-edge pixel P4 and the P5-of a predetermined column (R3) the value evaluation according to other group intended pixel in the same row, and the value of P4 is according to P1, P2, P3, and the value of P5 is according to P6, P7, P8.
The same cluster intended pixel of this VC-1 all the other row of renewal with good conditionsi is values of being calculated according to for the predetermined sets of pixels (edge pixel P4, P5) of this predetermined column (R3).Thus, the P4 among the R1 has upgraded based on the P1 among the R1, P2, P3, yet only has P4, P5 among the R3 to upgrade.Likewise, the P5 among the R1 has upgraded based on the P6 among the R1, P7, P8, yet only has P4, P5 among the R3 to upgrade.The 2nd row are also handled with the 4th row in a similar manner.
From another point of view, in some pixels of predetermined tertial pixel by filtering or upgraded, when when tertial other pixel reaches condition.This filter involves these other pixel execution is compared and calculating.If when tertial other pixel does not reach this condition, be with analog form filtering, as stated in corresponding each pixel of all the other row.Use an initiative technology at some embodiment of the deblocking effect filter 290 of this exposure, earlier to the 3rd row filtering, then again to other row filtering.These initiative technology will combine Fig. 4,5,6A-6D, more detailed explanation.
Although Fig. 3 explains rows of processing vertical edges, those skilled in the art should understand same figure revolves and an every trade processing horizontal edge also can be described after turning 90 degrees.Although those skilled in the art also can recognize that VC-1 uses the 3rd row in four row to have ready conditions as judgement and upgrades the predetermined column of other row; Principle in this exposure also can be applied to the embodiment (for example first row, secondary series etc.) that uses other predetermined column, also can be applied to form other different embodiment of sub-square column number.Likewise, upgrade the value of pixel to set desire, also can be applied to the embodiment that other pixel has been verified and other pixel has been set in the principle of this exposure although those skilled in the art also can recognize the value of the contiguous one group of pixel of VC-1 check.With regard to an example, can check P2 and P 3 to judge the updating value of P4.Another example, P3 can set according to the value of P2 and P4.
Video accelerator module 150 in the GPU 120 is deblocking filter in the loop (IDF, inloop deblockging filter), for example by deblocking effect filter in the loop of VC-1 standard, realizes hardware-accelerated logical circuit.The GPU instruction realizes this hardware-accelerated logical circuit, will be in the back explanation.The known method of realizing deblocking effect filter in the VC-1 loop is each row of parallel processing/OK, be the respectively row/row execution at sub-square because same pixel is calculated.To the 4x4 square filtering of two vicinities but need promote gate (increased gate count) carry out weekly by the phase for this known method.Relative, be to handle the 3rd row/row pixel earlier by the hardware-accelerated logical circuit 400 employed initiative methods of deblocking effect filter in the VC-1 loop, and if these pixels reach this desired condition, then sequential processes be left that three be listed as/OK.This initiative method is used less logic gate number than known method, and it duplicates the function of each row/row.Deblocking effect filter acceleration logic circuit 400 is listed as the 4x4 square filtering of each cycle of processing to two vicinities in proper order in the VC-1 loop.This long filtering time is consistent with the instruction cycle of GPU 120, and wherein this known method filtering faster is in fact also fast than required speed, causes the waste on the gate.
Fig. 4 is the tabulation of the hardware description pseudo-code of the hardware-accelerated logical circuit 400 of deblocking effect filter in the VC-1 loop.Though non-use actual hardware descriptive language (HDL, hardware descriptionlanguage), for example Verilog and VHDL and use pseudo-code, it is quite familiar that those skilled in the art tackle these pseudo-codes.These people should understand when describing with actual HDL, and these procedure codes should be compiled and then synthesize several gates configurations of component part video accelerator module 150.These people should recognize that these gates can various technology realize, for example application-specific integrated circuit (ASIC) (ASIC), programmable gate array (PGA) or field programming logic gate array (FPGA).
410 sections of this procedure code is module definition (module definition).The hardware-accelerated logical circuit 400 of deblocking effect filter has many input parameters in the VC-1 loop.The sub-square that carries out filtering is by this square parameter (Block parameter) institute standard.If vertical parameter (Vertical parameter) is true (True), then this acceleration logic circuit 400 is regarded as 4x8 square (referring to Fig. 3) with the square parameter, and carries out vertical edge filtering.If vertical parameter is false (False), then this acceleration logic circuit 400 is regarded as 8x4 square (referring to Fig. 3) with the square parameter, and the executive level edge filter.
The section 420 of procedure code begins iterative cycles (iteration loop), sets the value of this loop parameter variable.Through this circulation time, loop parameter is made as 3 for the first time, so handle the 3rd row earlier.It is 1,2 and 4 that follow-up loop iteration is set loop parameter.Utilize these parameters, the hardware-accelerated logical circuit 400 of deblocking effect filter repeats 4 times in the VC-1 loop, handles 8 pixels at every turn, and wherein delegation can be a horizontal row or a vertical row, and each row is to handle (referring to Fig. 5) by row acceleration logic circuit 500.In certain embodiments, this journey acceleration logic circuit 500 is to realize with the HDL secondary module, will combine Fig. 5 explanation.
The vertical parameter of section 430 tests is carried out horizontal or vertical edge filter to judge.According to this result, 8 elements of row array variable are the capable initialization from the row of this 4x8 input square or 8x4 input square.
Section 440 compares to determine the 3rd capable whether the processing through loop parameter and 3 is done.If loop parameter is 3, two control variables in addition, ProcessingPixel 3 and FILTER_OTHER_3 then are made as very.If loop parameter is not 3, ProcessingPixel 3 is made as very.
Section 450 illustrates another HDL module, VC1_IDC_Filter_Line, and this filter is used present row.(combine Fig. 3 said, this line filter is based on neighborhood pixels value updating edge pixel value.) provide to the parameter of this submodule and comprise this control variables ProcessingPixel 3, FILTER_OTHER_3 and loop parameter.In one embodiment, the hardware-accelerated logical circuit 400 of deblocking effect filter has extra input parameter in the VC-1 loop, quantized value, and this quantization parameter also offers this submodule.
After the submodule processing should be listed as, the hardware-accelerated logical circuit 400 of deblocking effect filter continued this iterative cycles at section 420 with the loop parameter updating value in the VC-1 loop.Method is used this filter to the 3rd row of input square according to this, then the 1st row, the 2nd row, the 4th row.
Fig. 5 is the tabulation of the hardware description language procedure code of row acceleration logic circuit 500, and it has realized above-mentioned submodule.The section 510 of procedure code is a module definition.Row acceleration logic circuit 500 has many input parameters.The row that will carry out filtering is to be defined as capable input parameter.ProcessingPixel 3 is input parameters, if the behavior the 3rd goes or the 3rd row then are made as it very through the higher level logical circuit.Parameter F ILTER_OTHER_3 is made as very by the higher level logical circuit, and is adjusted by row acceleration logic circuit 500 according to pixel value.
Section 520 carry out as VC-1 fixed various pixel value computings.(, will these computings not elaborated because this calculating can be understood with reference to the standard of VC-1.) ProcessingPixel 3 parameters that provided by the hardware-accelerated logical circuit 400 of deblocking effect filter in the higher level VC-1 loop of section 530 test.If ProcessingPixel 3 is true, then section 530 is initialized as default value with control variables DO_FILTER, and is true.The various results of the computing in the middle of section 520 are used for judging whether also to handle other 3 row.If this Pixel calcualting result representes not handle other 3 row, then DO_FILTER is made as vacation.
If ProcessingPixel 3 is false, section 540 uses input parameter FILTER_OTHER_3 (being set by the hardware-accelerated logical circuit 400 of deblocking effect filter in the higher level VC-1 loop) to set the value of DO_FILTER.If DO_FILTER is true, section 550 is tested these DO_FILTER variablees and is upgraded this edge pixel P4, the P5 (referring to Fig. 3) of this row variable.
Section 560 these ProcessingPixel 3 parameters of test, and suitably upgrade FILTER_OTHER_3.This FILTER_OTHER_3 variable is the state information that is used for passing on different examples in this module.If ProcessingPixel 3 is true, then section 550 upgrades this FILTER_OTHER_3 parameter with the value of DO_FILTER.This technology makes the higher level module (being VC1_InloopFilter) be used for explaining this module provide FILTER_OTHER_3 value that the VC_1_INLOOPFILTER_LINE modules at lower layers of example thus upgraded to another routine VC_1_INLOOPFILTER_LINE.
The pseudo-code that it will be understood by a person skilled in the art that Fig. 5 can be synthesized the gate layout that realizes row acceleration logic circuit 500 to produce in every way.Wherein a kind of layout is in Fig. 6 A, to explain, they constitute the calcspar of capable acceleration logic circuit 500 together.Those skilled in the art should feel familiar to deblocking effect filter algorithm and logic circuit structure in the VC-1 loop.Therefore, the element of Fig. 6 A will not detail.And will select to detail the characteristic of row acceleration logic circuit 500.
It will be understood by a person skilled in the art that the computing that deblocking effect filter involved in the VC-1 loop comprises following, wherein P1-P8 is meant the position of pixel in the row/row that is processed.
A0=(2*(P3-P6)-5*(P4-P5)+4)>>3
A1=(2*(P1-P4)-5*(P2-P3)+4)>>3
A2=(2*(P5-P8)-5*(P6-P7)+4)>>3
clip=(P4-P5)/2
In preceding 3 computings each involves 3 subtractions, 2 multiplication, 1 addition and 1 and moves to right.The part of the capable acceleration logic circuit 500 among Fig. 6 A uses the shared logic circuit to calculate A0, A1, A2 in proper order, but not uses specific separate logic square for A0, A1, A2.Through avoiding the logical circuit square to repeat, utilize multiplexer to handle each input in proper order, reduced gate and/or power consumption.
Multiplexer 605,610 and 620 is to be used for selecting different inputs from pixel buffer P-8 in the different sequential cycle, and these inputs provide to each shared logic circuit box. Logical circuit square 625 and 630 is respectively carried out subtraction.Logical circuit square 635 multiply by 2 through execution 1 realization that moves to left.Multiply by be by move to left 1 carry out, the back connects adder 645.To the move to left output, constant 4 of device 635 and the negative of 645 outputs of adder 650 adds together.At last, logical circuit square 655 is carried out and is moved to right 3.
In the 1st sequential cycle, input T=1 provides to each multiplexer 605,610 and 615, and calculates the value of A1 and have buffer 660.In the 2nd sequential cycle, input T=2 provides to each multiplexer 605,610 and 615, and calculates the value of A2 and have buffer 665.In the 3rd sequential cycle, input T=3 provides to each multiplexer 605,610 and 615, and calculates the value of A0 and have buffer 670.Exist value A1, A2, the A3 of buffer 660,665,670 to be used by the part row acceleration logic circuit 500 of Fig. 6 B, will be in the back explanation.The output of the output of P4 buffer (671) and P5 buffer (673) will be used by the part row acceleration logic circuit 500 of Fig. 6 C, will be in the back explanation.
Those skilled in the art also should be appreciated that the extra computing of chatting after deblocking effect filter involves in the VC-1 loop:
D=5*((sign(A0)*A3)-A0)/8
if(CLIP>0)
{
if(D<0)
D=0
if(D>CLIP)
D=CLIP
}
else
{
if(D>0)
D=0
if(D<CLIP)
D=CLIP
}
The part row acceleration logic circuit 500 of Fig. 6 B receives input from the part row acceleration logic circuit 500 of Fig. 6 A, and calculates D (675).With reference to Fig. 6 A, CLIP (677) is that following generation: pixel P4 and P5 are subtracted each other by logical circuit square 679 once more, and this result moves to right (integer division is with 2) to produce CLIP 677 by logical circuit square 680.Get back to Fig. 6 B, A1 can obtain from buffer 660 in the period 1, and A2 can obtain from buffer 665 in second round, and A0 can obtain from buffer 670 in the period 3.Thereby in the period 4, the part row acceleration logic circuit 500 of the 6th figure calculates D (675) according to above-mentioned equation.
Row acceleration logic circuit 500 utilizes (675) to upgrade the location of pixels of P4, P5.Especially, P4=P4-D and P5=P5+D.Although Fig. 6 is A, the previous single row/row (for example single group of location of pixels P0-P8) that combines of Fig. 6 B explains that the computing meeting of a sub-block the 3rd row/row influences the behavior of other 3 row/row of this sub-block.Row acceleration logic circuit 500 utilizes an initiative method to realize this behavior.When independent filtering operation begin from the foremost-abreast-accomplish,, be shown in the position that part row acceleration logic circuit 500 selections with good conditionsi of Fig. 6 C, Fig. 6 D will be upgraded in conjunction with the explanation of Fig. 6 A, Fig. 6 B.In other words, the hardware-accelerated logical circuit 400 of deblocking effect filter judges it is that value is originally write back or new value is write back in the VC-1 loop.Relatively, known method, deblocking effect filter uses circulation in the VC-1 loop, so independent filtering operation is carried out conditionally.
Like previous explanation, the pseudo-code of Fig. 4 interpreting line acceleration logic circuit 500 is so running in circulation: in repetition section 420, example section (instantiation section) 450 occurred.The example of this layman's acceleration logic circuit 500 is used 2 parameters, ProcessingPixe13 and FILTER_OTHER_3.These parameters with row acceleration logic circuit 500 are carried out pixel P4, P5 renewal with good conditionsi as follows.Referring to Fig. 6 C, buffer P4 writes the result of subtracter 681, and wherein subtracter 681 is input as P4 (671), is 0 or D (675), decides according to the value of DO_FILTER (683).Likewise, buffer P5 writes the result of adder 685, and wherein adder 685 is input as P5 (673), is 0 or D (675), decides according to the value of DO_FILTER (683).Thereby the updating value of P4 is P4 value originally (if DO_FILTER is for false), or P4-D.Likewise, the updating value of P5 is P5 value originally (if DO_FILTER is for false), or P5+D.
Those skilled in the art should recognize that when processing one sub-square the 3rd was listed as, the condition of upgrading P4 with P4-D was:
((ABS(A0)<PQUANT)OR(A3<ABS(A0))OR(CLIP!=0)
DO_FILTER 683 is 500 calculating of part row acceleration logic circuit by these conditions of check among Fig. 6 D.Multiplexer 687 provides one to input to OR door 697, if ABS (A0)<PQUANT then selects true output, other then is false.Multiplexer 689 provides another to input to OR door 697, if A3<ABS (A0) then selects true output, other then is false.Multiplexer 691 provides another to input to OR door 697, if CLIP! Select true output for=0, other then is false.
DO_FILTER 683 is provided by 693 of multiplexers, and it utilizes control input Processing_Pixel_3 (695) to select the output or the input signal FILTER_OTHER_3 (699) of output OR door 697.The pseudo-code that input Processing_Pixel_3 (695) had before combined Fig. 4 with FILTER_OTHER_3 (699) and illustrated the hardware-accelerated logical circuit 400 of deblocking effect filter in the higher level VC-1 loop of capable acceleration logic circuit 500 was explained.Get back to Fig. 4, when handling the 3rd row/row (the 1st circle), Processing_Pixel_3 (695) is made as very, and other then is false.Based on the condition about PQUANT, ABS (A0), CLIP, record intermediate variable DO_FILTER is no matter whether P4/P5 upgrades.The value of last FILTER_OTHER_3 (699) is to establish the parameter DO_FILTER from this centre.The result of the capable acceleration logic circuit 500 of the logical circuit part of Fig. 6 C, Fig. 6 is in per 4 cycles, to be made as filtered value (according to variablees such as A0-A3, PQUANT, CLIP) or to write its value originally once more at the P4 of 4 adjacent column/row, the location of pixels of P5.
This VC-1 deblocking effect accelerator module 400 adopts parallel and combining in proper order in a creative way, as previously mentioned.Parallel processing provides execution faster and reduces and postpones.Although parallelization has increased logic gate number, recruitment is offset by aforesaid processing in proper order.Do not use the aforementioned known method of handling in proper order to increase logic gate number on foot.
Some embodiment of GPU 120 comprise and are used for the H.264 hardware-accelerated unit of deblocking effect, and this deblocking effect function is for use through the GPU instruction.GPU 120 will combine Fig. 8 to specify, and the GPU of strengthening explanation and providing deblocking effect H.264 to quicken function instructs special selection.
Graphic process unit
The principle of multiple deblocking effect instruction
The instruction set of GPU 120 is included in the partial decoding of h device of carrying out in the software 160 and can be used to quicken deblocking effect filter.Explain that at this initiative technology provides not only one multiple graphics processing unit instruction to quicken specific deblocking effect filter.Deblocking effect filter 290 is exactly in proper order originally in the loop, thereby specific filter must be with a graded to pixel filter (for example H.264 regulation be from left to right followed from top to bottom).Thereby previous pixel that filter or that upgraded is brought as input when filtering the back pixel.Master processor processes is stored in the pixel value of known as memory device, and this makes pixel read one by one, write.Yet this essence in proper order can't proper fit when deblocking effect filter in the loop 290 uses GPU accelerating part Filtering Processing.Known GPU is stored in texture quick with pixel and gets (texture cache), and the design of this gpu pipeline is not deferred to one by one (back-to-back) and read, writes texture quick and get.
At these some embodiment that disclose GPU 120 instruction of multiple graphics processing unit is provided, it can be used for quickening specific deblocking effect filter together.The some of them instruction is got texture quick when the pixel data source, and some instructions use the GPU performance element as data source.Pixel is read, writes in deblocking effect filter 290 suitable these different patterns processing unit instructions that are used in combination in the loop to reach one by one.Next the summary description data of GPU 120 of flowing through are then explained the deblocking effect assisted instruction that is provided by GPU 120 again, with 290 these instructions of utilization of deblocking effect filter in the loop.
GPU stream
Fig. 7 is the figure of GPU 120 data flow, and wherein instruction stream is the arrow by Fig. 7 left side, and image or graphical stream are to be represented by the arrow on the right.Fig. 7 has omitted several elements well known by persons skilled in the art, and these are inessential to deblocking effect characteristic in the loop of explaining GPU 120.Instruction stream processor 710 receives instruction 720 from the system bus (not shown), and decodes and should instruct, and produces director data 730, for example vertex data.GPU 120 is supported known graphics process instruction, and accelerated video encoding and/or decoded instruction.
Known graphics process instruction involves like vertex coloring (vertex shading), how much painted (geometry shading), the painted difficult problems such as (pixel shading) of pixel.Therefore, director data 730 is the ponds (pool) 740 that are applied to tinter performance element (shader execution units).Necessary texture filtering unit (TFU, the texture filter unit) 750 that use of painted performance element is to apply texture to pixel.Data texturing is to take from texture quick soon to get 760, and it is in main storage (not shown) back.
Video accelerator 150 is given in some instructions, and its running will be in the back explanation.The data that produce are then handled by back wrapper (post-packer 770), and it compresses these data.In reprocessing (post-processing) afterwards, the data that produced by the video accelerator module provide to pool of execution units (execution unit pool) 740.
The execution of encoding and decoding of video assisted instruction, for example aforesaid deblocking effect filter command, different with aforesaid known graphics command in many aspects.At first, the video assisted instruction is to be carried out by video accelerator module 150, but not the tinter performance element.Secondly, the video assisted instruction is not used its data texturing.
Yet employed view data of video assisted instruction and the employed data texturing of graphics command are 2 dimension arrays.GPU 120 utilizes this advantage equally, uses texture filtering unit 750 to download the view data of giving video accelerator module 150, thereby makes texture quick get 760 to get some view data by 150 runnings of video accelerator module soon.Therefore, be shown in Fig. 7, video accelerator module 150 is between texture filtering unit 750 and back wrapper 770.
Texture filtering unit 750 check is from instructing the director data 730 of 720 acquisitions.Director data 730 also provides TFU 750 texture quicks to get the coordinate of the view data of wanting in 760.In one embodiment, these coordinates are marked as U, V is right, and those skilled in the art tackle this and are familiar with.When instruction 720 was the video assisted instruction, the director data that is captured also ordered texture filtering unit 750 to skip over the texture filter (not shown) in the texture filtering unit 750.
Method according to this, texture filtering unit 750 are to receive to handle for the video assisted instruction to go the download images data to video accelerator module 150.Video accelerator module 150 receives view data from the texture filtering unit 750 on the data path, with the order data 730 on the order path, and according to 730 pairs of these view data execution of order data running.By 150 output image datas of video accelerator module is to feed back to pool of execution units 740, after being handled by back wrapper 770.
The deblocking effect instruction
At the embodiment of the GPU 120 of this narration, the VC-1 deblocking effect filter is provided and H.264 deblocking effect filter is hardware-accelerated.The VC-1 deblocking effect filter is to be quickened by GPU instruction (" IDF_VC-1 "), and H.264 deblocking effect filter instructs (" IDF_H264_0 ", " IDF_H264_1 ", " IDF_H264_2 ") to quicken by three GPUs.
Like previous explanation, each GPU instruction is decoding and analyzes (parsed) and be director data 730 that specific set of parameters that it can be considered each instruction is shown in the 1st table.Some shared parameters are shared in IDF_H264_x instruction, and other for each instruction exclusive.It will be understood by a person skilled in the art that these parameters can use various command codes (opcode) and instruction format encoded, so these subjects under discussion will be in this discussion.
The 1st table: the parameter of IDF_H264 instruction
Figure G071B0359420070628D000141
Be used in combination many input parameters to judge the 4x4 square address that is captured by texture filtering unit 750.The BaseAddress parameter is pointed out the starting point of this data texturing in texture quick is got.Give the BaseAddress parameter with upper left square coordinate in this zone.PictureHeight and PictureWidth input parameter are the scopes that is used for judging this square, i.e. the lower left coordinate.At last, video and graphic can be gradual scanning (progessive) or interlacing scan (interlace).If interlacing scan, it forms (top and below) by both direction.Texture filtering unit 750 uses FieldFlag and TopFieldFlag with suitable processing horizontally interlaced image.
The output of deblocking effect 8x4x8 position provides in the target buffer, and also writes back pool of execution units 740.It is " location updating (modify in place) " runnings that deblocking effect output is write back pool of execution units 740, in the realization of some decoder, is necessary, the pixel value in the square wherein H.264 for example, and the right and below are to calculate according to previous result.Yet the VC-1 decoder is unlike H.264 this restriction relation is arranged.In VC-1, to each 8x8 border (earlier vertically level) again filtering.All vertical edges can thereby be carried out the filtering after a while of 4x4 edge in fact abreast.Can utilize parallelization because only there are two pixels (edge one) to be updated, and these pixels are not used for calculating other edge.Since the deblocking effect data are to write back pool of execution units 740 but not texture quick gets 760, different IDF_H264_x instructions is provided, this sub-square is captured from diverse location.This can see in the 1st table, in the narration of BlockAddress, and Data Block 1 and Data Block 2 parameters.The IDF_H264_0 instruction is got the whole 8x4x8 of 760 acquisitions seat square from texture quick.IDF_H264_1 instruction is got 760 acquisition half subdetector squares and from half of pool of execution units 740 acquisition from texture quick.
The function that the IDF_H264_x that becomes with decoder 160 instructs will combine Fig. 8 to detail.Next be described in the supply pixel data to before the video accelerator module 150, the processing of the pixel data that texture filtering unit 750 and pool of execution units 740 conversions are captured.
The conversion of view data
Above-mentioned order parameter, provide desire from texture quick get 760 or the coordinate of separating the sub-square address of getting from pool of execution units 740 to texture filtering unit 750.View data comprises brightness (Y) and chroma (Cb, Cr) plane.The definition of YC flag input parameter will be handled Y plane or CbCr plane.
When handling brightness (Y) data; Indicated like YC flag parameter, texture filtering unit 750 these sub-squares of acquisition also provide these 128 inputs as the hardware-accelerated logical circuit 400 of deblocking effect filter in the VC-1 loop (the for example square input parameter of the VC-1 accelerator example of Fig. 4).The data that produced are to write the target buffer as 4 groups-buffer (register quad, that is, DST, DST+1, DST+2, DST+3).
When handling the chroma data, indicated like YC flag parameter, Cb and Cr square will be handled by the hardware-accelerated logical circuit 400 of deblocking effect filter in the VC-1 loop continuously.The data that produced are to write texture quick to get 760.In certain embodiments, this write operation took place in each cycle, and each cycle writes 256.
Some video accelerator module embodiment use interlacing scan CbCr plane, respectively save as a half width and half length.In these embodiment, interlacing scan is separated to the buffer that is used for linking up texture filtering unit 750 and video accelerator module 150 for video accelerator module 150 with the sub-square data of CbCr in texture filtering unit 750.Especially, texture filtering unit 750 writes this buffer with 2 4x4Cb squares, then 2 4x4Cr squares is write this buffer.The 8x4Cb square is at first handled by the hardware-accelerated logical circuit 400 of deblocking effect filter in the VC-1 loop, and the data that produced write texture quick and get 760.Then, the 8x4Cr square is handled by the hardware-accelerated logical circuit 400 of deblocking effect filter in the VC-1 loop, and the data that produced write texture quick and get 760.Video accelerator module 150 uses CbCr flag parameter to handle in proper order to manage this.
Software decoder uses the deblocking effect instruction
In conjunction with the explanation of previous Fig. 1, decoder 160 is carried out on primary processor 110 but the video assisted instruction of also utilizing GPU 120 to be provided.Especially H.264 the embodiment of deblocking effect filter 290 uses specific ID F_H264_x to combine to comply with the H.264 order of defined to handle the edge in the loop, gets 760 acquisition one a little squares and captures other from pool of execution units 740 from texture quick.Under suitably combining, these IDF_H264_x instructions reach one by one that pixel reads and writes.
Fig. 8 is the calcspar that is used for the big square of 16x16 H.264.This big square cuts into 16 4x4 squares, and each all will carry out deblocking effect.4 sub-squares among Fig. 8 can according to row and row definition (R1 for example, C2).H.264 definition is handled vertical edge earlier at the processing horizontal edge, edge order (a-h) as shown in Figure 8.
Therefore, this deblocking effect filter is the edge that is applied between an antithetical phrase square, and sub-square is to order filtering according to this:
edge?a=[block?to?left?of?R1,C1]|[R1,C1];[block?to?left?of?R2,C1]|[R2,C1]
[block?to?left?of?R3,C1]|[R3,C1];[block?to?left?of?R4,C1]|[R4,C1]
edge?b=[R1,C1]|[R2,C2];[R2,C1]|[R2,C2];
[R3,C1]|[R3,C2];[R4,C1]|[R4,C2];
edge?c=[R1,C2]|[R2,C3];[R2,C2]|[R2,C3];
[R3,C2]|[R3,C3];[R4,C2]|[R4,C3];
edge?d=[R1,C3]|[R2,C4];[R2,C3]|[R2,C4];
[R3,C3]|[R3,C4];[R4,C3]|[R4,C4];
edge?e=[block?to?top?of?R1,C1]|[R1,C1];[block?to?topof R1,C2]|[R1,C2];
[block?to?top?of?R1,C3]|[R1,C3];[block?to?top?ofR1,C4]|[R1,C4]
edge?f=[R1,C1]|[R2,C1];[R1,C2]|[R2,C2];
[R1,C3]|[R2,C3];[R1,C4]|[R2,C4]
edge?g=[R2,C1]|[R3,C1];[R2,C2]|[R3,C2];
[R2,C3]|[R3,C3];[R2,C4]|[R3,C4]
edge?h=[R3,C1]|[R4,C1];[R3,C2]|[R4,C2];
[R3,C3]|[R4,C3];[R3,C4]|[R4,C4]
For the 1st antithetical phrase square, all download and get 760, because also there is not pixel to be changed because of using filter from texture quick.(in fact R1, pixel value C1), the 2nd row vertical edge share all pixels with the 1st row vertical edge although the filter of the 1st vertical edge (a) can change.Therefore, the 2nd antithetical phrase square (edge b) is also downloaded from texture quick and is got 760.Since the vertical edge between two adjacent columns is not shared pixel, the 3rd to (edge c) and the 4th to (edge d) sub-square also together.
Specific ID F_H264_x instruction by deblocking effect filter in the loop 290 is sent is judged and will be downloaded pixel data from that position.Order by 290 employed the 1st group of vertical edges of IDF_H264_x instruction process (a-d) of deblocking effect filter in the loop is:
IDF_H264_0SRC1=address?of(R1,C1);
IDF_H264_0SRC1=address?of(R2,C1);
IDF_H264_0SRC1=address?of(R3,C1);
IDF_H264_0SRC1=address?of(R4,C1);
Next, deblocking effect filter 290 is handled the 2nd vertical edge (b) in the loop, from (R1, C2) beginning.Be defined as (R1, C2) 4 pixels of Far Left and (R1, C1) the rightmost pixel overlapping of sub-square in the 8x4 square.These by (R1, vertical edge filter C1) is handled, also possibly upgrade, the overlapping pixel be thereby read from pool of execution units 740 but not texture quick gets 760.Yet, in that (R1, C2) rightmost 4 pixels of sub-square are also by filtering, thereby read to get 760 from texture quick.Sub-square (R2, C2) arrive (R4, C2) also together.The order of deblocking effect filter 290 through ordering following IDF_H264_x is to handle the 2nd group of vertical edge, to accomplish this result in the loop:
IDF_H264_1SRC1=address?of(R1,C2);
IDF_H264_1SRC1=address?of(R2,C2);
IDF_H264_1SRC1=address?of(R3,C2);
IDF_H264_1SRC1=address?of(R4,C2);
When handling the 3rd group of vertical edge, from (R1, C3) beginning.(R1, C3) in the 8x4 square 4 pixels of Far Left with (R1, C2) the rightmost pixel of sub-square overlaps, thereby will read from pool of execution units 740 but not texture quick gets 760.
Yet, in that (R1, C2) rightmost 4 pixels of sub-square are also by filtering, thereby read to get 760 from texture quick.Sub-square (R1, C2) arrive (R4, C2) also together.Similar situation can take place in last group vertical edge.Therefore, deblocking effect filter 290 is left 2 groups of vertical edges through the order of ordering following IDF_H264_x to handle in the loop:
IDF_H264_1SRC1=address?of(R1,C3);
IDF_H264_1SRC1=address?of(R2,C3);
IDF_H264_1SRC1=address?of(R3,C3);
IDF_H264_1SRC1=address?of(R4,C3);
IDF_H264_1SRC1=address?of(R1,C4);
IDF_H264_1SRC1=address?of(R2,C4);
IDF_H264_1SRC1=address?of(R3,C4);
IDF_H264_1SRC1=address?of(R4,C4);
Follow processing horizontal edge (e-h).At this moment, deblocking effect filter has been applied to each the sub-square in the big square, thereby each pixel possibly upgraded.Therefore, sending to each the sub-square that carries out horizontal edge filtering is to read from pool of execution units 740 but not texture quick gets 760.Therefore, in the loop order of deblocking effect filter 290 IDF_H264_x below order with the processing horizontal edge:
IDF_H264_2SRC1=address?of(R1,C1);
IDF_H264_2SRC1=address?of(R2,C1);
IDF_H264_2SRC1=address?of(R3,C1);
IDF_H264_2SRC1=address?of(R4,C1);
IDF_H264_2SRC1=address?of(R1,C2);
IDF_H264_2SRC1=address?of(R2,C2);
IDF_H264_2SRC1=address?of(R3,C2);
IDF_H264_2SRC1=address?of(R4,C2);
IDF_H264_2SRC1=address?of(R1,C3);
Square in any program description or the flow chart should be understood that representation module, section or subprogram sign indicating number, and it comprises the one or more executable instruction of the step that is used for realizing particular logic circuit function or program.The technical staff who is familiar with software department should recognize that other implementation method also is contained in the scope that is disclosed.In other implementation method, shown in each function can be disobeyed or the order that discloses carry out, comprise and carry out in fact synchronously or reverse carrying out, decide according to related function.
The System and method for of this exposure can software, hardware or its combine to realize.In certain embodiments; This system and/or method are existing the software in the memory to realize, and by the suitable processor that is arranged in calculation element performed (comprise and be not limited to microprocessor, microcontroller, network processing unit, can ressemble processor, extendible processor).In other embodiments; This system and/or method are to realize with logical circuit; Comprise and be not limited to programmable logic device (PLD; Programmable logic device), programmable gate array (PGA, programmable gate array), field programmable gate array (FPGA, field programmable gate array) or special circuit (ASIC).In other embodiments, these logical statements are to accomplish in graphic process unit or GPU (GPU).
Can be embedded into any computer-readable media and use at the System and method for of this exposure, or link order executive system, unit.This instruction execution system comprise any with the computer be the basis system, contain processor system or other can be from this instruction execution system acquisition and the system that carries out these instructions.The literal that is disclosed " computer-readable media (computer-readable medium) " can be and anyly can hold, stores, links up, transmits or transmit this program as the instrument that uses or be connected with this instruction execution system.This computer-readable media can be, and for example (unrestricted) is system or the transmission medium based on electronics, magnetic, light, electromagnetism, ultrared or semiconductor technology.
Use the particular example (unrestricted) of the computer-readable media of electronic technology to comprise: to have the line that one or more electrical (electronics) connects; Random access memory (RAM, random access memory); Read-only memory (ROM, read-only memory); Can wipe programmable read only memory (EPROM or flash memory) away.Use the particular example (unrestricted) of the computer-readable media of magnetic technology to comprise: the portable computers disk.Use the particular example (unrestricted) of the computer-readable media of optical tech to comprise: optical fiber and portability read-only optical disc (CD-ROM).
Though the present invention illustrates and describes as embodiment with one or more specific example at this; Details shown in but should not limiting the invention to, however still can not deviate under the spirit of the present invention and many various modifications of realization and structural change in the impartial field of claim scope and scope.Therefore, preferably explain, before claim scope subsequently, propose this statement with the claim of being enclosed ground in extensive range and with the method that meets field of the present invention.

Claims (10)

1. deblocking effect filter that is used for video decode comprises:
First logical circuit is used for judging whether the intended pixel crowd's in a plurality of pixel groups pixel reaches condition;
Second logical circuit is arranged to when reaching this condition, and elder generation is to this intended pixel crowd's pixel filter; And
The 3rd logical circuit is arranged to when reaching this condition, according to the respective sets filter unit in many group filter units, and in proper order to each remaining in these a plurality of pixel groups pixel group filtering,
Wherein this condition is decide by predetermined calculating and set relatively, calculating that this is predetermined and be one group of filter unit relatively,
Wherein these a plurality of pixel groups form a plurality of vicinity 4 * 4 type square pixel square, and the intended pixel crowd is the edge pixel of 4 * 4 type square pixel square of vicinity,
When handling the P3 of this 4 * 4 type square pixel square, the condition of upgrading P4 is:
((ABS(A0)<PQUANT)OR(A3<ABS(A0))OR(CLIP!=0)
Wherein,
A0=(2*(P3-P6)-5*(P4-P5)+4)>>3
A1=(2*(P1-P4)-5*(P2-P3)+4)>>3
A2=(2*(P5-P8)-5*(P6-P7)+4)>>3
A3[I]=min(A1[I],A2[I])
clip=(P4-P5)/2,
P1 to P8 is meant the position of pixel in the row/row that is processed in 2 vicinities, 4 * 4 type square pixel square; ABS representes to ask the absolute value of A0; PQUANT is a preset value; The interpolation of the pixel in row/row that A0 representes to be processed, the interpolation of the pixel at the edge that A1, A2 represent respectively to be processed
If ABS (A0)<PQUANT then selects true output, other then is false; If A3<ABS (A0) then selects true output, other then is false; If CLIP! Select true output for=0, other then is false.
2. deblocking effect filter according to claim 1, wherein the 3rd logical circuit also comprises:
The 4th logical circuit is arranged to be updated in the first intended pixel group in remaining each pixel group according to the second intended pixel group in remaining each pixel group.
3. deblocking effect filter according to claim 1, this second logical circuit also comprises:
The 5th logical circuit is arranged in when reaching this condition, and elder generation is to this intended pixel crowd's pixel filtering abreast.
4. deblocking effect filter according to claim 1, wherein this deblocking effect filter be applied to sub-square between the edge to remove the edge product.
5. deblocking effect filter according to claim 1, wherein pixel reads and writes the suitable a plurality of graphics process instructions that combine of this deblocking effect filter use to reach one by one.
6. a Video Decoder comprises:
Entropy decoder receives the input coding bit stream;
Spatial decoder receives the output of this entropy decoder and the encoded picture that generation comprises a plurality of pixels;
First logical circuit is arranged to combine present picture to combine picture with predicted pictures to produce; And
Deblocking effect filter in the loop receives and is somebody's turn to do the combination picture, and deblocking effect filter comprises in this loop:
Second logical circuit is arranged to the filtering to the intended pixel crowd; And
The 3rd logical circuit is arranged to when this intended pixel crowd reaches condition, according to the respective sets filter unit in many group filter units, and to each remaining in a plurality of pixel groups pixel group filtering,
Wherein this condition is decide by predetermined calculating and set relatively, calculating that this is predetermined and be one group of filter unit relatively,
Wherein these a plurality of pixel groups form a plurality of vicinity 4 * 4 type square pixel square, and the intended pixel crowd is the edge pixel of 4 * 4 type square pixel square of vicinity,
When handling the P3 of this 4 * 4 type square pixel square, the condition of upgrading P4 is:
((ABS(A0)<PQUANT)OR(A3<ABS(A0))OR(CLIP!=0)
Wherein,
A0=(2*(P3-P6)-5*(P4-P5)+4)>>3
A1=(2*(P1-P4)-5*(P2-P3)+4)>>3
A2=(2*(P5-P8)-5*(P6-P7)+4)>>3
A3[I]=min(A1[I],A2[I])
clip=(P4-P5)/2,
P1 to P8 is meant the position of pixel in the row/row that is processed in 2 vicinities, 4 * 4 type square pixel square; ABS representes to ask the absolute value of A0; PQUANT is a preset value; The interpolation of the pixel in row/row that A0 representes to be processed, the interpolation of the pixel at the edge that A1, A2 represent respectively to be processed
If ABS (A0)<PQUANT then selects true output, other then is false; If A3<ABS (A0) then selects true output, other then is false; If CLIP! Select true output for=0, other then is false.
7. Video Decoder according to claim 6, wherein this second logical circuit also comprises:
The 4th logical circuit is arranged in when reaching this condition, and elder generation is to this intended pixel crowd's pixel filtering abreast.
8. Video Decoder according to claim 6, wherein the 3rd logical circuit also comprises:
The 5th logical circuit is arranged to be updated in the first intended pixel group in remaining each pixel group according to the second intended pixel group in remaining each pixel group.
9. a GPU comprises:
Main Processing Interface receives at least one video assisted instruction; And
The video accelerator module is used for this at least one video assisted instruction of response, and this video accelerator module comprises deblocking effect filter in the loop, and deblocking effect filter comprises in this loop:
First logical circuit is arranged to judge whether the intended pixel crowd's of a plurality of pixel groups pixel reaches first condition;
Second logical circuit is arranged to when reaching this first condition, and elder generation is to this intended pixel crowd's pixel filter; And
The 3rd logical circuit is arranged to when reaching this first condition, according to the respective sets filter unit in many group filter units, and in proper order to pixel group filtering remaining in these a plurality of pixel groups,
Wherein this condition is decide by predetermined calculating and set relatively, calculating that this is predetermined and be one group of filter unit relatively,
Wherein these a plurality of pixel groups form a plurality of vicinity 4 * 4 type square pixel square, and the intended pixel crowd is the edge pixel of 4 * 4 type square pixel square of vicinity,
When handling the P3 of this 4 * 4 type square pixel square, the condition of upgrading P4 is:
((ABS(A0)<PQUANT)OR(A3<ABS(A0))OR(CLIP!=0)
Wherein,
A0=(2*(P3-P6)-5*(P4-P5)+4)>>3
A1=(2*(P1-P4)-5*(P2-P3)+4)>>3
A2=(2*(P5-P8)-5*(P6-P7)+4)>>3
A3[I]=min(A1[I],A2[I])
clip=(P4-P5)/2,
P1 to P8 is meant the position of pixel in the row/row that is processed in 2 vicinities, 4 * 4 type square pixel square; ABS representes to ask the absolute value of A0; PQUANT is a preset value; The interpolation of the pixel in row/row that A0 representes to be processed, the interpolation of the pixel at the edge that A1, A2 represent respectively to be processed
If ABS (A0)<PQUANT then selects true output, other then is false; If A3<ABS (A0) then selects true output, other then is false; If CLIP! Select true output for=0, other then is false.
10. GPU according to claim 9, wherein the 3rd logical circuit also comprises:
The 4th logical circuit is arranged to be updated in the first intended pixel group in remaining each pixel group according to the second intended pixel group in remaining each pixel group.
CN2007101103594A 2006-06-16 2007-06-13 Systems and methods of video compression deblocking Active CN101072351B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US81462306P 2006-06-16 2006-06-16
US60/814,623 2006-06-16

Publications (2)

Publication Number Publication Date
CN101072351A CN101072351A (en) 2007-11-14
CN101072351B true CN101072351B (en) 2012-11-21

Family

ID=38880763

Family Applications (6)

Application Number Title Priority Date Filing Date
CN2007101103594A Active CN101072351B (en) 2006-06-16 2007-06-13 Systems and methods of video compression deblocking
CN2007101101936A Active CN101068353B (en) 2006-06-16 2007-06-18 Graph processing unit and method for calculating absolute difference and total value of macroblock
CN200710111956.9A Active CN101083764B (en) 2006-06-16 2007-06-18 Programmable video processing unit and video data processing method
CN2007101101940A Active CN101068365B (en) 2006-06-16 2007-06-18 Method for judging moving vector for describing refrence square moving and the storage media
CN2007101101921A Active CN101068364B (en) 2006-06-16 2007-06-18 Video encoder and graph processing unit
CN2007101119554A Active CN101083763B (en) 2006-06-16 2007-06-18 Programmable video processing unit and video data processing method

Family Applications After (5)

Application Number Title Priority Date Filing Date
CN2007101101936A Active CN101068353B (en) 2006-06-16 2007-06-18 Graph processing unit and method for calculating absolute difference and total value of macroblock
CN200710111956.9A Active CN101083764B (en) 2006-06-16 2007-06-18 Programmable video processing unit and video data processing method
CN2007101101940A Active CN101068365B (en) 2006-06-16 2007-06-18 Method for judging moving vector for describing refrence square moving and the storage media
CN2007101101921A Active CN101068364B (en) 2006-06-16 2007-06-18 Video encoder and graph processing unit
CN2007101119554A Active CN101083763B (en) 2006-06-16 2007-06-18 Programmable video processing unit and video data processing method

Country Status (2)

Country Link
CN (6) CN101072351B (en)
TW (6) TWI444047B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8705622B2 (en) 2008-04-10 2014-04-22 Qualcomm Incorporated Interpolation filter support for sub-pixel resolution in video coding
US9077971B2 (en) 2008-04-10 2015-07-07 Qualcomm Incorporated Interpolation-like filtering of integer-pixel positions in video coding
US9967590B2 (en) 2008-04-10 2018-05-08 Qualcomm Incorporated Rate-distortion defined interpolation for video coding based on fixed filter or adaptive filter
EP2359590A4 (en) * 2008-12-15 2014-09-17 Ericsson Telefon Ab L M Method and apparatus for avoiding quality deterioration of transmitted media content
CN101901588B (en) * 2009-05-31 2012-07-04 比亚迪股份有限公司 Method for smoothly displaying image of embedded system
CN102164284A (en) * 2010-02-24 2011-08-24 富士通株式会社 Video decoding method and system
US8295619B2 (en) * 2010-04-05 2012-10-23 Mediatek Inc. Image processing apparatus employed in overdrive application for compressing image data of second frame according to first frame preceding second frame and related image processing method thereof
TWI395490B (en) * 2010-05-10 2013-05-01 Univ Nat Central Electrical-device-implemented video coding method
US8681162B2 (en) * 2010-10-15 2014-03-25 Via Technologies, Inc. Systems and methods for video processing
EP2661879B1 (en) 2011-01-03 2019-07-10 HFI Innovation Inc. Method of filter-unit based in-loop filtering
CN106162186B (en) * 2011-01-03 2020-06-23 寰发股份有限公司 Loop filtering method based on filtering unit
KR101567467B1 (en) * 2011-05-10 2015-11-09 미디어텍 인크. Method and apparatus for reduction of in-loop filter buffer
RU2619706C2 (en) 2011-06-28 2017-05-17 Самсунг Электроникс Ко., Лтд. Method and device for encoding video, and method and device for decoding video which is accompanied with internal prediction
TWI612802B (en) * 2012-03-30 2018-01-21 Jvc Kenwood Corp Image decoding device, image decoding method
US9953455B2 (en) 2013-03-13 2018-04-24 Nvidia Corporation Handling post-Z coverage data in raster operations
US10154265B2 (en) 2013-06-21 2018-12-11 Nvidia Corporation Graphics server and method for streaming rendered content via a remote graphics processing service
CN105872553B (en) * 2016-04-28 2018-08-28 中山大学 A kind of adaptive loop filter method based on parallel computation
US20180174359A1 (en) * 2016-12-15 2018-06-21 Mediatek Inc. Frame difference generation hardware in a graphics system
CN111028133B (en) * 2019-11-21 2023-06-13 中国航空工业集团公司西安航空计算技术研究所 Graphic command pre-decoding device based on SystemVerilog

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050013494A1 (en) * 2003-07-18 2005-01-20 Microsoft Corporation In-loop deblocking filter
US6882688B1 (en) * 1998-12-11 2005-04-19 Matsushita Electric Industrial Co., Ltd. Deblocking filter arithmetic apparatus and deblocking filter arithmetic method
WO2005122588A1 (en) * 2004-06-14 2005-12-22 Tandberg Telecom As Method for chroma deblocking
US20060078048A1 (en) * 2004-10-13 2006-04-13 Gisle Bjontegaard Deblocking filter
CN1774722A (en) * 2003-03-17 2006-05-17 高通股份有限公司 Method and apparatus for improving video quality of low bit-rate video
EP1659803A1 (en) * 2003-08-19 2006-05-24 Matsushita Electric Industrial Co., Ltd. Method for encoding moving image and method for decoding moving image

Family Cites Families (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3578498B2 (en) * 1994-12-02 2004-10-20 株式会社ソニー・コンピュータエンタテインメント Image information processing device
US5627657A (en) * 1995-02-28 1997-05-06 Daewoo Electronics Co., Ltd. Method for sequentially displaying information recorded on interactive information recording medium
US6064450A (en) * 1995-12-06 2000-05-16 Thomson Licensing S.A. Digital video preprocessor horizontal and vertical filters
JP3876392B2 (en) * 1996-04-26 2007-01-31 富士通株式会社 Motion vector search method
JPH10145753A (en) * 1996-11-15 1998-05-29 Sony Corp Receiver and its method
US6496537B1 (en) * 1996-12-18 2002-12-17 Thomson Licensing S.A. Video decoder with interleaved data processing
US6177922B1 (en) * 1997-04-15 2001-01-23 Genesis Microship, Inc. Multi-scan video timing generator for format conversion
JP3870491B2 (en) * 1997-07-02 2007-01-17 松下電器産業株式会社 Inter-image correspondence detection method and apparatus
US6487249B2 (en) * 1998-10-09 2002-11-26 Matsushita Electric Industrial Co., Ltd. Efficient down conversion system for 2:1 decimation
US6573905B1 (en) * 1999-11-09 2003-06-03 Broadcom Corporation Video and graphics system with parallel processing of graphics windows
CN1112714C (en) * 1998-12-31 2003-06-25 上海永新彩色显象管有限公司 Kinescope screen washing equipment and method
CN1132432C (en) * 1999-03-23 2003-12-24 三洋电机株式会社 video decoder
KR100677082B1 (en) * 2000-01-27 2007-02-01 삼성전자주식회사 Motion estimator
JP4461562B2 (en) * 2000-04-04 2010-05-12 ソニー株式会社 Playback apparatus and method, and signal processing apparatus and method
US6717988B2 (en) * 2001-01-11 2004-04-06 Koninklijke Philips Electronics N.V. Scalable MPEG-2 decoder
US7940844B2 (en) * 2002-06-18 2011-05-10 Qualcomm Incorporated Video encoding and decoding techniques
CN1332560C (en) * 2002-07-22 2007-08-15 上海芯华微电子有限公司 Method based on difference between block bundaries and quantizing factor for removing block effect without additional frame memory
US6944224B2 (en) * 2002-08-14 2005-09-13 Intervideo, Inc. Systems and methods for selecting a macroblock mode in a video encoder
US7336720B2 (en) * 2002-09-27 2008-02-26 Vanguard Software Solutions, Inc. Real-time video coding/decoding
US7027515B2 (en) * 2002-10-15 2006-04-11 Red Rock Semiconductor Ltd. Sum-of-absolute-difference checking of macroblock borders for error detection in a corrupted MPEG-4 bitstream
FR2849331A1 (en) * 2002-12-20 2004-06-25 St Microelectronics Sa METHOD AND DEVICE FOR DECODING AND DISPLAYING ACCELERATED ON THE ACCELERATED FRONT OF MPEG IMAGES, VIDEO PILOT CIRCUIT AND DECODER BOX INCORPORATING SUCH A DEVICE
US6922492B2 (en) * 2002-12-27 2005-07-26 Motorola, Inc. Video deblocking method and apparatus
US7660352B2 (en) * 2003-04-04 2010-02-09 Sony Corporation Apparatus and method of parallel processing an MPEG-4 data stream
US7274824B2 (en) * 2003-04-10 2007-09-25 Faraday Technology Corp. Method and apparatus to reduce the system load of motion estimation for DSP
NO319007B1 (en) * 2003-05-22 2005-06-06 Tandberg Telecom As Video compression method and apparatus
US20050105621A1 (en) * 2003-11-04 2005-05-19 Ju Chi-Cheng Apparatus capable of performing both block-matching motion compensation and global motion compensation and method thereof
US7292283B2 (en) * 2003-12-23 2007-11-06 Genesis Microchip Inc. Apparatus and method for performing sub-pixel vector estimations using quadratic approximations
CN1233171C (en) * 2004-01-16 2005-12-21 北京工业大学 A simplified loop filtering method for video coding
US20050262276A1 (en) * 2004-05-13 2005-11-24 Ittiam Systamc (P) Ltd. Design method for implementing high memory algorithm on low internal memory processor using a direct memory access (DMA) engine
US20060002479A1 (en) * 2004-06-22 2006-01-05 Fernandes Felix C A Decoder for H.264/AVC video
US8116379B2 (en) * 2004-10-08 2012-02-14 Stmicroelectronics, Inc. Method and apparatus for parallel processing of in-loop deblocking filter for H.264 video compression standard
CN1750660A (en) * 2005-09-29 2006-03-22 威盛电子股份有限公司 Method for calculating moving vector

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6882688B1 (en) * 1998-12-11 2005-04-19 Matsushita Electric Industrial Co., Ltd. Deblocking filter arithmetic apparatus and deblocking filter arithmetic method
CN1774722A (en) * 2003-03-17 2006-05-17 高通股份有限公司 Method and apparatus for improving video quality of low bit-rate video
US20050013494A1 (en) * 2003-07-18 2005-01-20 Microsoft Corporation In-loop deblocking filter
EP1659803A1 (en) * 2003-08-19 2006-05-24 Matsushita Electric Industrial Co., Ltd. Method for encoding moving image and method for decoding moving image
WO2005122588A1 (en) * 2004-06-14 2005-12-22 Tandberg Telecom As Method for chroma deblocking
US20060078048A1 (en) * 2004-10-13 2006-04-13 Gisle Bjontegaard Deblocking filter

Also Published As

Publication number Publication date
CN101068353A (en) 2007-11-07
TW200821986A (en) 2008-05-16
TWI482117B (en) 2015-04-21
CN101083763A (en) 2007-12-05
TW200816082A (en) 2008-04-01
TW200816820A (en) 2008-04-01
CN101068353B (en) 2010-08-25
TW200803525A (en) 2008-01-01
CN101068365A (en) 2007-11-07
TWI348654B (en) 2011-09-11
TW200803527A (en) 2008-01-01
CN101068364B (en) 2010-12-01
CN101072351A (en) 2007-11-14
TWI383683B (en) 2013-01-21
TWI350109B (en) 2011-10-01
TWI444047B (en) 2014-07-01
CN101083764A (en) 2007-12-05
CN101083764B (en) 2014-04-02
CN101083763B (en) 2012-02-08
TWI395488B (en) 2013-05-01
CN101068364A (en) 2007-11-07
TW200803528A (en) 2008-01-01
CN101068365B (en) 2010-08-25

Similar Documents

Publication Publication Date Title
CN101072351B (en) Systems and methods of video compression deblocking
US8369419B2 (en) Systems and methods of video compression deblocking
US8243815B2 (en) Systems and methods of video compression deblocking
KR102189213B1 (en) Partial decoding for arbitrary viewing angles and reduced line buffers for virtual reality video
US6046773A (en) Apparatus and method for decoding video images
US20120307004A1 (en) Video decoding with 3d graphics shaders
CN103918273B (en) It is determined that the method for the binary code word for conversion coefficient
CN107409233A (en) Image processing apparatus and image processing method
WO2008067500A2 (en) Parallel deblocking filter for h.264 video codec
KR20130140066A (en) Video coding methods and apparatus
CN105556964A (en) Content adaptive bi-directional or functionally predictive multi-pass pictures for high efficiency next generation video coding
CN102804165A (en) Front end processor with extendable data path
JP2005309900A (en) Image processor and image processing method
US5774676A (en) Method and apparatus for decompression of MPEG compressed data in a computer system
CN109246430A (en) 360 degree of video fast intra-mode predictions of virtual reality and CU, which are divided, shifts to an earlier date decision
US20230063062A1 (en) Hardware codec accelerators for high-performance video encoding
Jiang et al. FIPIP: A novel fine-grained parallel partition based intra-frame prediction on heterogeneous many-core systems
CN109963158A (en) A kind of high definition video decoding method based on GPU parallel computation
Han et al. Efficient video decoding on GPUs by point based rendering
Kopperundevi et al. Methods to develop high throughput hardware architectures for HEVC Deblocking Filter using mixed pipelined-block processing techniques
Li et al. Transform coding on programmable stream processors
Rosa et al. FPGA prototyping strategy for a H. 264/AVC video decoder
Naresh et al. FPGA IMPLEMENTATION OF DEBLOCKING FILTER CUSTOM INSTRUCTION HARDWARE ON NIOS-II BASED SOC
US20090175345A1 (en) Motion compensation method and apparatus
CN116114245A (en) Parallel processing of video frames during video encoding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant