CN101072351A - Systems and methods of video compression deblocking - Google Patents
Systems and methods of video compression deblocking Download PDFInfo
- Publication number
- CN101072351A CN101072351A CNA2007101103594A CN200710110359A CN101072351A CN 101072351 A CN101072351 A CN 101072351A CN A2007101103594 A CNA2007101103594 A CN A2007101103594A CN 200710110359 A CN200710110359 A CN 200710110359A CN 101072351 A CN101072351 A CN 101072351A
- Authority
- CN
- China
- Prior art keywords
- pixel
- square
- filter
- logical circuit
- pixel group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Processing (AREA)
- Image Generation (AREA)
Abstract
An exemplary video decoder comprises: an entropy decoder; a spatial decoder; combining logic; and an inloop deblocking filter. The entropy decoder receives an incoming coded bit stream. The spatial decoder receives the output of the entropy encoder and produces an encoded picture comprising a plurality of pixels. The combining logic combines a current picture with a prediction picture to produce a combined picture. The inloop deblocking filter receives the combined picture. The inloop deblocking filter comprises: logic configured to filter a predefined pixel group; and logic configured to filter each of the remaining pixel groups in the plurality after the predefined pixel group, according to a corresponding set of taps in a plurality of sets of taps, if the predefined pixel group meets a criteria.
Description
Technical field
The invention relates to image compression and decompression, and especially about having the Graphics Processing Unit of image compression and decompression feature.
Background technology
Personal computer and consumption electronic products are to be used for various amusement articles.These amusement articles can roughly be divided into 2 classes: those of the drawing that uses a computer (computer-generated graphics), for example computer game; With use those of compressed video data stream (compressed video stream), for example pre-record program to digital video disk (DVD) (DVD), or provide digital program (digital programming) to set-top box (set-top box) by cable TV or satellite dealer.The 2nd kind also comprises the coding simulation video data stream, for example performed by digital VTR (DVR, digital video recorder).
Computer graphics is produced by Graphics Processing Unit (GPU, graphic processing unit) usually.Graphics Processing Unit is a kind of a kind of special microprocessor on computer game platform (computer game consoles) and some personal computers that is based upon.Graphics Processing Unit is to be optimized to carrying out fast to describe three-dimensional space basic object (three-dimensional primitive objects), for example triangle, quadrangle etc.These basic objects are to describe with a plurality of summits, and wherein each summit has attribute (for example color), and can apply texture (texture) to this basic object.The result who describes is dual space pel array (two-dimensional array of pixels), is presented on the display or monitor of computer.
The encoding and decoding of video data stream involves different types of computing, for example, discrete cosine transform (discrete cosine transform), mobile estimating (motion estimation), motion compensation (motion compensation), deblocking effect filter (deblocking filter).Usually by the special hardware logic electric circuit of general service central processing unit (CPU) combination, for example application-specific integrated circuit (ASIC) (ASIC, application specific integrated circuit) is handled in these calculating.Consumer thereby a plurality of calculate platforms of needs are to satisfy their amusement demand.Thereby need can the process computer drawing and the single computing platform of encoding and decoding of video.
Summary of the invention
Provide a kind of System and method for that is used for the video compression deblocking effect at the embodiment of this exposure.A kind of exemplary deblocking effect filter that is used for video decode comprises: be arranged to be used for judge whether the intended pixel group's of a plurality of pixel groups pixel reaches the logical circuit of standard; Be arranged to when reaching this standard, elder generation is to the logical circuit of this intended pixel group's pixel filter; And be arranged to when reaching this standard, according to the respective sets filter units in many group filter units (set of taps), in proper order to the logical circuit of pixel group filtering remaining in these a plurality of pixel groups.
A kind of exemplary Video Decoder comprises: deblocking effect filter in entropy decoder, spatial decoder, combinational logic circuit and the loop.This entropy decoder receives the input coding bit stream.This spatial decoder receives the output of this entropy decoder and the encoded picture that generation comprises a plurality of pixels.This combinational logic circuit combines picture with predicted pictures to produce in conjunction with present picture.Deblocking effect filter receives this in conjunction with picture in this loop.Deblocking effect filter comprises in this loop: be arranged to the logical circuit to intended pixel group filtering; And be arranged to when this intended pixel group reaches standard, according to the respective sets filter unit in many group filter units, to the logical circuit of each remaining in these a plurality of pixel groups pixel group filtering.
A kind of exemplary figure processing unit comprises main Processing Interface and video accelerator module.This main Processing Interface receives at least one video assisted instruction.This video accelerator module is used for this at least one video assisted instruction.This video accelerator module comprises deblocking effect filter in the loop.Deblocking effect filter comprises in this loop: be arranged to judge whether the intended pixel group's of a plurality of pixel groups pixel reaches the logical circuit of first standard; Be arranged to when reaching this first standard, elder generation is to the logical circuit of this intended pixel group's pixel filter; And be arranged to when reaching this first standard, according to the respective sets filter units in many group filter units (set of taps), in proper order to the logical circuit of pixel group filtering remaining in these a plurality of pixel groups.
Description of drawings
Fig. 1 is the calcspar that is used for the exemplary calculate platform of figure and video coding and/or decoding.
Fig. 2 is the calcspar of this Video Decoder 160 among Fig. 1.
Fig. 3 illustrates the sub-square pixel setting of VC-1 filter.
Fig. 4 is the tabulation of the hardware description pseudo-code of the hardware-accelerated logical circuit 400 of deblocking effect filter in Fig. 1 VC-1 loop.
Fig. 5 is the tabulation of the hardware description language procedure code of the capable acceleration logic circuit 500 of Fig. 4.
Fig. 6 A to Fig. 6 D forms the calcspar of the capable acceleration logic circuit of Fig. 4, Fig. 5.
Fig. 7 is the data flowchart of the Graphics Processing Unit 120 of Fig. 1.
Fig. 8 is the calcspar of the H.264 used big square of 16x16.
[main element label declaration]
100~system, 110~general service CPU, 120~graphic process unit (GPU), 130~memory, 140~bus, 150~video accelerator module (VPU), 160~software decoder, 170~video accelerator actuator.
Bit stream, 210~entropy decoder, 215~spatial decoder, 220~inverse quantizer, the conversion of 230~inverse discrete cosine, 235~figure, 245~motion-vector, 250~motion compensation, 255~early decoding figure, 265~prediction figure, 270~space compensation, 280~adder, the 290~deblocking effect filter, 295~decoding figure of 205~input.
310-320~two a contiguous 4x4 square, 330~vertical boundary.
The hardware-accelerated logical circuit of deblocking effect filter, 410~module definition section, the vertical parameter section of 420~iterative cycles section, 430~test, 440~comparison loop parameter and 3 sections, 450~example section in 400~loop.
500~row acceleration logic circuit, 510~module definition section, 520~pixel value computing section, 530~comparison loop parameter and 3 sections, 540~test DO_FILTER section, 550~update mode section.
605-610-615-620~multiplexer, 625-630-679~subtracter, 635-640-655-680~logical circuit square, 645-650~adder, 660-665-670~buffer, the output of 671~P4 buffer, the output of 673~P5 buffer.681~subtracter, 685~adder.687-689-691-693~multiplexer, 697~OR door.
710~instruction stream processor, 720~instruction, 730~director data, 740~pool of execution units, 750~texture filtering unit, 760~texture quick are got, 770~back wrapper.
Embodiment
The calculate platform that is used for encoding and decoding of video
Fig. 1 is the calcspar that is used for the exemplary calculate platform of figure and video coding and/or decoding.System 100 comprises general service CPU110 (after this being called primary processor), graphic process unit (GPU) 120, memory 130 and bus 140.Graphics Processing Unit 120 comprises video accelerator module (VPU) 150, but its accelerated video encoding and/or decoding, will be in the back narration.It is the instruction that can carry out on Graphics Processing Unit 120 that the video of Graphics Processing Unit 120 quickens function.
In certain embodiments, only have a fraction of decoder 160 on primary processor, to carry out, and most decoder 160 is to be carried out by Graphics Processing Unit 120, under the few overload of driver.Method according to this, the intensive computing square that often is performed (computationally intensive blocks) is unloaded to Graphics Processing Unit 120, and more complex calculations are performed by primary processor 110.In certain embodiments, an intensive calculation function of being realized by Graphics Processing Unit 120 comprises the hardware-accelerated logic device of deblocking effect filter in the loop (inloop deblocking filter hardwareacceleration logic) 400, also be called blocking artifact filter 400 or deblocking effect filter 400 in the loop, it will illustrate in conjunction with Fig. 4 after a while.The example of another intensive calculation function is a boundary intensity (BS, boundary strength) of judging each filter.
Above-mentioned structure thereby make following running flexible: on primary processor 110, decoder 160 is carried out some by big square (marcoblock) being carried out the specific function (for example deblocking effect or computation bound intensity) of coloring process (shader program); Or on Graphics Processing Unit 120, carry out most decoder 160, utilize pipelining (pipelining) and parallelization (parallelism).In the embodiment that some decoders 160 are carried out on Graphics Processing Unit 120, it is synchronous thread (thread) between these decoder 160 each aspects that this deblocking effect is handled.
Omit several among Fig. 1 and quicken feature and inessential and well known elements well known to those skilled in the art for the video of explaining Graphics Processing Unit 120.
Video Decoder
Fig. 2 is the calcspar of this Video Decoder 160 among Fig. 1.In specific embodiments illustrated in fig. 2, decoder 160 is used H.264 video compression standard of ITU.Yet those skilled in the art should recognize that the decoder 160 of Fig. 2 is preliminary expressions of Video Decoder, and this Video Decoder also illustrates the running that is similar to other type of decoder H.264, for example SMPTE VC-1 and MPEG-2 standard.In addition, although be shown the part of Graphics Processing Unit 120, those skilled in the art also should be appreciated that the partial decoding of h device 160 in this exposure also can be implemented in outside the Graphics Processing Unit for example self-existent logical circuit, the part of application-specific integrated circuit (ASIC) (ASIC) etc.
The bit stream 205 of input is at first by 210 processing of entropy decoder (entropy decoder).Entropy coding has the advantage of statistics repetition type (statistic redundancy): some patterns are than the more normal appearance of other pattern, so normal just representing with short sign indicating number of occurring.Entropy coding comprises huffman coding (Huffmancoding) and run length coding (run-length encoding).After entropy coding, these data are by 215 processing of spatial decoder (spatial decoder), and it has following advantage, and in fact, pixel contiguous in the figure is identical or relevant usually, so as long as difference is encoded.In this one exemplary embodiment, spatial decoder 215 comprises inverse quantizer (inverse quantizer) 220, with inverse discrete cosine conversion (IDCT) function 230.The output of IDCT function 230 can be considered figure (235), is made up of the number pixel.
Figure 235 is treated to less sub-block, is called big square.H.264 the video compression standard is used the big block sizes of 16 * 16 pixels, and other compression standard can be used other size.Big squares in the figure 235 combine with the information of early decoding figure item, are called inter-picture prediction (inter prediction) and handle, or combine with the information of other big square of figure 235, are called intra-frame prediction (intraprediction) processing.This incoming bit stream 205 is decoded by entropy decoder 205, and uses between picture or intra-frame prediction according to all types of figures.
When using inter-picture prediction, entropy decoder 210 produces motion-vector (motion vector) 245 outputs.Motion-vector 245 is used to temporary transient coding, and it has following advantage, and in fact, many pixels have identical value in a series of figure usually.Change from a figure to another figure is to be encoded to motion-vector 245.Motion compensation square 250 is predicted figure (265) in conjunction with motion-vector 245 to produce with one or more early decoding figures 255.When using inter-picture prediction, information and the generous agllutination in the figure 235 that space compensation square 270 will derive from contiguous big square close to produce prediction figure (275).
Coded program causes as along the discontinuous of generous block edge and the discontinuous product in sub-square edge (artifact) in the big square.The result " edge " occurred (edge) at the decoding picture frame, and does not originally have.Deblocking effect filter 290 be applied to by colligator 280 output in conjunction with figure, to remove these edge products.This decoding figure 295 that storage is produced by deblocking effect filter ensuing figure that is used for decoding.
In conjunction with the discussion of Fig. 1, partial decoding of h device 160 is carried out on primary processor 110, and decoder 160 also has the advantage that the video assisted instruction is provided by Graphics Processing Unit 120.Especially, in certain embodiments, deblocking effect filter 290 uses the one or more instructions that provided by Graphics Processing Unit 120 to be used for realizing using the filtering of low relatively computing cost.
Deblocking effect filter
Fig. 3 shows two contiguous 4x4 squares (310,320), is defined as row R1-R4 and row C1-C8.Vertical boundary 330 between these two sub-squares is along row C4 and C5.This VC-1 filter is to each 4x4 square running.For leftmost sub-square, the predetermined sets of pixels (P1, P2, P3) of this VC-1 filter check in predetermined column (3).If should predetermined sets of pixels reach specific criteria, then upgrade another pixel P4 in the identical predetermined column.This standard is to be decided with particular set relatively by the calculating of pixel in this predetermined group.It will be understood by a person skilled in the art that these calculate and be one group of filter unit (a set oftaps) more, and detailed calculating combine Fig. 5 discussion after a while with general relatively.Updating value is also based on to the performed computing of pixel in the predetermined group.This VC-1 filter is handled rightmost sub-square with analog form, judges whether pixel 6,7,8 reaches standard, then upgrades P5 if reach this standard.In other words, this VC-1 filter is a group intended pixel-edge pixel P4 of a predetermined column (R3) and the P5-value evaluation according to other group intended pixel in the same row, and the value of P4 is according to P1, P2, P3, and the value of P5 is according to P6, P7, P8.
The same cluster intended pixel of this VC-1 all the other row of renewal with good conditionsi is values of being calculated according to for the predetermined sets of pixels (edge pixel P4, P5) of this predetermined column (R3).Thus, the P4 among the R1 has upgraded based on the P1 among the R1, P2, P3, yet only has P4, P5 among the R3 to upgrade.Similarly, the P5 among the R1 has upgraded based on the P6 among the R1, P7, P8, yet only has P4, P5 among the R3 to upgrade.The 2nd row are also handled in a similar manner with the 4th row.
From another point of view, filtered or upgraded in some pixels of predetermined tertial pixel, when when tertial other pixel reaches standard.This filter involves these other pixel execution is compared and calculating.If when tertial other pixel does not reach this standard, be with analog form filtering, as mentioned above in corresponding each pixel of all the other row.Some embodiment at the deblocking effect filter 290 of this exposure use an initiative technology, earlier to the 3rd row filtering, then again to other row filtering.These initiative technology will be in conjunction with Fig. 4,5,6A-6D, more detailed description.
Although Fig. 3 illustrates rows of processing vertical edges, those skilled in the art should understand same figure revolves and an every trade processing horizontal edge also can be described after turning 90 degrees.Although those skilled in the art also can recognize that VC-1 uses the 3rd row in four row to have ready conditions as judgement and upgrades the predetermined column of other row, principle in this exposure also can be applied to the embodiment (for example first row, secondary series etc.) that uses other predetermined column, also can be applied to form other different embodiment of sub-square column number.Similarly, upgrade the value of pixel to set desire, also can be applied to the embodiment that other pixel has been verified and other pixel has been set in the principle of this exposure although those skilled in the art also can recognize the value of the contiguous one group of pixel of VC-1 check.With regard to an example, can check P2 and P3 to judge the updating value of P4.Another example, P3 can set according to the value of P2 and P4.
Fig. 4 is the tabulation of the hardware description pseudo-code of the hardware-accelerated logical circuit 400 of deblocking effect filter in the VC-1 loop.Though non-use actual hardware descriptive language (HDL, hardware descriptionlanguage), for example Verilog and VHDL and use pseudo-code, it is quite familiar that those skilled in the art tackle these pseudo-codes.These people should understand when describing with actual HDL, and these procedure codes should be compiled and then synthesize several gate configurations of component part video accelerator module 150.These people should recognize that these gates can various technology realize, for example application-specific integrated circuit (ASIC) (ASIC), programmable gate array (PGA) or field programming logic gate array (FPGA).
410 sections of this procedure code is module definition (module definition).The hardware-accelerated logical circuit 400 of deblocking effect filter has many input parameters in the VC-1 loop.The sub-square that carries out filtering is by this square parameter (Block parameter) institute standard.If vertical parameter (Vertical parameter) is true (True), then this acceleration logic circuit 400 is considered as 4 * 8 squares (referring to Fig. 3) with the square parameter, and carries out vertical edge filtering.If vertical parameter is false (False), then this acceleration logic circuit 400 is considered as 8 * 4 squares (referring to Fig. 3) with the square parameter, and the executive level edge filter.
The section 420 of procedure code begins iterative cycles (iteration loop), sets the value of this loop parameter variable.By this circulation time, loop parameter is made as 3 for the first time, so handle the 3rd row earlier.It is 1,2 and 4 that follow-up loop iteration is set loop parameter.Utilize these parameters, the hardware-accelerated logical circuit 400 of deblocking effect filter repeats 4 times in the VC-1 loop, handles 8 pixels at every turn, and wherein delegation can be a horizontal row or a vertical row, and each row is handled (referring to Fig. 5) by row acceleration logic circuit 500.In certain embodiments, this journey acceleration logic circuit 500 is to realize with the HDL secondary module, will illustrate in conjunction with Fig. 5.
The vertical parameter of section 430 tests is carried out horizontal or vertical edge filter to judge.According to this result, 8 elements of row array variable are the capable initialization from the row of this 4 * 8 input square or 8x4 input square.
After submodule was handled these row, the hardware-accelerated logical circuit 400 of deblocking effect filter continued this iterative cycles at section 420 with the loop parameter updating value in the VC-1 loop.Method is used this filter to the 3rd row of input square according to this, then the 1st row, the 2nd row, the 4th row.
Fig. 5 is the tabulation of the hardware description language procedure code of row acceleration logic circuit 500, and it has realized above-mentioned submodule.The section 510 of procedure code is a module definition.Row acceleration logic circuit 500 has many input parameters.The row that will carry out filtering is to be defined as capable input parameter.ProcessingPixel3 is an input parameter, if the behavior the 3rd goes or the 3rd row then are made as it very by the higher level logical circuit.Parameter F ILTER_OTHER_3 is made as very by the higher level logical circuit, and is adjusted by row acceleration logic circuit 500 according to pixel value.
Section 520 carry out as VC-1 fixed various pixel value computings.(, will these computings not elaborated because this calculating can be understood with reference to the standard of VC-1.) the ProcessingPixel3 parameter that provided by the hardware-accelerated logical circuit 400 of deblocking effect filter in the higher level VC-1 loop of section 530 test.If ProcessingPixel3 is true, then section 530 is initialized as default value with control variables DO_FILTER, and is true.The various results of the computing in the middle of section 520 are used for judging whether also to handle other 3 row.If this Pixel calcualting result represents not handle other 3 row, then DO_FILTER is made as vacation.
If ProcessingPixel3 is false, section 540 uses input parameter FILTER_OTHER_3 (being set by the hardware-accelerated logical circuit 400 of deblocking effect filter in the higher level VC-1 loop) to set the value of DO_FILTER.If DO_FILTER is true, section 550 is tested these DO_FILTER variablees and is upgraded this edge pixel P4, the P5 (referring to Fig. 3) of this row variable.
Section 560 these ProcessingPixel3 parameters of test, and suitably upgrade FILTER_OTHER_3.This FILTER_OTHER_3 variable is the state information that is used for passing on different examples in this module.If ProcessingPixel3 is true, then section 550 upgrades this FILTER_OTHER_3 parameter with the value of DO_FILTER.This technology makes the higher level module (being VC1_InloopFilter) be used for illustrating this module provide FILTER_OTHER_3 value that the VC_1_INLOOPFILTER_LINE modules at lower layers of example thus upgraded to another routine VC_1_INLOOPFILTER_LINE.
The pseudo-code that it will be understood by a person skilled in the art that Fig. 5 can be synthesized the gate layout that realizes row acceleration logic circuit 500 to produce in every way.Wherein a kind of layout is to illustrate in Fig. 6 A, and they constitute the calcspar of capable acceleration logic circuit 500 together.Those skilled in the art should feel familiar to deblocking effect filter algorithm and logic circuit structure in the VC-1 loop.Therefore, the element of Fig. 6 A will not describe in detail.And will select to describe in detail the feature of row acceleration logic circuit 500.
It will be understood by a person skilled in the art that the computing that deblocking effect filter involved in the VC-1 loop comprises following, wherein P1-P8 is meant the position of pixel in processed row/row.
A0=(2*(P3-P6)-5*(P4-P5)+4)>>3
A1=(2*(P1-P4)-5*(P2-P3)+4)>>3
A2=(2*(P5-P8)-5*(P6-P7)+4)>>3
clip=(P4-P5)/2
In preceding 3 computings each involves 3 subtractions, 2 multiplication, 1 addition and 1 and moves to right.The part of the capable acceleration logic circuit 500 among Fig. 6 A uses the shared logic circuit to calculate A0, A1, A2 in proper order, but not uses specific separate logic square for A0, A1, A2.By avoiding the logical circuit square to repeat, utilize multiplexer to handle each input in proper order, reduced gate and/or power consumption.
Multiplexer 605,610 and 620 is to be used for selecting different inputs from pixel buffer P-8 in the different sequential cycle, and these inputs provide to each shared logic circuit box. Logical circuit square 625 and 630 is respectively carried out subtraction.Logical circuit square 635 multiply by 2 by execution 1 realization that moves to left.Multiply by be by move to left 1 carry out, the back connects adder 645.To the move to left output, constant 4 of device 635 and negatives of 645 outputs of adder 650 add together.At last, logical circuit square 655 is carried out and is moved to right 3.
In the 1st sequential cycle, input T=1 provides to each multiplexer 605,610 and 615, and calculates the value of A1 and have buffer 660.In the 2nd sequential cycle, input T=2 provides to each multiplexer 605,610 and 615, and calculates the value of A2 and have buffer 665.In the 3rd sequential cycle, input T=3 provides to each multiplexer 605,610 and 615, and calculates the value of A0 and have buffer 670.Exist value A1, A2, the A3 of buffer 660,665,670 to be used by the part row acceleration logic circuit 500 of Fig. 6 B, will be in the back explanation.The output of the output of P4 buffer (671) and P5 buffer (673) will be used by the part row acceleration logic circuit 500 of Fig. 6 C, will be in the back explanation.
Those skilled in the art also should be appreciated that the extra computing of chatting after deblocking effect filter involves in the VC-1 loop:
D=5*((sign(A0)*A3)-A0)/8
if(CLIP>0)
{
if(D<0)
D=0
if(D>CLIP)
D=CLIP
}
else
{
if(D>0)
D=0
if(D<CLIP)
D=CLIP
}
The part row acceleration logic circuit 500 of Fig. 6 B receives input from the part row acceleration logic circuit 500 of Fig. 6 A, and calculates D (675).Referring again to Fig. 6 A, CLIP (677) is that following generation: pixel P4 and P5 are subtracted each other by logical circuit square 679, and this result moves to right (integer division is with 2) to produce CLIP 677 by logical circuit square 680.Get back to Fig. 6 B, A1 can obtain from buffer 660 in the period 1, and A2 can obtain from buffer 665 in second round, and A0 can obtain from buffer 670 in the period 3.Thereby in the period 4, the part row acceleration logic circuit 500 of the 6th figure calculates D (675) according to above-mentioned equation.
Row acceleration logic circuit 500 utilizes (675) to upgrade the location of pixels of P4, P5.Especially, P4=P4-D and P5=P5+D.Although Fig. 6 is A, Fig. 6 B before illustrated in conjunction with single row/row (for example single group of location of pixels P0-P8) that the computing meeting of a sub-block the 3rd row/row influenced the behavior of other 3 row/row of this sub-block.Row acceleration logic circuit 500 utilizes an initiative method to realize this behavior.When independent filtering operation from the foremost-abreast-finish,, be shown in the position that part row acceleration logic circuit 500 selections with good conditionsi of Fig. 6 C, Fig. 6 D will be upgraded in conjunction with the explanation of Fig. 6 A, Fig. 6 B.In other words, the hardware-accelerated logical circuit 400 of deblocking effect filter judges it is that value is originally write back or new value is write back in the VC-1 loop.Relatively, known method, deblocking effect filter uses circulation in the VC-1 loop, so independent filtering operation is carried out conditionally.
As previously described, the pseudo-code of Fig. 4 interpreting line acceleration logic circuit 500 so running in circulation: in repeating section 420, example section (instantiation section) 450 occurred.The example of this layman's acceleration logic circuit 500 is used 2 parameters, ProcessingPixel3 and FILTER_OTHER_3.The following execution pixel of these parameters P4, P5 renewal with good conditionsi with row acceleration logic circuit 500.Referring to Fig. 6 C, buffer P4 writes the result of subtracter 681, and wherein subtracter 681 is input as P4 (671), is 0 or D (675), decides according to the value of DO_FILTER (683).Similarly, buffer P5 writes the result of adder 685, and wherein adder 685 is input as P5 (673), is 0 or D (675), decides according to the value of DO_FILTER (683).Thereby the updating value of P4 is P4 value originally (if DO_FILTER is for false), or P4-D.Similarly, the updating value of P5 is P5 value originally (if DO_FILTER is for false), or P5+D.
Those skilled in the art should recognize that when processing one sub-square the 3rd was listed as, the standard of upgrading P4 with P4-D was:
((ABS(A0)<PQUANT)OR(A3<ABS(A0))OR(CLIP!=0)
This VC-1 deblocking effect accelerator module 400 adopts parallel and combining in proper order in a creative way, as previously mentioned.Parallel processing provides execution faster and reduces and postpones.Although parallelization has increased logic gate number, recruitment is offset by aforesaid processing in proper order.Do not use the aforementioned known method of handling in proper order to increase logic gate number on foot.
Some embodiment of Graphics Processing Unit 120 comprise and are used for the H.264 hardware-accelerated unit of deblocking effect, and this deblocking effect function is for use by the Graphics Processing Unit instruction.Graphics Processing Unit 120 will describe in detail in conjunction with Fig. 8, and the Graphics Processing Unit of strengthening explanation and providing deblocking effect H.264 to quicken function is instructed special selection.
Graphic process unit
The principle of multiple deblocking effect instruction
The instruction set of Graphics Processing Unit 120 is included in the partial decoding of h device of carrying out in the software 160 and can be used to quicken deblocking effect filter.Illustrate that at this initiative technology provides not only one multiple graphics processing unit instruction to quicken specific deblocking effect filter.Deblocking effect filter 290 is exactly in proper order originally in the loop, thereby specific filter must be with a graded to pixel filter (for example H.264 regulation be from left to right followed from top to bottom).Thereby previous pixel that filter or that upgraded is brought as input when the pixel of filter back.Master processor processes is stored in the pixel value of known as memory device, and this makes pixel read one by one, write.Yet this essence in proper order can't suitably cooperate when deblocking effect filter in the loop 290 uses Graphics Processing Unit accelerating part Filtering Processing.Known Graphics Processing Unit is stored in texture quick with pixel and gets (texture cache), and the design of this gpu pipeline is not deferred to one by one (back-to-back) and read, writes texture quick and get.
Provide the instruction of multiple graphics processing unit at these some embodiment that disclose Graphics Processing Unit 120, it can be used for quickening specific deblocking effect filter together.The some of them instruction is got texture quick when the pixel data source, and some instructions use the Graphics Processing Unit performance element as data source.Pixel is read, writes in deblocking effect filter 290 suitable these different Graphics Processing Unit instructions that are used in combination in the loop to reach one by one.Next the summary description data of Graphics Processing Unit 120 of flowing through then are provided by deblocking effect assisted instruction again that provided by Graphics Processing Unit 120, with 290 these instructions of utilization of deblocking effect filter in the loop.
Graphics Processing Unit stream
Fig. 7 is the figure of Graphics Processing Unit 120 data flow, and wherein instruction stream is the arrow by Fig. 7 left side, and image or graphical stream are to be represented by the arrow on the right.Fig. 7 has omitted several elements well known by persons skilled in the art, and these are inessential to deblocking effect feature in the loop of explaining Graphics Processing Unit 120.Instruction stream processor 710 receives instruction 720 from the system bus (not shown), and this instruction of decoding, and produces director data 730, for example vertex data.Graphics Processing Unit 120 is supported known graphics process instruction, and accelerated video encoding and/or decoded instruction.
Known graphics process instruction involves as vertex coloring (vertex shading), how much painted (geometry shading), the painted difficult problems such as (pixel shading) of pixel.Therefore, director data 730 is the ponds (pool) 740 that are applied to tinter performance element (shader execution units).Necessary texture filtering unit (TFU, the texture filter unit) 750 that use of painted performance element is to apply texture to pixel.Data texturing is to take from texture quick soon to get 760, and it is in main storage (not shown) back.
The execution of encoding and decoding of video assisted instruction, for example aforesaid deblocking effect filter command, different with aforesaid known graphics command in many aspects.At first, the video assisted instruction is to be carried out by video accelerator module 150, but not the tinter performance element.Secondly, the video assisted instruction is not used its data texturing.
Yet employed view data of video assisted instruction and the employed data texturing of graphics command are 2 dimension arrays.Graphics Processing Unit 120 is utilized this advantage equally, uses texture filtering unit 750 to download the view data of giving video accelerator module 150, thereby makes texture quick get 760 to get some view data by 150 runnings of video accelerator module soon.Therefore, be shown in Fig. 7, video accelerator module 150 is between texture filtering unit 750 and back wrapper 770.
750 checks of texture filtering unit are from instructing 720 director datas 730 that capture.Director data 730 also provides TFU 750 texture quicks to get the coordinate of the view data of wanting in 760.In one embodiment, these coordinates are marked as U, V is right, and those skilled in the art tackle this and are familiar with.When instruction 720 when being the video assisted instruction, the director data that is captured also orders texture filtering unit 750 to skip over texture filter (not shown) in the texture filtering unit 750.
Method according to this, texture filtering unit 750 are to be subjected to handle for the video assisted instruction to go the download images data to video accelerator module 150.Video accelerator module 150 receives view data from the texture filtering unit 750 on the data path, with the order data 730 on the order path, and according to 730 pairs of these view data execution of order data running.By 150 output image datas of video accelerator module is to feed back to pool of execution units 740, after being handled by back wrapper 770.
The deblocking effect instruction
At the embodiment of the Graphics Processing Unit 120 of this narration, provide the VC-1 deblocking effect filter and H.264 deblocking effect filter is hardware-accelerated.The VC-1 deblocking effect filter is to be quickened by Graphics Processing Unit instruction (" IDF_VC-1 "), and H.264 deblocking effect filter instructs (" IDF_H264_0 ", " IDF_H264_1 ", " IDF_H264_2 ") to quicken by three Graphics Processing Unit.
As previously described, each Graphics Processing Unit instruction is that decoding and analysis (parsed) are director data 730, and the specific set of parameters that it can be considered each instruction is shown in the 1st table.Some shared parameters are shared in IDF_H264_x instruction, and other for each instruction exclusive.It will be understood by a person skilled in the art that these parameters can use various command codes (opcode) and instruction form coding, so these subjects under discussion will be in this discussion.
The 1st table: the parameter of IDF_H264 instruction
Parameter | Size | Operand | Narration |
FieldFlag (Input) | The 1-position | If FieldFlag==1 is Field Picture then, other Frame Picture | |
TopFieldFlag (Input) | The 1-position | If TopFieldFlag==1 is Top-Field-Picture then, other Bottom-Field-Picture is if set FieldFlag. |
PictureWidth (Input) | The 16-position | For example, be used for 1920 of HDTV | |
PictureHeight (Input) | The 16-position | For example, be used for 1080 of 30P HDTV | |
YC Flag | The 1-position | Control-2 | Or chroma plane, Y plane |
Field Direction | The 1-position | Control-1 | |
CBCR Flag | The 1-position | Control-1 | Cb or Cr |
BaseAddress (Input) | The 32-position is signless | Be used for IDF_H64_0 and IDF_H64_0: the sub-square base address of texture storage device | |
BlockAddress (Input) | 13.3 form omits fractional part | SRC1[0:1 5]=U SRC1[31: 16]=V | Be used for IDF_H64_0: the texture coordinate of whole sub-square (about base address) |
For IDF_H64_1: the texture coordinate of remaining sub-square (about base address) | |||
Do not use at IDF_H64_2 | |||
DataBlock1 | 4 * 4 * 8-position | Do not use at IDF_H64_0 | |
SRC2[127 :0] | Be used for IDF_H64_1: first of sub-square or left side, according to FilterDirection according to the Control2 parameter coding | ||
SRC2[127 :0] | Be used for IDF_H64_2: first (even number) buffer is right | ||
DataBlock2 | 4 * 4 * 8-position | In IDF_H64_0 o or IDF_H64_1, do not use | |
SRC2[255: 128] | Be used for IDF_H64_2: second (odd number) buffer is right | ||
Sub-block (Output) | The 128-position | The sub-square of 8 * 4 * 8-bit of deblocking effect (128-position) |
Be used in combination many input parameters to judge 4 * 4 square addresses that captured by texture filtering unit 750.The BaseAddress parameter is pointed out the starting point of this data texturing in texture quick is got.Give the BaseAddress parameter with upper left square coordinate in this zone.PictureHeight and PictureWidth input parameter are the scopes that is used for judging this square, i.e. the lower left coordinate.At last, video and graphic can be gradual scanning (progessive) or interlacing scan (interlace).If interlacing scan, it forms (top and below) by both direction.Texture filtering unit 750 uses FieldFlag and TopFieldFlag with suitable processing horizontally interlaced image.
8 * 4 * 8 outputs of deblocking effect provide in the target buffer, and also write back pool of execution units 740.It is " location updating (modify in place) " runnings that deblocking effect output is write back pool of execution units 740, in the realization of some decoder, be necessary, the pixel value in the square wherein H.264 for example, the right and below are to calculate according to previous result.Yet the VC-1 decoder is unlike H.264 this restriction relation is arranged.In VC-1, to each 8x8 border (earlier vertically level) again filtering.All vertical edges can thereby be carried out the filtering after a while of 4 * 4 edges in fact abreast.Can utilize parallelization because only there are two pixels (edge one) to be updated, and these pixels are not used for calculating other edge.Since the deblocking effect data are to write back pool of execution units 740 but not texture quick gets 760, different IDF_H264_x instructions is provided, this sub-square is captured from diverse location.This can the 1st the table in see, in the narration of BlockAddress, Data Block1 and Data Block2 parameter.The IDF_H264_0 instruction is got whole 8 * 4 * 8 seat squares of 760 acquisitions from texture quick.IDF_H264_1 instruction is got half sub-square of 760 acquisitions and from half of pool of execution units 740 acquisition from texture quick.
The function that the IDF_H264_x that becomes with decoder 160 instructs will describe in detail in conjunction with Fig. 8.Next be described in the supply pixel data to before the video accelerator module 150, the processing of the pixel data that texture filtering unit 750 and pool of execution units 740 conversions are captured.
The conversion of view data
Above-mentioned order parameter provides desire to get 760 or separate the coordinate of the sub-square address of getting to texture filtering unit 750 from pool of execution units 740 from texture quick.View data comprises brightness (Y) and chroma (Cb, Cr) plane.The definition of YC flag input parameter will be handled Y plane or CbCr plane.
When handling brightness (Y) data, indicate as YC flag parameter, texture filtering unit 750 these sub-squares of acquisition also provide these 128 inputs as the hardware-accelerated logical circuit 400 of deblocking effect filter in the VC-1 loop (for example square input parameter of the VC-1 accelerator example of Fig. 4).The data that produced are to write the target buffer as 4 groups-buffer (register quad, that is, DST, DST+1, DST+2, DST+3).
When handling the chroma data, indicate as YC flag parameter, Cb and Cr square will be handled continuously by the hardware-accelerated logical circuit 400 of deblocking effect filter in the VC-1 loop.The data that produced are to write texture quick to get 760.In certain embodiments, this write operation took place in each cycle, and each cycle writes 256.
Some video accelerator module embodiment use interlacing scan CbCr plane, respectively save as a half width and half length.In these embodiments, interlacing scan is separated to the buffer that is used for linking up texture filtering unit 750 and video accelerator module 150 for video accelerator module 150 with the sub-square data of CbCr in texture filtering unit 750.Especially, texture filtering unit 750 writes this buffer with 24 * 4 Cb squares, then 24 * 4 Cr squares is write this buffer.8 * 4 Cb squares are at first handled by the hardware-accelerated logical circuit 400 of deblocking effect filter in the VC-1 loop, and the data that produced write texture quick and get 760.Then, 8 * 4 Cr squares are handled by the hardware-accelerated logical circuit 400 of deblocking effect filter in the VC-1 loop, and the data that produced write texture quick and get 760.Video accelerator module 150 uses CbCr flag parameter to handle in proper order to manage this.
Software decoder uses the deblocking effect instruction
In conjunction with the explanation of previous Fig. 1, decoder 160 is carried out on primary processor 110 but the video assisted instruction of also utilizing Graphics Processing Unit 120 to be provided.Especially H.264 the embodiment of deblocking effect filter 290 uses specific ID F_H264_x in conjunction with to handle the edge in the loop, complies with the H.264 order of defined, gets 760 acquisition one a little squares and captures other from pool of execution units 740 from texture quick.Suitably in conjunction with under, these IDF_H264_x instructions reach one by one that pixel reads and writes.
Fig. 8 is the calcspar that is used for 16 * 16 big squares H.264.This big square cuts into 16 4x4 squares, and each all will carry out deblocking effect.4 sub-squares among Fig. 8 can according to row and row definition (R1 for example, C2).H.264 definition is handled vertical edge earlier at the processing horizontal edge, edge order (a-h) as shown in Figure 8.
Therefore, this deblocking effect filter is the edge that is applied between an antithetical phrase square, and sub-square is to order filtering according to this:
edge a=[block to left of R1,C1]|[R1,C1];[block to left of R2,C1]|[R2,C1]
[block to left of R3,C1]|[R3,C1];[block to left of R4,C1]|[R4,C1]
edge b=[R1,C1]|[R2,C2];[R2,C1]|[R2,C2];
[R3,C1]|[R3,C2];[R4,C1]|[R4,C2];
edge c=[R1,C2]|[R2,C3];[R2,C2]|[R2,C3];
[R3,C2]|[R3,C3];[R4,C2]|[R4,C3];
edge d=[R1,C3]|[R2,C4];[R2,C3]|[R2,C4];
[R3,C3]|[R3,C4];[R4,C3]|[R4,C4];
edge e=[block to top of R1,C1]|[R1,C1];[block to topof R1,C2]|[R1,C2];
[block to top of R1,C3]|[R1,C3];[block to top ofR1,C4]|[R1,C4]
edge f=[R1,C1]|[R2,C1];[R1,C2]|[R2,C2];
[R1,C3]|[R2,C3];[R1,C4]|[R2,C4]
edge g=[R2,C1]|[R3,C1];[R2,C2]|[R3,C2];
[R2,C3]|[R3,C3];[R2,C4]|[R3,C4]
edge h=[R3,C1]|[R4,C1];[R3,C2]|[R4,C2];
[R3,C3]|[R4,C3];[R3,C4]|[R4,C4]
For the 1st antithetical phrase square, all download and get 760, because also there is not pixel to be changed because of using filter from texture quick.(in fact R1, pixel value C1), the 2nd row vertical edge share all pixels with the 1st row vertical edge although the filter of the 1st vertical edge (a) can change.Therefore, the 2nd antithetical phrase square (edge b) is also downloaded from texture quick and is got 760.Since the vertical edge between two adjacent column is not shared pixel, the 3rd to (edge c) and the 4th to (edge d) sub-square also together.
Judge and to download pixel data from that position by the specific ID F_H264_x instruction that deblocking effect filter in the loop 290 is sent.Order by 290 employed the 1st group of vertical edges of IDF_H264_x instruction process (a-d) of deblocking effect filter in the loop is:
IDF_H264_0 SRC1=address of(R1,C1);
IDF_H264_0 SRC1=address of(R2,C1);
IDF_H264_0 SRC1=address of(R3,C1);
IDF_H264_0 SRC1=address of(R4,C1);
Next, deblocking effect filter 290 is handled the 2nd vertical edge (b) in the loop, from (R1, C2) beginning.Be defined as (R1, C2) 4 pixels of Far Left and (R1, C1) the rightmost pixel overlapping of sub-square in 8 * 4 sub-squares.These by (R1, vertical edge filter c1) is handled, also may upgrade, the overlapping pixel be thereby read from pool of execution units 740 but not texture quick gets 760.Yet, in that (R1, C2) rightmost 4 pixels of sub-square also do not have filteredly, thereby read to get 760 from texture quick.Sub-square (R2, C2) arrive (R4, C2) also together.The order of deblocking effect filter 290 by ordering following IDF_H264_x is to handle the 2nd group of vertical edge, to finish this result in the loop:
IDF_H264_1 SRC1=address of (R1,C2);
IDF_H264_1 SRC1=address of (R2,C2);
IDF_H264_1 SRC1=address of (R3,C2);
IDF_H264_1 SRC1=address of (R4,C2);
When handling the 3rd group of vertical edge, from (R1, C3) beginning.(R1, C3) in the 8x4 square 4 pixels of Far Left with (R1, C2) the rightmost pixel of sub-square overlaps, thereby will read from pool of execution units 740 but not texture quick gets 760.
Yet, in that (R1, C2) rightmost 4 pixels of sub-square also do not have filteredly, thereby read to get 760 from texture quick.Sub-square (R1, C2) arrive (R4, C2) also together.Similar situation can take place in last group vertical edge.Therefore, deblocking effect filter 290 is left 2 groups of vertical edges by the order of ordering following IDF_H264_x to handle in the loop:
IDF_H264_1 SRC1=address of(R1,C3);
IDF_H264_1 SRC1=address of(R2,C3);
IDF_H264_1 SRC1=address of(R3,C3);
IDF_H264_1 SRC1=address of(R4,C3);
IDF_H264_1 SRC1=address of(R1,C4);
IDF_H264_1 SRC1=address of(R2,C4);
IDF_H264_1 SRC1=address of(R3,C4);
IDF_H264_1 SRC1=address of(R4,C4);
Follow processing horizontal edge (e-h).At this moment, deblocking effect filter has been applied to each the sub-square in the big square, thereby each pixel may be upgraded.Therefore, sending to each the sub-square that carries out horizontal edge filtering is to read from pool of execution units 740 but not texture quick gets 760.Therefore, in the loop order of deblocking effect filter 290 IDF_H264_x below order with the processing horizontal edge:
IDF_H264_2 SRC1=address of(R1,C1);
IDF_H264_2 SRC1=address of(R2,C1);
IDF_H264_2 SRC1=address of(R3,C1);
IDF_H264_2 SRC1=address of(R4,C1);
IDF_H264_2 SRC1=address of(R1,C2);
IDF_H264_2 SRC1=address of(R2,C2);
IDF_H264_2 SRC1=address of(R3,C2);
IDF_H264_2 SRC1=address of(R4,C2);
IDF_H264_2 SRC1=address of(R1,C3);
Square in any program description or the flow chart should be understood that representation module, section or subprogram sign indicating number, and it comprises the one or more executable instruction of the step that is used for realizing particular logic circuit function or program.The technical staff who is familiar with software department should recognize that other implementation method also is contained in the disclosed scope.In other implementation method, shown in each function can be disobeyed or the order that discloses carry out, comprise and carry out in fact synchronously or reverse carrying out, decide according to related function.
Can software, hardware or it is in conjunction with realization at the System and method for of this exposure.In certain embodiments, this system and/or method are existing the software in the memory to realize, and by the suitable processor that is arranged in calculation element performed (comprise and be not limited to microprocessor, microcontroller, network processing unit, can ressemble processor, extendible processor).In other embodiments, this system and/or method are to realize with logical circuit, comprise and be not limited to programmable logic device (PLD, programmable logic device), programmable gate array (PGA, programmable gate array), field programmable gate array (FPGA, field programmable gate array) or special circuit (ASIC).In other embodiments, these logical statements are to finish in graphic process unit or Graphics Processing Unit (GPU).
Can be embedded into any computer-readable media and use at the System and method for of this exposure, or link order executive system, unit.This instruction execution system comprises any system based on computer, contain the system of processor or other can be from this instruction execution system acquisition and the system that carries out these instructions.Disclosed literal " computer-readable media (computer-readable medium) " can be and anyly can hold, stores, links up, transmits or transmit this program as the instrument that uses or be connected with this instruction execution system.This computer-readable media can be, and for example (unrestricted) is system or the transmission medium based on electronics, magnetic, light, electromagnetism, ultrared or semiconductor technology.
Use the particular example (unrestricted) of the computer-readable media of electronic technology to comprise: to have the line that one or more electrical (electronics) connects; Random access memory (RAM, random access memory); Read-only memory (ROM, read-only memory); Can wipe programmable read only memory (EPROM or flash memory) away.Use the particular example (unrestricted) of the computer-readable media of magnetic technology to comprise: the portable computers disk.Use the particular example (unrestricted) of the computer-readable media of optical tech to comprise: optical fiber and portability read-only optical disc (CD-ROM).
Though the present invention illustrates and describes as embodiment with one or more specific example at this, details shown in but should not limiting the invention to, however still can not deviate under the spirit of the present invention and in the field of claim scope equalization and scope, realize many different modifications and structural change.Therefore, preferably explain, before claim scope subsequently, propose this statement with the claim of being enclosed ground in extensive range and with the method that meets field of the present invention.
Claims (19)
1. deblocking effect filter that is used for video decode comprises:
First logical circuit is used for judging whether the pixel of the intended pixel group in a plurality of pixel groups reaches standard;
Second logical circuit is arranged to when reaching this standard, and elder generation is to this intended pixel group's pixel filter; And
The 3rd logical circuit is arranged to when reaching this standard, according to the respective sets filter unit in many group filter units, in proper order to each remaining in these a plurality of pixel groups pixel group filtering.
2. deblocking effect filter according to claim 1, wherein these a plurality of pixel groups form the square pixel square, and each pixel group of these a plurality of pixel groups comprises a row pixel square.
3. deblocking effect filter according to claim 1, wherein these a plurality of pixel groups form the square pixel square, and each pixel group of these a plurality of pixel groups comprises the one-row pixels square.
4. deblocking effect filter according to claim 1, wherein the 3rd logical circuit also comprises:
The 4th logical circuit is arranged to be updated in the first intended pixel group in remaining each pixel group according to the second intended pixel group in remaining each pixel group.
5. deblocking effect filter according to claim 1, this second logical circuit also comprises:
The 5th logical circuit is arranged in when reaching this standard, and elder generation is to this intended pixel group's pixel filtering abreast.
6. deblocking effect filter according to claim 1, wherein this deblocking effect filter be applied to sub-square to the edge to remove the edge product.
7. deblocking effect filter according to claim 1, wherein this deblocking effect filter uses a plurality of graphics process instruction of suitable combination pixel reads and writes to reach one by one.
8. deblocking effect filter according to claim 1, wherein this deblocking effect filter is to define according to the VC-1 standard.
9. a Video Decoder comprises:
Entropy decoder receives the input coding bit stream;
Spatial decoder receives the output of this entropy decoder and the encoded picture that generation comprises a plurality of pixels;
First logical circuit is arranged to combine picture with predicted pictures to produce in conjunction with present picture; And deblocking effect filter in the loop, receiving this in conjunction with picture, deblocking effect filter comprises in this loop:
Second logical circuit is arranged to the filtering to the intended pixel group; And
The 3rd logical circuit is arranged to when this intended pixel group reaches standard, according to the respective sets filter unit in many group filter units, to each remaining in these a plurality of pixel groups pixel group filtering.
10. Video Decoder according to claim 9, wherein these a plurality of pixel groups form the square pixel square, and each pixel group of these a plurality of pixel groups comprises a row pixel square.
11. Video Decoder according to claim 9, wherein these a plurality of pixel groups form the square pixel square, and each pixel group of these a plurality of pixel groups comprises the one-row pixels square.
12. Video Decoder according to claim 9, wherein this second logical circuit also comprises:
The 4th logical circuit is arranged in when reaching this standard, and elder generation is to this intended pixel group's pixel filtering abreast.
13. Video Decoder according to claim 9, wherein the 3rd logical circuit also comprises:
The 5th logical circuit is arranged to be updated in the first intended pixel group in remaining each pixel group according to the second intended pixel group in remaining each pixel group.
14. Video Decoder according to claim 9, wherein this blocking artifact filter is to define according to the VC-1 standard.
15. a Graphics Processing Unit comprises:
Main Processing Interface receives at least one video assisted instruction; And
The video accelerator module is used for this at least one video assisted instruction of response, and this video accelerator module comprises deblocking effect filter in the loop, and deblocking effect filter comprises in this loop:
First logical circuit is arranged to judge whether the intended pixel group's of a plurality of pixel groups pixel reaches first standard;
Second logical circuit is arranged to when reaching this first standard, and elder generation is to this intended pixel group's pixel filter; And
The 3rd logical circuit is arranged to when reaching this first standard, according to the respective sets filter unit in many group filter units, in proper order to pixel group filtering remaining in these a plurality of pixel groups.
16. Graphics Processing Unit according to claim 15, wherein these a plurality of pixel groups form the square pixel square, and each pixel group of these a plurality of pixel groups comprises a row pixel square.
17. Graphics Processing Unit according to claim 15, wherein these a plurality of pixel groups form the square pixel square, and each pixel group of these a plurality of pixel groups comprises the one-row pixels square.
18. Video Decoder according to claim 15, wherein the 3rd logical circuit also comprises:
The 4th logical circuit is arranged to be updated in the first intended pixel group in remaining each pixel group according to the second intended pixel group in remaining each pixel group.
19. Video Decoder according to claim 15, wherein this blocking artifact filter is to define according to the VC-1 standard.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US81462306P | 2006-06-16 | 2006-06-16 | |
US60/814,623 | 2006-06-16 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101072351A true CN101072351A (en) | 2007-11-14 |
CN101072351B CN101072351B (en) | 2012-11-21 |
Family
ID=38880763
Family Applications (6)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2007101103594A Active CN101072351B (en) | 2006-06-16 | 2007-06-13 | Systems and methods of video compression deblocking |
CN2007101101936A Active CN101068353B (en) | 2006-06-16 | 2007-06-18 | Graph processing unit and method for calculating absolute difference and total value of macroblock |
CN200710111956.9A Active CN101083764B (en) | 2006-06-16 | 2007-06-18 | Programmable video processing unit and video data processing method |
CN2007101101940A Active CN101068365B (en) | 2006-06-16 | 2007-06-18 | Method for judging moving vector for describing refrence square moving and the storage media |
CN2007101101921A Active CN101068364B (en) | 2006-06-16 | 2007-06-18 | Video encoder and graph processing unit |
CN2007101119554A Active CN101083763B (en) | 2006-06-16 | 2007-06-18 | Programmable video processing unit and video data processing method |
Family Applications After (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2007101101936A Active CN101068353B (en) | 2006-06-16 | 2007-06-18 | Graph processing unit and method for calculating absolute difference and total value of macroblock |
CN200710111956.9A Active CN101083764B (en) | 2006-06-16 | 2007-06-18 | Programmable video processing unit and video data processing method |
CN2007101101940A Active CN101068365B (en) | 2006-06-16 | 2007-06-18 | Method for judging moving vector for describing refrence square moving and the storage media |
CN2007101101921A Active CN101068364B (en) | 2006-06-16 | 2007-06-18 | Video encoder and graph processing unit |
CN2007101119554A Active CN101083763B (en) | 2006-06-16 | 2007-06-18 | Programmable video processing unit and video data processing method |
Country Status (2)
Country | Link |
---|---|
CN (6) | CN101072351B (en) |
TW (6) | TWI444047B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102254297A (en) * | 2010-10-15 | 2011-11-23 | 威盛电子股份有限公司 | Multiple shader system and processing method thereof |
CN106162186A (en) * | 2011-01-03 | 2016-11-23 | 联发科技股份有限公司 | Loop filter method based on filter unit |
US10567751B2 (en) | 2011-01-03 | 2020-02-18 | Hfi Innovation Inc. | Method of filter-unit based in-loop filtering |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8705622B2 (en) | 2008-04-10 | 2014-04-22 | Qualcomm Incorporated | Interpolation filter support for sub-pixel resolution in video coding |
US9077971B2 (en) | 2008-04-10 | 2015-07-07 | Qualcomm Incorporated | Interpolation-like filtering of integer-pixel positions in video coding |
US9967590B2 (en) | 2008-04-10 | 2018-05-08 | Qualcomm Incorporated | Rate-distortion defined interpolation for video coding based on fixed filter or adaptive filter |
EP2359590A4 (en) * | 2008-12-15 | 2014-09-17 | Ericsson Telefon Ab L M | Method and apparatus for avoiding quality deterioration of transmitted media content |
CN101901588B (en) * | 2009-05-31 | 2012-07-04 | 比亚迪股份有限公司 | Method for smoothly displaying image of embedded system |
CN102164284A (en) * | 2010-02-24 | 2011-08-24 | 富士通株式会社 | Video decoding method and system |
US8295619B2 (en) * | 2010-04-05 | 2012-10-23 | Mediatek Inc. | Image processing apparatus employed in overdrive application for compressing image data of second frame according to first frame preceding second frame and related image processing method thereof |
TWI395490B (en) * | 2010-05-10 | 2013-05-01 | Univ Nat Central | Electrical-device-implemented video coding method |
KR101567467B1 (en) * | 2011-05-10 | 2015-11-09 | 미디어텍 인크. | Method and apparatus for reduction of in-loop filter buffer |
RU2619706C2 (en) | 2011-06-28 | 2017-05-17 | Самсунг Электроникс Ко., Лтд. | Method and device for encoding video, and method and device for decoding video which is accompanied with internal prediction |
TWI612802B (en) * | 2012-03-30 | 2018-01-21 | Jvc Kenwood Corp | Image decoding device, image decoding method |
US9953455B2 (en) | 2013-03-13 | 2018-04-24 | Nvidia Corporation | Handling post-Z coverage data in raster operations |
US10154265B2 (en) | 2013-06-21 | 2018-12-11 | Nvidia Corporation | Graphics server and method for streaming rendered content via a remote graphics processing service |
CN105872553B (en) * | 2016-04-28 | 2018-08-28 | 中山大学 | A kind of adaptive loop filter method based on parallel computation |
US20180174359A1 (en) * | 2016-12-15 | 2018-06-21 | Mediatek Inc. | Frame difference generation hardware in a graphics system |
CN111028133B (en) * | 2019-11-21 | 2023-06-13 | 中国航空工业集团公司西安航空计算技术研究所 | Graphic command pre-decoding device based on SystemVerilog |
Family Cites Families (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3578498B2 (en) * | 1994-12-02 | 2004-10-20 | 株式会社ソニー・コンピュータエンタテインメント | Image information processing device |
US5627657A (en) * | 1995-02-28 | 1997-05-06 | Daewoo Electronics Co., Ltd. | Method for sequentially displaying information recorded on interactive information recording medium |
US6064450A (en) * | 1995-12-06 | 2000-05-16 | Thomson Licensing S.A. | Digital video preprocessor horizontal and vertical filters |
JP3876392B2 (en) * | 1996-04-26 | 2007-01-31 | 富士通株式会社 | Motion vector search method |
JPH10145753A (en) * | 1996-11-15 | 1998-05-29 | Sony Corp | Receiver and its method |
US6496537B1 (en) * | 1996-12-18 | 2002-12-17 | Thomson Licensing S.A. | Video decoder with interleaved data processing |
US6177922B1 (en) * | 1997-04-15 | 2001-01-23 | Genesis Microship, Inc. | Multi-scan video timing generator for format conversion |
JP3870491B2 (en) * | 1997-07-02 | 2007-01-17 | 松下電器産業株式会社 | Inter-image correspondence detection method and apparatus |
US6487249B2 (en) * | 1998-10-09 | 2002-11-26 | Matsushita Electric Industrial Co., Ltd. | Efficient down conversion system for 2:1 decimation |
US6573905B1 (en) * | 1999-11-09 | 2003-06-03 | Broadcom Corporation | Video and graphics system with parallel processing of graphics windows |
JP3757116B2 (en) * | 1998-12-11 | 2006-03-22 | 松下電器産業株式会社 | Deblocking filter calculation device and deblocking filter calculation method |
CN1112714C (en) * | 1998-12-31 | 2003-06-25 | 上海永新彩色显象管有限公司 | Kinescope screen washing equipment and method |
CN1132432C (en) * | 1999-03-23 | 2003-12-24 | 三洋电机株式会社 | video decoder |
KR100677082B1 (en) * | 2000-01-27 | 2007-02-01 | 삼성전자주식회사 | Motion estimator |
JP4461562B2 (en) * | 2000-04-04 | 2010-05-12 | ソニー株式会社 | Playback apparatus and method, and signal processing apparatus and method |
US6717988B2 (en) * | 2001-01-11 | 2004-04-06 | Koninklijke Philips Electronics N.V. | Scalable MPEG-2 decoder |
US7940844B2 (en) * | 2002-06-18 | 2011-05-10 | Qualcomm Incorporated | Video encoding and decoding techniques |
CN1332560C (en) * | 2002-07-22 | 2007-08-15 | 上海芯华微电子有限公司 | Method based on difference between block bundaries and quantizing factor for removing block effect without additional frame memory |
US6944224B2 (en) * | 2002-08-14 | 2005-09-13 | Intervideo, Inc. | Systems and methods for selecting a macroblock mode in a video encoder |
US7336720B2 (en) * | 2002-09-27 | 2008-02-26 | Vanguard Software Solutions, Inc. | Real-time video coding/decoding |
US7027515B2 (en) * | 2002-10-15 | 2006-04-11 | Red Rock Semiconductor Ltd. | Sum-of-absolute-difference checking of macroblock borders for error detection in a corrupted MPEG-4 bitstream |
FR2849331A1 (en) * | 2002-12-20 | 2004-06-25 | St Microelectronics Sa | METHOD AND DEVICE FOR DECODING AND DISPLAYING ACCELERATED ON THE ACCELERATED FRONT OF MPEG IMAGES, VIDEO PILOT CIRCUIT AND DECODER BOX INCORPORATING SUCH A DEVICE |
US6922492B2 (en) * | 2002-12-27 | 2005-07-26 | Motorola, Inc. | Video deblocking method and apparatus |
CN100424717C (en) * | 2003-03-17 | 2008-10-08 | 高通股份有限公司 | Method and apparatus for improving video quality of low bit-rate video |
US7660352B2 (en) * | 2003-04-04 | 2010-02-09 | Sony Corporation | Apparatus and method of parallel processing an MPEG-4 data stream |
US7274824B2 (en) * | 2003-04-10 | 2007-09-25 | Faraday Technology Corp. | Method and apparatus to reduce the system load of motion estimation for DSP |
NO319007B1 (en) * | 2003-05-22 | 2005-06-06 | Tandberg Telecom As | Video compression method and apparatus |
US20050013494A1 (en) * | 2003-07-18 | 2005-01-20 | Microsoft Corporation | In-loop deblocking filter |
US7650032B2 (en) * | 2003-08-19 | 2010-01-19 | Panasonic Corporation | Method for encoding moving image and method for decoding moving image |
US20050105621A1 (en) * | 2003-11-04 | 2005-05-19 | Ju Chi-Cheng | Apparatus capable of performing both block-matching motion compensation and global motion compensation and method thereof |
US7292283B2 (en) * | 2003-12-23 | 2007-11-06 | Genesis Microchip Inc. | Apparatus and method for performing sub-pixel vector estimations using quadratic approximations |
CN1233171C (en) * | 2004-01-16 | 2005-12-21 | 北京工业大学 | A simplified loop filtering method for video coding |
US20050262276A1 (en) * | 2004-05-13 | 2005-11-24 | Ittiam Systamc (P) Ltd. | Design method for implementing high memory algorithm on low internal memory processor using a direct memory access (DMA) engine |
NO20042477A (en) * | 2004-06-14 | 2005-10-17 | Tandberg Telecom As | Chroma de-blocking procedure |
US20060002479A1 (en) * | 2004-06-22 | 2006-01-05 | Fernandes Felix C A | Decoder for H.264/AVC video |
US8116379B2 (en) * | 2004-10-08 | 2012-02-14 | Stmicroelectronics, Inc. | Method and apparatus for parallel processing of in-loop deblocking filter for H.264 video compression standard |
NO322722B1 (en) * | 2004-10-13 | 2006-12-04 | Tandberg Telecom As | Video encoding method by reducing block artifacts |
CN1750660A (en) * | 2005-09-29 | 2006-03-22 | 威盛电子股份有限公司 | Method for calculating moving vector |
-
2007
- 2007-06-05 TW TW096120098A patent/TWI444047B/en active
- 2007-06-13 CN CN2007101103594A patent/CN101072351B/en active Active
- 2007-06-15 TW TW096122002A patent/TWI383683B/en active
- 2007-06-15 TW TW096122009A patent/TWI348654B/en active
- 2007-06-15 TW TW096121865A patent/TWI395488B/en active
- 2007-06-15 TW TW096121890A patent/TWI482117B/en active
- 2007-06-15 TW TW096122000A patent/TWI350109B/en active
- 2007-06-18 CN CN2007101101936A patent/CN101068353B/en active Active
- 2007-06-18 CN CN200710111956.9A patent/CN101083764B/en active Active
- 2007-06-18 CN CN2007101101940A patent/CN101068365B/en active Active
- 2007-06-18 CN CN2007101101921A patent/CN101068364B/en active Active
- 2007-06-18 CN CN2007101119554A patent/CN101083763B/en active Active
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102254297A (en) * | 2010-10-15 | 2011-11-23 | 威盛电子股份有限公司 | Multiple shader system and processing method thereof |
CN102254297B (en) * | 2010-10-15 | 2013-11-27 | 威盛电子股份有限公司 | Multiple shader system and processing method thereof |
CN106162186A (en) * | 2011-01-03 | 2016-11-23 | 联发科技股份有限公司 | Loop filter method based on filter unit |
US10567751B2 (en) | 2011-01-03 | 2020-02-18 | Hfi Innovation Inc. | Method of filter-unit based in-loop filtering |
Also Published As
Publication number | Publication date |
---|---|
CN101068353A (en) | 2007-11-07 |
TW200821986A (en) | 2008-05-16 |
TWI482117B (en) | 2015-04-21 |
CN101083763A (en) | 2007-12-05 |
TW200816082A (en) | 2008-04-01 |
TW200816820A (en) | 2008-04-01 |
CN101068353B (en) | 2010-08-25 |
TW200803525A (en) | 2008-01-01 |
CN101068365A (en) | 2007-11-07 |
TWI348654B (en) | 2011-09-11 |
TW200803527A (en) | 2008-01-01 |
CN101072351B (en) | 2012-11-21 |
CN101068364B (en) | 2010-12-01 |
TWI383683B (en) | 2013-01-21 |
TWI350109B (en) | 2011-10-01 |
TWI444047B (en) | 2014-07-01 |
CN101083764A (en) | 2007-12-05 |
CN101083764B (en) | 2014-04-02 |
CN101083763B (en) | 2012-02-08 |
TWI395488B (en) | 2013-05-01 |
CN101068364A (en) | 2007-11-07 |
TW200803528A (en) | 2008-01-01 |
CN101068365B (en) | 2010-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101072351B (en) | Systems and methods of video compression deblocking | |
US8369419B2 (en) | Systems and methods of video compression deblocking | |
US8243815B2 (en) | Systems and methods of video compression deblocking | |
US20120307004A1 (en) | Video decoding with 3d graphics shaders | |
CN103918273B (en) | It is determined that the method for the binary code word for conversion coefficient | |
EP0763305B1 (en) | Apparatus and method for decoding video images | |
KR20130140066A (en) | Video coding methods and apparatus | |
CN105556964A (en) | Content adaptive bi-directional or functionally predictive multi-pass pictures for high efficiency next generation video coding | |
CN102804165A (en) | Front end processor with extendable data path | |
CN109246430A (en) | 360 degree of video fast intra-mode predictions of virtual reality and CU, which are divided, shifts to an earlier date decision | |
Cheung et al. | Highly parallel rate-distortion optimized intra-mode decision on multicore graphics processors | |
Kopperundevi et al. | A high throughput hardware architecture for deblocking filter in HEVC | |
US20230063062A1 (en) | Hardware codec accelerators for high-performance video encoding | |
Baldev et al. | A directional and scalable streaming deblocking filter hardware architecture for HEVC decoder | |
Kim et al. | An efficient architecture of in-loop filters for multicore scalable HEVC hardware decoders | |
Doan et al. | Multi-asip based parallel and scalable implementation of motion estimation kernel for high definition videos | |
Sima et al. | Color space conversion for MPEG decoding on FPGA-augmented trimedia processor | |
Kopperundevi et al. | Methods to develop high throughput hardware architectures for HEVC Deblocking Filter using mixed pipelined-block processing techniques | |
Jiang et al. | FIPIP: A novel fine-grained parallel partition based intra-frame prediction on heterogeneous many-core systems | |
CN109963158A (en) | A kind of high definition video decoding method based on GPU parallel computation | |
Ren et al. | Parallel streaming intra prediction for full HD H. 264 encoding | |
Li et al. | Transform coding on programmable stream processors | |
Rosa et al. | FPGA prototyping strategy for a H. 264/AVC video decoder | |
Tseng et al. | Hardware–software co-design architecture for joint photo expert graphic XR encoder | |
US20090175345A1 (en) | Motion compensation method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |