CN101123723B

CN101123723B - Digital video decoding method based on image processor

Info

Publication number: CN101123723B
Application number: CN2006100892521A
Authority: CN
Inventors: 周秉锋; 韩博
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2006-08-11
Filing date: 2006-08-11
Publication date: 2011-01-12
Anticipated expiration: 2026-08-11
Also published as: CN101123723A

Abstract

The invention provides a decoding method for compression video based on a GPU. The method utilizes point graphic element to show video block instead of square and maps decoding segments to the GPU except variable length decoding, and then the GPU organizes video data into point collection and completes decoding process by drawing point collection. The invention has the advantages of a CPU and the GPU so that the two work in parallel accelerate video decoding process; the invention has high performance of hardware decoding and flexibility of software decoding at the same time, can handle a plurality of video compression format and standard, replace a special decoding hardware of personnel computer with the GPU, a game console, a handheld mobile device, etc, and improve the utilization of hardware resources and reduce cost.

Description

Digital video decoding method based on graphic process unit

Technical field

The invention belongs to the computer digit field of video compression, be specifically related to a kind of graphic process unit (GraphicsProcessing Unit-GPU) of utilizing and finish video frequency decoding method.

Background technology

Digital video has been widely used in relating to Digital Television in the daily life, personal computer, handheld mobile device and amusement, every field such as education.For users, the most basic requirement is exactly that the high-quality of video content is play (decoding) in real time.But video compression standard needs to adopt the video compression technology of high computation complexity in order to obtain the picture quality that high compression ratio is become reconciled, and this directly causes its decode procedure need consume a large amount of computational resources.

Common video compression standard is the basic processing unit with the macro block of 16 * 16 sizes mostly, with reference to figure 1, for finishing decode procedure, need each macro block is finished following processing links successively: variable length decoding, inverse quantization, inverse discrete cosine transform (Inverse Discrete Cosine Transform-IDCT), motion compensation and color space conversion.Variable length decoding is finished the parsing of video bit stream, recovers the entropy coding information of video, the parameter of each macro block for example, and coefficient and motion vector etc., this process is the bit manipulation of strict serial.Inverse quantization subsequently and IDCT act on each coefficient block that constitutes macro block, handle sparse DCT coefficient, are used for recovering original block of pixels, this conversion process computation complexity height.Motion compensation is the effective ways that reduce time redundancy in the video sequence, is basic unit with macro block.This process is the most similar image block of macro block in one of search and the present image in reference frame in the basic principle of coding stage, promptly predict piece, Search Results is represented with motion vector, calculates the difference between current macro and the prediction piece, then this difference and motion vector is encoded.Motion compensation is exactly the process that recovers coded image by difference and motion vector.Because good prediction often brings better code efficiency, therefore common video coding system has all adopted technology such as bi-directional predicted (B frame) and sub-pixel precision motion vector to improve the accuracy of estimation.Though improved precision of prediction and compression ratio, but further increased the complexity of computing.Last color space conversion process is carried out the multiplying of color vector (RGB) and conversion rectangle at each pixel in the image, is typical computation-intensive process.This shows that the decode procedure of video is by a plurality of complication systems that processing links constituted jointly consuming time.

In the face of the complicated compress technique that high-quality and high-resolution video and compression standard of new generation (for example H.264) are introduced, utilize the software decoder of CPU merely in the computer system at present even can't satisfy the demand of real-time decoding video.Therefore, thus need other subsystem to share the pressure that the partial decoding of h task is alleviated CPU.Video decode hardware special-purpose over past ten years is introduced in the computer system, occurs with independent integrated circuit board or the form that is integrated in the graphic hardware.The popularization of DirectX Video Acceleration (DXVA) standard of Microsoft makes the latter become present main flow.But this special-purpose decoding hardware often can only thereby possess very limited autgmentability and programmability at certain specific video compression standard (most of is MPEG-2), lacks enough flexibilities and deals with various video compression format at present.Though begun the Video processing hardware of integrated programmable at present on the graphics card, the technology such as Avivo of the PureVideo of Nvidia and ATI for example, but their need extra hardware spending and the cost of Geng Gao, and lack effective high-level language at present and application programming interfaces are controlled these bottom hardware resources easily.

On the other hand, along with 3-D graphic application and development and popularization, graphic hardware has developed becomes a kind of graphic process unit that has high-performance and flexibility concurrently, GPU just, main part able to programme comprises vertex processor (Vertex Processor) and pixel processor (Fragment Processor) at present.This two parts processing unit constitutes the pipeline processes structure of GPU in conjunction with grating and synthesizer.The high-performance that the graphic process unit large-scale parallel is brought, the ripe programmability that senior painted language brought and the support (32 floating numbers) of high accuracy data type make GPU become the coprocessor of a very attractive except that CPU in the computer system, and can be used to solve a lot of general-purpose computations problems (GPGPU) outside the figure field, numerical computations for example, signal processing, fluid simulation etc.Consider that from the angle of architecture GPU is the stream handle based on the highly-parallel of vector operations, this structure has very big similar figures with some successful dedicated multimedias and video processor.These are all for realizing that on GPU video decode provides strong support efficiently.

But GPU is that the data of processing all are regular relatively summit and pixels, so can not be directly used in the video decoding process of relative complex and multiple-limb for accelerated graphics calculates from being designed into development.Except last color space conversion link, the texture method that the GPGPU field is commonly used also is not suitable for this decode procedure.Main cause is that present most of video compression standards are all based on this institutional framework of macro block/coefficient block; Each macro block or coefficient block all have own distinctive parameter and attribute, and be different each other, is not easy to represent with regular single texture.Some for example utilize the DCT/IDCT conversion of GPU based on the previous work of texture representation, and performance is compared not advantage with CPU, but also has appreciable data transfer overhead.Document " Acceleratevideo decoding with generic GPU " (Shen G. etc., IEEE Transaction on Circuits andSystems for Video Technology, May 2005) utilize little rectangle to represent macro block, thereby finish the movement compensation process in the decoding, though effectively but still have a problem such as data redundancy.The computational resource that these methods do not make full use of GPU causes performance on the low side, and is not suitable for practical video decoding system.

Summary of the invention

The objective of the invention is to solve the deficiency of present software and hardware decoding scheme in performance or flexibility, proposed a kind of compressed video decoder method based on GPU.This method has the high-performance of hardware and the flexibility of software concurrently, is applicable to the various video compression standard, can be used for substituting the personal computer that is equipped with GPU, game host, special use decoding hardware on the handheld mobile device etc., the utilance of raising hardware resource reduces cost.

Above-mentioned purpose of the present invention is achieved by the following technical solutions:

A kind of digital video decoding method based on graphic process unit, its step comprises:

1) the CPU variable length decoding obtains macro block and coefficient block, and with the element figure in the graphic plotting " point " expression, generates the macro block point set of macro block correspondence and the DCT coefficient point set of coefficient block correspondence respectively;

2) CPU sends into GPU with macro block point set and DCT coefficient point set with batch processing mode in batches;

3) draw macro block point set and DCT coefficient point set, GPU carries out corresponding summit and pixel processor is finished video decoding process.

The present invention will constitute the expression of base unit-" macro block and the coefficient block " of video with element figure-" point " in the graphic plotting, thereby traditional video decoding process is mapped as the drawing process of point set, then give full play of the advantage of GPU pipeline processes and MPP, obtain higher decoding performance.In the process of drawing point set, controlling GPU by summit and pixel program goes up programmable vertex processor and pixel processor and finishes key link in the decode procedure: inverse quantization, IDCT, motion compensation and color space conversion, and further utilize synthesis unit (Blending) and texture filtering unit on the GPU to share the part calculation task.This technical scheme specifically comprises the following aspects:

1) utilize some pel rather than rectangle to represent video blocks information.Operation principle is to utilize a little attribute (four-dimensional vector) as the position, the macro block in the store videos such as normal and texture coordinate and the type of coefficient block, position, parameter, information such as coefficient.Wherein macro block and coefficient block are distinguished the different point set of corresponding two classes: macro block point set and DCT coefficient point set are respectively applied for motion compensation and IDCT.Wherein the generative process of DCT coefficient point set has utilized Zigzag to scan the number that reduces a centrostigma.Consider branch process ability and the dissimilar macro block or the different operating process of coefficient block correspondence of GPU poor efficiency, in the process that generates DCT coefficient point set and macro block point set, utilize CPU that two class point sets have been done further segmentation, corresponding piece with generic operation is assigned in the class subclass, for example in the macro block all nonanticipating macro block (Intra) poly-be a class, all forward prediction macroblocks (forward) are poly-for another kind of.

2) inverse quantization in the decode procedure and IDCT process are by once drawing 1) in the DCT coefficient point set created finish.Wherein inverse quantization is finished by the vertex processor of GPU fully, and IDCT mainly finishes in pixel processor, and both constitute pipeline organization and have improved execution efficient.Quantization parameter in the inverse quantization and DCT coefficient are sent into vertex processor by the attribute of a pel and are quantized rectangle and set in advance the constant register of putting into vertex processor by the uniform parameter.The IDCT process is by finishing at pixel processing unit neutral line combination DCT coefficient and corresponding basic image, and basic image leaves in the video memory of GPU with the form of texture after preliminary treatment.For the DCT coefficient that belongs to same coefficient block that is distributed in a plurality of points, utilize add up result's (residual image texture) in the IDCT output buffers in a plurality of somes pels of hybrid device (blending) among the GPU.

3) movement compensation process is by drawing 1) in the macro block point set created finish sampling reference picture texture and step 2 in pixel processing unit) in the IDCT output texture of output, the sampled result that adds up is also done the saturation computing, finishes movement compensation process.For the motion compensation of subpixel accuracy, utilize the bilinear filtering hardware of GPU texture cell to realize the interpolation arithmetic of sub-pixel.

Advantage of the present invention can be summarized as the following aspects:

1) this method combines CPU and GPU advantage separately, makes both concurrent working accelerating decoding of video processes, has the high-performance of hardware decoding and the flexibility of software decode simultaneously, can handle various video compression format and standard.

2) the special-purpose vision hardware of contrast, this solution can be accomplished platform independence and system independence based on the figure API (as OpenGL) and the senior painted language (as CG and GLSL) on upper strata, the particular hardware that is independent of bottom realizes, be applicable in the system of the various GPU of being furnished with, as personal computer, game host, mobile phone and PDA etc.The evolution speed of GPU is fast, and the performance increasing degree constantly increases new function and characteristic and brings programmability more flexibly considerably beyond Moore's Law, has more potentiality than soft decoding of CPU and specialized hardware in the long term.

3) this method is represented macro block and coefficient block with point, realizes simply, and control flexibly.Contrast texture representation is only transmitted nonzero coefficient based on the method for point; Contrast rectangle representation has been eliminated the mass of redundancy data on four summits in the rectangle, thereby has been reduced transport overhead, has reduced bandwidth demand.Simultaneously, point methods control is conveniently rejected not encoding block (non-coded block) flexibly, and the pairing DCT coefficient point of coefficient block pel generative process rejected zero coefficient automatically, has reduced unnecessary calculating.Based on the expression mode of point, also conveniently utilize vertex processor and rasterisation hardware in the GPU processing streamline, fully excavate the computational resource of GPU.On the other hand, utilize CPU to be divided into the method for different point sets, eliminated the bottleneck of GPU branch process, improved performance.

Description of drawings

Be to brief description of drawings of the present invention below:

Fig. 1 is the key link schematic diagram of typical video decoding process.

Fig. 2 is a corresponding hardware system construction drawing of the present invention.

Fig. 3 is the schematic diagram of the macroblock/block structure of digital video.

Fig. 4 is that the present invention utilizes GPU to carry out the overall flow figure of video decode by drawing point set.

Fig. 5 is the schematic diagram that produces DCT coefficient point pel among the present invention from the coefficient block of video.

Fig. 6 is the schematic diagram that DCT base image forms texture.

Fig. 7 a is the structural representation of the output buffers of IDCT process.

Fig. 7 b is the structural representation of the frame buffer of movement compensation process.

Fig. 8 a is the schematic diagram of subpixel accuracy motion compensated interpolation process.

Fig. 8 b is the schematic diagram of texture filtering unit bilinear interpolation.

Fig. 9 draws the process schematic diagram that DCT coefficient point set is finished inverse quantization and IDCT.

Figure 10 draws the process schematic diagram that the macro block point set is finished motion compensation.

Embodiment

Below with reference to accompanying drawing of the present invention, most preferred embodiment of the present invention is described in more detail.

Fig. 2 has illustrated the structure chart of the pairing hardware system of the present invention.The present invention needs CPU to cooperate to finish whole decode procedure with GPU, and both can executed in parallel, further raises the efficiency.Be connected by system bus between CPU and GPU, for example PCIE or AGP.Bus bandwidth is a limited resources, and data transfer overhead is the key factor that influences overall performance.The considerable advantage that the present invention contrasts existing method just is to have avoided useless or redundant data, has obviously reduced volume of transmitted data.CPU concentrates the point that the decoding information needed of macro block in the video and coefficient block encapsulates into drafting usefulness, is temporary in the Installed System Memory with summit array or other forms, imports GPU into by system bus then.GPU is the main performance element of decoding task among the present invention, finishes main decoding task, requires the video memory of programmable summit and pixel processor and a constant volume to be used to deposit calculated data and intermediate object program.

The present invention proposes and a kind ofly represent macro block and coefficient block in the video, draw corresponding macro block and the pairing point set of coefficient block is realized video frequency decoding method by graphic hardware GPU with a some pel.Handling process of the present invention as shown in Figure 4.Realize the concrete steps of video decode below in conjunction with the detailed explanation the present invention of accompanying drawing:

1) point set of macro block and coefficient block correspondence in the CPU variable length decoding generation video.

At first CPU finishes the information that variable length decoding obtains macro block and coefficient block in the video, then video information is encapsulated in the attribute of a pel into, and be categorized into different points according to the dissimilar or processing procedure of macro block and coefficient block and concentrate, after handling all video blocks, corresponding point set is sent into GPU with batch fashion (as the summit array) in batches, improves the efficient that GPU is parallel and streamline is carried out.

Point set is divided into two big classes: DCT coefficient point set and macro block point set.The main foundation of this division is the block-based formation structure of present compressed video, and as shown in Figure 3, macro block wherein is the base unit of motion compensation, and the coefficient block of formation macro block is the base unit of inverse quantization and IDCT.Two big class point sets can further be divided into new subclass according to the type and the characteristic of piece.For example DCT coefficient point set can further be divided into a DCT coding point set and frame DCT coding point set according to the difference of DCT coded system; The macro block point set can be subdivided into nonanticipating macro block (Intra) collection, unidirectional predicted macroblocks collection and bi-directional predicted macro block collection etc. according to macro block (mb) type.The often corresponding different decoding process of these dissimilar video blocks is utilized CPU to be categorized into subclass in advance and is also sent into the GPU execution respectively, can avoid GPU to go up branch operation consuming time, improves whole decoding efficiency.

It is different that information in macro block and the coefficient block encapsulates the process of a pel into, but basic thought all is a plurality of vector attributes such as the position that utilizes the some pel, normal, and color, texture coordinate waits the type in the store video piece, parameter, useful informations such as coefficient.

The main information that comprises in the macro block is the position of macro block, and (intra inter) and motion vector, can directly put into the vector attribute of a pel, thereby macro block is converted into a pel type.

The main information of coefficient block is the DCT coefficient.Have benefited from energy centralization characteristic and the quantizing process of DCT, 64 DCT coefficients in one 8 * 8 coefficient block only have a small amount of nonzero value to be distributed in low frequency part.Though a small amount of coefficient directly can be put into the attribute of a pel, the coefficient irregular distribution in the different coefficient block is unfavorable for forming the regular point set that is fit to the GPU processing, generates regular structure so need reorganize the coefficient in each coefficient block.We utilize the Zigzag file layout of DCT coefficient to generate corresponding coefficient point pel, as shown in Figure 5.Zigzag scanning is converted into the one dimension form with two dimension nonzero coefficient is put together as much as possible.Based on the one dimension coefficient array of Zigzag, per four coefficients are a four-dimensional attribute in the group corresponding points pel.Be the regularity of guarantee point, each or specific a plurality of four-dimensional attribute are put into a point, the index (coefficient index) of this group coefficient in this one-dimension array of packing into simultaneously, position together with coefficient block, type, quantified parameter information forms a DCT coefficient point pel.The direct result of this method may produce a plurality of somes pels to each video blocks exactly.The result that we utilize the IDCT process will be distributed in the difference subsequently adds up.

The generation method of above-mentioned some pel acts on all macro blocks and the coefficient block in every two field picture, the point set that produces is stored in the Installed System Memory with the form of summit array (Vertex Array), utilize figure API to draw point set then, data are sent into GPU with batch fashion and are finished subsequently decode procedure.

2) initialization figure API draws environment.

A) call size (for example glPointSize among the OpenGL) behind the pel rasterisation of api function set-point.Size is set to 8 textures that also activate the smart mode of point and generates (Point Sprite ARBExtension) when drawing DCT coefficient point set, is set to 16 when drawing the macro block point set.For the block structure of variable-size, the size of piece can be left in the attribute a little, the PSIZE register in the vertex processor of change GPU is realized different rasterisation sizes.

B) on GPU, distribute from screen buffer memory (off-screen buffer) space output result in the middle of depositing.We have distributed an IDCT output buffers and three frame buffers.In order to guarantee the precision of IDCT computing, the output buffers of IDCT is single pass 16 floating number forms (fp16), and brightness and chromatic component are shown in Fig. 7 a.Because movement compensation process need keep reference frame, three frame buffers are respectively applied for the preservation forward reference frame, the back is to reference frame and present frame, the structure of frame buffer is 8 the three-channel Unsigned Byte of RGB types, structure is shown in Fig. 7 b, luminance component is kept in the R passage, and two chromatic components are kept at respectively in G and the B passage through after the interpolation.Utilize " being rendered into texture " function of GPU, as render to texture extension or the FBO of OpenGL, these buffer memorys can play up finish after directly as texture for the sampling and the visit.For IDCT output texture the texture filtering pattern is set and is " Nearest "; For the frame buffer that is used for motion prediction texture filtering is set and is used for the subpixel accuracy motion compensation so that activate the texture filtering function during texture sampling automatically for " Bilinear "; The texture addressing mode is set simultaneously fills (padding) for " Clamp " is used for " non-limited motion vector " required pixel to image edge.

C) handle DCT base image, synthetic basic image texture for GPU sampling usefulness.Idct transform can be regarded as the linear combination of DCT coefficient and its corresponding basic image, shown in following formula:

x = Σ_{u = 0}^{N} Σ_{v = 0}^{N} X (u, v) [T {(u)}^{T} T (v)]

X wherein represents the block of pixels behind the IDCT, X (u, v) represent in the DCT coefficient block (u, the coefficient of v) locating, T are represented the dct transform matrix, and T (u) is that the u of this matrix is capable, and (u, the basic image of v) being tackled is by column vector T (u) for coefficient ^T(apposition v) generates with the row vector T.The computational process of above-mentioned formula is the linear combination computing of computing of scalar sum multiplication of matrices and matrix.The main advantage of this process is that the calculating of each coefficient is relatively independent, and can directly reject zero valued coefficients and reduce amount of calculation.

Base image texture generative process as shown in Figure 7.According to the order of Zigzag scanning, the basic image of per four coefficient correspondences leaves in the RGBA passage of one 8 * 8 texture block, and in order to guarantee the precision of IDCT computing, the data precision of each Color Channel is 16.Finally like this can obtain one 32 * 32 size, the texture of 16 floating number precision RGBA forms.

D) pack into corresponding to summit handling procedure (Vertex Program) and the pixel processor (Fragment Program) of drawing DCT coefficient point set.With the quantization matrix summit handling procedure of packing into, be used for inverse quantization by the Uniform parameter.

3) finish preparation 2) after, beginning plot step 1) the DCT coefficient point set that generates, GPU finishes inverse quantization and IDCT process in drawing process, as shown in Figure 9.

A) vertex processor is realized inverse quantization.The inverse quantization process is the multiplying of quantization step and coefficient in essence.Operating process is as follows:

X _iq(u，v)＝qp×QM(u，v)×X _q(u，v)

X wherein _q(u, v) and X _Iq(u, the DCT coefficient of v) representing the inverse quantization front and back respectively, the represented quantization parameter of qp is put into a little attribute by the generative process of coefficient point pel in the step 1), QM (u, the respective items of v) representing quantization matrix, whole quantization matrix is in step 2) d) in the constant register of packing into, corresponding item (entry) can obtain by the coefficient index of introducing in the step 1).Because coefficient deposits with vector form, vector multiplication can be finished the inverse quantization process of four coefficients in the handling procedure of summit.

The summit handling procedure can also calculate the texture coordinate of the basic image of coefficient correspondence according to coefficient index, and is delivered to rasterization stage subsequently.

B) deleting of the light stage is according to step 2) size of the point set in a) and the position of vertex processor output, a pel is converted into the block of pixels of specifying size on the relevant position.Each pixel that block of pixels covered has simultaneously all been inherited the output attribute of some pels processing stage of the summit.For the coefficient point set, activating step 2) after a) smart texture generated, each pixel can generate the interior texture coordinate of corresponding piece, scope (0,0)-(1,1).

C) pixel processor is in conjunction with the basic image texture coordinate and the b of a) middle output) the interior texture coordinate of the middle piece that forms, the basic image texture value of each pixel correspondence of can accurately sampling.Consideration step 2) the IDCT computing formula c).Change the multiplying between scalar and matrix for the direct operational form between pixel this moment.Because coefficient and basic image texture value all exist with the form of the four-dimensional vector of RGBA, a vector dot operation in the pixel processor just can be finished the multiplication of four coefficients and add up, and outputs results in the buffer memory then.

D) activate the mixed function (Blending) of GPU hardware, and be set to the Add computing.Because a plurality of coefficient point pels that each coefficient block may generate in the step 1), operation result by each the some pel output of this step can finally add up in output buffers, thus completing steps 2) c) and in the IDCT computing formula linearity to all coefficients add up.

Arrive this, the completing of DCT coefficient point set, the result in the video behind coefficient block inverse quantization and the IDCT is kept in the output buffers of IDCT, is used for subsequently movement compensation process as the residual image texture.

4) summit and the pixel processor of packing into and being used for motion compensation is provided with the size (16) of macro block point, draws the macro block point set and finishes movement compensation process, as shown in figure 10.

A) the summit handling procedure is mainly used in motion vector is carried out preliminary treatment.Pixel precision according to motion vector produces corresponding fractional part, so that utilize the bilinear filtering hardware of texture to finish the interpolation of pixel automatically when texture sampling.For example for half-pixel accuracy, fractional part is 0.5.Fig. 8 a and Fig. 8 b simple declaration picture element interpolation and texture bilinear filtering process.

B) rasterisation produces the block of pixels of macroblock size, the motion vector of output during each pixel is all inherited a).

C) in the pixel processor, at first utilize the WPOS register to obtain each locations of pixels, utilize motion vector this position to be offset the texture coordinate that draws corresponding reference block then.The residual image texture of pixel processor sampling reference frame texture and IDCT output, the sampled value that adds up is also done saturation and is handled, and outputs results in the frame buffer.

5) image in the frame buffer outputs on the display device if desired, need carry out color space and transform.Implementation procedure is utilized pixel processor sampling step 4 for drawing the rectangle of an image size) c) frame buffer of output, and each pixel done change color, and the result is exported demonstration.Finally finish whole decode procedure.

Above-mentioned steps has provided all processes that utilizes GPU to finish video decode, and CPU only is used to generate and organize the point set of drawing usefulness among the present invention, and other all decoding links are all finished on GPU, have reduced the computation burden of CPU to greatest extent; By macro block in the video and coefficient block are expressed as a pel, whole decode procedure is mapped as the drawing process of a pel efficiently, given full play to the computational resource on the GPU, by means of the parallel computation of GPU hardware and the acceleration function of pipeline processes, the present invention has significantly improved the efficient of video decode.

Although disclose specific embodiments of the invention and accompanying drawing for the purpose of illustration, its purpose is to help to understand content of the present invention and implement according to this, but it will be appreciated by those skilled in the art that: without departing from the spirit and scope of the invention and the appended claims, various replacements, variation and modification all are possible.Therefore, the present invention should not be limited to most preferred embodiment and the disclosed content of accompanying drawing.

Claims

1. digital video decoding method based on graphic process unit, its step comprises:

1) the CPU variable length decoding obtains macro block and coefficient block, and represents with the element figure in the graphic plotting " point ", generates the point set of video blocks correspondence; The main information that comprises in the described macro block is the position of macro block, type and motion vector can directly be put into the vector attribute of a pel, the main information of described coefficient block is the DCT coefficient, utilize the Zigzag file layout of DCT coefficient to generate corresponding coefficient point pel, one dimension coefficient array based on Zigzag, per four coefficients are a four-dimensional attribute in the group corresponding points pel, each or specific a plurality of four-dimensional attribute are put into a point, the index of this group coefficient in this one-dimension array of packing into simultaneously, position together with coefficient block, type, quantified parameter information forms a DCT coefficient point pel;

2) CPU sends into GPU with the point set of video blocks correspondence with batch processing mode in batches;

3) draw macro block point set and DCT coefficient point set, GPU carries out corresponding summit and pixel processor is finished decode procedure, IDCT process in the decode procedure is by finishing at pixel processing unit neutral line combination DCT coefficient and corresponding basic image, the base image leaves in the video memory of GPU with the form of texture after preliminary treatment; For the DCT coefficient that belongs to same coefficient block that is distributed in a plurality of points, utilize hybrid device among the GPU to add up result in a plurality of somes pels in the IDCT output buffers; Wherein movement compensation process is finished by the macro block point set of creating, sampling reference picture texture and IDCT output texture in pixel processing unit, and the sampled result that adds up is also done the saturation computing, finishes movement compensation process; Inverse quantization operation in the decode procedure is finished in the GPU vertex processor, and quantization matrix is by the Uniform parameter constant register of packing into, and coefficient index of storing in the binding site attribute and quantization parameter utilize vector multiplication to finish inverse quantization.

2. a kind of digital video decoding method as claimed in claim 1 based on graphic process unit, it is characterized in that: step 1) with the method that macro block is expressed as a pel is, the decoding information needed of macro block is stored in the attribute of a pel, above-mentioned decoding information needed is the position of macro block, type, motion vector, the attribute of above-mentioned some pel are the position, normal, vector attributes such as texture coordinate.

3. a kind of digital video decoding method as claimed in claim 1 or 2 based on graphic process unit, it is characterized in that: step 1) with the method that coefficient block is expressed as a pel is, DCT coefficient in the coefficient block is left in the attribute of a pel, and the DCT coefficient utilizes the CPU preliminary treatment to generate regular coefficient storage configuration.

4. the digital video decoding method based on graphic process unit as claimed in claim 1, it is characterized in that: step 1) CPU is when generating the some pel of macro block and coefficient block correspondence, according to the dissimilar and operating process of macro block and coefficient block, the division points pel is to different point sets in advance.

5. a kind of digital video decoding method as claimed in claim 1 based on graphic process unit, it is characterized in that: the inverse DCT process in the step 3) decode procedure is that the linear combination by DCT coefficient and corresponding basic image realizes, the base image generates texture, and leaves among the GPU with the form of texture.

6. a kind of digital video decoding method as claimed in claim 5 based on graphic process unit, it is characterized in that: described that coefficient block is expressed as the process of a pel is corresponding with claim 3 for the process that generates texture from basic image, and the basic image of each four-tuple coefficient correspondence is stored in respectively in the RGBA passage of same texture block.

7. a kind of digital video decoding method based on graphic process unit as claimed in claim 1 is characterized in that: the inverse DCT operating process in the step 3) decode procedure comprises following steps successively:

1) the summit handling procedure calculates the texture coordinate of basic image;

2) turn to block of pixels according to the some pel size grating of setting;

3) the pixel processor basic image of sampling, and do point multiplication operation with the coefficient attribute of inheriting;

4) activate GPU mixed function and be set to add operation, the result of calculation in the difference pel that adds up outputs to the prediction residual image texture.

8. a kind of digital video decoding method as claimed in claim 1 based on graphic process unit, it is characterized in that: the movement compensation process in the step 3) decode procedure comprises following steps successively:

1) the summit handling procedure is handled motion vector according to precision of prediction, and corresponding fractional part is set;

2) rasterisation macro block point pel is a block of pixels;

3) pixel processor is according to the texture coordinate of motion vector calculation reference block in reference frame, and the prediction residual image texture that sampling reference frame and claim 7 step 4) are exported, accumulation result are also done the saturation operation.

9. a kind of digital video decoding method based on graphic process unit as claimed in claim 8 is characterized in that: described sampling reference frame process is to have utilized the bilinear filtering function of GPU texture cell to finish the needed interpolation arithmetic of subpixel accuracy motion compensation.