Method and the circuit of the video decode of the efficient peek of a kind of match block
Technical field
The invention belongs to the video decode technical field, specifically refer to method and the circuit of the video decode of the efficient peek of a kind of match block.
Background technology
In video decoding process, use the macro block (macroblock) of inter prediction (inter) pattern in motion compensation (motion compensation) process, need to from the image of having decoded, take out match block, as predicting the outcome, then on the basis that predicts the outcome, the residual error data that stack parses from code stream, the reconstruct macro block (macroblock) that obtains reducing.
The picture size of video is in increase at full speed in recent years, between several years from the QCIF of 174x144, be increased to the 1080p of 1920x1080, ever-increasing picture size, so that decoder reading out data from external memory can produce huge access bandwidth, and the high access delay of bringing for solving high latency DDR internal memory that the access bandwidth problem uses, become the performance bottleneck of a lot of decoders.
The development of simultaneous video protocols, in agreement H.264, in order to improve compression ratio, a technology that macro block uses the difformity match block of the diverse location taking-up of different reference frames to predict has appearred, add that the diverse location fractional pixel interpolation expands the required different pixels situation in limit, this is complex so that the peek process becomes, be easy to produce the characteristic of extremely irregular access external memory, the bit wide of current chip bus is improving constantly in addition, 64bit even 128bit have been brought up to from 16bit, a large amount of scattered 8bit level non-alignment small data quantity access, can greatly waste the broadband of bus, the efficient that causes peeking is extremely low, also so that high-performance decoders peek process need has diverse framework to solve these problems.
The flow process of basic video decode comprises the steps: to resolve the relevant syntactic element of macro block (mb) type; Resolve the relevant syntactic element of motion compensation, the process relevant information obtains peeking; Resolve the relevant syntactic element of residual error coefficient, through counter-scanning (Inverse Scan), inverse transformation (Inverse Transform), inverse quantization (InverseQuantization) obtains the residual error pixel data; The peek module is peeked from external memory according to peek process relevant information; Motion compensating module carries out interpolation and weight estimation according to the reference block data of getting, and obtains predict pixel; Stack predict pixel and residual error pixel obtain reconstructed pixel, and reconstructed pixel is done block elimination filtering and output.In the flow process of general macro block hardware decoding, as shown in Figure 1, can be in type module decoded macroblock type and sub-block type, be judged as be the interframe type after, obtain ref_idx and mv in the Get_MV module, the address that obtains peeking and size, giving the fetch module peeks, remaining coefficient code stream is sent into the Get_Residual module, parse residual error coefficient, carry out counter-scanning (Inverse Scan), inverse transformation (Inverse Transform), inverse quantization (Inverse Quantization) obtains the residual error pixel.With a little whiles, the fetch module is peeked from external memory, the number of getting is sent into motion compensation (Motion Compensation) module carry out interpolation and weighting, obtains predict pixel.After residual error pixel and the predict pixel stack, obtain reconstructed pixel, send into again last block elimination filtering (deblock) module filtered and output.
In common decoding process, from the MC path that the fetch module begins, owing to there is inefficient irregular external memory access, the MC path is the residual error path that is much more slowly than the hardware inner high speed in speed, so the bandwidth that the speed of whole hardware can accessed external memory is limited.
Chinese invention patent discloses a kind of video image motion compensator No. 20041009125.4, and this scheme arrives the interprocedual adding interpolation calculation of storage after the access number outside, but the efficient of peek process is not had to improve.
Chinese invention patent discloses a kind of image storage method for compressing video frequency signal decode No. 20051000487.2, reference picture during this scheme is deposited outside uses the special format storage, access procedure is complicated, in particular cases also may lower efficiency at some.
Chinese invention patent discloses a kind of moving compensating data device for loading and method No. 20051009873.7, and this scheme is two dimension peek merger the one dimension peek, and part has improved the efficient of peek, but only uses for single macro block.
Chinese invention patent discloses method and the device of the data parallel read-write of control strip built-in storage in a kind of decoding device for No. 20071004671, this scheme is peeked parallel under the piecemeal yardstick under the macro block, can't produce a desired effect under the scene of the contour access delay internal memory of DDR.
Chinese invention patent discloses a kind of data pre-fetching system in video processing No. 20071004692.9, and this scheme is applicable to the non-cache mechanism data prefetched instruction of the processor classes such as CPU or DSP.
Chinese invention patent discloses a kind of data rapidly-reading method based on the compensation of standard movement H.264 for No. 20081030211.6, this scheme can increase line feed extra on the coefficient bus and an expense of bursting actually greatly with the piecemeal reading out data in small, broken bits of 9x9.
Summary of the invention
Technical problem to be solved by this invention is to provide method and the circuit of the efficient video decode of peeking of a kind of match block, and the method is optimized the efficient of peeking on the bus, improves the effect of external memory access speed, and integral body has improved the efficient of video decode.
The present invention solves the problems of the technologies described above by the following technical solutions:
The method of the video decode of the efficient peek of a kind of match block is provided, comprises the steps:
Step 10: several macro blocks are merged into a macro block group;
Step 20: resolve the relevant syntactic element of the first macro block macro block (mb) type in the macro block group;
Step 30: resolve the relevant syntactic element of the first macro block motion compensation, obtain the first macro block peek process relevant information;
Step 40: resolve the relevant syntactic element of the first macro block residual error coefficient, through counter-scanning, inverse transformation, inverse quantization obtains the first macro block residual error pixel data;
Step 50: resolve the relevant syntactic element of the second macro block macro block (mb) type in the macro block group;
Step 60: resolve the relevant syntactic element of the second macro block motion compensation, obtain the second macro block peek process relevant information; Afterwards, step 70 is carried out simultaneously with step 80 to step 90, and described step 90 is obtained the reference block data from step 80; Then change step 100 over to;
Step 70: resolve the relevant syntactic element of the second macro block residual error coefficient, through counter-scanning, inverse transformation, inverse quantization obtains the second macro block residual error pixel data;
Step 80: the peek module merges optimization to the peek process relevant information of the first macro block and the second macro block, uses the peek information of optimizing to peek from external memory, and described step 20,30,40,50,60 in no particular order;
Step 90: motion compensating module carries out interpolation and weight estimation according to the reference block data of getting, and obtains respectively the predict pixel of the first macro block and the second macro block;
Step 100: predict pixel and the residual error pixel of superpose respectively the first macro block and the second macro block obtain reconstructed pixel, and reconstructed pixel is done block elimination filtering and output.
A kind of video decode circuit is provided, comprises:
The macro block merge cells is used for several macro blocks are merged into a macro block group;
The first resolution unit is used for resolving the relevant syntactic element of macro block group the first macro block macro block (mb) type; Resolve the relevant syntactic element of the first macro block motion compensation, obtain the first macro block peek process relevant information; Resolve the relevant syntactic element of the first macro block residual error coefficient, through counter-scanning, inverse transformation, inverse quantization, obtain the first macro block residual error pixel data;
The second resolution unit is used for resolving the relevant syntactic element of macro block group the second macro block macro block (mb) type; Resolve the relevant syntactic element of the second macro block motion compensation, obtain the second macro block peek process relevant information; Resolve the relevant syntactic element of the second macro block residual error coefficient, through counter-scanning, inverse transformation, inverse quantization, obtain the second macro block residual error pixel data;
Fetch unit is used for the peek process relevant information of the first macro block and the second macro block is merged optimization, uses the peek information of optimizing to peek from external memory;
Motion compensation units is used for carrying out interpolation and weight estimation according to the reference block data of getting, and obtains respectively the predict pixel of the first macro block and the second macro block;
Reconfiguration unit, predict pixel and the residual error pixel of be used for superposeing respectively the first macro block and the second macro block obtain reconstructed pixel, and reconstructed pixel is done block elimination filtering and output.
The invention has the advantages that: several macro blocks are merged into a macro block group, peek take a macro block group as unit, at utmost utilize the spatial locality of reference data, optimize the access that merges in the macro block group, utilize the long sudden transmission of high-bit width bus, improve total line use ratio.
Description of drawings
The invention will be further described in conjunction with the embodiments with reference to the accompanying drawings.
Fig. 1 is the basic decoding process figure of prior art inter macroblocks.
Fig. 2 is the basic decoding process figure that the present invention merges a macro block group optimizing the peek process.
Fig. 3 is macro block group parallel procedure schematic diagram of the present invention.
Embodiment
In the high-performance decoders decode procedure, the delay of internal storage access is often much larger than parsing time of a macro block code stream, and in the high-performance decoders design, performance is the first perpetual object, it is not matter of utmost importance that resource is used, and simultaneously, the external memory access can separate with the code stream decoding process, that is to say, can shift to an earlier date the position that a lot of time resolutions obtain the current internal memory that will access.
So the present invention proposes, and several macro blocks are merged into a macro block group, peeks take a macro block group as unit, at utmost utilize the spatial locality of reference data, optimize the access that merges in the macro block group, utilize the long sudden transmission of high-bit width bus, improve total line use ratio.
Details are as follows for concrete steps of the present invention:
Step 10: several macro blocks are merged into a macro block group;
Step 20: resolve the relevant syntactic element of the first macro block macro block (mb) type in the macro block group;
Step 30: resolve the relevant syntactic element of the first macro block motion compensation, obtain the first macro block peek process relevant information;
Step 40: resolve the relevant syntactic element of the first macro block residual error coefficient, through counter-scanning, inverse transformation, inverse quantization obtains the first macro block residual error pixel data;
Step 50: resolve the relevant syntactic element of the second macro block macro block (mb) type in the macro block group;
Step 60: resolve the relevant syntactic element of the second macro block motion compensation, obtain the second macro block peek process relevant information; Afterwards, carry out simultaneously step 70,80,90; Then change step 100 over to;
Step 70: resolve the relevant syntactic element of the second macro block residual error coefficient, through counter-scanning, inverse transformation, inverse quantization obtains the second macro block residual error pixel data;
Step 80: the peek module merges optimization to the peek process relevant information of the first macro block and the second macro block, uses the peek information of optimizing to peek from external memory.
Step 90: motion compensating module carries out interpolation and weight estimation according to the reference block data of getting, and obtains respectively the predict pixel of two macro blocks;
Step 100: predict pixel and the residual error pixel of two macro blocks that superpose respectively obtain reconstructed pixel, and reconstructed pixel is done block elimination filtering and output.
Wherein step 80 is carried out with the 20th to 70 step of next time circulation is parallel simultaneously.
See also Fig. 2 and shown in Figure 3, core thinking of the present invention is to obtain after the peek information of a plurality of macro blocks, and these peek information are merged optimization, reaches to optimize the efficient of peeking on the bus, improves the effect of external memory access speed.Simultaneously, owing to use the decoding of two-stage macro block, when a rear macro block began to merge peek, previous stage macro block decoding hardware can be served for next macro block decoding, has reached the parallel effect of macro-block level to a greater extent.
The P_skip macro block that often occurs continuously in the static scene situation is as example, if in 1/4th pixel situations, under the 16x16 macro block of normal condition, each macro block 23x23 point of need to peeking, 23 pixels of every row peek, get 23 row, under 32 buses, use two macro blocks of 32 burst4 transmission, need every enforcement to transmit with two burst4, amount to 92 transmission, total secured transmission of payload data rate is (2*23*23)/((4*4) * 2*23*2)=71.9%.Optimize peek if use the many macro blocks of the present invention to merge, use 32 burst4 transmission just equally, (2*23*23)/((4*4*3) * 23)=95.8%, the utilance in bus broadband is improved greatly.
The present invention also provides a kind of video decode circuit embodiments, comprising:
The macro block merge cells is used for several macro blocks are merged into a macro block group;
The first resolution unit is used for resolving the relevant syntactic element of macro block group the first macro block macro block (mb) type; Resolve the relevant syntactic element of the first macro block motion compensation, obtain the first macro block peek process relevant information; Resolve the relevant syntactic element of the first macro block residual error coefficient, through counter-scanning, inverse transformation, inverse quantization, obtain the first macro block residual error pixel data;
The second resolution unit is used for resolving the relevant syntactic element of macro block group the second macro block macro block (mb) type; Resolve the relevant syntactic element of the second macro block motion compensation, obtain the second macro block peek process relevant information; Resolve the relevant syntactic element of the second macro block residual error coefficient, through counter-scanning, inverse transformation, inverse quantization, obtain the second macro block residual error pixel data;
Fetch unit is used for the peek process relevant information of the first macro block and the second macro block is merged optimization, uses the peek information of optimizing to peek from external memory;
Motion compensation units is used for carrying out interpolation and weight estimation according to the reference block data of getting, and obtains respectively the predict pixel of the first macro block and the second macro block;
Reconfiguration unit, predict pixel and the residual error pixel of be used for superposeing respectively the first macro block and the second macro block obtain reconstructed pixel, and reconstructed pixel is done block elimination filtering and output.
Use one group of two macro block as macro block group minimum unit, use said frame to decode, in different film source situations, the external memory access bandwidth reduces 20 ~ 40% than former decoder, and whole decoding speed has on average improved more than 10%.