CN101902643B

CN101902643B - Very large-scale integration (VLSI) structural design method of parallel array-type intraframe prediction decoder

Info

Publication number: CN101902643B
Application number: CN 201010223353
Authority: CN
Inventors: 兰旭光; 杨志远; 韩骞逸; 李兴玉; 郑南宁
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2010-07-09
Filing date: 2010-07-09
Publication date: 2013-02-06
Anticipated expiration: 2030-07-09
Also published as: CN101902643A

Abstract

The invention discloses a design method of a parallel array-type intraframe prediction decoder. By adopting an intraframe parallel array technique, the parallel decoding of the sub-macro-blocks is achieved, and the decoding efficiency of the intraframe macro-blocks is improved. By adopting a multi-prediction mode multiplexing technique, the intraframe prediction calculation units PE are realized. Four PE units in each sub-macro-block predict four pixels in parallel. By adopting a self-adapting production line technique, the prediction decoding of the self-adapting production line of the sub-macro-blocks in the PE array is achieved. By adopting a parallel prediction sequence technique, the prediction sequence of the parallel decoding is realized according to the dependency relationship of the reference pixels, and the data conflict is solved. By adopting a double-sliding window mechanism, the requirement of double PE arrays on the parallel decoding of the sub-macro-blocks is met, and the parallel synchronization of the PE arrays is coordinated. By the method, the requirements on the decoding of the high-definition and ultra-high-definition video can be satisfied; and the decoding efficiency and performances are improved.

Description

Very large-scale integration (VLSI) structural design method of parallel array-type intraframe prediction decoder

Technical field

The invention belongs to the coding and decoding video field, particularly a kind of parallel array formula infra-frame prediction decoder architecture method for designing.

Background technology

The development of making rapid progress along with the Internet and the update of terminal presentation facility, people are more and more higher for the requirement of high sharpness video quality of service.Solving the direct method of the transmission of video mass data and storage problem carries out encoding and decoding to video file exactly and processes, H.264 be the encoding and decoding standard of effect excellence of new generation, but have algorithm complex height and the large problem of data throughout for the HD video file in decoding end.Intraframe prediction algorithm is about 30% in the operand accounting of decoding key frame, and is 8100 for the HD video one frame number of macroblocks of 1080P, and the decoding operand is huge.The efficient real-time decoder architecture of design is present study hotspot.

Summary of the invention

According to the deficiency that exists in the above-mentioned background technology, the object of the invention is, for HD video real-time decoding demand, a kind of infra-frame prediction decoder VLSI construction design method of parallel array formula is provided, realization parallel array formula is decoded to sub-macro block in the frame, to improve intraframe decoder efficient and speed.

In order to realize above-mentioned task, the technical solution that the present invention adopts is:

Infra-frame prediction decoder VLSI construction design method towards the parallel array formula of high definition and ultra high-definition video real-time decoding specifically comprises the following steps:

The first step: form is that the video file of .264 is at the predictive mode that obtains macro block and sub-macro block behind the code stream analyzing after by entropy decoding and inverse transformation, and the reference pixel value that takes out from the reference pixel access module, give two PE arrays as the input of predictor computation module, PE array A, PE array B calculates, two arrays can dope two 4 * 4 simultaneously, if by tradition " Z " type forecasting sequence ordering luminance Y component, colourity U, the V component, the A array is 0 with order to sequence number, 1,4,5,8,9,12,13,16,17,18 and 19 totally 12 4 * 4 predict, and the B array is to

sequence number

2,3,6,7,10,11,14,15,20,21,22,23 these 12 4 * 4 orders are predicted, after PE array A predicts in order two sub-blocks in advance and obtains final pixel value, two PE array A, B sequentially carries out prediction and calculation by sub-block prediction separately more simultaneously;

Second step: to be predicted value with the data of separating residual error send into as input adds the residual error computing module in the output of predictor computation module, for keeping the ultra rapid predictions of two PE arrays, adding the residual error module will have SUM A and two groups of adders of SUM B respectively output valve and each the self-corresponding residual values of PE array A and PE array B to be done and processed, and the data that calculate in this module are not pass through the pixel value of piece filtering;

The 3rd step: the pixel value that second step calculates can write back in the reference pixel module again, these pixel values can be decided by the position of concrete sub-macro block and send into memory module and take for subsequently prediction as the reference pixel, concrete steps are as follows: if carry out the value that the predicted value of a certain row in the forecasting process needs multiplexing prostatitis to predict in prediction array, then the prostatitis predicted value can be put into third level register set stores, for rank rear multiplexing; Each sub-block prediction complete and add corresponding residual error after, its rightmost side one row and the pixel value of below delegation will be stored into second level macro-block level register group, for having other sub-macro blocks of data dependence relation to take with the book macro block, be provided with again the data access renewal that two sliding window mechanism solve sub-macro block parallel anticipation in the memory device at the corresponding levels; If when finishing the sub-block of second step and be in the macro block below delegation or the sub-macro block of the rightmost side one row, deposit below delegation or the rightmost side one row pixel value of this sub-block in first order image level RAM memory, for having other macro blocks of data dependence relation to take with this macro block.

The present invention has successfully realized above-mentioned parallel array formula infra-frame prediction decoder architecture method, by the parallel decoding of antithetical phrase macro block, can promote significantly the H.264 decoding efficiency of infra-frame prediction, can be used for the real-time decoding of HD video.

The invention provides towards the infra-frame prediction decoder VLSI construction design method of the parallel array formula of high definition and ultra high-definition video real-time decoding, sub-macro block in the frame is realized the decoding of parallel array formula, thereby improve decoding efficiency and speed, satisfy the demand of high definition and ultra high-definition video real-time decoding.

Description of drawings

Fig. 1 is parallel array formula decoder VLSI structural representation in the frame of the present invention.

Fig. 2 is that PE battle array infra-frame prediction of the present invention calculates schematic diagram.

Fig. 2 (a) expression PE basic calculating cellular construction.

Fig. 2 (b) expression PE array prediction and calculation sequential.

The sub-macro block adaptive prediction of Fig. 2 (c) expression sequential chart.

Fig. 3 is the sub-macro block sequential of parallel decoding of the present invention schematic diagram.

Fig. 3 (a) expression conventional flow waterline solution numeral macro block order.

The sub-macro block order of Fig. 3 (b) expression parallel decoding.

The theoretical sequential chart of the sub-macro block of Fig. 3 (c) expression parallel decoding.

Fig. 4 is the two PE array concurrent working schematic diagrames of the present invention.

The parallel two Mechanism of Slide Window Technology of Fig. 4 (a) expression brightness.

The parallel two Mechanism of Slide Window Technology of Fig. 4 (b) expression colourity.

The synchronization principles figure of the parallel sub-macro block of Fig. 4 (c) expression.

The RAM schematic diagram is write in Fig. 4 (d) expression.

The parallel sequential of Fig. 4 (e) expression SUM module.

Below in conjunction with accompanying drawing content of the present invention is described in further detail.

Embodiment

Shown in Figure 1, in parallel array formula infra-frame prediction decode system, mainly comprise three main modular.At first, predictive mode according to the intra-frame macro block that obtains from code stream analyzing and sub-block, in the reference pixel access module, obtain the required reference data of sub-block, enter predictor computation module and carry out the infra-frame prediction computing, the prediction data that obtains adds the residual error computing with the IDCT data of corresponding sub-block and recovers view data through parallel synchronous before adding the residual error module entering, the dependence pixel that needs subsequent sub-block is deposited in the reference pixel read module, read the pixel that needs through sliding window.

Shown in Fig. 2 (a), (b), (c), at first design the prediction and calculation unit PE that multiple predictive mode can be multiplexing, designed four input summer structures according to the common factor of prediction and calculation formula.Control inputs is then in order to realize that data-reusing, data round the demand such as block.Subsequently, form one group of PE array antithetical phrase macro block by 4 such PE unit and carry out prediction and calculation, each cycle calculates 4 pixels of sub-block one row, and 4 cycle finish the sub-block prediction computing.At last, because the difference that different mode exists at the forecasting process of sub-block has been brought the irregular of the predicted time of finishing a sub-block, adopted self adaptation the pipeline design method, a complete process has comprised prestrain reference pixel, precomputation, predicted and has added the residual error computing, the PE array is according to the adjustment forecasting process of different mode adaptives, and the clock of having avoided the fixed flowline critical path to bring is wasted.

Shown in Fig. 3 (a), (b), (c), given first " Z " type order in the standard, analyze its sub-macro block data dependence, provided subsequently parallel anticipation order of the present invention: because the next line sub-block relies on the data of lastrow sub-block, so PE array A is prediction in advance, PE array B begins parallel anticipation the second row sub-block when predicting sub-block 2, the sub-block order of PE array A when the order of the sub-block bracket () of PE array prediction is parallel anticipation.Finish the sub-block prediction that continues to begin chrominance C b after the prediction of brightness as PE array A, when PE array A predicted the 10th sub-block, PE array B began the prediction of chrominance C r, and this moment, the colourity sub-block was parallel.After PE battle array A finishes chrominance C b, last two sub-blocks of PE battle array B prediction chrominance C r.Its infra-frame prediction sequencing theory figure is shown in (c), and the first two sub-block and latter two sub-block in the visible macro block are single gust of computing, and other sub-blocks are parallel anticipations, and such order can promote forecasting efficiency greatly, shortens predicted time.

Fig. 4 (a), (b) (c), shown in (d), (e), two PE array parallel anticipation working mechanism comprises that specifically two array sliding windows obtain reference pixel, parallel synchronous way, reference pixel writes RAM and parallel array adds residual error.The operating state of two sliding windows mainly contains two kinds: when for the sub-macro block concurrent working of brightness, according to forecasting sequence PE array A two sub-blocks in advance, so thereby PE array A and B remain the reference pixel update inconsistency that has needed when the time difference of two sub-blocks has solved sub-block prediction.Immediate updating sliding window register data after prediction is finished is for subsequent sub-block is prepared.When being chrominance C b and Cr when parallel, PE array A and B are responsible for respectively Cb and Cr sub-block, and front 8 of window registers be that PE array A prediction Cb uses, and rear 8 be PE array B prediction Cr use, and conflicting can not appear in the reference pixel of colourity yet like this.And because PE array inside is the self adaptation the pipeline design, so just have each PE array different situation of required clock and also line asynchronous occurred under different patterns.Since after the prediction to add the residual computations clock very regular, so need to carry out before parallel synchronous.Employing is got rear principle and is namely finished prediction and calculation take last PE array and constantly finish constantly as parallel, then triggers to add the residual error module.Last column pixel of macro block need to be kept among the RAM during infra-frame prediction computing, the forecasting sequence of appointment according to the present invention, the pixel that luminance macroblock need to be preserved is all predicted by PE array B, the pixel that chrominance C b macro block need to be preserved is all predicted by PE array A, the pixel that chrominance C r macro block need to be preserved is all predicted by PE array B, and because PE array A and PE array B clock keep the time difference of two sub-macro blocks, be mutually independently in the sequential of writing RAM like this, the confusion of the write address that two arrays bring to the RAM write operation simultaneously can not occur.For the function that adds the residual error module, data and the IDCT data of two SUM arrays after to parallel anticipation add computing, strictly keep 4 cycles to finish calculating, recover view data.

The present invention is described in more detail below in conjunction with embodiment that accompanying drawing and inventor provide.

1) by " parallel array technology in the frame ", realizes sub-macro block parallel decoding, improve the decoding efficiency of intra-frame macro block.

2) by " many predictive modes multiplex technique ", the infra-frame prediction calculation units PE of realization.In every sub-macro block with 4 pixels of 4 PE unit parallel anticipations.

3) by " self adaptation pipelining ", realized the prediction decoding of the self adaptation streamline of antithetical phrase macro block pixels in the PE array.

4) by " parallel anticipation sequential technologies ", utilize the dependence of reference pixel, realize the forecasting sequence of parallel decoding, solved data collision.

5) by " two sliding window mechanism ", realize two sub-macro blocks of PE array parallel decoding, coordinated the parallel synchronous of PE array.

Described " parallel array technology in the frame " uses prediction array to sub-macro block parallel decoding in the frame.Designed infra-frame prediction decoder has comprised three main submodules: predictor computation module, reference pixel access module and add the residual error module.At first finish the infra-frame prediction computing according to predictive mode and the reference pixel of sub-macro block in predictor computation module, then in adding the residual error module, finish sub-block residual error data and prediction data addition synthetic image, at last subsequent prediction required image pixel is kept at the reference pixel access module.Arithmetic element all improves decoding efficiency with the mode design of parallel array in all modules.

Described " many predictive modes multiplex technique " is to design reusable predicting unit PE for the denominator of multiple predictive mode, calculates so that the various patterns of infra-frame prediction (except the Plane pattern needs precomputation) all can be finished pixel prediction by this computing unit.4 PE unit will form the PE array, and 4 pixels are carried out prediction and calculation in the antithetical phrase macro block.Two PE arrays will be to two sub-macro block parallel decodings.

Described " self adaptation pipelining " refers to that when the decoding intra-frame macro block, because predictive mode is various, the clock of decoded macroblock is not unified.The self adaptation pipelining is decoded thereby reach flowing water for each pattern automatic aligning decode clock.

Described " parallel anticipation sequential technologies ", two PE array job orders when being the infra-frame prediction decoding.Designed parallel order has at first guaranteed the reference pixel dependence of sub-macro block, consider that simultaneously macro block last column pixel need to deposit RAM on the sheet in, should parallel order guarantee that the YC sub-block only operates a sub-macro block, had avoided writing the address ram conflict like this when needs are preserved.

Described " two sliding window mechanism " refers to use two sliding windows to obtain sub-macro block reference pixel data, two PE array parallel synchronous cooperatings.Two sliding window operating states according to forecasting sequence comprise between the brightness sub-block parallel, the YC sub-block is parallel, parallel between the colourity sub-block.Two PE arrays predict according to reference pixel, need to consider the parallel synchronous problem, namely have the sub-block prediction asynchronism(-nization) and the asynchronous situation of finishing prediction that occurs, need this moment parallel wait could finally add synchronously residual computations.

Claims

1. very large-scale integration (VLSI) structural design method of parallel array-type intraframe prediction decoder specifically comprises the following steps:

The first step: form is that the video file of .264 is at the predictive mode that obtains macro block and sub-macro block behind the code stream analyzing after by entropy decoding and inverse transformation, and the reference pixel value that takes out from the reference pixel access module, give two PE arrays as the input of predictor computation module, PE array A, PE array B calculates, two arrays can dope two 4 * 4 simultaneously, if by tradition " Z " type forecasting sequence ordering luminance Y component, colourity U, the V component, the A array is 0 with order to sequence number, 1,4,5,8,9,12,13,16,17,18 and 19 totally 12 4 * 4 predict, and the B array is to sequence number 2,3,6,7,10,11,14,15,20,21,22,23 these 12 4 * 4 orders are predicted, after PE array A predicts in order two sub-blocks in advance and obtains final pixel value, two PE array A, B sequentially carries out prediction and calculation by sub-block prediction separately more simultaneously;

Described PE Array Design is the prediction and calculation unit PE that multiple predictive mode can be multiplexing, has designed four input summer structures according to the common factor of prediction and calculation formula; Control inputs is then in order to realize that data-reusing, data round the demand such as block; Subsequently, form one group of PE array antithetical phrase macro block by 4 such PE unit and carry out prediction and calculation, each cycle calculates 4 pixels of sub-block one row, and 4 cycle finish the sub-block prediction computing;

Second step: to be predicted value with the data of separating residual error send into as input adds the residual error computing module in the output of predictor computation module, for keeping the ultra rapid predictions of two PE arrays, adding the residual error module will have SUM_A and two groups of adders of SUM_B respectively output valve and each the self-corresponding residual values of PE array A and PE array B to be done and processed, and the data that calculate in this module are not pass through the pixel value of piece filtering;