CN101902643B - Very large-scale integration (VLSI) structural design method of parallel array-type intraframe prediction decoder - Google Patents

Very large-scale integration (VLSI) structural design method of parallel array-type intraframe prediction decoder Download PDF

Info

Publication number
CN101902643B
CN101902643B CN 201010223353 CN201010223353A CN101902643B CN 101902643 B CN101902643 B CN 101902643B CN 201010223353 CN201010223353 CN 201010223353 CN 201010223353 A CN201010223353 A CN 201010223353A CN 101902643 B CN101902643 B CN 101902643B
Authority
CN
China
Prior art keywords
sub
array
prediction
block
parallel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201010223353
Other languages
Chinese (zh)
Other versions
CN101902643A (en
Inventor
兰旭光
杨志远
韩骞逸
李兴玉
郑南宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN 201010223353 priority Critical patent/CN101902643B/en
Publication of CN101902643A publication Critical patent/CN101902643A/en
Application granted granted Critical
Publication of CN101902643B publication Critical patent/CN101902643B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a design method of a parallel array-type intraframe prediction decoder. By adopting an intraframe parallel array technique, the parallel decoding of the sub-macro-blocks is achieved, and the decoding efficiency of the intraframe macro-blocks is improved. By adopting a multi-prediction mode multiplexing technique, the intraframe prediction calculation units PE are realized. Four PE units in each sub-macro-block predict four pixels in parallel. By adopting a self-adapting production line technique, the prediction decoding of the self-adapting production line of the sub-macro-blocks in the PE array is achieved. By adopting a parallel prediction sequence technique, the prediction sequence of the parallel decoding is realized according to the dependency relationship of the reference pixels, and the data conflict is solved. By adopting a double-sliding window mechanism, the requirement of double PE arrays on the parallel decoding of the sub-macro-blocks is met, and the parallel synchronization of the PE arrays is coordinated. By the method, the requirements on the decoding of the high-definition and ultra-high-definition video can be satisfied; and the decoding efficiency and performances are improved.

Description

Very large-scale integration (VLSI) structural design method of parallel array-type intraframe prediction decoder
Technical field
The invention belongs to the coding and decoding video field, particularly a kind of parallel array formula infra-frame prediction decoder architecture method for designing.
Background technology
The development of making rapid progress along with the Internet and the update of terminal presentation facility, people are more and more higher for the requirement of high sharpness video quality of service.Solving the direct method of the transmission of video mass data and storage problem carries out encoding and decoding to video file exactly and processes, H.264 be the encoding and decoding standard of effect excellence of new generation, but have algorithm complex height and the large problem of data throughout for the HD video file in decoding end.Intraframe prediction algorithm is about 30% in the operand accounting of decoding key frame, and is 8100 for the HD video one frame number of macroblocks of 1080P, and the decoding operand is huge.The efficient real-time decoder architecture of design is present study hotspot.
Summary of the invention
According to the deficiency that exists in the above-mentioned background technology, the object of the invention is, for HD video real-time decoding demand, a kind of infra-frame prediction decoder VLSI construction design method of parallel array formula is provided, realization parallel array formula is decoded to sub-macro block in the frame, to improve intraframe decoder efficient and speed.
In order to realize above-mentioned task, the technical solution that the present invention adopts is:
Infra-frame prediction decoder VLSI construction design method towards the parallel array formula of high definition and ultra high-definition video real-time decoding specifically comprises the following steps:
The first step: form is that the video file of .264 is at the predictive mode that obtains macro block and sub-macro block behind the code stream analyzing after by entropy decoding and inverse transformation, and the reference pixel value that takes out from the reference pixel access module, give two PE arrays as the input of predictor computation module, PE array A, PE array B calculates, two arrays can dope two 4 * 4 simultaneously, if by tradition " Z " type forecasting sequence ordering luminance Y component, colourity U, the V component, the A array is 0 with order to sequence number, 1,4,5,8,9,12,13,16,17,18 and 19 totally 12 4 * 4 predict, and the B array is to sequence number 2,3,6,7,10,11,14,15,20,21,22,23 these 12 4 * 4 orders are predicted, after PE array A predicts in order two sub-blocks in advance and obtains final pixel value, two PE array A, B sequentially carries out prediction and calculation by sub-block prediction separately more simultaneously;
Second step: to be predicted value with the data of separating residual error send into as input adds the residual error computing module in the output of predictor computation module, for keeping the ultra rapid predictions of two PE arrays, adding the residual error module will have SUM A and two groups of adders of SUM B respectively output valve and each the self-corresponding residual values of PE array A and PE array B to be done and processed, and the data that calculate in this module are not pass through the pixel value of piece filtering;
The 3rd step: the pixel value that second step calculates can write back in the reference pixel module again, these pixel values can be decided by the position of concrete sub-macro block and send into memory module and take for subsequently prediction as the reference pixel, concrete steps are as follows: if carry out the value that the predicted value of a certain row in the forecasting process needs multiplexing prostatitis to predict in prediction array, then the prostatitis predicted value can be put into third level register set stores, for rank rear multiplexing; Each sub-block prediction complete and add corresponding residual error after, its rightmost side one row and the pixel value of below delegation will be stored into second level macro-block level register group, for having other sub-macro blocks of data dependence relation to take with the book macro block, be provided with again the data access renewal that two sliding window mechanism solve sub-macro block parallel anticipation in the memory device at the corresponding levels; If when finishing the sub-block of second step and be in the macro block below delegation or the sub-macro block of the rightmost side one row, deposit below delegation or the rightmost side one row pixel value of this sub-block in first order image level RAM memory, for having other macro blocks of data dependence relation to take with this macro block.
The present invention has successfully realized above-mentioned parallel array formula infra-frame prediction decoder architecture method, by the parallel decoding of antithetical phrase macro block, can promote significantly the H.264 decoding efficiency of infra-frame prediction, can be used for the real-time decoding of HD video.
The invention provides towards the infra-frame prediction decoder VLSI construction design method of the parallel array formula of high definition and ultra high-definition video real-time decoding, sub-macro block in the frame is realized the decoding of parallel array formula, thereby improve decoding efficiency and speed, satisfy the demand of high definition and ultra high-definition video real-time decoding.
Description of drawings
Fig. 1 is parallel array formula decoder VLSI structural representation in the frame of the present invention.
Fig. 2 is that PE battle array infra-frame prediction of the present invention calculates schematic diagram.
Fig. 2 (a) expression PE basic calculating cellular construction.
Fig. 2 (b) expression PE array prediction and calculation sequential.
The sub-macro block adaptive prediction of Fig. 2 (c) expression sequential chart.
Fig. 3 is the sub-macro block sequential of parallel decoding of the present invention schematic diagram.
Fig. 3 (a) expression conventional flow waterline solution numeral macro block order.
The sub-macro block order of Fig. 3 (b) expression parallel decoding.
The theoretical sequential chart of the sub-macro block of Fig. 3 (c) expression parallel decoding.
Fig. 4 is the two PE array concurrent working schematic diagrames of the present invention.
The parallel two Mechanism of Slide Window Technology of Fig. 4 (a) expression brightness.
The parallel two Mechanism of Slide Window Technology of Fig. 4 (b) expression colourity.
The synchronization principles figure of the parallel sub-macro block of Fig. 4 (c) expression.
The RAM schematic diagram is write in Fig. 4 (d) expression.
The parallel sequential of Fig. 4 (e) expression SUM module.
Below in conjunction with accompanying drawing content of the present invention is described in further detail.
Embodiment
Shown in Figure 1, in parallel array formula infra-frame prediction decode system, mainly comprise three main modular.At first, predictive mode according to the intra-frame macro block that obtains from code stream analyzing and sub-block, in the reference pixel access module, obtain the required reference data of sub-block, enter predictor computation module and carry out the infra-frame prediction computing, the prediction data that obtains adds the residual error computing with the IDCT data of corresponding sub-block and recovers view data through parallel synchronous before adding the residual error module entering, the dependence pixel that needs subsequent sub-block is deposited in the reference pixel read module, read the pixel that needs through sliding window.
Shown in Fig. 2 (a), (b), (c), at first design the prediction and calculation unit PE that multiple predictive mode can be multiplexing, designed four input summer structures according to the common factor of prediction and calculation formula.Control inputs is then in order to realize that data-reusing, data round the demand such as block.Subsequently, form one group of PE array antithetical phrase macro block by 4 such PE unit and carry out prediction and calculation, each cycle calculates 4 pixels of sub-block one row, and 4 cycle finish the sub-block prediction computing.At last, because the difference that different mode exists at the forecasting process of sub-block has been brought the irregular of the predicted time of finishing a sub-block, adopted self adaptation the pipeline design method, a complete process has comprised prestrain reference pixel, precomputation, predicted and has added the residual error computing, the PE array is according to the adjustment forecasting process of different mode adaptives, and the clock of having avoided the fixed flowline critical path to bring is wasted.
Shown in Fig. 3 (a), (b), (c), given first " Z " type order in the standard, analyze its sub-macro block data dependence, provided subsequently parallel anticipation order of the present invention: because the next line sub-block relies on the data of lastrow sub-block, so PE array A is prediction in advance, PE array B begins parallel anticipation the second row sub-block when predicting sub-block 2, the sub-block order of PE array A when the order of the sub-block bracket () of PE array prediction is parallel anticipation.Finish the sub-block prediction that continues to begin chrominance C b after the prediction of brightness as PE array A, when PE array A predicted the 10th sub-block, PE array B began the prediction of chrominance C r, and this moment, the colourity sub-block was parallel.After PE battle array A finishes chrominance C b, last two sub-blocks of PE battle array B prediction chrominance C r.Its infra-frame prediction sequencing theory figure is shown in (c), and the first two sub-block and latter two sub-block in the visible macro block are single gust of computing, and other sub-blocks are parallel anticipations, and such order can promote forecasting efficiency greatly, shortens predicted time.
Fig. 4 (a), (b) (c), shown in (d), (e), two PE array parallel anticipation working mechanism comprises that specifically two array sliding windows obtain reference pixel, parallel synchronous way, reference pixel writes RAM and parallel array adds residual error.The operating state of two sliding windows mainly contains two kinds: when for the sub-macro block concurrent working of brightness, according to forecasting sequence PE array A two sub-blocks in advance, so thereby PE array A and B remain the reference pixel update inconsistency that has needed when the time difference of two sub-blocks has solved sub-block prediction.Immediate updating sliding window register data after prediction is finished is for subsequent sub-block is prepared.When being chrominance C b and Cr when parallel, PE array A and B are responsible for respectively Cb and Cr sub-block, and front 8 of window registers be that PE array A prediction Cb uses, and rear 8 be PE array B prediction Cr use, and conflicting can not appear in the reference pixel of colourity yet like this.And because PE array inside is the self adaptation the pipeline design, so just have each PE array different situation of required clock and also line asynchronous occurred under different patterns.Since after the prediction to add the residual computations clock very regular, so need to carry out before parallel synchronous.Employing is got rear principle and is namely finished prediction and calculation take last PE array and constantly finish constantly as parallel, then triggers to add the residual error module.Last column pixel of macro block need to be kept among the RAM during infra-frame prediction computing, the forecasting sequence of appointment according to the present invention, the pixel that luminance macroblock need to be preserved is all predicted by PE array B, the pixel that chrominance C b macro block need to be preserved is all predicted by PE array A, the pixel that chrominance C r macro block need to be preserved is all predicted by PE array B, and because PE array A and PE array B clock keep the time difference of two sub-macro blocks, be mutually independently in the sequential of writing RAM like this, the confusion of the write address that two arrays bring to the RAM write operation simultaneously can not occur.For the function that adds the residual error module, data and the IDCT data of two SUM arrays after to parallel anticipation add computing, strictly keep 4 cycles to finish calculating, recover view data.
The present invention is described in more detail below in conjunction with embodiment that accompanying drawing and inventor provide.
1) by " parallel array technology in the frame ", realizes sub-macro block parallel decoding, improve the decoding efficiency of intra-frame macro block.
2) by " many predictive modes multiplex technique ", the infra-frame prediction calculation units PE of realization.In every sub-macro block with 4 pixels of 4 PE unit parallel anticipations.
3) by " self adaptation pipelining ", realized the prediction decoding of the self adaptation streamline of antithetical phrase macro block pixels in the PE array.
4) by " parallel anticipation sequential technologies ", utilize the dependence of reference pixel, realize the forecasting sequence of parallel decoding, solved data collision.
5) by " two sliding window mechanism ", realize two sub-macro blocks of PE array parallel decoding, coordinated the parallel synchronous of PE array.
Described " parallel array technology in the frame " uses prediction array to sub-macro block parallel decoding in the frame.Designed infra-frame prediction decoder has comprised three main submodules: predictor computation module, reference pixel access module and add the residual error module.At first finish the infra-frame prediction computing according to predictive mode and the reference pixel of sub-macro block in predictor computation module, then in adding the residual error module, finish sub-block residual error data and prediction data addition synthetic image, at last subsequent prediction required image pixel is kept at the reference pixel access module.Arithmetic element all improves decoding efficiency with the mode design of parallel array in all modules.
Described " many predictive modes multiplex technique " is to design reusable predicting unit PE for the denominator of multiple predictive mode, calculates so that the various patterns of infra-frame prediction (except the Plane pattern needs precomputation) all can be finished pixel prediction by this computing unit.4 PE unit will form the PE array, and 4 pixels are carried out prediction and calculation in the antithetical phrase macro block.Two PE arrays will be to two sub-macro block parallel decodings.
Described " self adaptation pipelining " refers to that when the decoding intra-frame macro block, because predictive mode is various, the clock of decoded macroblock is not unified.The self adaptation pipelining is decoded thereby reach flowing water for each pattern automatic aligning decode clock.
Described " parallel anticipation sequential technologies ", two PE array job orders when being the infra-frame prediction decoding.Designed parallel order has at first guaranteed the reference pixel dependence of sub-macro block, consider that simultaneously macro block last column pixel need to deposit RAM on the sheet in, should parallel order guarantee that the YC sub-block only operates a sub-macro block, had avoided writing the address ram conflict like this when needs are preserved.
Described " two sliding window mechanism " refers to use two sliding windows to obtain sub-macro block reference pixel data, two PE array parallel synchronous cooperatings.Two sliding window operating states according to forecasting sequence comprise between the brightness sub-block parallel, the YC sub-block is parallel, parallel between the colourity sub-block.Two PE arrays predict according to reference pixel, need to consider the parallel synchronous problem, namely have the sub-block prediction asynchronism(-nization) and the asynchronous situation of finishing prediction that occurs, need this moment parallel wait could finally add synchronously residual computations.

Claims (1)

1. very large-scale integration (VLSI) structural design method of parallel array-type intraframe prediction decoder specifically comprises the following steps:
The first step: form is that the video file of .264 is at the predictive mode that obtains macro block and sub-macro block behind the code stream analyzing after by entropy decoding and inverse transformation, and the reference pixel value that takes out from the reference pixel access module, give two PE arrays as the input of predictor computation module, PE array A, PE array B calculates, two arrays can dope two 4 * 4 simultaneously, if by tradition " Z " type forecasting sequence ordering luminance Y component, colourity U, the V component, the A array is 0 with order to sequence number, 1,4,5,8,9,12,13,16,17,18 and 19 totally 12 4 * 4 predict, and the B array is to sequence number 2,3,6,7,10,11,14,15,20,21,22,23 these 12 4 * 4 orders are predicted, after PE array A predicts in order two sub-blocks in advance and obtains final pixel value, two PE array A, B sequentially carries out prediction and calculation by sub-block prediction separately more simultaneously;
Described PE Array Design is the prediction and calculation unit PE that multiple predictive mode can be multiplexing, has designed four input summer structures according to the common factor of prediction and calculation formula; Control inputs is then in order to realize that data-reusing, data round the demand such as block; Subsequently, form one group of PE array antithetical phrase macro block by 4 such PE unit and carry out prediction and calculation, each cycle calculates 4 pixels of sub-block one row, and 4 cycle finish the sub-block prediction computing;
Second step: to be predicted value with the data of separating residual error send into as input adds the residual error computing module in the output of predictor computation module, for keeping the ultra rapid predictions of two PE arrays, adding the residual error module will have SUM_A and two groups of adders of SUM_B respectively output valve and each the self-corresponding residual values of PE array A and PE array B to be done and processed, and the data that calculate in this module are not pass through the pixel value of piece filtering;
The 3rd step: the pixel value that second step calculates can write back in the reference pixel module again, these pixel values can be decided by the position of concrete sub-macro block and send into memory module and take for subsequently prediction as the reference pixel, concrete steps are as follows: if carry out the value that the predicted value of a certain row in the forecasting process needs multiplexing prostatitis to predict in prediction array, then the prostatitis predicted value can be put into third level register set stores, for rank rear multiplexing; Each sub-block prediction complete and add corresponding residual error after, its rightmost side one row and the pixel value of below delegation will be stored into second level macro-block level register group, for having other sub-macro blocks of data dependence relation to take with the book macro block, be provided with again the data access renewal that two sliding window mechanism solve sub-macro block parallel anticipation in the memory device at the corresponding levels; If when finishing the sub-block of second step and be in the macro block below delegation or the sub-macro block of the rightmost side one row, deposit below delegation or the rightmost side one row pixel value of this sub-block in first order image level RAM memory, for having other macro blocks of data dependence relation to take with this macro block.
CN 201010223353 2010-07-09 2010-07-09 Very large-scale integration (VLSI) structural design method of parallel array-type intraframe prediction decoder Expired - Fee Related CN101902643B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010223353 CN101902643B (en) 2010-07-09 2010-07-09 Very large-scale integration (VLSI) structural design method of parallel array-type intraframe prediction decoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010223353 CN101902643B (en) 2010-07-09 2010-07-09 Very large-scale integration (VLSI) structural design method of parallel array-type intraframe prediction decoder

Publications (2)

Publication Number Publication Date
CN101902643A CN101902643A (en) 2010-12-01
CN101902643B true CN101902643B (en) 2013-02-06

Family

ID=43227783

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010223353 Expired - Fee Related CN101902643B (en) 2010-07-09 2010-07-09 Very large-scale integration (VLSI) structural design method of parallel array-type intraframe prediction decoder

Country Status (1)

Country Link
CN (1) CN101902643B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105516728B (en) * 2015-12-15 2019-06-28 华中科技大学 A kind of parallel intra-frame prediction method of H.265/HEVC middle 8x8 sub-macroblock
CN107392838B (en) * 2017-07-27 2020-11-27 苏州浪潮智能科技有限公司 WebP compression parallel acceleration method and device based on OpenCL
CN110324631A (en) * 2019-05-09 2019-10-11 湖南国科微电子股份有限公司 A kind of image parallel processing method, device and electronic equipment
CN110381321B (en) * 2019-08-23 2021-08-31 西安邮电大学 Interpolation calculation parallel implementation method for motion compensation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4779735B2 (en) * 2006-03-16 2011-09-28 パナソニック株式会社 Decoding device, decoding method, program, and recording medium
EP2290985B1 (en) * 2008-06-10 2017-05-03 Panasonic Intellectual Property Management Co., Ltd. Image decoding apparatus and image coding apparatus

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
AVS解码器中帧内预测算法的硬件实现;孙月萍等;《电视技术》;20081231;第32卷(第12期);全文 *
AVS解码器自适应帧内预测的硬件实现;姜伟等;《计算机工程与应用》;20081231(第36期);全文 *
H.264帧内预测和模式判断的并行硬件结构设计;张刚等;《电视技术》;20090131;第33卷(第01期);全文 *
JP特开2007-251605A 2007.09.27
姜伟等.AVS解码器自适应帧内预测的硬件实现.《计算机工程与应用》.2008,(第36期),
孙月萍等.AVS解码器中帧内预测算法的硬件实现.《电视技术》.2008,第32卷(第12期),
张刚等.H.264帧内预测和模式判断的并行硬件结构设计.《电视技术》.2009,第33卷(第01期),

Also Published As

Publication number Publication date
CN101902643A (en) 2010-12-01

Similar Documents

Publication Publication Date Title
CN102547296B (en) Motion estimation accelerating circuit and motion estimation method as well as loop filtering accelerating circuit
CN101710986B (en) H.264 parallel decoding method and system based on isostructural multicore processor
CN101330617B (en) Hardware implementing method and apparatus for anticipater within multi-standard frame based on mode mapping
CN102055981B (en) Deblocking filter for video coder and implementation method thereof
CN102088603B (en) Entropy coder for video coder and implementation method thereof
CN101902643B (en) Very large-scale integration (VLSI) structural design method of parallel array-type intraframe prediction decoder
CN102625108B (en) Multi-core-processor-based H.264 decoding method
CN101115207B (en) Method and device for implementing interframe forecast based on relativity between future positions
CN1589028B (en) Predicting device and method based on pixel flowing frame
CN102857756A (en) Transfer coder adaptive to high efficiency video coding (HEVC) standard
CN1652605B (en) Video codecs, data processing systems and methods for the same
CN102143361B (en) Video coding method and video coding device
CN102148990A (en) Device and method for predicting motion vector
CN102572430A (en) Method for implementing H.264 deblocking filter algorithm based on reconfigurable technique
CN101909212A (en) Multi-standard macroblock prediction system of reconfigurable multimedia SoC
CN102932643A (en) Expanded variable block movement estimation circuit suitable for HEVC (high efficiency video coding) standard
CN100568920C (en) The method and apparatus of the video image brightness interpolating of serial input and line output
CN100574460C (en) AVS inter-frame predicated reference sample extraction method
Han et al. Optimization of motion compensation based on GPU and CPU for VVC decoding
CN102420989B (en) Intra-frame prediction method and device
CN100469146C (en) Video image motion compensator
CN103780914B (en) Loop filter accelerating circuit and loop filter method
CN102595137A (en) Fast mode judging device and method based on image pixel block row/column pipelining
CN104602026B (en) A kind of reconstruction loop structure being multiplexed entirely encoder under HEVC standard
CN102055980A (en) Intra-frame predicting circuit for video coder and realizing method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130206

Termination date: 20180709