Summary of the invention
The object of the present invention is to provide a kind of method for estimating of hard-wired part parallelization, this method can satisfy the real-time requirement that the video communication class is used, and reduces the access times to memory, improves the H.264/AVC operating frequency of encoder.
The thinking of method of the present invention is, adopt the strategy of hierarchical search to realize that the integer picture element movement in the exercise estimator H.264 estimates, thereby make the execution that the integer picture element movement is estimated and the fraction movement estimation can part parallel, improved the processing speed of estimation.The part parallel that integer estimation of the present invention and fraction movement are estimated refers to, the integer estimation is divided into that rough layer integer picture element movement is estimated and detailed level integer picture element movement estimate two-layer, two-layer order execution; With fraction movement estimate to be divided into half picture element movement is estimated and 1/4 picture element movement estimate two-layer, two-layer order execution; H.264/AVC exercise estimator is at first carried out rough layer integer picture element movement and is estimated, executed in parallel detailed level integer picture element movement is estimated and half pixel of half picture element movement in the estimating calculating operation of transplanting then, carries out the estimation of 1/4 picture element movement at last.
The thinking of system of the present invention is, different according to the size of data that method for estimating of the present invention is handled and type, adopted different memory strategies, reduced visit capacity, realized that a hardware based parallelization finishes the H.264/AVC system of estimation external memory storage.Memory strategy of the present invention will provide detailed description among the embodiment hereinafter.
The invention is characterized in that this method realizes according to following steps in a digital integrated circuit chip:
Be used for the H.264/AVC method for estimating of encoder, it is characterized in that, this method realizes according to following steps in a digital integrated circuit chip:
Step (1) is moving image unit input current frame memory with the frame, and this current frame memory is that unit is current macro input current macro memory with the macro block; The video image of video encoding standard is unit input reference frame storing device with the frame meeting H.264/AVC;
Step (2), rough layer data pre-processor are carried out the preliminary treatment of rough layer data according to the following steps:
Step (2.1), the 1st data input buffer reads 32 current macro data from the current macro memory; The 2nd data input buffer reads 32 region of search data from the reference frame storing device;
Step (2.2), two data input buffers described in the step (2.1) are merged into one 128 bit data output to 4 32 bit data of importing continuously separately: 128 bit data of the 1st data input buffer output are delivered to the 1st mean filter array, and 128 bit data of the 2nd data input buffer output are delivered to the 2nd mean filter array;
Step (2.3), two mean filter arrays described in the step (2.2) obtain rough layer current macro data and rough layer integer motion estimation search area data simultaneously, and the each output of above-mentioned rough layer current macro data of obtaining and rough layer integer motion estimation search area data is 64;
Wherein, each pixel in the rough layer integer motion estimation search zone is tried to achieve with following formula:
Pel
c=(Pel
00+Pel
01+Pel
10+Pel
11)/4,
Pel
cA pixel of expression rough layer;
Pel
00, Pel
01Represent the capable 2*m of original reference frame search zone 2*n, a 2*m+1 pixel respectively, Pel
10, Pel
11Represent the capable 2*m of original reference frame search zone 2*n+1, a 2*m+1 pixel respectively, n, m are [0,23] between, thereby obtain the region of search of the rough layer estimation of a 48*48 position, the region of search of this macro block is centered close to current this macro block top left pixel;
Step (2.4), 1st, the 2 two mean filter array data of 64 of data storage array and region of search shift register array outputs behind the current macro mean filter successively respectively, each memory in these two arrays is deposited 8 pixel datas;
Step (3), rough layer exercise estimator are carried out the rough layer estimation according to the following steps, and prediction of output pattern and corresponding sports vector:
The residual computations device input current macro data and the rough layer motion estimation search area data of step (3.1), rough layer data pre-processor 8 2*2 pieces in the rough layer exercise estimator, residual error is represented with SAD;
Step (3.2), described 8 2*2 piece residual computations devices adopt the add tree structure to calculate 4*4,4*8,8*4,8*8,8*16,16*8 and the 16*16 residual error of all 2*2,2*4,4*2,4*4,4*8,8*4 and the 8*8 piece correspondence of totally 7 kinds of predictive mode correspondences in a clock cycle;
Step (3.3), Minimum Residual extent temporarily that obtain before the residual error of the most similar block search device current all sub-pieces under the various predictive modes that each clock comes relatively to obtain from described 8 2*2 piece residual computations devices and the present clock, preserve littler residual error data, if the residual error that present clock obtains is littler, then upgrade the motion vector of least residual correspondence according to current similar position; Circulation execution in step (3.1)~(3.3), search finishes up to the rough layer region of search, obtains least residual and corresponding motion vector thereof;
Step (3.4) after step (3.3) is carried out end, earlier the least residual addition of each self-corresponding each sub-piece under the above-mentioned various predictive modes, obtains the residual sum under every kind of predictive mode respectively; Secondly, 7 residual sums relatively, the predictive mode of selecting residual sum minimum wherein is as optimal mode;
Step (3.5), the optimum prediction mode that the motion vector equalizer obtains according to step (3.4) the motion vector addition of each sub-piece of this pattern correspondence, is got its average, obtains the motion vector of rough layer estimation;
Step (4), with parallel processor executed in parallel detailed level estimation and half pixel interpolation according to the following steps:
Step (4.1), rough layer exercise estimator and reference frame storing device are sent into integer pixel input buffer to the motion vector of rough layer estimation and reference frame data respectively successively;
Step (4.2), described integer pixel input buffer is input to detailed level region of search shift register array to the detailed level region of search data of 24*24, and described shift register array is sent into the detailed level exercise estimator to these detailed level region of search data again, calculates detailed level least residual and corresponding motion vector successively according to the following steps:
Step (4.2.1) is sent into a detailed level residual computations device to described detailed level region of search data, corresponding current macro data, calculates the residual error of 4*4,4*8,8*4,8*8,8*16,16*8 and 16*16 piece:
Described residual computations device is that the sub-residual computations device that 8*8 residual error data constitutes constitutes by 4 elements separately; In this sub-residual computations device, obtain the residual error of a 4*1 piece by the add tree addition with per 4 residual error data of delegation, the residual error addition of 4 4*1 pieces obtains the residual error of a 4*4 piece, in a clock cycle, calculate the residual error of all 4*4 pieces, again by the add tree structure, calculate the residual error of all sub-piece correspondences of current macro, circulation is carried out, until the detailed level region of search all disposes;
Step (4.2.2), the Minimum Residual extent that similarity device comes residual error that the described clock cycle of interim comparison step (4.2.1) obtains to obtain before the clock cycle therewith at each clock, if the residual error that this clock cycle obtains is littler, then upgrade the motion vector of minimum residual error correspondence according to current similar position; Circulation is carried out, and search finishes up to the detailed level region of search, obtains least residual and corresponding motion vector thereof;
Step (4.3), described integer pixel input buffer is sent into half pixel interpolation filter array to the half pel search area data of 30*30 simultaneously, described half pixel interpolation filter adopts 6 tap FIR filters of H.264/AVC standard appointment, described half pixel interpolation filter array uses 4 vertical half pixel interpolation filters with 8 of level half pixel interpolation filter, this 4 levels, half pixel interpolation filter uses 6 pixel interpolations in delegation's integer pixel, these 8 vertical half pixel interpolation filters use six pixel interpolations in a row integer pixel or row half pixel, and interpolation formula is:
Pel
h=roun
d((Pel
0-5Pel
1+20Pel
2+20Pel
3-5Pel
4+Pel
5)/32),
Wherein, round () expression rounds, and subscript h represents half pixel, and subscript 0~5 expression produces 6 the integer pixels and half pixel of half pixel
Described half pixel interpolation filter array is by half pixel interpolation memory I output integer pixel, simultaneously by A, B, C three and half pixel stores output level respectively, vertical half pixel and be sandwiched in level and vertical two interpolation between the value of half pixel;
Whether step (4.4) finishes with parallel computation end decision device judgement detailed level estimation and half pixel interpolation filtering calculating, as finishing, sends end signal;
Step (5), carry out half picture element movement estimation and the estimation of 1/4 picture element movement according to the following steps successively with half picture element movement estimator and 1/4 picture element movement estimator:
Step (5.1) is carried out half picture element movement with the half picture element movement estimator and is estimated, and exports half pixel least residual and corresponding half picture element movement vector, and its steps in sequence is as follows:
Step (5.1.1), detailed level exercise estimator on-chip memorizer interface input least residual and the corresponding motion vector in the half picture element movement estimator; Described three and half pixel store A, B, C import half pixel A, half pixel B and half pixel C to described on-chip memorizer interface;
Step (5.1.2), described on-chip memorizer interface carries out estimation to A, B, C three classes half pixel with 3 half picture element movement estimators respectively after by 3 half pixel input buffers A, B, C three classes half pixel being sent to three and half pel search regional memories respectively more according to the following steps:
At first, each half picture element movement estimator calculates the residual error of the 4*4 piece and the half pel search zone 4*4 piece of current macro with the residual computations device of a 4*4 piece;
Then, contrast the residual error of similar sub-piece that dissimilar regions of search obtain with a most similar selector, the least residual contrast of selecting the search of wherein minimum residual error and detailed level to obtain, draw similar of littler residual error correspondence the most similar of estimating for half picture element movement, draw its half pixel residual error and corresponding motion vector;
Step (5.2) is carried out 1/4 picture element movement with 1/4 picture element movement estimator and is estimated, obtains the least residual and corresponding 1/4 motion vector of 1/4 pixel according to the following steps:
Step (5.2.1) is all importing 1/4 pixel interpolation memory interface from the current macro of half pixel least residual of half picture element movement estimator output and corresponding motion vector, integer pixel I, half pixel interpolation memory A that half pixel interpolation memory I obtains, half pixel A, B, C and current macro memory that B, C obtain;
Step (5.2.2), 1/4 pixel interpolation calculator array from one 1/4 pixel that will calculate of described 1/4 pixel interpolation memory interface input adjacent two and half pixels or one and half pixels and an integer pixel average and obtain this 1/4 pixel, can obtain 12 kind of 1/4 pixel, but only calculate 8 kind of 1/4 pixel around the pixel of the most similar of half pixel that described half picture element movement estimator obtains;
Step (5.2.3), described 1/4 pixel interpolation calculator array deposits all 1/4 pixels of gained in 8 1/4 pel search regional memories respectively, and sends into 8 1/4 picture element movement estimators respectively;
Step (5.2.4), 8 1/4 picture element movement estimators obtain 1/4 pixel block and the residual error between the 4*4 piece of the current macro of internal memory in advance from described 8 1/4 pel search regional memories respectively with the residual computations device of a 4*4 piece respectively;
Step (5.2.5), the most similar selector is selected described in the step (5.2.4) similar of best 1/4 pixel of conduct of residual error minimum in 8 kinds of regions of search, and calculate corresponding motion vector, this residual error is compared with the least residual that half pel search obtains, similar conduct selecting minimum residual error correspondence be the best similar of final selection that generates of exercise estimator H.264/AVC again;
Step (6) is calculated the H.264/AVC final motion vector MV of exercise estimator generation with a final motion vector calculator
f, and deposit the motion estimation result memory in:
MV
f=8MV
c+4MV
p+2MV
h+MV
q,
Wherein, the motion vector of c, p, h, q correspondence is represented the motion vector that rough layer, detailed level, half pixel and 1/4 picture element movement estimator obtain respectively.
The H.264/AVC method for estimating that the present invention proposes can improve the computational speed of estimation greatly.Compare with the global search method, when adopting identical region of search, the H.264/AVC method for estimating that the present invention proposes can save for 64.5% computing time.The computing time of each computing unit of H.264/AVC movement estimation system and being compared as follows shown in the table of global search method that the present invention proposes.
|
The integer picture element movement is estimated |
Half picture element movement is estimated |
1/4 picture element movement is estimated |
Amount to |
Global search |
2480 |
484 |
168 |
3132 |
New method |
744 |
236 |
168 |
1148 |
Relatively |
30.0% |
48.8% |
100% |
35.5% |
Integer data in the table represents that two kinds of methods finish the used clock number of identical calculations, and percentage represents that the method for estimating that the present invention proposes finishes the percentage that the used clock number of identical calculations accounts for the global search method.
Embodiment
Distinguishing feature of the present invention is, data dependence of estimating according to integer estimation and fraction movement in the operation principle of estimation H.264/AVC, the motion estimation process and video communication class are used the requirement to the compressed image quality, have proposed the method for estimating of part parallel.The method for estimating that the present invention proposes makes integer estimation and fraction movement estimate partly executed in parallel, thereby reaches the purpose that improves estimation speed.
Another characteristics of the present invention are, a kind of motion vector prediction value generation method and rational hunting zone have been adopted, make exercise estimator system of the present invention adjacent macroblocks when the integer estimation can share the region of search data of reference frame, each sub-piece of current macro can be shared the region of search data of reference frame when half picture element movement is estimated, thereby reaches the purpose that reduces memory accesses.
The 3rd characteristics of the present invention are, have adopted a kind of reusable hardware configuration, and this structure is used to calculate SAD when half pixel and the estimation of 1/4 picture element movement, thereby has reduced the hardware resource expense that half picture element movement is estimated and 1/4 picture element movement is estimated.
The 4th characteristics of the present invention are to have proposed a kind of digital circuitry, and this system has realized parallelization method for estimating proposed by the invention.This system can be from present frame and reference frame storing device reading of data, finish estimation fast.
To be elaborated to the specific embodiment of the present invention with reference to the accompanying drawings below.
Fig. 1 shows the flow chart of classical global search exercise estimator.H.264/AVC estimation can be divided into two stages in the standard: the integer picture element movement is estimated and the mark picture element movement is estimated.Its mid-score picture element movement estimates to comprise again that half picture element movement is estimated and 1/4 picture element movement is estimated.Other estimation of different accuracy level comprises two links of data preliminary treatment and the most similar block search again.The function of data preliminary treatment link is: according to macro block or the position of sub-piece and the search precision of estimation when pre-treatment, produce corresponding region of search view data, these data may be to read from the external memory storage of depositing reference frame, also may need from the reference frame storing device, to obtain by interpolation calculation after the reading of data, but all must before the most similar block search link, obtain.The function of the most similar block search link is: in the region of search that data preliminary treatment link produces the most similar of search current macro or sub-piece.The result of the estimation of more coarse precision need be used in the region of search of calculating current precision estimation, because the hunting zone that the integer picture element movement is estimated is bigger, the motion vector of each the sub-piece of current macro that obtains at last also relatively disperses, and therefore can not estimate the position in current macro mark picture element movement estimating searching zone in advance.Under the restriction of this data dependence, the global search estimation must be carried out in proper order.Therefore the flow chart (Fig. 1) of global search method for estimating also is the state transition diagram that this method is carried out.
The people of familiar H.264 video compression coding standard is clear, if the region of search of integer estimation is smaller, the fraction movement that carries out is so on its basis estimated also to be limited in the limited scope, so just can calculate half interior pixel data of this scope in advance to produce half pel search zone, thereby after paying the cost that increases a little amount of calculation and memory space, improve the concurrency and the computational speed of whole estimation.And in video communication etc. was used, the movement velocity of the video image of processing was also unhappy, and excursion is also little, even most of image (background) all is fully constant, so that the hunting zone of estimation does not need is very big.The present invention just is being based on above-mentioned consideration, has proposed a kind of H.264/AVC method for estimating of part parallel.Fig. 2 a shows the fundamental diagram of the method for estimating of the present invention's proposition.The method that the present invention proposes has adopted the strategy of hierarchical search when the integer picture element movement is estimated: be divided into the rough layer estimation and the detailed level estimation is two-layer.Like this, after the rough layer estimation finished, the region of search of detailed level estimation can be little a lot; At this moment, directly calculate detailed level motion estimation search zone and half pixel on every side thereof, the half pel search zone of calculating each sub-piece more respectively according to the motion vector of each height piece after finish in the detailed level estimation will be saved the plenty of time, therefore is easier to satisfy real-time.In addition, detailed level search and half pixel interpolation calculate the raw image data that all needs to read in reference frame, and most of view data of both needs is identical.Based on above-mentioned consideration, the method that the present invention proposes is carried out detailed level estimation and half pixel interpolation simultaneously and is calculated after the rough layer search.Be that detailed level estimation and half pixel interpolation calculated executed in parallel (being the parallel organization shown in Fig. 2 a) after rough layer integer estimation finished, just begin half picture element movement after both all finish and estimate.After half picture element movement is estimated to finish, carry out 1/4 pixel interpolation successively and calculate and estimation.At last, final motion vector and the corresponding residual error data of motion vector calculation that obtains according to the estimation of each precision.
Fig. 2 b shows the flow chart of the method for estimating realization of the present invention's proposition, and the branch following steps are carried out successively:
1. according to the parameter of the characteristics of image initialization movement estimation system of handling, comprise the wide height of initialization frame figure, each module status of initialization movement estimation system etc.;
2. import current macro data and corresponding rough layer motion estimation search area data thereof, and calculate rough layer data, i.e. rough layer pre-processing image data by mean filter;
3. the rough layer estimation obtains the predictive mode of current macro and the motion vector of 4 pixel accuracies, and this vector is the average of each sub-piece motion vector in the current macro;
4. executed in parallel detailed level estimation and half pixel interpolation calculate, and obtain the motion vector and the half pel search area data of integer pixel accuracy;
5. half picture element movement is estimated, obtains the motion vector of half-pixel accuracy;
6.1/4 pixel interpolation calculates and estimation, obtains the motion vector of 1/4 pixel accuracy;
7. calculate the motion vector and the corresponding residual error data of 1/4 the heaviest pixel accuracy according to the motion estimation result of above-mentioned each precision;
8. deposit the result who obtains in 7 in memory.
Wherein, executed in parallel module implementation is as described below:
When A. importing reference frame integer pixel data, carry out half pixel interpolation and calculate.
B. judge simultaneously whether the integer pixel is the region of search data of detailed level estimation, if not, then do not operate; If, then deposit data in shift register, after all region of search data are all imported, beginning detailed level estimation.
C. when detailed level estimation and half pixel interpolation calculate all finishes after, whole parallel organization is operated and is finished.
Shown in Fig. 2 a, H.264/AVC encoder system sees it is that order is carried out on the whole, " but the estimation of detailed level integer picture element movement " unit and " half pixel interpolation " unit are parallel on the basis of " input of integer pixel " unit, so, these three modules are merged to a state, can obtain the state transition diagram of the exercise estimator system hardware realization of the present invention's proposition, shown in Fig. 2 c.Among the figure, whole exercise estimator zero clearing was not worked when ME_reset was high; When ME_reset was low, if Int_Pred_en is high, the presentation code device had been selected inter-frame forecast mode.As CMB_start when being high, show and to handle current macro that exercise estimator changes operating state over to from idle condition.Data_Pretreat_fin can begin to search for the most similar of rough layer for the data that high expression rough layer estimation needs have been ready to.When Int_4_Pred_fin was high, expression rough layer estimation finished, and begins to carry out parallel computation; After having finished the input of integer pixel, the calculating of half pixel interpolation and detailed level estimation, Mixed_Operation_fin is high, and exercise estimator begins to carry out the mark picture element movement and estimates.HME_fin represents finishing of half picture element movement estimation when being high; Quat_ME_fin represents finishing of estimation of 1/4 picture element movement and final motion vector calculation when being high.When 1/4 picture element movement is estimated to finish,, then continue next macro block is carried out estimation if Int_Pred_en and CMB_start are high; Otherwise whole exercise estimator changes idle condition over to.
Fig. 3 shows the structured flowchart of the H.264/AVC exercise estimator system of the present invention's proposition.Wherein, rough layer data pre-processor, rough layer exercise estimator, parallel organization integer pixel data input structure, detailed level exercise estimator, half pixel interpolation computing array, half picture element movement estimator, 1/4 picture element movement estimator and final motion vector calculator are the data processing unit of exercise estimator system H.264/AVC; Present frame/reference frame storing device, current macro memory, rough layer preliminary treatment result memory, detailed level motion estimation search regional memory, half pixel interpolation memory, half picture element movement estimating searching regional memory and motion estimation result memory are the data storage cells of exercise estimator system H.264/AVC.The particular hardware of above-mentioned data processing and memory cell realizes details are as follows:
1. rough layer data pre-processor
In video compression technology, usually the motion vector of the motion vector prediction current macro of the adjacent block by current macro, be the center, region of search with this motion vector prediction value then, the view data of region of search is carried out best similar search in the input reference frame.This method is carried out motion vector prediction at all macro blocks of present frame, has increased amount of calculation; Simultaneously, unpredictable because the center, region of search of all macro blocks of present frame is all inequality and distribution does not have rule, so must import the region of search data of each macro block respectively, increased visit capacity to memory.The present invention adopts (0 when rough layer integer estimation, 0) as the motion vector prediction value of each sub-piece of current macro, promptly with the coordinate (position of current macro top left pixel in present frame) of current macro as the center, region of search of this macro block, overcome the deficiency of above-mentioned conventional motion method for vector prediction.The method that the present invention adopts makes that the region of search data of each height piece can be read in together in the same macro block, and the part that overlaps need not repeat input; And most of region of search data of adjacent macroblocks can share, and have reduced the visit to memory.Fig. 4 a shows the motion vector prediction method that the present invention adopts.With the region of search is that 48*48 is an example.Shown in Fig. 4 a, there are 9 MB to constitute the region of search of reference frame when handling macro block 1 (MB1), wherein 6 can be used when handling MB2 after the process simple shift, and this will save 2/3 number of memory accesses.
Fig. 4 b shows the simple block diagram of rough layer data pre-processor.This preprocessor comprises: two data input buffers, two mean filter arrays, 2 region of search shift register array A, B, data storage behind the current macro mean filter.Wherein, 32bit current macro data and 32bit region of search data read from current frame memory and reference frame storing device respectively, and two data input buffers are merged into 1 128bit data with 4 32bit data of importing continuously and outputed to the mean filter array; The mean filter array is used to obtain rough layer integer motion estimation search area data and rough layer current macro data, is output as 64bit; Current macro after the compression leaves in the data storage behind the current macro mean filter, and rough layer region of search data are divided into 2 and leave among region of search shift register array A and the B.Region of search shift register array A and B are respectively the shift register arrays of 16 row, 24 row and 8 row, 24 row, each register can be deposited a 8bit pixel data, data storage behind the current macro mean filter is the register array of a 8*8, each register can be deposited a 8bit pixel data, and can be by direct read, above-mentioned memory has constituted the rough layer preliminary treatment result memory shown in Fig. 3.
The method of obtaining rough layer integer motion estimation search zone among the present invention has been shown among Fig. 4 c, and 4 of each 2*2 piece original pixel mean filters obtain a rough layer region of search pixel in the reference frame.In the present invention, the region of search of rough layer estimation is that the computing formula of the piece mean filter of 48*48 is
Pel
c=(Pel
00+Pel
01+Pel
10+Pel
11)/4
Wherein subscript c represents to generate the subscript of rough layer pixel, the subscript of the capable 2*m of 00 and 01 expression original reference frame search zone 2*n, a 2*m+1 pixel, the subscript (n, m are between [0,23]) of the capable 2*m of 10 and 11 expression original reference frame search zone 2*n+1, a 2*m+1 pixel.
2. rough layer exercise estimator
As indicated above, H.264 in the estimation of video compression standard, there are 7 kinds of predictive modes, the cost of wherein encoding is minimum is referred to as optimum prediction mode.Usually, when estimation finishes, determine optimum prediction mode by the coding cost, the coding cost function is
J
mode(Mode/λ
mode)=SAD(Mode)+λ
mode*R(Mode)
Wherein SAD (Mode) and R (Mode) are respectively the coding cost of image residual error data and other information (comprising motion vector), λ
ModeIt is Lagrange multiplier.The people who is familiar with video compression technology knows that the coding cost of residual error data plays a leading role in whole coding cost, therefore ignore the coding cost of other information usually.If but 1/4 picture element movement is estimated to finish to select optimum prediction mode according to SAD (Mode) more by the time, amount of calculation is very big, therefore in the present invention, to determine the optimum prediction mode of estimation after rough layer integer estimation finishes, this will reduce the H.264/AVC amount of calculation of estimation greatly.Simultaneously, the rough layer estimation also obtains the motion vector of each sub-piece in the current macro.When the detailed level estimation, if with these motion vectors is the center, region of search, then the region of search data of each sub-piece all import respectively, in fact the sub-block search area data of Shu Ru each much all is repetition, this has increased the visit capacity to the reference frame storing device.Therefore, in the present invention the motion vector of each sub-piece of current macro is averaged as the center, region of search of whole macro block detailed level integer estimation, the region of search of each height piece can be read in together when this made the detailed level estimation, thereby had saved the time of visit reference frame storing device.
Fig. 5 a shows the simple block diagram of rough layer exercise estimator among the present invention.This exercise estimator input is output as the average and the predictive mode of each sub-piece motion vector of current macro through pretreated current macro and region of search data.This exercise estimator comprises: a SAD calculator, the most similar block search device, a predictive mode selector and a motion vector equalizer that calculates each sub-piece motion vector average of current macro.
Fig. 5 b shows the hardware configuration of rough layer estimation SAD calculator among the present invention.SAD_n_m_l represents the SAD that calculates among the figure, and wherein n and m represent the columns and the line number of the pixel block of SAD correspondence respectively, the sequence number of SAD under the same line number columns of 1 expression.Before the rough layer estimation, each circle (being called PE) has all been deposited a rough layer current macro pixel among the figure; After the rough layer estimation began, all there were the region of search pixel input that 8 palpuses of delegation handle and 8 PE that are passed to respective column each clock cycle, calculated absolute value poor of current macro pixel and region of search pixel; Encirclement is an add tree with the dotted rectangle of 2 PE of delegation, obtains the SAD of 2*1 piece; Result of calculation is sent in the little rectangle on the right, and this rectangle is used for the 2*1SAD of one's own profession 2*1SAD and its lastrow adduction preservation result of calculation mutually, also can play time-delay, synchronous effect simultaneously; Per 2 2*1SAD additions obtain the SAD of a 2*2 piece.Fig. 5 c shows the hardware configuration that calculates a 2*2 piece SAD in the SAD calculator of rough layer exercise estimator among the present invention, wherein: Pel
cAnd Pel
rThe pixel of representing current macro and region of search respectively, they subtract each other in a PE and ask absolute value; The result of calculation addition of two PE has obtained the SAD of a 2*1, and is stored in the memory on right side; 2*1SAD addition in adder in two memories obtains the SAD of a 2*2.By this structure, in one-period, can calculate the SAD of all 2*2 pieces; Add the add tree of PE array below, can calculate the SAD of all 2*2,2*4,4*2,4*4,4*8,8*4 and the 8*8 piece correspondence of rough layer current macro; When next clock arrives, import new delegation's region of search data, continue to repeat said process ...As seen, this hardware configuration can calculate the SAD of all 2*2,2*4,4*2,4*4,4*8,8*4 and the 8*8 piece correspondence of 7 kinds of predictive mode correspondences such as 4*4,4*8,8*4,8*8,8*16,16*8 and 16*16 in a clock.
The most similar block search device is finished following function: the size of minimum SAD before each clock compares the SAD of all sub-pieces that present clock obtains and present clock temporarily, preserve littler SAD data in the register of depositing minimum SAD; If the SAD that obtains of present clock is littler, then upgrade the motion vector of minimum SAD correspondence according to current similar position.Search finishes up to the rough layer region of search in aforesaid operations circulation execution, obtains minimum SAD and corresponding motion vector thereof this moment.
The predictive mode selector begins to carry out after the most similar block search device quits work, it is at first the minimum SAD addition respectively of each sub-piece of 7 kinds of pattern correspondences such as 4*4,4*8,8*4,8*8,8*16,16*8 and 16*16, obtain 7 kinds of SAD sums under the pattern respectively, compare 7 SAD sums then, the predictive mode of selecting SAD sum minimum is as optimum prediction mode.The optimum prediction mode that the motion vector equalizer is selected according to the predictive mode selector with the motion vector addition of each sub-piece of this pattern correspondence, is got average then, obtains the motion vector of rough layer estimation.
3. the parallel processor of detailed level estimation and half pixel interpolation
Fig. 6 a shows the relation of detailed level motion estimation search zone and the required integer pixel of half pixel interpolation.Stain is represented the piece of 16*16 of detailed level motion estimation search regional center of current macro correspondence and the detailed level region of search that ash point constitutes 24*24 together, and the most similar of each sub-piece of current macro is inevitable in this region of search.Because the motion vector decision that the region of search that half picture element movement is estimated is obtained by detailed level integer estimation, and the 6 tap finite response digital filters (FIR filter) that calculate half pixel need each 3 integer pixel of half pixel both sides, so calculate the required integer pixel of half pel search area image data for being that the size at center is the pixel block (being made of black, ash, white three kinds of picture elements) of 30*30 with detailed level motion estimation search zone.Detailed level motion estimation search zone has a large portion to overlap with the required integer pixel of half pixel interpolation, and this is the important prerequisite that the present invention proposes the parallel organization method for estimating.
Fig. 6 b shows the parallel processing flow process of half pixel interpolation and detailed level estimation among the present invention.After the rough layer estimation finishes, carry out half pixel interpolation according to the reference frame initial data of the motion vector input 30*30 that obtains, the region of search data of detailed level estimation are preserved simultaneously.The region of search data and half pixel interpolation of input detailed level estimation calculate parallel carrying out; After detailed level motion estimation search area data was all imported, detailed level estimation and half pixel interpolation calculated parallel carrying out; After judging module determines that the both finishes, begin to carry out half picture element movement and estimate.And according to video compression standard H.264, half picture element movement estimation and 1/4 picture element movement estimate to use half pixel block and the integer pixel block of 26*26, therefore, 26,*26 half pixel and corresponding integer pixel that the calculating of half pixel interpolation is produced leave in the half pixel interpolation memory, so that use these data in half picture element movement estimation and the estimation of 1/4 picture element movement.
Fig. 6 c shows the simple block diagram of parallel processor among the present invention.This parallel processor comprises: an integer pixel input buffer, one and half pixel interpolation filter arrays, half pixel data memory I, A, B, C, a detailed level motion estimation search regional memory, a detailed level exercise estimator and a parallel computation finish decision device.Wherein, integer pixel input buffer is converted to the 240bit pixel data with the 32bit pixel data of importing and outputs to half pixel interpolation filter array and detailed level motion estimation search regional memory.
Fig. 6 d shows the interpolation filter array structure that the present invention adopts.This structure has been used 4 horizontal filters and 8 vertical filter, and horizontal filter uses 6 pixel interpolations in delegation's integer pixel, and vertical filter is used 6 pixel interpolations in the row integer pixel.Drawn circle, triangle, five-pointed star is half different pixels of three classes among the figure, deposits C, B, A three and half pixel data memories respectively in; The integer pixel of square representative then deposits half pixel data memory I in, and promptly the pixel that frame of broken lines is surrounded among the figure all deposits half pixel data memory in.The present invention has adopted 6 of H.264/AVC standard code to take out the FIR filter, and interpolation formula is
Pel
h=round((Pel
0-5Pel
1+20Pel
2+20Pel
3-5Pel
4+Pel
5)/32)
Wherein, function round () expression rounds operation, and subscript h represents half pixel, and subscript 0~5 expression produces 6 the integer pixels or half pixel of half pixel.
Fig. 6 e shows the storage policy of detailed level region of search data among the present invention.The region of search of detailed level estimation is the piece of 24*24, data are divided into 3 equal pieces (8*24) Top, Middle and Bottom, leave among 4 shift register array T, M1, M2 and the B, wherein the Middle piece is left among M1 and the M2 simultaneously, and T, M1, M2 and B have then constituted the detailed level motion estimation search regional memory shown in Fig. 3 and Fig. 6 c.
Fig. 6 f shows the simple block diagram of detailed level exercise estimator among the present invention, and the detailed level exercise estimator comprises among the present invention: a SAD calculator and a block search device the most similar.Fig. 6 g shows the hardware configuration of the SAD that calculates 8*8,8*4,4*8 and 4*4 piece in the SAD calculator, SAD_n_m_l represents the SAD that calculates among the figure, wherein n and m represent the columns and the line number of the pixel block of SAD correspondence respectively, the sequence number of SAD under the same line number columns of 1 expression.Before the detailed level estimation, each circle (being called PE) has all been deposited a current macro pixel among the figure; After the detailed level estimation began, all there was the input of delegation 8 regions of search pixel each clock cycle and is passed to 8 PE of same row, calculated absolute value poor of current macro pixel and region of search pixel; Encirclement is an add tree with the dotted rectangle of 4 PE of delegation, obtains the SAD of 4*1 piece; Result of calculation is sent in the little rectangle on the right, and this rectangle is used for the 4*1SAD of one's own profession 4*1SAD and its lastrow adduction preservation result of calculation mutually, also can play time-delay, synchronous effect simultaneously; Per 4 4*1SAD additions obtain the SAD of a 4*4 piece.By this structure, in one-period, can calculate the SAD of all 4*4 pieces; Add suitable add tree, can calculate the SAD of all sub-piece correspondences of current macro; When next clock arrives, import new delegation's region of search data, continue to repeat said process ...As seen, this hardware configuration can produce the SAD of each sub-piece of optimum prediction mode correspondence in conjunction with the above-mentioned shift register array strategy that the present invention adopts in a clock.And whole SAD calculator is made of structure shown in 4 Fig. 6 g and more add tree.
The most similar block search device is finished following function: the size of minimum SAD before each clock compares SAD that present clock obtains and present clock temporarily, preserve littler SAD data in the register of depositing minimum SAD; If the SAD that obtains of present clock is littler, then upgrade the motion vector of minimum SAD correspondence according to current similar position.Search finishes up to the rough layer region of search in aforesaid operations circulation execution, obtains minimum SAD and corresponding motion vector thereof this moment.
Fig. 6 h shows the sequential schematic diagram of parallel processor work among the present invention.Mixed_en, HDP_en, SA_data_in_en and IME_en represent the enable signal of parallel work-flow, half pixel interpolation, the input of detailed level motion estimation search zone and 4 modules of detailed level integer picture element movement estimation respectively, on these signals, also identified simultaneously and enabled the effective time, be i.e. CLK0 shown in the figure, CLK216, CLK270 and CLK345.Data_in_row_cnt is the counter output of input integer number of rows of picture elements, and the integer pixel of parallel processor input is the piece of 30*30, so have 30 row, counter is output as 0~29.SA_top_store_en, SA_middle_store_en and SA_bottom_store_en are the enable signal of writing of 4 shift register array groups of the region of search data of storage detailed level integer estimation.Owing to have two M1 and M2 to deposit same data in 4 shift register arrays, control with same control signal SA_middle_store_en.
4. half picture element movement is estimated and the estimation of 1/4 picture element movement
Fig. 7 a shows the position relation of integer pixel (black color dots) and half pixel (ABC), 1/4 pixel (1~12).In the present invention, half pixel of diverse location constitutes different regions of search respectively with 1/4 pixel, half picture element movement estimator and 1/4 picture element movement estimator are searched for respectively in these regions of search, seek the most similar, the most similar SAD by more different regions of search determines best similar and corresponding motion vector then.In addition, because in the data input that mainly spends in half pixel and 1/4 pel search zone computing time that half picture element movement is estimated and 1/4 picture element movement is estimated, seek the most similar then simple relatively, so the present invention has adopted a kind of reusable hardware configuration of saving resource to realize that half picture element movement is estimated and the estimation of 1/4 picture element movement.In the H.264/AVC exercise estimator that the present invention proposes, no matter the predictive mode of selecting after the rough layer estimation is any, when half pixel and the estimation of 1/4 picture element movement, be unit all, calculate the similarity of each sub-piece in current sub-block and half pixel or the 1/4 pel search zone with 4*4 piece.Circulate and finish the most similar block search of all 4*4 pieces of current macro for 16 times.Again according to the predictive mode of selecting for use, calculate the most similar and corresponding motion vector of corresponding sub-piece at last.Shown in Fig. 7 a, need half pixel of search that three kinds of A, B, C are arranged, and 1/4 pixel that need search for have 8 kinds in 1~12, i.e. 8 1/4 pixels around half pixel or the integer pixel among Fig. 7 a.Adopt hardware configuration of the present invention, can save nearly 90% hardware resource than all sub-pieces of while parallel computation current macro to similar method of region of search, then do not increase computing time.
Fig. 7 b shows the simple block diagram of half picture element movement estimator among the present invention.This exercise estimator is input as A, B, C three classes half pel search area data, integer picture element movement vector sum SAD, is output as half picture element movement vector sum SAD.This circuit that the present invention adopts comprises: one and half pixel interpolation memory interfaces, and three memories that are used for depositing A, B, C three classes half pel search zone, three are used for carrying out half picture element movement estimator and a most similar selector.Wherein, half pixel interpolation memory interface is used for producing address signal and the read data enable signal of half pixel interpolation memory A, B, C, comprises that simultaneously three and half pixel input buffers are used for keeping in A, B, C three classes half pixel.A half picture element movement estimator comprises a 4*4 piece SAD calculator, is used for calculating the SAD of current 4*4 piece and region of search 4*4 piece.The most similar selector finished following function: contrast the SAD of similar sub-piece that dissimilar regions of search obtain; The minimum SAD contrast of selecting minimum SAD and detailed level search to obtain, similar of littler SAD correspondence for the half picture element movement estimation the most similar.
Fig. 7 c shows the simple block diagram of 1/4 picture element movement estimator among the present invention.This exercise estimator is input as SAD and the motion vector that half picture element movement is estimated, and A, B, C three classes, half pixel and integer pixel I, is output as 1/4 picture element movement vector sum SAD.This circuit that the present invention adopts comprises: one and half pixel interpolation memory interfaces, one 1/4 pixel interpolation computing array, 8 1/4 pel search regional data store, 8 1/4 picture element movement estimators and a most similar selector that comprise a 4*4 piece SAD calculator respectively.Wherein, 1/4 pixel is averaged by two and half adjacent pixels or integer pixel and is obtained, and Fig. 7 d shows the corresponding relation that is generated 1/4 pixel among the present invention by integer pixel and half pixel.In 12 kind of 1/4 pixel shown in Fig. 7 a, 8 kind of 1/4 pixel around the pixel that half pixel that selection half picture element movement estimator obtains is the most similar deposits 8 region of search memories respectively in, and send into 8 1/4 picture element movement estimators, calculate the SAD of this 8 kind of 1/4 pixel block and current 4*4 piece respectively by the 4*4 piece SAD calculator in these 8 1/4 picture element movement estimators.The most similar selector finished following function: select similar of best 1/4 pixel of conduct of SAD minimum in 8 kinds of regions of search, and calculate corresponding motion vector; With the SAD contrast in this SAD and half pel search zone, similar conduct of the SAD correspondence that selection is minimum be final best similar of selecting of exercise estimator system H.264/AVC.
5. final motion vector calculator
H.264/AVC the formula of the final motion vector of exercise estimator system-computed is as follows:
MV
f=8MV
c+4MV
p+2MV
h+MV
q,
Wherein, the motion vector of subscript f, c, p, h, q correspondence is represented the final motion vector of whole exercise estimator and the motion vector that rough layer, detailed level, half pixel and 1/4 picture element movement estimator obtain respectively.
6.H.264/AVC the memory of exercise estimator system
Present frame/reference frame storing device is realized by SRAM, must be able to deposit CIF format-pattern (352*288 pixel), the view data that address signal that produces by system and read signal therefrom read present frame or reference frame.The current macro memory is the register array of 16*16, and each register cell can be deposited the 8bit pixel data, and can be read and write separately.The motion estimation result memory is used for depositing the optimum prediction mode and the corresponding final motion vector and the residual error data thereof of H.264/AVC exercise estimator system generation.
Although the present invention describes with reference to some preferred embodiment example, should remember sincerely that scope of the present invention is not limited in these concrete execution modes.All drop within design philosophy of the present invention and the scope for modification and variations of the present invention that the present invention did, covering scope of the present invention defines in following claim.