CN100471275C - Motion estimating method for H.264/AVC coder - Google Patents

Motion estimating method for H.264/AVC coder Download PDF

Info

Publication number
CN100471275C
CN100471275C CN 200610113030 CN200610113030A CN100471275C CN 100471275 C CN100471275 C CN 100471275C CN 200610113030 CN200610113030 CN 200610113030 CN 200610113030 A CN200610113030 A CN 200610113030A CN 100471275 C CN100471275 C CN 100471275C
Authority
CN
China
Prior art keywords
pixel
search
data
residual
picture element
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 200610113030
Other languages
Chinese (zh)
Other versions
CN1933600A (en
Inventor
罗嵘
杨春雷
杨华中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING HUAXIA DENTSU TECHNOLOGY CO LTD
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN 200610113030 priority Critical patent/CN100471275C/en
Publication of CN1933600A publication Critical patent/CN1933600A/en
Application granted granted Critical
Publication of CN100471275C publication Critical patent/CN100471275C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A motion estimating method of H.264 / AVC coder divides integer motion estimation to be rough layer and fine layer being estimated separately in sequence then inputting parallel interpolation of half pixel and fine layer motion estimation into reference frame data and integer pixel data of rough layer motion vector separately and carrying out half pixel motion estimation as well as quarter pixel motion estimation for calculating out final minimum residual error and its corresponding final motion vector by utilizing motion vector corresponding to minimum residual error under optimum prediction mode obtained from rough layer motion estimation.

Description

Be used for the H.264/AVC method for estimating of encoder
Technical field
The present invention relates to a kind of method of estimation, particularly relate to a kind of H.264/AVC hardware realization of the rapid motion estimating method of encoder that is used for.
Background technology
H.264/AVC standard is the video compression standard towards the low bit-rate video communications applications that the MPEG (Motion Picture Experts Group) by the VCEG (video experts group) of the Standardization Sector ITU-T of international telecommunication union telecommunication and the ISO of International Standards Organization develops jointly, but its excellent compression performance makes its effect be not limited only to this, this standard all will play a significant role at aspects such as digital television broadcasting, real-time video communication, network video stream communicate and multi-media SMSs, obtain especially in recent years paying close attention to widely and studying.H.264/AVC the motion estimation algorithm that adopts owing to it to a great extent for the compressed capability of moving image of standard, that is: the pixel block with 16*16 is defined as macro block (MB), each macro block can be divided into 7 kinds of sizes, the variform sub-piece (block) of 16*16,16*8,8*16,8*8,8*4,4*8 or 4*4 when coding, and the dividing mode of this seed block is called predictive mode.Selected part frame figure from sequence of frames of video and is that unit encodes fully to the pixel data of these reference frames with the macro block as the reference frame in when coding; And when other non-reference frames of coding, also be that unit encodes with the macro block, method is as follows.At first pass through search, contrast during coding, determine under each predictive mode each sub-piece in the current macro in reference frame corresponding the most similar, obtain the displacement of each sub-piece between the most similar position in position and its of present frame at reference frame---be called motion vector, and the difference between current sub-block and the similar same position pixel---be called residual error, last optimum prediction mode and corresponding motion vector and the residual error data thereof that only needs coding to select according to each predictive mode corresponding codes cost (size of coding back code stream).Cooperate integral discrete cosine transform and entropy coding just can reduce the data volume of moving image greatly above-mentioned estimation strategy.In motion estimation process H.264/AVC, must consider the coding cost of all predictive modes, promptly at each macro block, must calculate the motion vector and the residual error data of 41 sub-pieces of its 7 kinds of predictive mode correspondences; And the precision of motion vector must be 1/4 pixel unit.
Each sub-piece process of the most similar in reference frame is called estimation in the search current macro.Need in reference frame, determine the hunting zone of the most similar of a search current sub-block during estimation---be called the region of search.The most accurate implementation is the global search method: calculate the similarity (the absolute value sum SAD according to the pixel differences between two pixel blocks weighs usually) of current sub-block and the sub-piece in region of search with the unit pixel for the step-size in search pointwise in the region of search, determine that the piece of similarity maximum (SAD minimum) is the most similar sub-piece---be called the integer picture element movement and estimate; With 1/2 pixel unit is step-length, and search current sub-block and the most similar sub-piece of determining 1/2 pel search zone are on every side determined similar of best 1/2 pixel---be called half picture element movement and estimate; With 1/4 pixel unit is step-length, and similar of best 1/4 pixel is determined in similar 1/4 pel search zone on every side of search current sub-block and the best 1/2 pixel of determining---be called 1/4 picture element movement and estimate; Final motion vector and the residual error data of determining predictive mode and correspondence thereof.Said process is in sequence.
Though the global search method can access best motion estimation result, its amount of calculation is surprising, and needs frequent visit to deposit the memory of reference frame, needs very long computing time.And H.264/AVC standard application is in fields such as video communication, digital broadcasting, the transmission of Web TV Streaming Media, and very high to the requirement of real-time, big like this amount of calculation and computing time can't be satisfied this requirement.Therefore, Shi Yong H.264/AVC encoder need adopt rapid motion estimating method.
Summary of the invention
The object of the present invention is to provide a kind of method for estimating of hard-wired part parallelization, this method can satisfy the real-time requirement that the video communication class is used, and reduces the access times to memory, improves the H.264/AVC operating frequency of encoder.
The thinking of method of the present invention is, adopt the strategy of hierarchical search to realize that the integer picture element movement in the exercise estimator H.264 estimates, thereby make the execution that the integer picture element movement is estimated and the fraction movement estimation can part parallel, improved the processing speed of estimation.The part parallel that integer estimation of the present invention and fraction movement are estimated refers to, the integer estimation is divided into that rough layer integer picture element movement is estimated and detailed level integer picture element movement estimate two-layer, two-layer order execution; With fraction movement estimate to be divided into half picture element movement is estimated and 1/4 picture element movement estimate two-layer, two-layer order execution; H.264/AVC exercise estimator is at first carried out rough layer integer picture element movement and is estimated, executed in parallel detailed level integer picture element movement is estimated and half pixel of half picture element movement in the estimating calculating operation of transplanting then, carries out the estimation of 1/4 picture element movement at last.
The thinking of system of the present invention is, different according to the size of data that method for estimating of the present invention is handled and type, adopted different memory strategies, reduced visit capacity, realized that a hardware based parallelization finishes the H.264/AVC system of estimation external memory storage.Memory strategy of the present invention will provide detailed description among the embodiment hereinafter.
The invention is characterized in that this method realizes according to following steps in a digital integrated circuit chip:
Be used for the H.264/AVC method for estimating of encoder, it is characterized in that, this method realizes according to following steps in a digital integrated circuit chip:
Step (1) is moving image unit input current frame memory with the frame, and this current frame memory is that unit is current macro input current macro memory with the macro block; The video image of video encoding standard is unit input reference frame storing device with the frame meeting H.264/AVC;
Step (2), rough layer data pre-processor are carried out the preliminary treatment of rough layer data according to the following steps:
Step (2.1), the 1st data input buffer reads 32 current macro data from the current macro memory; The 2nd data input buffer reads 32 region of search data from the reference frame storing device;
Step (2.2), two data input buffers described in the step (2.1) are merged into one 128 bit data output to 4 32 bit data of importing continuously separately: 128 bit data of the 1st data input buffer output are delivered to the 1st mean filter array, and 128 bit data of the 2nd data input buffer output are delivered to the 2nd mean filter array;
Step (2.3), two mean filter arrays described in the step (2.2) obtain rough layer current macro data and rough layer integer motion estimation search area data simultaneously, and the each output of above-mentioned rough layer current macro data of obtaining and rough layer integer motion estimation search area data is 64;
Wherein, each pixel in the rough layer integer motion estimation search zone is tried to achieve with following formula:
Pel c=(Pel 00+Pel 01+Pel 10+Pel 11)/4,
Pel cA pixel of expression rough layer;
Pel 00, Pel 01Represent the capable 2*m of original reference frame search zone 2*n, a 2*m+1 pixel respectively, Pel 10, Pel 11Represent the capable 2*m of original reference frame search zone 2*n+1, a 2*m+1 pixel respectively, n, m are [0,23] between, thereby obtain the region of search of the rough layer estimation of a 48*48 position, the region of search of this macro block is centered close to current this macro block top left pixel;
Step (2.4), 1st, the 2 two mean filter array data of 64 of data storage array and region of search shift register array outputs behind the current macro mean filter successively respectively, each memory in these two arrays is deposited 8 pixel datas;
Step (3), rough layer exercise estimator are carried out the rough layer estimation according to the following steps, and prediction of output pattern and corresponding sports vector:
The residual computations device input current macro data and the rough layer motion estimation search area data of step (3.1), rough layer data pre-processor 8 2*2 pieces in the rough layer exercise estimator, residual error is represented with SAD;
Step (3.2), described 8 2*2 piece residual computations devices adopt the add tree structure to calculate 4*4,4*8,8*4,8*8,8*16,16*8 and the 16*16 residual error of all 2*2,2*4,4*2,4*4,4*8,8*4 and the 8*8 piece correspondence of totally 7 kinds of predictive mode correspondences in a clock cycle;
Step (3.3), Minimum Residual extent temporarily that obtain before the residual error of the most similar block search device current all sub-pieces under the various predictive modes that each clock comes relatively to obtain from described 8 2*2 piece residual computations devices and the present clock, preserve littler residual error data, if the residual error that present clock obtains is littler, then upgrade the motion vector of least residual correspondence according to current similar position; Circulation execution in step (3.1)~(3.3), search finishes up to the rough layer region of search, obtains least residual and corresponding motion vector thereof;
Step (3.4) after step (3.3) is carried out end, earlier the least residual addition of each self-corresponding each sub-piece under the above-mentioned various predictive modes, obtains the residual sum under every kind of predictive mode respectively; Secondly, 7 residual sums relatively, the predictive mode of selecting residual sum minimum wherein is as optimal mode;
Step (3.5), the optimum prediction mode that the motion vector equalizer obtains according to step (3.4) the motion vector addition of each sub-piece of this pattern correspondence, is got its average, obtains the motion vector of rough layer estimation;
Step (4), with parallel processor executed in parallel detailed level estimation and half pixel interpolation according to the following steps:
Step (4.1), rough layer exercise estimator and reference frame storing device are sent into integer pixel input buffer to the motion vector of rough layer estimation and reference frame data respectively successively;
Step (4.2), described integer pixel input buffer is input to detailed level region of search shift register array to the detailed level region of search data of 24*24, and described shift register array is sent into the detailed level exercise estimator to these detailed level region of search data again, calculates detailed level least residual and corresponding motion vector successively according to the following steps:
Step (4.2.1) is sent into a detailed level residual computations device to described detailed level region of search data, corresponding current macro data, calculates the residual error of 4*4,4*8,8*4,8*8,8*16,16*8 and 16*16 piece:
Described residual computations device is that the sub-residual computations device that 8*8 residual error data constitutes constitutes by 4 elements separately; In this sub-residual computations device, obtain the residual error of a 4*1 piece by the add tree addition with per 4 residual error data of delegation, the residual error addition of 4 4*1 pieces obtains the residual error of a 4*4 piece, in a clock cycle, calculate the residual error of all 4*4 pieces, again by the add tree structure, calculate the residual error of all sub-piece correspondences of current macro, circulation is carried out, until the detailed level region of search all disposes;
Step (4.2.2), the Minimum Residual extent that similarity device comes residual error that the described clock cycle of interim comparison step (4.2.1) obtains to obtain before the clock cycle therewith at each clock, if the residual error that this clock cycle obtains is littler, then upgrade the motion vector of minimum residual error correspondence according to current similar position; Circulation is carried out, and search finishes up to the detailed level region of search, obtains least residual and corresponding motion vector thereof;
Step (4.3), described integer pixel input buffer is sent into half pixel interpolation filter array to the half pel search area data of 30*30 simultaneously, described half pixel interpolation filter adopts 6 tap FIR filters of H.264/AVC standard appointment, described half pixel interpolation filter array uses 4 vertical half pixel interpolation filters with 8 of level half pixel interpolation filter, this 4 levels, half pixel interpolation filter uses 6 pixel interpolations in delegation's integer pixel, these 8 vertical half pixel interpolation filters use six pixel interpolations in a row integer pixel or row half pixel, and interpolation formula is:
Pel h=roun d((Pel 0-5Pel 1+20Pel 2+20Pel 3-5Pel 4+Pel 5)/32),
Wherein, round () expression rounds, and subscript h represents half pixel, and subscript 0~5 expression produces 6 the integer pixels and half pixel of half pixel
Described half pixel interpolation filter array is by half pixel interpolation memory I output integer pixel, simultaneously by A, B, C three and half pixel stores output level respectively, vertical half pixel and be sandwiched in level and vertical two interpolation between the value of half pixel;
Whether step (4.4) finishes with parallel computation end decision device judgement detailed level estimation and half pixel interpolation filtering calculating, as finishing, sends end signal;
Step (5), carry out half picture element movement estimation and the estimation of 1/4 picture element movement according to the following steps successively with half picture element movement estimator and 1/4 picture element movement estimator:
Step (5.1) is carried out half picture element movement with the half picture element movement estimator and is estimated, and exports half pixel least residual and corresponding half picture element movement vector, and its steps in sequence is as follows:
Step (5.1.1), detailed level exercise estimator on-chip memorizer interface input least residual and the corresponding motion vector in the half picture element movement estimator; Described three and half pixel store A, B, C import half pixel A, half pixel B and half pixel C to described on-chip memorizer interface;
Step (5.1.2), described on-chip memorizer interface carries out estimation to A, B, C three classes half pixel with 3 half picture element movement estimators respectively after by 3 half pixel input buffers A, B, C three classes half pixel being sent to three and half pel search regional memories respectively more according to the following steps:
At first, each half picture element movement estimator calculates the residual error of the 4*4 piece and the half pel search zone 4*4 piece of current macro with the residual computations device of a 4*4 piece;
Then, contrast the residual error of similar sub-piece that dissimilar regions of search obtain with a most similar selector, the least residual contrast of selecting the search of wherein minimum residual error and detailed level to obtain, draw similar of littler residual error correspondence the most similar of estimating for half picture element movement, draw its half pixel residual error and corresponding motion vector;
Step (5.2) is carried out 1/4 picture element movement with 1/4 picture element movement estimator and is estimated, obtains the least residual and corresponding 1/4 motion vector of 1/4 pixel according to the following steps:
Step (5.2.1) is all importing 1/4 pixel interpolation memory interface from the current macro of half pixel least residual of half picture element movement estimator output and corresponding motion vector, integer pixel I, half pixel interpolation memory A that half pixel interpolation memory I obtains, half pixel A, B, C and current macro memory that B, C obtain;
Step (5.2.2), 1/4 pixel interpolation calculator array from one 1/4 pixel that will calculate of described 1/4 pixel interpolation memory interface input adjacent two and half pixels or one and half pixels and an integer pixel average and obtain this 1/4 pixel, can obtain 12 kind of 1/4 pixel, but only calculate 8 kind of 1/4 pixel around the pixel of the most similar of half pixel that described half picture element movement estimator obtains;
Step (5.2.3), described 1/4 pixel interpolation calculator array deposits all 1/4 pixels of gained in 8 1/4 pel search regional memories respectively, and sends into 8 1/4 picture element movement estimators respectively;
Step (5.2.4), 8 1/4 picture element movement estimators obtain 1/4 pixel block and the residual error between the 4*4 piece of the current macro of internal memory in advance from described 8 1/4 pel search regional memories respectively with the residual computations device of a 4*4 piece respectively;
Step (5.2.5), the most similar selector is selected described in the step (5.2.4) similar of best 1/4 pixel of conduct of residual error minimum in 8 kinds of regions of search, and calculate corresponding motion vector, this residual error is compared with the least residual that half pel search obtains, similar conduct selecting minimum residual error correspondence be the best similar of final selection that generates of exercise estimator H.264/AVC again;
Step (6) is calculated the H.264/AVC final motion vector MV of exercise estimator generation with a final motion vector calculator f, and deposit the motion estimation result memory in:
MV f=8MV c+4MV p+2MV h+MV q
Wherein, the motion vector of c, p, h, q correspondence is represented the motion vector that rough layer, detailed level, half pixel and 1/4 picture element movement estimator obtain respectively.
The H.264/AVC method for estimating that the present invention proposes can improve the computational speed of estimation greatly.Compare with the global search method, when adopting identical region of search, the H.264/AVC method for estimating that the present invention proposes can save for 64.5% computing time.The computing time of each computing unit of H.264/AVC movement estimation system and being compared as follows shown in the table of global search method that the present invention proposes.
The integer picture element movement is estimated Half picture element movement is estimated 1/4 picture element movement is estimated Amount to
Global search 2480 484 168 3132
New method 744 236 168 1148
Relatively 30.0% 48.8% 100% 35.5%
Integer data in the table represents that two kinds of methods finish the used clock number of identical calculations, and percentage represents that the method for estimating that the present invention proposes finishes the percentage that the used clock number of identical calculations accounts for the global search method.
Description of drawings
Accompanying drawing in this specification only provides for illustrated purpose, content of the present invention is not produced any restriction, wherein:
Fig. 1 shows the flow chart of classical global search exercise estimator;
Fig. 2 a shows the fundamental diagram of the exercise estimator of the method for the invention;
Fig. 2 b shows the workflow diagram of the method for estimating of the present invention's proposition;
Fig. 2 c shows the state transition diagram of realizing exercise estimator system works of the present invention;
Fig. 3 shows the structured flowchart of the H.264/AVC exercise estimator system of the present invention's proposition;
Fig. 4 a shows the motion vector prediction method that the present invention adopts;
Fig. 4 b shows the simple block diagram of rough layer data pre-processor among the present invention;
Fig. 4 c shows the method for obtaining rough layer integer picture element movement estimating searching zone among the present invention;
Fig. 5 a shows the simple block diagram of rough layer integer picture element movement estimator among the present invention;
Fig. 5 b shows the hardware configuration of the SAD calculator of rough layer exercise estimator among the present invention;
Fig. 5 c shows the hardware configuration that calculates a 2*2 piece SAD in the SAD calculator of rough layer exercise estimator among the present invention;
Fig. 6 a shows the relation of the zone of detailed level integer picture element movement estimating searching among the present invention and the required integer pixel of half pixel interpolation;
Fig. 6 b shows the flow process that half pixel interpolation and detailed level integer picture element movement among the present invention are estimated parallel processing;
Fig. 6 c shows the simple block diagram of parallel processor among the present invention;
Fig. 6 d shows the interpolation filter array structure that the present invention adopts;
Fig. 6 e shows the storage policy of detailed level region of search data among the present invention;
Fig. 6 f shows detailed level integer picture element movement estimator simple block diagram among the present invention;
Fig. 6 g shows the hardware configuration of the SAD that calculates 8*8,8*4,4*8 and 4*4 piece in the SAD calculator;
Fig. 6 h shows the sequential schematic diagram of parallel computation process among the present invention;
Fig. 7 a shows the position relation of integer pixel and half pixel, 1/4 pixel;
Fig. 7 b shows the simple block diagram of half picture element movement estimator among the present invention;
Fig. 7 c shows the simple block diagram of 1/4 picture element movement estimator among the present invention;
Fig. 7 d shows the corresponding relation that is produced 1/4 pixel among the present invention by integer pixel and half pixel.
Embodiment
Distinguishing feature of the present invention is, data dependence of estimating according to integer estimation and fraction movement in the operation principle of estimation H.264/AVC, the motion estimation process and video communication class are used the requirement to the compressed image quality, have proposed the method for estimating of part parallel.The method for estimating that the present invention proposes makes integer estimation and fraction movement estimate partly executed in parallel, thereby reaches the purpose that improves estimation speed.
Another characteristics of the present invention are, a kind of motion vector prediction value generation method and rational hunting zone have been adopted, make exercise estimator system of the present invention adjacent macroblocks when the integer estimation can share the region of search data of reference frame, each sub-piece of current macro can be shared the region of search data of reference frame when half picture element movement is estimated, thereby reaches the purpose that reduces memory accesses.
The 3rd characteristics of the present invention are, have adopted a kind of reusable hardware configuration, and this structure is used to calculate SAD when half pixel and the estimation of 1/4 picture element movement, thereby has reduced the hardware resource expense that half picture element movement is estimated and 1/4 picture element movement is estimated.
The 4th characteristics of the present invention are to have proposed a kind of digital circuitry, and this system has realized parallelization method for estimating proposed by the invention.This system can be from present frame and reference frame storing device reading of data, finish estimation fast.
To be elaborated to the specific embodiment of the present invention with reference to the accompanying drawings below.
Fig. 1 shows the flow chart of classical global search exercise estimator.H.264/AVC estimation can be divided into two stages in the standard: the integer picture element movement is estimated and the mark picture element movement is estimated.Its mid-score picture element movement estimates to comprise again that half picture element movement is estimated and 1/4 picture element movement is estimated.Other estimation of different accuracy level comprises two links of data preliminary treatment and the most similar block search again.The function of data preliminary treatment link is: according to macro block or the position of sub-piece and the search precision of estimation when pre-treatment, produce corresponding region of search view data, these data may be to read from the external memory storage of depositing reference frame, also may need from the reference frame storing device, to obtain by interpolation calculation after the reading of data, but all must before the most similar block search link, obtain.The function of the most similar block search link is: in the region of search that data preliminary treatment link produces the most similar of search current macro or sub-piece.The result of the estimation of more coarse precision need be used in the region of search of calculating current precision estimation, because the hunting zone that the integer picture element movement is estimated is bigger, the motion vector of each the sub-piece of current macro that obtains at last also relatively disperses, and therefore can not estimate the position in current macro mark picture element movement estimating searching zone in advance.Under the restriction of this data dependence, the global search estimation must be carried out in proper order.Therefore the flow chart (Fig. 1) of global search method for estimating also is the state transition diagram that this method is carried out.
The people of familiar H.264 video compression coding standard is clear, if the region of search of integer estimation is smaller, the fraction movement that carries out is so on its basis estimated also to be limited in the limited scope, so just can calculate half interior pixel data of this scope in advance to produce half pel search zone, thereby after paying the cost that increases a little amount of calculation and memory space, improve the concurrency and the computational speed of whole estimation.And in video communication etc. was used, the movement velocity of the video image of processing was also unhappy, and excursion is also little, even most of image (background) all is fully constant, so that the hunting zone of estimation does not need is very big.The present invention just is being based on above-mentioned consideration, has proposed a kind of H.264/AVC method for estimating of part parallel.Fig. 2 a shows the fundamental diagram of the method for estimating of the present invention's proposition.The method that the present invention proposes has adopted the strategy of hierarchical search when the integer picture element movement is estimated: be divided into the rough layer estimation and the detailed level estimation is two-layer.Like this, after the rough layer estimation finished, the region of search of detailed level estimation can be little a lot; At this moment, directly calculate detailed level motion estimation search zone and half pixel on every side thereof, the half pel search zone of calculating each sub-piece more respectively according to the motion vector of each height piece after finish in the detailed level estimation will be saved the plenty of time, therefore is easier to satisfy real-time.In addition, detailed level search and half pixel interpolation calculate the raw image data that all needs to read in reference frame, and most of view data of both needs is identical.Based on above-mentioned consideration, the method that the present invention proposes is carried out detailed level estimation and half pixel interpolation simultaneously and is calculated after the rough layer search.Be that detailed level estimation and half pixel interpolation calculated executed in parallel (being the parallel organization shown in Fig. 2 a) after rough layer integer estimation finished, just begin half picture element movement after both all finish and estimate.After half picture element movement is estimated to finish, carry out 1/4 pixel interpolation successively and calculate and estimation.At last, final motion vector and the corresponding residual error data of motion vector calculation that obtains according to the estimation of each precision.
Fig. 2 b shows the flow chart of the method for estimating realization of the present invention's proposition, and the branch following steps are carried out successively:
1. according to the parameter of the characteristics of image initialization movement estimation system of handling, comprise the wide height of initialization frame figure, each module status of initialization movement estimation system etc.;
2. import current macro data and corresponding rough layer motion estimation search area data thereof, and calculate rough layer data, i.e. rough layer pre-processing image data by mean filter;
3. the rough layer estimation obtains the predictive mode of current macro and the motion vector of 4 pixel accuracies, and this vector is the average of each sub-piece motion vector in the current macro;
4. executed in parallel detailed level estimation and half pixel interpolation calculate, and obtain the motion vector and the half pel search area data of integer pixel accuracy;
5. half picture element movement is estimated, obtains the motion vector of half-pixel accuracy;
6.1/4 pixel interpolation calculates and estimation, obtains the motion vector of 1/4 pixel accuracy;
7. calculate the motion vector and the corresponding residual error data of 1/4 the heaviest pixel accuracy according to the motion estimation result of above-mentioned each precision;
8. deposit the result who obtains in 7 in memory.
Wherein, executed in parallel module implementation is as described below:
When A. importing reference frame integer pixel data, carry out half pixel interpolation and calculate.
B. judge simultaneously whether the integer pixel is the region of search data of detailed level estimation, if not, then do not operate; If, then deposit data in shift register, after all region of search data are all imported, beginning detailed level estimation.
C. when detailed level estimation and half pixel interpolation calculate all finishes after, whole parallel organization is operated and is finished.
Shown in Fig. 2 a, H.264/AVC encoder system sees it is that order is carried out on the whole, " but the estimation of detailed level integer picture element movement " unit and " half pixel interpolation " unit are parallel on the basis of " input of integer pixel " unit, so, these three modules are merged to a state, can obtain the state transition diagram of the exercise estimator system hardware realization of the present invention's proposition, shown in Fig. 2 c.Among the figure, whole exercise estimator zero clearing was not worked when ME_reset was high; When ME_reset was low, if Int_Pred_en is high, the presentation code device had been selected inter-frame forecast mode.As CMB_start when being high, show and to handle current macro that exercise estimator changes operating state over to from idle condition.Data_Pretreat_fin can begin to search for the most similar of rough layer for the data that high expression rough layer estimation needs have been ready to.When Int_4_Pred_fin was high, expression rough layer estimation finished, and begins to carry out parallel computation; After having finished the input of integer pixel, the calculating of half pixel interpolation and detailed level estimation, Mixed_Operation_fin is high, and exercise estimator begins to carry out the mark picture element movement and estimates.HME_fin represents finishing of half picture element movement estimation when being high; Quat_ME_fin represents finishing of estimation of 1/4 picture element movement and final motion vector calculation when being high.When 1/4 picture element movement is estimated to finish,, then continue next macro block is carried out estimation if Int_Pred_en and CMB_start are high; Otherwise whole exercise estimator changes idle condition over to.
Fig. 3 shows the structured flowchart of the H.264/AVC exercise estimator system of the present invention's proposition.Wherein, rough layer data pre-processor, rough layer exercise estimator, parallel organization integer pixel data input structure, detailed level exercise estimator, half pixel interpolation computing array, half picture element movement estimator, 1/4 picture element movement estimator and final motion vector calculator are the data processing unit of exercise estimator system H.264/AVC; Present frame/reference frame storing device, current macro memory, rough layer preliminary treatment result memory, detailed level motion estimation search regional memory, half pixel interpolation memory, half picture element movement estimating searching regional memory and motion estimation result memory are the data storage cells of exercise estimator system H.264/AVC.The particular hardware of above-mentioned data processing and memory cell realizes details are as follows:
1. rough layer data pre-processor
In video compression technology, usually the motion vector of the motion vector prediction current macro of the adjacent block by current macro, be the center, region of search with this motion vector prediction value then, the view data of region of search is carried out best similar search in the input reference frame.This method is carried out motion vector prediction at all macro blocks of present frame, has increased amount of calculation; Simultaneously, unpredictable because the center, region of search of all macro blocks of present frame is all inequality and distribution does not have rule, so must import the region of search data of each macro block respectively, increased visit capacity to memory.The present invention adopts (0 when rough layer integer estimation, 0) as the motion vector prediction value of each sub-piece of current macro, promptly with the coordinate (position of current macro top left pixel in present frame) of current macro as the center, region of search of this macro block, overcome the deficiency of above-mentioned conventional motion method for vector prediction.The method that the present invention adopts makes that the region of search data of each height piece can be read in together in the same macro block, and the part that overlaps need not repeat input; And most of region of search data of adjacent macroblocks can share, and have reduced the visit to memory.Fig. 4 a shows the motion vector prediction method that the present invention adopts.With the region of search is that 48*48 is an example.Shown in Fig. 4 a, there are 9 MB to constitute the region of search of reference frame when handling macro block 1 (MB1), wherein 6 can be used when handling MB2 after the process simple shift, and this will save 2/3 number of memory accesses.
Fig. 4 b shows the simple block diagram of rough layer data pre-processor.This preprocessor comprises: two data input buffers, two mean filter arrays, 2 region of search shift register array A, B, data storage behind the current macro mean filter.Wherein, 32bit current macro data and 32bit region of search data read from current frame memory and reference frame storing device respectively, and two data input buffers are merged into 1 128bit data with 4 32bit data of importing continuously and outputed to the mean filter array; The mean filter array is used to obtain rough layer integer motion estimation search area data and rough layer current macro data, is output as 64bit; Current macro after the compression leaves in the data storage behind the current macro mean filter, and rough layer region of search data are divided into 2 and leave among region of search shift register array A and the B.Region of search shift register array A and B are respectively the shift register arrays of 16 row, 24 row and 8 row, 24 row, each register can be deposited a 8bit pixel data, data storage behind the current macro mean filter is the register array of a 8*8, each register can be deposited a 8bit pixel data, and can be by direct read, above-mentioned memory has constituted the rough layer preliminary treatment result memory shown in Fig. 3.
The method of obtaining rough layer integer motion estimation search zone among the present invention has been shown among Fig. 4 c, and 4 of each 2*2 piece original pixel mean filters obtain a rough layer region of search pixel in the reference frame.In the present invention, the region of search of rough layer estimation is that the computing formula of the piece mean filter of 48*48 is
Pel c=(Pel 00+Pel 01+Pel 10+Pel 11)/4
Wherein subscript c represents to generate the subscript of rough layer pixel, the subscript of the capable 2*m of 00 and 01 expression original reference frame search zone 2*n, a 2*m+1 pixel, the subscript (n, m are between [0,23]) of the capable 2*m of 10 and 11 expression original reference frame search zone 2*n+1, a 2*m+1 pixel.
2. rough layer exercise estimator
As indicated above, H.264 in the estimation of video compression standard, there are 7 kinds of predictive modes, the cost of wherein encoding is minimum is referred to as optimum prediction mode.Usually, when estimation finishes, determine optimum prediction mode by the coding cost, the coding cost function is
J mode(Mode/λ mode)=SAD(Mode)+λ mode*R(Mode)
Wherein SAD (Mode) and R (Mode) are respectively the coding cost of image residual error data and other information (comprising motion vector), λ ModeIt is Lagrange multiplier.The people who is familiar with video compression technology knows that the coding cost of residual error data plays a leading role in whole coding cost, therefore ignore the coding cost of other information usually.If but 1/4 picture element movement is estimated to finish to select optimum prediction mode according to SAD (Mode) more by the time, amount of calculation is very big, therefore in the present invention, to determine the optimum prediction mode of estimation after rough layer integer estimation finishes, this will reduce the H.264/AVC amount of calculation of estimation greatly.Simultaneously, the rough layer estimation also obtains the motion vector of each sub-piece in the current macro.When the detailed level estimation, if with these motion vectors is the center, region of search, then the region of search data of each sub-piece all import respectively, in fact the sub-block search area data of Shu Ru each much all is repetition, this has increased the visit capacity to the reference frame storing device.Therefore, in the present invention the motion vector of each sub-piece of current macro is averaged as the center, region of search of whole macro block detailed level integer estimation, the region of search of each height piece can be read in together when this made the detailed level estimation, thereby had saved the time of visit reference frame storing device.
Fig. 5 a shows the simple block diagram of rough layer exercise estimator among the present invention.This exercise estimator input is output as the average and the predictive mode of each sub-piece motion vector of current macro through pretreated current macro and region of search data.This exercise estimator comprises: a SAD calculator, the most similar block search device, a predictive mode selector and a motion vector equalizer that calculates each sub-piece motion vector average of current macro.
Fig. 5 b shows the hardware configuration of rough layer estimation SAD calculator among the present invention.SAD_n_m_l represents the SAD that calculates among the figure, and wherein n and m represent the columns and the line number of the pixel block of SAD correspondence respectively, the sequence number of SAD under the same line number columns of 1 expression.Before the rough layer estimation, each circle (being called PE) has all been deposited a rough layer current macro pixel among the figure; After the rough layer estimation began, all there were the region of search pixel input that 8 palpuses of delegation handle and 8 PE that are passed to respective column each clock cycle, calculated absolute value poor of current macro pixel and region of search pixel; Encirclement is an add tree with the dotted rectangle of 2 PE of delegation, obtains the SAD of 2*1 piece; Result of calculation is sent in the little rectangle on the right, and this rectangle is used for the 2*1SAD of one's own profession 2*1SAD and its lastrow adduction preservation result of calculation mutually, also can play time-delay, synchronous effect simultaneously; Per 2 2*1SAD additions obtain the SAD of a 2*2 piece.Fig. 5 c shows the hardware configuration that calculates a 2*2 piece SAD in the SAD calculator of rough layer exercise estimator among the present invention, wherein: Pel cAnd Pel rThe pixel of representing current macro and region of search respectively, they subtract each other in a PE and ask absolute value; The result of calculation addition of two PE has obtained the SAD of a 2*1, and is stored in the memory on right side; 2*1SAD addition in adder in two memories obtains the SAD of a 2*2.By this structure, in one-period, can calculate the SAD of all 2*2 pieces; Add the add tree of PE array below, can calculate the SAD of all 2*2,2*4,4*2,4*4,4*8,8*4 and the 8*8 piece correspondence of rough layer current macro; When next clock arrives, import new delegation's region of search data, continue to repeat said process ...As seen, this hardware configuration can calculate the SAD of all 2*2,2*4,4*2,4*4,4*8,8*4 and the 8*8 piece correspondence of 7 kinds of predictive mode correspondences such as 4*4,4*8,8*4,8*8,8*16,16*8 and 16*16 in a clock.
The most similar block search device is finished following function: the size of minimum SAD before each clock compares the SAD of all sub-pieces that present clock obtains and present clock temporarily, preserve littler SAD data in the register of depositing minimum SAD; If the SAD that obtains of present clock is littler, then upgrade the motion vector of minimum SAD correspondence according to current similar position.Search finishes up to the rough layer region of search in aforesaid operations circulation execution, obtains minimum SAD and corresponding motion vector thereof this moment.
The predictive mode selector begins to carry out after the most similar block search device quits work, it is at first the minimum SAD addition respectively of each sub-piece of 7 kinds of pattern correspondences such as 4*4,4*8,8*4,8*8,8*16,16*8 and 16*16, obtain 7 kinds of SAD sums under the pattern respectively, compare 7 SAD sums then, the predictive mode of selecting SAD sum minimum is as optimum prediction mode.The optimum prediction mode that the motion vector equalizer is selected according to the predictive mode selector with the motion vector addition of each sub-piece of this pattern correspondence, is got average then, obtains the motion vector of rough layer estimation.
3. the parallel processor of detailed level estimation and half pixel interpolation
Fig. 6 a shows the relation of detailed level motion estimation search zone and the required integer pixel of half pixel interpolation.Stain is represented the piece of 16*16 of detailed level motion estimation search regional center of current macro correspondence and the detailed level region of search that ash point constitutes 24*24 together, and the most similar of each sub-piece of current macro is inevitable in this region of search.Because the motion vector decision that the region of search that half picture element movement is estimated is obtained by detailed level integer estimation, and the 6 tap finite response digital filters (FIR filter) that calculate half pixel need each 3 integer pixel of half pixel both sides, so calculate the required integer pixel of half pel search area image data for being that the size at center is the pixel block (being made of black, ash, white three kinds of picture elements) of 30*30 with detailed level motion estimation search zone.Detailed level motion estimation search zone has a large portion to overlap with the required integer pixel of half pixel interpolation, and this is the important prerequisite that the present invention proposes the parallel organization method for estimating.
Fig. 6 b shows the parallel processing flow process of half pixel interpolation and detailed level estimation among the present invention.After the rough layer estimation finishes, carry out half pixel interpolation according to the reference frame initial data of the motion vector input 30*30 that obtains, the region of search data of detailed level estimation are preserved simultaneously.The region of search data and half pixel interpolation of input detailed level estimation calculate parallel carrying out; After detailed level motion estimation search area data was all imported, detailed level estimation and half pixel interpolation calculated parallel carrying out; After judging module determines that the both finishes, begin to carry out half picture element movement and estimate.And according to video compression standard H.264, half picture element movement estimation and 1/4 picture element movement estimate to use half pixel block and the integer pixel block of 26*26, therefore, 26,*26 half pixel and corresponding integer pixel that the calculating of half pixel interpolation is produced leave in the half pixel interpolation memory, so that use these data in half picture element movement estimation and the estimation of 1/4 picture element movement.
Fig. 6 c shows the simple block diagram of parallel processor among the present invention.This parallel processor comprises: an integer pixel input buffer, one and half pixel interpolation filter arrays, half pixel data memory I, A, B, C, a detailed level motion estimation search regional memory, a detailed level exercise estimator and a parallel computation finish decision device.Wherein, integer pixel input buffer is converted to the 240bit pixel data with the 32bit pixel data of importing and outputs to half pixel interpolation filter array and detailed level motion estimation search regional memory.
Fig. 6 d shows the interpolation filter array structure that the present invention adopts.This structure has been used 4 horizontal filters and 8 vertical filter, and horizontal filter uses 6 pixel interpolations in delegation's integer pixel, and vertical filter is used 6 pixel interpolations in the row integer pixel.Drawn circle, triangle, five-pointed star is half different pixels of three classes among the figure, deposits C, B, A three and half pixel data memories respectively in; The integer pixel of square representative then deposits half pixel data memory I in, and promptly the pixel that frame of broken lines is surrounded among the figure all deposits half pixel data memory in.The present invention has adopted 6 of H.264/AVC standard code to take out the FIR filter, and interpolation formula is
Pel h=round((Pel 0-5Pel 1+20Pel 2+20Pel 3-5Pel 4+Pel 5)/32)
Wherein, function round () expression rounds operation, and subscript h represents half pixel, and subscript 0~5 expression produces 6 the integer pixels or half pixel of half pixel.
Fig. 6 e shows the storage policy of detailed level region of search data among the present invention.The region of search of detailed level estimation is the piece of 24*24, data are divided into 3 equal pieces (8*24) Top, Middle and Bottom, leave among 4 shift register array T, M1, M2 and the B, wherein the Middle piece is left among M1 and the M2 simultaneously, and T, M1, M2 and B have then constituted the detailed level motion estimation search regional memory shown in Fig. 3 and Fig. 6 c.
Fig. 6 f shows the simple block diagram of detailed level exercise estimator among the present invention, and the detailed level exercise estimator comprises among the present invention: a SAD calculator and a block search device the most similar.Fig. 6 g shows the hardware configuration of the SAD that calculates 8*8,8*4,4*8 and 4*4 piece in the SAD calculator, SAD_n_m_l represents the SAD that calculates among the figure, wherein n and m represent the columns and the line number of the pixel block of SAD correspondence respectively, the sequence number of SAD under the same line number columns of 1 expression.Before the detailed level estimation, each circle (being called PE) has all been deposited a current macro pixel among the figure; After the detailed level estimation began, all there was the input of delegation 8 regions of search pixel each clock cycle and is passed to 8 PE of same row, calculated absolute value poor of current macro pixel and region of search pixel; Encirclement is an add tree with the dotted rectangle of 4 PE of delegation, obtains the SAD of 4*1 piece; Result of calculation is sent in the little rectangle on the right, and this rectangle is used for the 4*1SAD of one's own profession 4*1SAD and its lastrow adduction preservation result of calculation mutually, also can play time-delay, synchronous effect simultaneously; Per 4 4*1SAD additions obtain the SAD of a 4*4 piece.By this structure, in one-period, can calculate the SAD of all 4*4 pieces; Add suitable add tree, can calculate the SAD of all sub-piece correspondences of current macro; When next clock arrives, import new delegation's region of search data, continue to repeat said process ...As seen, this hardware configuration can produce the SAD of each sub-piece of optimum prediction mode correspondence in conjunction with the above-mentioned shift register array strategy that the present invention adopts in a clock.And whole SAD calculator is made of structure shown in 4 Fig. 6 g and more add tree.
The most similar block search device is finished following function: the size of minimum SAD before each clock compares SAD that present clock obtains and present clock temporarily, preserve littler SAD data in the register of depositing minimum SAD; If the SAD that obtains of present clock is littler, then upgrade the motion vector of minimum SAD correspondence according to current similar position.Search finishes up to the rough layer region of search in aforesaid operations circulation execution, obtains minimum SAD and corresponding motion vector thereof this moment.
Fig. 6 h shows the sequential schematic diagram of parallel processor work among the present invention.Mixed_en, HDP_en, SA_data_in_en and IME_en represent the enable signal of parallel work-flow, half pixel interpolation, the input of detailed level motion estimation search zone and 4 modules of detailed level integer picture element movement estimation respectively, on these signals, also identified simultaneously and enabled the effective time, be i.e. CLK0 shown in the figure, CLK216, CLK270 and CLK345.Data_in_row_cnt is the counter output of input integer number of rows of picture elements, and the integer pixel of parallel processor input is the piece of 30*30, so have 30 row, counter is output as 0~29.SA_top_store_en, SA_middle_store_en and SA_bottom_store_en are the enable signal of writing of 4 shift register array groups of the region of search data of storage detailed level integer estimation.Owing to have two M1 and M2 to deposit same data in 4 shift register arrays, control with same control signal SA_middle_store_en.
4. half picture element movement is estimated and the estimation of 1/4 picture element movement
Fig. 7 a shows the position relation of integer pixel (black color dots) and half pixel (ABC), 1/4 pixel (1~12).In the present invention, half pixel of diverse location constitutes different regions of search respectively with 1/4 pixel, half picture element movement estimator and 1/4 picture element movement estimator are searched for respectively in these regions of search, seek the most similar, the most similar SAD by more different regions of search determines best similar and corresponding motion vector then.In addition, because in the data input that mainly spends in half pixel and 1/4 pel search zone computing time that half picture element movement is estimated and 1/4 picture element movement is estimated, seek the most similar then simple relatively, so the present invention has adopted a kind of reusable hardware configuration of saving resource to realize that half picture element movement is estimated and the estimation of 1/4 picture element movement.In the H.264/AVC exercise estimator that the present invention proposes, no matter the predictive mode of selecting after the rough layer estimation is any, when half pixel and the estimation of 1/4 picture element movement, be unit all, calculate the similarity of each sub-piece in current sub-block and half pixel or the 1/4 pel search zone with 4*4 piece.Circulate and finish the most similar block search of all 4*4 pieces of current macro for 16 times.Again according to the predictive mode of selecting for use, calculate the most similar and corresponding motion vector of corresponding sub-piece at last.Shown in Fig. 7 a, need half pixel of search that three kinds of A, B, C are arranged, and 1/4 pixel that need search for have 8 kinds in 1~12, i.e. 8 1/4 pixels around half pixel or the integer pixel among Fig. 7 a.Adopt hardware configuration of the present invention, can save nearly 90% hardware resource than all sub-pieces of while parallel computation current macro to similar method of region of search, then do not increase computing time.
Fig. 7 b shows the simple block diagram of half picture element movement estimator among the present invention.This exercise estimator is input as A, B, C three classes half pel search area data, integer picture element movement vector sum SAD, is output as half picture element movement vector sum SAD.This circuit that the present invention adopts comprises: one and half pixel interpolation memory interfaces, and three memories that are used for depositing A, B, C three classes half pel search zone, three are used for carrying out half picture element movement estimator and a most similar selector.Wherein, half pixel interpolation memory interface is used for producing address signal and the read data enable signal of half pixel interpolation memory A, B, C, comprises that simultaneously three and half pixel input buffers are used for keeping in A, B, C three classes half pixel.A half picture element movement estimator comprises a 4*4 piece SAD calculator, is used for calculating the SAD of current 4*4 piece and region of search 4*4 piece.The most similar selector finished following function: contrast the SAD of similar sub-piece that dissimilar regions of search obtain; The minimum SAD contrast of selecting minimum SAD and detailed level search to obtain, similar of littler SAD correspondence for the half picture element movement estimation the most similar.
Fig. 7 c shows the simple block diagram of 1/4 picture element movement estimator among the present invention.This exercise estimator is input as SAD and the motion vector that half picture element movement is estimated, and A, B, C three classes, half pixel and integer pixel I, is output as 1/4 picture element movement vector sum SAD.This circuit that the present invention adopts comprises: one and half pixel interpolation memory interfaces, one 1/4 pixel interpolation computing array, 8 1/4 pel search regional data store, 8 1/4 picture element movement estimators and a most similar selector that comprise a 4*4 piece SAD calculator respectively.Wherein, 1/4 pixel is averaged by two and half adjacent pixels or integer pixel and is obtained, and Fig. 7 d shows the corresponding relation that is generated 1/4 pixel among the present invention by integer pixel and half pixel.In 12 kind of 1/4 pixel shown in Fig. 7 a, 8 kind of 1/4 pixel around the pixel that half pixel that selection half picture element movement estimator obtains is the most similar deposits 8 region of search memories respectively in, and send into 8 1/4 picture element movement estimators, calculate the SAD of this 8 kind of 1/4 pixel block and current 4*4 piece respectively by the 4*4 piece SAD calculator in these 8 1/4 picture element movement estimators.The most similar selector finished following function: select similar of best 1/4 pixel of conduct of SAD minimum in 8 kinds of regions of search, and calculate corresponding motion vector; With the SAD contrast in this SAD and half pel search zone, similar conduct of the SAD correspondence that selection is minimum be final best similar of selecting of exercise estimator system H.264/AVC.
5. final motion vector calculator
H.264/AVC the formula of the final motion vector of exercise estimator system-computed is as follows:
MV f=8MV c+4MV p+2MV h+MV q
Wherein, the motion vector of subscript f, c, p, h, q correspondence is represented the final motion vector of whole exercise estimator and the motion vector that rough layer, detailed level, half pixel and 1/4 picture element movement estimator obtain respectively.
6.H.264/AVC the memory of exercise estimator system
Present frame/reference frame storing device is realized by SRAM, must be able to deposit CIF format-pattern (352*288 pixel), the view data that address signal that produces by system and read signal therefrom read present frame or reference frame.The current macro memory is the register array of 16*16, and each register cell can be deposited the 8bit pixel data, and can be read and write separately.The motion estimation result memory is used for depositing the optimum prediction mode and the corresponding final motion vector and the residual error data thereof of H.264/AVC exercise estimator system generation.
Although the present invention describes with reference to some preferred embodiment example, should remember sincerely that scope of the present invention is not limited in these concrete execution modes.All drop within design philosophy of the present invention and the scope for modification and variations of the present invention that the present invention did, covering scope of the present invention defines in following claim.

Claims (1)

1. be used for the H.264/AVC method for estimating of encoder, it is characterized in that, this method realizes according to following steps in a digital integrated circuit chip:
Step (1) is moving image unit input current frame memory with the frame, and this current frame memory is that unit is current macro input current macro memory with the macro block; The video image of video encoding standard is unit input reference frame storing device with the frame meeting H.264/AVC;
Step (2), rough layer data pre-processor are carried out the preliminary treatment of rough layer data according to the following steps:
Step (2.1), the 1st data input buffer reads 32 current macro data from the current macro memory; The 2nd data input buffer reads 32 region of search data from the reference frame storing device;
Step (2.2), two data input buffers described in the step (2.1) are merged into one 128 bit data output to 4 32 bit data of importing continuously separately: the data of the 1st data input buffer output are delivered to the 1st mean filter array, and the data of the 2nd data input buffer output are delivered to the 2nd mean filter array;
Step (2.3), two mean filter arrays described in the step (2.2) obtain rough layer current macro data and rough layer integer motion estimation search area data simultaneously, and the each output of above-mentioned rough layer current macro data of obtaining and rough layer integer motion estimation search area data is 64;
Wherein, each pixel in the rough layer integer motion estimation search zone is tried to achieve with following formula:
Pel c=(Pel 00+Pel 01+Pel 10+Pel 11)/4,
Pel cA pixel of expression rough layer;
Pel 00, Pel 01Represent the capable 2*m of original reference frame search zone 2*n, a 2*m+1 pixel respectively, Pel 10, Pel 11Represent the capable 2*m of original reference frame search zone 2*n+1, a 2*m+1 pixel respectively, n, m are [0,23] between, thereby obtain the region of search of the rough layer estimation of a 48*48 position, the region of search of this macro block is centered close to current this macro block top left pixel;
Step (2.4), 1st, the 2 two mean filter array data of 64 of data storage array and region of search shift register array outputs behind the current macro mean filter successively respectively, each memory in these two arrays is deposited 8 pixel datas;
Step (3), rough layer exercise estimator are carried out the rough layer estimation according to the following steps, and prediction of output pattern and corresponding sports vector:
The residual computations device input current macro data and the rough layer motion estimation search area data of step (3.1), rough layer data pre-processor 8 2*2 pieces in the rough layer exercise estimator, residual error is represented with SAD;
Step (3.2), described 8 2*2 piece residual computations devices adopt the add tree structure to calculate 4*4,4*8,8*4,8*8,8*16,16*8 and the 16*16 residual error of all 2*2,2*4,4*2,4*4,4*8,8*4 and the 8*8 piece correspondence of totally 7 kinds of predictive mode correspondences in a clock cycle;
Step (3.3), Minimum Residual extent temporarily that obtain before the residual error of the most similar block search device current all sub-pieces under the various predictive modes that each clock comes relatively to obtain from described 8 2*2 piece residual computations devices and the present clock, preserve littler residual error data, if the residual error that present clock obtains is littler, then upgrade the motion vector of least residual correspondence according to current similar position; Circulation execution in step (3.1)~(3.3), search finishes up to the rough layer region of search, obtains least residual and corresponding motion vector thereof;
Step (3.4) after step (3.3) is carried out end, earlier the least residual addition of each self-corresponding each sub-piece under the above-mentioned various predictive modes, obtains the residual sum under every kind of predictive mode respectively; Secondly, 7 residual sums relatively, the predictive mode of selecting residual sum minimum wherein is as optimal mode;
Step (3.5), the optimum prediction mode that the motion vector equalizer obtains according to step (3.4) the motion vector addition of each sub-piece of this pattern correspondence, is got its average, obtains the motion vector of rough layer estimation;
Step (4), with parallel processor executed in parallel detailed level estimation and half pixel interpolation according to the following steps:
Step (4.1), rough layer exercise estimator and reference frame storing device are sent into integer pixel input buffer to the motion vector of rough layer estimation and reference frame data respectively successively;
Step (4.2), described integer pixel input buffer is input to detailed level region of search shift register array to the detailed level region of search data of 24*24, and described shift register array is sent into the detailed level exercise estimator to these detailed level region of search data again, calculates detailed level least residual and corresponding motion vector successively according to the following steps:
Step (4.2.1) is sent into a detailed level residual computations device to described detailed level region of search data, corresponding current macro data, calculates the residual error of 4*4,4*8,8*4,8*8,8*16,16*8 and 16*16 piece:
Described residual computations device is that the sub-residual computations device that 8*8 residual error data constitutes constitutes by 4 elements separately; In this sub-residual computations device, obtain the residual error of a 4*1 piece by the add tree addition with per 4 residual error data of delegation, the residual error addition of 4 4*1 pieces obtains the residual error of a 4*4 piece, in a clock cycle, calculate the residual error of all 4*4 pieces, again by the add tree structure, calculate the residual error of all sub-piece correspondences of current macro, circulation is carried out, until the detailed level region of search all disposes;
Step (4.2.2), the Minimum Residual extent that similarity device comes residual error that the described clock cycle of interim comparison step (4.2.1) obtains to obtain before the clock cycle therewith at each clock, if the residual error that this clock cycle obtains is littler, then upgrade the motion vector of minimum residual error correspondence according to current similar position; Circulation is carried out, and search finishes up to the detailed level region of search, obtains least residual and corresponding motion vector thereof;
Step (4.3), described integer pixel input buffer is sent into half pixel interpolation filter array to the half pel search area data of 30*30 simultaneously, described half pixel interpolation filter adopts 6 tap FIR filters of H.264/AVC standard appointment, described half pixel interpolation filter array uses 4 vertical half pixel interpolation filters with 8 of level half pixel interpolation filter, this 4 levels, half pixel interpolation filter uses 6 pixel interpolations in delegation's integer pixel, these 8 vertical half pixel interpolation filters use six pixel interpolations in a row integer pixel or row half pixel, and interpolation formula is:
Pel h=round((Pel 0-5Pel 1+20Pel 2+20Pel 3-5Pel 4+Pel 5)/32),
Wherein, round () expression rounds, and subscript h represents half pixel, and subscript 0~5 expression produces 6 the integer pixels and half pixel of half pixel
Described half pixel interpolation filter array is by half pixel interpolation memory I output integer pixel, simultaneously by A, B, C three and half pixel stores output level respectively, vertical half pixel and be sandwiched in level and vertical two interpolation between the value of half pixel;
Whether step (4.4) finishes with parallel computation end decision device judgement detailed level estimation and half pixel interpolation filtering calculating, as finishing, sends end signal;
Step (5), carry out half picture element movement estimation and the estimation of 1/4 picture element movement according to the following steps successively with half picture element movement estimator and 1/4 picture element movement estimator:
Step (5.1) is carried out half picture element movement with the half picture element movement estimator and is estimated, and exports half pixel least residual and corresponding half picture element movement vector, and its steps in sequence is as follows:
Step (5.1.1), detailed level exercise estimator on-chip memorizer interface input least residual and the corresponding motion vector in the half picture element movement estimator; Described three and half pixel store A, B, C import half pixel A, half pixel B and half pixel C to described on-chip memorizer interface;
Step (5.1.2), described on-chip memorizer interface carries out estimation to A, B, C three classes half pixel with 3 half picture element movement estimators respectively after by 3 half pixel input buffers A, B, C three classes half pixel being sent to three and half pel search regional memories respectively more according to the following steps:
At first, each half picture element movement estimator calculates the residual error of the 4*4 piece and the half pel search zone 4*4 piece of current macro with the residual computations device of a 4*4 piece;
Then, contrast the residual error of similar sub-piece that dissimilar regions of search obtain with a most similar selector, the least residual contrast of selecting the search of wherein minimum residual error and detailed level to obtain, draw similar of littler residual error correspondence the most similar of estimating for half picture element movement, draw its half pixel residual error and corresponding motion vector;
Step (5.2) is carried out 1/4 picture element movement with 1/4 picture element movement estimator and is estimated, obtains the least residual and corresponding 1/4 motion vector of 1/4 pixel according to the following steps:
Step (5.2.1) is all importing 1/4 pixel interpolation memory interface from the current macro of half pixel least residual of half picture element movement estimator output and corresponding motion vector, integer pixel I, half pixel interpolation memory A that half pixel interpolation memory I obtains, half pixel A, B, C and current macro memory that B, C obtain;
Step (5.2.2), 1/4 pixel interpolation calculator array from one 1/4 pixel that will calculate of described 1/4 pixel interpolation memory interface input adjacent two and half pixels or one and half pixels and an integer pixel average and obtain this 1/4 pixel, can obtain 12 kind of 1/4 pixel, but only calculate 8 kind of 1/4 pixel around the pixel of the most similar of half pixel that described half picture element movement estimator obtains;
Step (5.2.3), described 1/4 pixel interpolation calculator array deposits all 1/4 pixels of gained in 8 1/4 pel search regional memories respectively, and sends into 8 1/4 picture element movement estimators respectively;
Step (5.2.4), 8 1/4 picture element movement estimators obtain 1/4 pixel block and the residual error between the 4*4 piece of the current macro of internal memory in advance from described 8 1/4 pel search regional memories respectively with the residual computations device of a 4*4 piece respectively;
Step (5.2.5), the most similar selector is selected described in the step (5.2.4) similar of best 1/4 pixel of conduct of residual error minimum in 8 kinds of regions of search, and calculate corresponding motion vector, this residual error is compared with the least residual that half pel search obtains, similar conduct selecting minimum residual error correspondence be the best similar of final selection that generates of exercise estimator H.264/AVC again;
Step (6) is calculated the H.264/AVC final motion vector MV of exercise estimator generation with a final motion vector calculator f, and deposit the motion estimation result memory in:
MV f=8MV c+4MV p+2MV h+MV q
Wherein, the motion vector of c, p, h, q correspondence is represented the motion vector that rough layer, detailed level, half pixel and 1/4 picture element movement estimator obtain respectively.
CN 200610113030 2006-09-08 2006-09-08 Motion estimating method for H.264/AVC coder Active CN100471275C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200610113030 CN100471275C (en) 2006-09-08 2006-09-08 Motion estimating method for H.264/AVC coder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200610113030 CN100471275C (en) 2006-09-08 2006-09-08 Motion estimating method for H.264/AVC coder

Publications (2)

Publication Number Publication Date
CN1933600A CN1933600A (en) 2007-03-21
CN100471275C true CN100471275C (en) 2009-03-18

Family

ID=37879181

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200610113030 Active CN100471275C (en) 2006-09-08 2006-09-08 Motion estimating method for H.264/AVC coder

Country Status (1)

Country Link
CN (1) CN100471275C (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101453646B (en) * 2007-12-04 2012-02-22 华为技术有限公司 Image interpolation method, apparatus and interpolation coefficient obtaining method
JP4618355B2 (en) * 2008-09-25 2011-01-26 ソニー株式会社 Image processing apparatus and image processing method
WO2011120221A1 (en) * 2010-03-31 2011-10-06 Intel Corporation Power efficient motion estimation techniques for video encoding
CN101860747B (en) * 2010-03-31 2012-05-23 北京大学 Sub-pixel movement estimation system and method
TWI678916B (en) 2010-04-13 2019-12-01 美商Ge影像壓縮有限公司 Sample region merging
CN106067984B (en) 2010-04-13 2020-03-03 Ge视频压缩有限责任公司 Cross-plane prediction
CN106454371B (en) 2010-04-13 2020-03-20 Ge视频压缩有限责任公司 Decoder, array reconstruction method, encoder, encoding method, and storage medium
ES2904650T3 (en) 2010-04-13 2022-04-05 Ge Video Compression Llc Video encoding using multitree image subdivisions
CN102281434B (en) * 2010-06-10 2013-11-06 中国移动通信集团公司 Video compression method and equipment
CN107257476B (en) 2011-09-09 2020-11-06 Lg 电子株式会社 Video decoding method, video encoding method, and computer-readable storage medium
CN102630016A (en) * 2012-04-09 2012-08-08 复旦大学 Depth assembly line hardware framework suitable for integer motion estimation
CN102932643B (en) * 2012-11-14 2016-02-10 上海交通大学 A kind of expansion variable-block motion estimation circuit being applicable to HEVC standard
US10785498B2 (en) * 2012-11-27 2020-09-22 Squid Design Systems Pvt Ltd System and method of mapping multiple reference frame motion estimation on multi-core DSP architecture
CN103716639B (en) * 2013-12-25 2017-04-19 同观科技(深圳)有限公司 Search algorithm of frame image motion estimation
CN104980737B (en) * 2014-04-01 2018-04-13 扬智科技股份有限公司 Inter-frame mode selecting method
CN104333674B (en) * 2014-11-24 2019-01-22 广东中星电子有限公司 A kind of video image stabilization method and device
CN110740323B (en) * 2019-10-29 2023-05-12 腾讯科技(深圳)有限公司 Method, device, server and storage medium for determining LCU division mode

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1163542A (en) * 1996-03-22 1997-10-29 大宇电子株式会社 Estimating device for half picture element movement
WO2003107679A2 (en) * 2002-06-18 2003-12-24 Qualcomm, Incorporated Techniques for video encoding and decoding
CN1753501A (en) * 2005-10-31 2006-03-29 连展科技(天津)有限公司 Method of selecting H.264/AVC frame to frame motion estimation mode

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1163542A (en) * 1996-03-22 1997-10-29 大宇电子株式会社 Estimating device for half picture element movement
WO2003107679A2 (en) * 2002-06-18 2003-12-24 Qualcomm, Incorporated Techniques for video encoding and decoding
CN1753501A (en) * 2005-10-31 2006-03-29 连展科技(天津)有限公司 Method of selecting H.264/AVC frame to frame motion estimation mode

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于SSE技术的H.264运动估计的并行处理. 李小红.合肥学院学报(自然科学版),第15卷第3期. 2005 *

Also Published As

Publication number Publication date
CN1933600A (en) 2007-03-21

Similar Documents

Publication Publication Date Title
CN100471275C (en) Motion estimating method for H.264/AVC coder
US11856220B2 (en) Reducing computational complexity when video encoding uses bi-predictively encoded frames
Chen et al. Analysis and architecture design of variable block-size motion estimation for H. 264/AVC
US7499491B2 (en) Apparatus for adaptive multiple-dimentional signal sequences encoding/decoding
CN102165777B (en) Adaptive interpolation filter for video coding
CN101505427A (en) Movement estimation apparatus in video compression encoding algorithm
US20110261886A1 (en) Image prediction encoding device, image prediction encoding method, image prediction encoding program, image prediction decoding device, image prediction decoding method, and image prediction decoding program
CN101326550A (en) Motion estimation using prediction guided decimated search
CN105191309A (en) Content adaptive prediction distance analyzer and hierarchical motion estimation system for next generation video coding
CN101816183A (en) Method and apparatus for inter prediction encoding/decoding an image using sub-pixel motion estimation
CN102291581B (en) Realizing method of self-adaptive motion estimation supporting frame field
CN101765011B (en) Method and device for scaling motion estimation
CN101299818B (en) N level sub-pixel search method based on whole pixel searching result
CN102148990B (en) Device and method for predicting motion vector
CN101860747B (en) Sub-pixel movement estimation system and method
CN1703094B (en) Image interpolation apparatus and methods that apply quarter pel interpolation to selected half pel interpolation results
CN103430546A (en) Video encoding device, video encoding method and video encoding program
CN102801982B (en) Estimation method applied on video compression and based on quick movement of block integration
Kao et al. A memory-efficient and highly parallel architecture for variable block size integer motion estimation in H. 264/AVC
CN101600111A (en) A kind of searching method of realizing secondary coding of self-adaptive interpolation filter
Van et al. Fast motion estimation for closed-loop HEVC transrating
CN102630014B (en) Bidirectional motion estimation device for production of interpolated frame based on front reference frame and rear reference frame
CN101227616B (en) H.263/AVC integer pixel vectors search method
CN100469146C (en) Video image motion compensator
CN102420989B (en) Intra-frame prediction method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: BEIJING HUAXIA DIANTONG TECHNOLOGY CO., LTD.

Free format text: FORMER OWNER: TSINGHUA UNIVERSITY

Effective date: 20120823

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 100084 HAIDIAN, BEIJING TO: 100085 HAIDIAN, BEIJING

TR01 Transfer of patent right

Effective date of registration: 20120823

Address after: 100085 A, block 9, 3rd Street, Beijing, Haidian District, A301

Patentee after: Beijing Powercom Technologies Co., Ltd.

Address before: 100084 Beijing 100084-82 mailbox

Patentee before: Tsinghua University

C56 Change in the name or address of the patentee
CP03 Change of name, title or address

Address after: 100094, No. 6, building, No. 3, Feng Xiu Middle Road, Beijing, Haidian District

Patentee after: Beijing Huaxia Diantong Technology Co., Ltd.

Address before: 100085 A, block 9, 3rd Street, Beijing, Haidian District, A301

Patentee before: Beijing Powercom Technologies Co., Ltd.

CP01 Change in the name or title of a patent holder

Address after: 100094, No. 6, building, No. 3, Feng Xiu Middle Road, Beijing, Haidian District

Patentee after: BEIJING HUAXIA DENTSU TECHNOLOGY Co.,Ltd.

Address before: 100094, No. 6, building, No. 3, Feng Xiu Middle Road, Beijing, Haidian District

Patentee before: BEIJING CHINASYS TECHNOLOGIES Co.,Ltd.

CP01 Change in the name or title of a patent holder