CN1599461A

CN1599461A - Motion estimating method and motion estimating circuit using the method

Info

Publication number: CN1599461A
Application number: CN 200410043874
Authority: CN
Inventors: 何卫锋; 毛志刚
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2004-09-15
Filing date: 2004-09-15
Publication date: 2005-03-23
Anticipated expiration: 2024-09-15
Also published as: CN1256686C

Abstract

This invention discloses a motion evaluation method and a circuit based on FSBM algorithm including: 1. initializing stage: inputting pixel data, 2, intermediate data organization stage, 3, motion vector computing stage, 4, the motion vector value output stage. It's favourable to circuit comparison, discrimination and motion vector output and reducing the circuit complexity utilizing the input data stream described by the said algorithm to compute MAD. The circuit images the said two-way algorithm to a pulsation array structure, and projects 1 to PE array, the last frame data have two sets of input buses y1 and y2 which transmit the data to PE unit by the multipath selector unit. The current reference block data in the current frame are put in RA set in PE.

Description

The motion estimation circuit of a kind of method for estimating and this method of application

Technical field:

The present invention relates to a kind of based on the FSBM algorithm method for estimating and use the motion estimation circuit of the frame level pipeline array structure of this method.This Method and circuits is mainly used in variously reduces the staff in the video encoder of yard standard based on MPEG (1 ,-2 ,-4) and H26x equipressure.

Background technology:

In numerous video/audio compressed encoding standards, mostly by the time redundancy of object video being carried out estimation and motion compensation comes removal of images, to reach the purpose of data compression.In motion estimation process, full search block matching algorithm (Full Search Block Matching Algorithm is called for short FSBM) is by carrying out exhaustive to recently seeking best matching blocks to all search block of region of search.In the common FSBM algorithm, adopt the MAD standard to mate, its algorithm is expressed as follows:

MAD (m, n) = Σ_{i = 0}^{N - 1} Σ_{j = 0}^{N - 1} | x (i, j) - y (i + m, j + n) | - - - (1)

Wherein (i j) is the brightness value of present frame to x, and (i+m j+n) is the brightness value of previous frame to y, and (i j) is the relative coordinate of reference macroblock.Relation between side-play amount and the detection range p satisfies-p≤m, n≤p.Under the constraint of detection range p, motion phasor MV is side-play amount (m ^*, n ^*), and satisfy MAD (m, value minimum n) is,

MV＝arg{minMAD(m，n)}…-p≤m，n≤p (2)

Fig. 4 has provided the schematic diagram of this matching algorithm.(1) formula is the piece level flowing water algorithm of normal theory.Described with machine language, be the nested algorithm of four layers of circulation.The motion estimation circuit that adopt four layers of loop nesting algorithm, obtains by the standard mapping method exists inefficiency, a large amount of inner buffers of needs unit, very lagre scale integrated circuit (VLSIC) to realize shortcomings such as difficulty.

In order to overcome above-mentioned shortcoming based on the motion estimation circuit of piece level flowing water, people have proposed the motion estimation algorithm based on frame level flowing water, its core is six layers of loop nesting, statement sees Table 1, wherein the size of the basic block of each estimation is N * N pixel, and each picture frame comprises Nh * Nv basic block.

do?v＝1?to?Ny do?h＝1?to?Ny MV(h，v)＝(0，0)； D _min(h，v)＝∞； do?m＝-p?to?p do?n＝-p?to?p MAD(m，n)＝0； do?i＝1?to?N do?j＝1?to?N MAD(m，n)＝MAD(m，n)+|x((h-1)N+i，(v-1)N+j)-y((h-1)N+i+m，(v-1)N+j+n)|； enddoj，i If?Dmin(h，v)＞MAD(m，n) Dmin(h，v)＝MAD(m，n)； MV(h，v)＝(m，n)； endif enddo?n.m.h.v

Table 1

Pause when the circuit that obtains by the standard mapping method based on above-mentioned algorithm can be eliminated estimation between adjacent reference block, thus the flowing water when realizing that adjacent reference block carries out estimation is carried out.Yet the too complexity of above-mentioned mapping process makes this method not be widely used.

In order to solve in the said method too complicated problems of mapping process, as far back as the nineteen ninety-five people a kind of new idea has just been proposed, promptly earlier the sextuple algorithm equivalence transformation shown in the table 1 is become three-dimensional algorithm, carry out projection again and obtain two-dimentional systolic array architecture.But, in the structure that obtains by this method, exist two shortcomings that are difficult to overcome:

(1) pixel data need be broadcasted input to array, make very lagre scale integrated circuit (VLSIC) realize the difficulty that becomes.

(2) detection range P and piece N size must satisfy the relation of N=2p, have limited the range of application of this structure.

For this reason, the calendar year 2001 people have proposed a kind of new algorithm again, and it becomes two-dimentional algorithm with the sextuple algorithm equivalence transformation shown in the table 1 earlier, carries out projection again and obtains the one-dimensional array structure.In this structure, though no longer require pixel data to broadcast input, the relation of N=2p still must satisfy.And also there is following shortcoming in it:

(1) complex structure of PE unit, required hardware resource is too many

(2) PE number P Es=(2p+1) ², for different p, the scale of circuit is also different.

(3) the PE number is too much, and hard-wired cost is too big.

In order to overcome the restriction of N=2p, people in 2003 method by direct mapping on the basis of the algorithm of table 1 obtains based on p=kN, the frame level flowing water motion estimation circuit structure of k 〉=1/2.But this structure is except above-mentioned three shortcomings, and also a large amount of use delay time registers makes the scale of circuit become unacceptable.

Summary of the invention:

The motion estimation circuit that the purpose of this invention is to provide a kind of method for estimating and this method of application, utilizing this method input traffic to carry out motion vector value calculates, be very beneficial for the circuit comparison, differentiate, find and the output movement vector, help reducing the complexity of circuit.Technical scheme of the present invention is as follows: a kind of method for estimating, and it is finished by following step:

(1) initial phase 101: initialization two interim present frame and interim preceding frame images without any pixel data.

(2) intermediate data organizes the step 102: the 1 stage that the pixel data of present frame and the pixel data of previous frame are stored in interim present frame respectively and go in the previous frame temporarily; The 2nd step: for each pixel groups of interim present frame, it is empty that there are data, other positions in the first row place.Other row places with the data at every row first row place are duplicated every row make that in each pixel groups, the pixel data of every row is all identical; According to the method described above, finish the data replication work of interim present frame; The 3rd step: the order of pixel peek is from previous frame search zone: (1), take out (2p+1) individual data in turn by what be listed as from first row, first row of region of search; (2), take out (2p+1) individual data from secondary series first row of region of search in turn by what be listed as; (3), the rest may be inferred, repeats said process; (4), will be by (2p+1) of (1) (2) (3) step order from the previous frame taking-up ²Individual data by row order be stored in interim previous frame first pixel groups first row in; (5), take out (2p+1) individual data from first row second row of region of search in turn by what be listed as; (6), take out (2p+1) individual data from secondary series second row of region of search in turn by what be listed as; (7), up to from second row of (2p+1) of region of search row by be listed as take out (2p+1) individual data in turn till; (8), (2p+1) that takes out from previous frame ²Individual data are stored in the secondary series of first pixel groups of interim previous frame by the order of row; (9), the preceding N row up to first pixel groups of interim previous frame all fill up data; The 4th step: N+1～2N row of first pixel groups of previous frame are all filled up data; The 5th step: repeat the process in the 4th step, in all row of first pixel groups of interim previous frame, all fill up data.The 6th step: repeat the process in the 5th step, all fill up data up to all pixel groups of interim previous frame.

(3) step 103: the 1 motion vector calculation stage, by the order of row first pixel data of going in interim present frame and interim previous frame first pixel groups is taken out the absolute calculation that differs from, and with the N that calculates ²Individual value adds up (MAD calculating), uses row-coordinate k (k=1) to come this value of mark simultaneously; The 2nd step, by the order of row second pixel data of going in interim present frame and interim previous frame first pixel groups is taken out the absolute calculation that differs from, and with the N that calculates ²Individual value adds up; In the 3rd step, the method according to the 1st step and the 2nd step repeats this process, and MAD calculating has been finished in all provisional capitals of first pixel groups in interim present frame and interim previous frame; In the 4th step, the method according to the 3rd step constantly repeats this process, and all pixel groups all are removed and have all finished MAD calculating in interim present frame and interim previous frame.

(4) the motion vector value output stage 104:

(2p+1) that separately the data of first pixel groups produce in by interim present frame and interim previous frame ²In the individual MAD value, find out this (2p+1) ²Minimum value in the individual MAD value, and the mark k of this minimum value exported as the motion vector value of present frame first reference block.(2p+1) that separately the data of second pixel groups produce in by interim present frame and interim previous frame ²In the individual MAD value, find out this (2p+1) ²Minimum value in the individual MAD value, and the mark k of this minimum value exported as the motion vector value of present frame second reference block.Repeat said process, till the motion vector of all reference blocks was all exported in present frame, entire method finished.

Traditional FSBM algorithm generally can be divided into initialization and input, MAD calculating, MV produces and the output three phases.The data of wherein carrying out MAD calculating are the pixel data from preceding frame image and current frame image input.How to carry out the tissue of input traffic for convenience of description among the present invention, define two picture frames, one is interim present frame, and another is interim previous frame.Present frame and previous frame pixel data are input to interim present frame earlier and keep in the previous frame temporarily, and then take out from interim present frame and interim previous frame and carry out MAD calculating.In motion estimation circuit efficiently, current frame pixel data and previous frame pixel data are organized into data flow according to certain form and mode and constantly are input in a steady stream and carry out MAD in the circuit and calculate.The organizational form of this data flow and under this data stream format, whether can find suitable hardware configuration also just to become the principal element of the operating efficiency height of decision-making circuit.The new two-layer Do loop nesting FSBM algorithm that the present invention proposes has mainly been described a kind of new input traffic organizational form, and under this data stream format, how pixel carries out MAD is calculated, how work such as output movement vector.The maximum characteristics of this algorithm are: utilize the input traffic of this arthmetic statement to carry out MAD calculating, be very beneficial for the circuit comparison, differentiate, find and the output movement vector, help reducing the complexity of circuit.

The present invention also provides the motion estimation circuit of using this method for estimating, to overcome the defective that motion estimation circuit hardware utilance is low, circuit scale is big of prior art, technical scheme of the present invention is as follows: a kind of motion estimation circuit of using method for estimating, it is by N ²Individual piece matching unit (PE ₁-PE _N ²), bus Y1, bus Y2, bus C1, bus C2, a N MUX (M ₁-M _N), two first-in first-out modules (FIFO1, FIFO2), a N-2 MUX (ME ₂-ME _N-1), a N-1 delay time register (Delay ₂-Delay _N), a N-1 delay time register group (Delay _-1-Delay _-N-1), a N-1 adder (a ₂-a _N) and motion vector generation unit MV composition.Each piece matching unit is all by four registers (Rge1-Reg4), three MUX (Mx1-Mx3), trigger DEF, two registers (RA1, RA2), MUX (M, MUX), addition absolute value element | and X-Y| and adder MAD form.

The present invention relates to following mathematical variable:

V, h: the piece coordinate (with the frame upper left corner is coordinate vertices) of current reference block in present frame.

M, n: coordinate is (v, the side-play amount between search block h) in current search piece and the previous frame.

I, j: the coordinate (with the piece upper left corner is coordinate vertices) of pixel in reference block.

Is, js: the absolute coordinate (with the frame upper left corner is coordinate vertices) of reference block pixel in present frame.

Iu, ju: search block is the absolute coordinate (is coordinate vertices with the frame upper left corner) in the frame formerly.

K: the search match block is carried out the piece number of estimation.

L: the matching order numbering of pixel when carrying out the piece coupling.

Wherein, k and l can be undertaken by the row order simultaneously, and also column major order carries out simultaneously.Here be that example describes with the column major order.

Thus: definition

l＝(i-1)N+j，1≤i，j≤N (3)

k＝(v-1)Nh(2p+1) ²+(h-1)(2p+1) ²+(m+pX2p+1)+n+p+1 (4)

The explanation of piece matching unit PE

MAD_in, the input/output port of MAD_out:MAD accumulated value, latch the centre.

Y_in1, Y_out1: the input and output port of frame data, latch the centre.

Y_in2, Y_out2: the input and output port of frame data, latch the centre.

Sel_in: the gating signal that the MAD of register RA 1 and RA2 calculates.

The latch output signal of Sel_out:Sel_in.

Sel1n: the input and output gating port of current frame data.

The logic function explanation of piece matching unit PE

Have the current reference block of same index number and the pixel data of next reference block and alternately be stored in the RA1 register and RA2 register of registers group, the Sel1 signal is responsible for the gating that pixel is deposited register in the process.Reference block pixel register RA1 when the then responsible PE of sel_in carries out the MAD computing and the output gating of RA2.Simultaneously, sel_in also will be as the clock trigger signal of d type flip flop, the selection signal of search block pixel Y_in1 and Y_in2 when producing PE thus and carrying out the MAD computing.After the absolute value of the difference of search block pixel and reference block pixel and the MAD_in addition, export as MAD_out under the latching of clock, it carries out the described computing of (1) formula.Fig. 7 provides the sequential explanation of main signal among the PE.

Fig. 8 is the hardware configuration based on the motion estimation circuit of p=kN (k 〉=1/2).Wherein, frame data has two cover input buss, Y1 and Y2.They deliver to the PE unit by MUX with data.In the drawings, FIFO1 and FIFO2 can accept the data from Y1 and Y2 respectively, and can deliver to bus C1 and C2 gets on, size be respectively (N-1) (2p-1) and (N-1) ²Delay_2p+1 and Delay _2p+2-NRefer to that respectively the degree of depth is (2p+1) and two delay shift register group (2p+2-N).

The operation principle of this circuit is:

(1), before circuit is started working, the current reference block data in the present frame will be preloaded onto in the some registers group of RA among the PE.Finish at circuit before the estimation task of current reference block data, next reference block data also will be preloaded onto in another registers group of RA among the PE.

It is the zero point of time beat when (2), just having started working with circuit.In 1～N (2p+1) beat, FIFO1 and FIFO2 are not used, and send number by port A to array from the frame data of Y1, Y2, and bus C1 and C2 do not work.According to importing, the data of Y2 input are then a bit complicated by first columns in the interim previous frame for the data of Y1.With the preceding N columns of interim previous frame according to according to row number the beginning of data insert row number several zero.Order by row compares, and is identical if this N is listed as the data of certain delegation, and then the data of Y2 input this moment are identical with Y1, if which these N data is inequality, then the data of Y2 input are those data inequality.

(3), in N (2p+1)+1～(2N-1) (2p+1) beat, FIFO1 accepts (2p+1) the individual data and depositing from Y1 (N-1); (2p+1)+1～(N+r) in the beat of (2p+1)+N-1, FIFO2 accepts (N-1) from Y2 at (N+r) ²Individual data, the row cache of going forward side by side (wherein, 0＜r＜N, and r is an integer).In said process, send number by port A to array from the data of Y1, Y2, and FIFO does not send number to array, bus C1 and C2 do not work.

(4), at (2p+1) ²+ 1～(2p+1) ²+ (N-1) in the beat process of (2p+1), FIFO1 will send number to bus C1, and Y1, Y2 also want PE port A to send number.At this moment, be sent to the B port from the data of FIFO1, and be sent to the A port from the data of Y1, Y2.Y1, Y2 send is last (N-1) (2p+1) individual data of the corresponding region of search of current reference block, and FIFO1 send is its (N-1) that deposit (2p+1) individual data; At (2p+r+1) (2p+1)+1～(2p+r+1) in the beat of (2p+1)+N-1, FIFO2 will send number to bus C2, the data that this moment may occur FIFO1 and FIFO2 are sent number to the B port simultaneously by bus C1 and C2, and Y1, Y2 send number (wherein to the A port simultaneously, 0＜r＜N, and r is an integer).

(5), at (2p+N) (2p+1)+1 behind the beat, FIFO1 stops to send number, and Y1, Y2 continue to send number to the A port, it begin at this moment from (N-1) of the region of search data of next reference block correspondence (2p+1)+1 a data begin to send number; (2p+1)+1～(2p+N) in the beat process of (2p+1)+N-1, FIFO2 will send (N-1) individual data at last to port B at (2p+N).

Begin to send in several processes to array at FIFO, the PE array is in the overlapping process that current reference block and next reference block carry out estimation.(5) after process finished, overlapping implementation also just finished, and this moment, PE only carried out the estimation of next reference block.For next reference block, begin to carry out of the beginning of the zero point of estimation for (4) process, it need only repeat said process and get final product.

Fig. 8 has provided the sequential relationship of some key signals in Fig. 6 structure.Wherein, FIFO1_w, FIFO1_r, FIFO2_w, FIFO2_r are respectively the read-write of two FIFO.L=18_A_e refers to that order is the A port enable signal of 18 PE.In the above-mentioned signal, low level is effective.

Obtaining of motion vector:

There is a data path that is made of shift register and adder the below of Fig. 6, and they add up the MAD value of each row PE unit, and are sent to the MV module.The MV module is responsible for producing motion vector.Exist because two field picture has the border when carrying out estimation, and the calculating of the MAD in the PE array is not handled border issue, all these work are all transferred to the MV module and are finished.It includes a synchronous counter.By the counter position of reference block in present frame of current estimation as can be known, can determine effective MAD computation interval thus, and effective k value.

Beneficial effect

The circuit structure that is mapped out by said method has following advantage:

(1) operating efficiency of PE is near 100%.

(2) the PE unit is simple in structure, saves hardware resource 40%～60%.

(3) PE number P Es=N ²Because in the common coding standard, N is a fixed value, p then wishes variable.Thus, for this kind structure, when p changed, the structural change of circuit was less, can finish corresponding work by the organizational form that changes circuit input data, and the adaptability of circuit is good, extensibility is good.

(4) number of PE is suitable, and does not change with p, and the scale of circuit also changes little substantially, and the realization cost of circuit changes little, the cost performance height.

Need not to satisfy the N=2p relation, make this structure be with a wide range of applications and bigger practical value

Description of drawings:

Fig. 1 is the step schematic diagram of the inventive method, Fig. 2 is the schematic diagram of step 102 among Fig. 1, Fig. 3 is the present frame and the structural representation of previous frame temporarily temporarily in the inventive method, piece coupling schematic diagram when Fig. 4 is estimation, Fig. 5 is the structural representation of piece matching unit PE among the present invention, the structural representation of Fig. 6 motion estimation circuit of the present invention, Fig. 7 are the sequential contrast figure of each pin institute's plus signal of piece matching unit PE and internal signal, and Fig. 8 is the timing diagram of signal in the motion estimation circuit.

Embodiment:

Embodiment one: specify present embodiment below in conjunction with Fig. 1 to Fig. 3.A kind of method for estimating, it is finished by following step:

(1) initial phase 101: initialization two interim present frame and interim preceding frame images without any pixel data.The every row of this two two field picture comprises N ²Individual picture element, every row comprise N _hN _v(2p+1) ²Individual pixel; On the direction of row, these two interim frames are divided into N _hN _vIndividual pixel groups, every group comprises (2p+1) ²Row, every group comprises N altogether ²(2p+1) ²Individual pixel is by the 1st group of the order difference called after that is listed as, the 2nd group ... N _hN _vGroup.The upper left corner with frame is the summit, with (l k) represents that this pixel formerly or the position in the current frame image, wherein 1≤l≤N ², 1≤l≤N _hN _v(2p+1) ²For current frame image and preceding frame image as input, every row comprises N * N _hIndividual picture element, every row comprise N * N _vIndividual pixel;

(2) intermediate data organizes the step 102: the 1 stage that the pixel data of present frame and the pixel data of previous frame are stored in interim present frame respectively and go in the previous frame temporarily, determines position of depositing and the order of depositing in the process of depositing; For current frame image, reference block takes out by the interior order serial from image by column scan of order, the reference block of line scanning in the frame; So that (l, k) remarked pixel is stored in the position in interim present frame and the interim previous frame.Pixel in first reference block of present frame takes out from reference block by the order of column scan, is stored in by the order of going in first row of first group of pixels in the interim present frame.Accordingly, the pixel in second reference block of present frame takes out from reference block by the order of column scan, in first row that is stored in second group of pixels in the interim present frame by the order of going; Repeat said process, the pixel data of all reference blocks in the present frame is stored in first row of each group of pixels in the interim present frame; The 2nd step: for each pixel groups of interim present frame, it is empty that there are data, other positions in the first row place.Other row places with the data at every row first row place are duplicated every row make that in each pixel groups, the pixel data of every row is all identical; According to the method described above, finish the data replication work of interim present frame; The 3rd step: for the pixel in first region of search of previous frame (being the pixel in the pairing region of search of first reference block of present frame), total (2p+N) ²Picture element, the total N of first group of pixels in the interim previous frame ²(2p+1) ²Individual empty position can be deposited pixel.The order of capture prime number certificate is from previous frame search zone: (1), take out (2p+1) individual data in turn by what be listed as from first row first row of region of search; (2), take out (2p+1) individual data from secondary series first row of region of search in turn by what be listed as; (3), the rest may be inferred, repeat said process, take out (2p+1) individual data from the 3rd row first row of region of search in turn by what be listed as, take out (2p+1) individual data from the 4th row first row of region of search in turn by what be listed as,, up to from (2p+1) row first row of region of search by be listed as take out (2p+1) individual data in turn till; (4), will be by (2p+1) of (1) (2) (3) step order from the previous frame taking-up ²Individual data by row order be stored in interim previous frame first pixel groups first row in; (5), take out (2p+1) individual data from first row second row of region of search in turn by what be listed as; (6), take out (2p+1) individual data from secondary series second row of region of search in turn by what be listed as; (7), the rest may be inferred, repeat said process, take out (2p+1) individual data from the 3rd row second row of region of search in turn by what be listed as, take out (2p+1) individual data from the 4th row second row of region of search in turn by what be listed as,, up to from second row of (2p+1) of region of search row by be listed as take out (2p+1) individual data in turn till; (8), will be by (2p+1) of (5) (6) (7) step order from the previous frame taking-up ²Individual data are stored in the secondary series of first pixel groups of interim previous frame by the order of row; (9), the rest may be inferred, repeats the process of (8), all fills up data up to the preceding N row of first pixel groups of interim previous frame; The 4th step: this step will all be filled up data with N+1～2N row of first pixel groups of previous frame, and method is identical with the 3rd step, sketches to be: (10), take out (2p+1) individual data in turn by what be listed as from secondary series first row of region of search; (11), take out (2p+1) individual data from the 3rd row first row of region of search in turn by what be listed as; (12), the rest may be inferred, repeats said process, up to from (2p+2) row first row of region of search by be listed as take out (2p+1) individual data in turn till; (13), (2p+1) that will take out from previous frame by (10) (11) (12), order ²Individual data are stored in by the order of row in the N+1 row of first pixel groups of interim previous frame; (14), take out (2p+1) individual data from secondary series second row of region of search in turn by what be listed as; (15), take out (2p+1) individual data from the 3rd row second row of region of search in turn by what be listed as; (16), the rest may be inferred, repeats said process, up to from second row of (2p+2) row of region of search by be listed as take out (2p+1) individual data in turn till; (17), will be by (2P+1) of (14) (15) (16) order from the previous frame taking-up ²Individual data are stored in by the order of row in the N+2 row of first pixel groups of interim previous frame; (18), the rest may be inferred, repeats the process of (17), all fills up data up to the preceding 2N row of first pixel groups of interim previous frame; The 5th step: repeat the process in the 4th step, in all row of first pixel groups of interim previous frame, all fill up data.The 6th step: repeat the process in the 5th step, all fill up data up to all pixel groups of interim previous frame.

(3) step 103: the 1 motion vector calculation stage, by the order of row first pixel data of going in interim present frame and interim previous frame first pixel groups is taken out the absolute calculation that differs from, and with the N that calculates ²Individual value adds up (MAD calculating), uses row-coordinate k (k=1) to come this value of mark simultaneously; The 2nd step, by the order of row second pixel data of going in interim present frame and interim previous frame first pixel groups is taken out the absolute calculation that differs from, and with the N that calculates ²Individual value adds up (MAD calculating), uses row-coordinate k (k=2) to come this value of souvenir simultaneously; In the 3rd step, the method according to the 1st step and the 2nd step repeats this process, up to interim present frame and temporarily in the previous frame all provisional capitals of first pixel groups finished MAD calculating, and with row-coordinate k (k=1,2,3 ... (2p+1) ²) come these MAD values of souvenir; In the 4th step, the method according to the 3rd step constantly repeats this process, and all pixel groups all are removed and have all finished MAD calculating in interim present frame and interim previous frame.And, all use their row-coordinate k (k=1,2,3 with these calculated values ... (2p+1) ², (2p+1) ²+ 1 ... 2 (2p+1) ²N _vN _h(2p+1) ²) come these MAD values of souvenir.

(4) the motion vector value output stage 104:

Given birth to N at MAD calculation stages common property _vN _h(2p+1) ²Individual MAD value, each data to pixel groups can produce (2p+1) in wherein interim present frame and the interim previous frame ²Individual MAD value.(2p+1) that separately the data of first pixel groups produce in by interim present frame and interim previous frame ²In the individual MAD value, find out this (2p+1) ²Minimum value in the individual MAD value, and the mark k of this minimum value exported as the motion vector value of present frame first reference block.(2p+1) that separately the data of second pixel groups produce in by interim present frame and interim previous frame ²In the individual MAD value, find out this (2p+1) ²Minimum value in the individual MAD value, and the mark k of this minimum value exported as the motion vector value of present frame second reference block.Repeat said process, till the motion vector of all reference blocks was all exported in the present frame that finds with said method, entire method finished.

Embodiment two: specify present embodiment below in conjunction with Fig. 5 and Fig. 6.A kind of motion estimation circuit of using method for estimating, it is by N ²Individual piece matching unit (PE ₁-PE _N ²), bus Y1, bus Y2, bus C1, bus C2, a N MUX (M ₁-M _N), two first-in first-out modules (FIFO1, FIFO2), a N-2 MUX ME ₂-ME _N-1), a N-1 delay time register (Delay ₂-Delay _N), a N-1 delay time register group (Delay _-1-Delay _-N-1), a N-1 adder (a ₂-a _N) and motion vector generation unit MV composition.Piece matching unit (PE ₁-PE _N ²) being arranged into the array structure that the capable N of N is listed as, bus Y1 connects MUX (M respectively ₁-M _N) an input, MUX (M ₁-M _N) another input all be connected on the bus Y2 MUX M ₁Output contiguous block matching unit PE respectively ₁Pin Y_in1 and the input of first-in first-out module FIFO1, the output connecting bus C1 of first-in first-out module FIFO1, MUX M ₂-MUX M _N-1Output be connected to piece matching unit PE _2-Piece matching unit PE _N-1Pin Y_in1 on, MUX M _NOutput contiguous block matching unit PE respectively _NPin Y_in1 and the input of first-in first-out module FIFO2, the output connecting bus C2 of first-in first-out module FIFO2, bus C2 also connects MUX ME respectively ₂-MUX ME _N-1Input and piece matching unit PE _NPin Y_in2, bus C1 is contiguous block matching unit PE also ₁Pin Y_in2, MUX ME ₂-MUX ME _N-1Another input, MUX ME ₂-MUX ME _N-1Output contiguous block matching unit PE respectively ₂-piece matching unit PE _N-1Pin Y_in2, in each row piece matching unit, pin MAD_out, pin set_out, pin X_out, the pin Sellt that is positioned at the piece matching unit of top connects pin MAD_in, pin Sel_in, pin X_in and the pin Selln of the piece matching unit that is positioned at the below respectively, the pin Y_out1 that is positioned at the piece matching unit of left in each row piece matching unit is connected pin Y_in1 and the pin Y_in2 that is positioned at right-hand piece matching unit, piece matching unit PE respectively with pin Y_out2 _NPin MAD_out connection delay registers group Delay _-1Input, piece matching unit PE _NPin Sel_out connect delay time register Delay ₂Input, delay time register Delay ₂Output contiguous block matching unit PE _N+1Pin Sel_in, piece matching unit PE _NPin X_out contiguous block matching unit PE _N+1Pin through X_in, piece matching unit PE _2NPin MAD_out connect adder a ₂An input, adder a ₂Another input connect delay time register group Delay _-1Output, adder a ₂Output connect delay time register group Delay _-2Input, piece matching unit PE _2NPin X_out connect piece matching unit PE _KN+1Pin X-in, piece matching unit PE _2NPin Sel_out connect delay time register Delay ₃Input, delay time register Delay ₃Output connect piece matching unit PE _KN+1Pin Sel_in, piece matching unit PE _{(k+1) N}Pin MAD_out connect adder a _N-1An input, adder a _N-1Output connect delay time register group Delay _-N-1Input, piece matching unit PE _{(K+1) N}Pin Sel_out connect delay time register Delay _NInput, delay time register Delay _NOutput connect piece matching unit PE _{N (n-1)+1}Pin Sel_in, piece matching unit PE _{(K+1) N}Pin X-out contiguous block matching unit PE _{N (N-1)+1}Pin X-in, piece matching unit PE _N ²Pin MAD_out connect adder a _NAn input, adder a _NOutput connect the input of motor image amount generation unit MV; Each piece matching unit is all by four registers (Rge1-Reg4), three MUX (Mx1-Mx3), trigger DEF, two register (RA1, RA2), MUX (M, MUX), the addition absolute value element | X-Y| and adder MAD form, the pin Selln connecting pin Sellt of piece matching unit, the control end of MUX Mx1, the control end of the control end of MUX Mx2 and MUX MUX, the pin X_in difference connected with multiple selector Mx1 of piece matching unit and the input of MUX Mx2, another input of MUX Mx1 connects the output of register RA 1, the input of MUX Mx3 and the input of MUX MUX, the output of MUX Mx1 connects the input of register RA 1, another input of MUX Mx2 connects the output of register RA 2, another input of MUX Mx3 and another input of MUX MUX, the output of MUX MUX connects the pin X_out of piece matching unit, the pin Sel_in of piece matching unit connects the pin clk of d type flip flop DEF, the input of the control end of MUX Mx3 and register Reg3, the output of register Reg3 connects the pin Sel_out of piece matching unit, the pin Y_in1 of piece matching unit connects the input of register Reg1 and the input of MUX M, the pin Y_in2 of piece matching unit connects the input of register Reg2 and another input of MUX M, the output of register Reg1 and register Reg2 connects the pin Y_out1 and the pin Y_out2 of piece matching unit respectively, the control end of MUX M connects the pin Q of trigger DEF, the output of MUX M connects the addition absolute value element | the input of X-Y|, the addition absolute value element | the output of another input connected with multiple selector Mx3 of X-Y|, the addition absolute value element | the output of X-Y| involves the input of adder MAD, another input of adder MAD connects the pin MAD_in of piece matching unit, the output of adder MAD connects the input of register Reg4, and the output of register Reg4 connects the pin MAD_out of piece matching unit.

Claims

1, a kind of method for estimating is characterized in that it finishes by following step:

(1) initial phase (101): initialization two interim present frame and interim preceding frame images without any pixel data.The every row of this two two field picture comprises N ²Individual picture element, every row comprise N _hN _v(2p+1) ²Individual pixel; On the direction of row, these two interim frames are divided into N _hN _vIndividual pixel groups, every group comprises (2p+1) ²Row, every group comprises N altogether ²(2p+1) ²Individual pixel is by the 1st group of the order difference called after that is listed as, the 2nd group ... N _hN _vGroup.The upper left corner with frame is the summit, with (l k) represents that this pixel formerly or the position in the current frame image, wherein 1≤l≤N ², 1≤l≤N _hN _v(2p+1) ²For current frame image and preceding frame image as input, every row comprises N * N _hIndividual picture element, every row comprise N * N _vIndividual pixel;

(2) intermediate data is organized the stage (102): the 1st step was stored in the pixel data of present frame and the pixel data of previous frame respectively interim present frame and went in the previous frame temporarily, determined position of depositing and the order of depositing in the process of depositing; For current frame image, reference block takes out by the interior order serial from image by column scan of order, the reference block of line scanning in the frame; With (l, k) remarked pixel is stored in the position in interim present frame and the interim previous frame; Pixel in first reference block of present frame takes out from reference block by the order of column scan, is stored in by the order of going in first row of first group of pixels in the interim present frame; Accordingly, the pixel in second reference block of present frame takes out from reference block by the order of column scan, in first row that is stored in second group of pixels in the interim present frame by the order of going; Repeat said process, the pixel data of all reference blocks in the present frame is stored in first row of each group of pixels in the interim present frame; The 2nd step: for each pixel groups of interim present frame, it is empty that there are data, other positions in the first row place; Other row places with the data at every row first row place are duplicated every row make that in each pixel groups, the pixel data of every row is all identical; According to the method described above, finish the data replication work of interim present frame; The 3rd step: for the pixel in first region of search of previous frame (being the pixel in the pairing region of search of first reference block of present frame), total (2p+N) ²Picture element, the total N of first group of pixels in the interim previous frame ²(2p+1) ²Individual empty position can be deposited pixel; The order of capture prime number certificate is from previous frame search zone: (1), take out (2p+1) individual data in turn by what be listed as from first row first row of region of search; (2), take out (2p+1) individual data from secondary series first row of region of search in turn by what be listed as; (3), the rest may be inferred, repeat said process, take out (2p+1) individual data from the 3rd row first row of region of search in turn by what be listed as, take out (2p+1) individual data from the 4th row first row of region of search in turn by what be listed as, ..., up to from (2p+1) row first row of region of search by be listed as take out (2p+1) individual data in turn till; (4), will be by (2p+1) of (1) (2) (3) step order from the previous frame taking-up ²Individual data by row order be stored in interim previous frame first pixel groups first row in; (5), take out (2p+1) individual data from first row second row of region of search in turn by what be listed as; (6), take out (2p+1) individual data from secondary series second row of region of search in turn by what be listed as; (7), the rest may be inferred, repeat said process, take out (2p+1) individual data from the 3rd row second row of region of search in turn by what be listed as, take out (2p+1) individual data from the 4th row second row of region of search in turn by what be listed as, ..., up to from second row of (2p+1) of region of search row by be listed as take out (2p+1) individual data in turn till; (8), (2p+1) that will take out from previous frame by (5), (6), (7) step order ²Individual data are stored in the secondary series of first pixel groups of interim previous frame by the order of row; (9), the rest may be inferred, repeats (8) process in step, all fills up data up to the preceding N row of first pixel groups of interim previous frame; The 4th step: this step will all be filled up data with N+1～2N row of first pixel groups of previous frame, and method is identical with the 4th step, sketches to be: (10), take out (2p+1) individual data in turn by what be listed as from secondary series first row of region of search; (11), take out (2p+1) individual data from the 3rd row first row of region of search in turn by what be listed as; (12), the rest may be inferred, repeats said process, up to from (2p+2) row first row of region of search by be listed as take out (2p+1) individual data in turn till; (13), (2p+1) that will take out from previous frame by (10), (11), (12) order ²Individual data are stored in by the order of row in the N+1 row of first pixel groups of interim previous frame; (14), take out (2p+1) individual data from secondary series second row of region of search in turn by what be listed as; (15), take out (2p+1) individual data from the 3rd row second row of region of search in turn by what be listed as; (16), the rest may be inferred, repeats said process, up to from second row of (2p+2) row of region of search by be listed as take out (2p+1) individual data in turn till; (17), (2p+1) that will take out from previous frame by (14), (15), (16) order ²Individual data are stored in by the order of row in the N+2 row of first pixel groups of interim previous frame; (18), the rest may be inferred, repeats the process of (17), all fills up data up to the preceding 2N row of first pixel groups of interim previous frame; The 5th step: repeat the process in the 4th step, in all row of first pixel groups of interim previous frame, all fill up data; The 6th step: repeat the process in the 5th step, all fill up data up to all pixel groups of interim previous frame;

(3) the motion vector calculation stage (103): the 1st step, by the order of row first pixel data of going in interim present frame and interim previous frame first pixel groups is taken out the absolute calculation that differs from, and with the N that calculates ²Individual value adds up, and comes this value of mark with row-coordinate k=1 simultaneously; The 2nd step, by the order of row second pixel data of going in interim present frame and interim previous frame first pixel groups is taken out the absolute calculation that differs from, and with the N that calculates ²Individual value adds up, and comes this value of souvenir with row-coordinate k=2 simultaneously; In the 3rd step, the method according to the 1st step and the 2nd step repeats this process, and MAD calculating has been finished in all provisional capitals of first pixel groups in interim present frame and interim previous frame, and uses row-coordinate k=1,2,3 ... (2p+1) ²) come these MAD values of souvenir; In the 4th step, the method according to the 3rd step constantly repeats this process, and all pixel groups all are removed and have all finished MAD calculating in interim present frame and interim previous frame; And, all use their row-coordinate k=1 with these calculated values, 2,3 ... (2p+1) ², (2p+1) ²+ 1 ... 2 (2p+1) ²N _vN _h(2p+1) ²Come these MAD values of souvenir;

(4) the motion vector value output stage 104:

Given birth to N at MAD calculation stages common property _vN _h(2p+1) ²Individual MAD value, each data to pixel groups can produce (2p+1) in wherein interim present frame and the interim previous frame ²Individual MAD value; (2p+1) that separately the data of first pixel groups produce in by interim present frame and interim previous frame ²In the individual MAD value, find out this (2p+1) ²Minimum value in the individual MAD value, and the mark k of this minimum value exported as the motion vector value of present frame first reference block; (2p+1) that separately the data of second pixel groups produce in by interim present frame and interim previous frame ²In the individual MAD value, find out this (2p+1) ²Minimum value in the individual MAD value, and the mark k of this minimum value exported as the motion vector value of present frame second reference block; Repeat said process, till the motion vector of all reference blocks was all exported in the present frame that finds with said method, entire method finished.

2, a kind of motion estimation circuit of using method for estimating is characterized in that it is by N ²Individual piece matching unit (PE ₁-PF _N ²), bus (Y1), bus (Y2), bus (C1), bus (C2), a N MUX (M ₁-M _N), two first-in first-out modules (FIFO1, FIFO2), a N-2 MUX (ME ₂-ME _N-1), a N-1 delay time register (Delay ₂-Delay _N), a N-1 delay time register group (Delay _-1-Delay _-N-1), a N-1 adder (a ₂-a _N) and motion vector generation unit (MV) composition, piece matching unit (PE ₁-PE _N ²) being arranged into the array structure that the capable N of N is listed as, bus (Y1) connects MUX (M respectively ₁-M _N) an input, MUX (M ₁-M _N) another input all be connected on the bus (Y2) MUX (M ₁) output contiguous block matching unit (PE respectively ₁) pin Y_in1 and the input of first-in first-out module (FIFO1), the output connecting bus (C1) of first-in first-out module (FIFO1), MUX (M ₂)-MUX (M _N-1) output be connected to piece matching unit (PE ₂)-piece matching unit (PE _N-1) pin Y_in1 on, MUX (M _N) output contiguous block matching unit (PE respectively _N) pin Y_in1 and the input of first-in first-out module (FIFO2), the output connecting bus (C2) of first-in first-out module (FIFO2), bus (C2) also connects MUX (ME respectively ₂)-MUX (ME _N-1) input and piece matching unit (PE _N) pin Y_in2, bus (C1) is gone back contiguous block matching unit (PE ₁) pin Y_in2, MUX (ME ₂)-MUX (ME _N-1) another input, MUX (ME ₂)-MUX (ME _N-1) output contiguous block matching unit (PE respectively ₂)-piece matching unit (PE _N-1) pin Y_in2, in each row piece matching unit, pin MAD_out, pin set_out, pin X_out, the pin Sellt that is positioned at the piece matching unit of top connects pin MAD_in, pin Sel_in, pin X_in and the pin Selln of the piece matching unit that is positioned at the below respectively, the pin Y_out1 that is positioned at the piece matching unit of left in each row piece matching unit is connected pin Y_in1 and the pin Y_in2 that is positioned at right-hand piece matching unit, piece matching unit (PE respectively with pin Y_out2 _N) pin MAD_out connection delay registers group (Delay _-1) input, piece matching unit (PE _N) pin Sel_out connect delay time register (Delay ₂) input, delay time register (Delay ₂) output contiguous block matching unit (PE _N+1) pin Sel_in, piece matching unit (P _EN) pin X_out contiguous block matching unit (PE _N+1) pin X_in, piece matching unit (PE _2N) pin MAD_out connect adder (a ₂) an input, adder (a ₂) another input connect delay time register group (Delay _-1) output, adder (a ₂) output connect delay time register group (Delay _-2) input, piece matching unit (PE _2N) pin X_out connect piece matching unit (PE _KN+1) pin X-in, piece matching unit (PE _2N) pin Sel_out connect delay time register (Delay ₃) input, delay time register (Delay ₃) output connect piece matching unit (PE _KN+1) pin Sel_in, piece matching unit (PE _{(k+1) N}) pin MAD_out connect adder (a _N-1) an input, adder (a _N-1) output connect delay time register group (Delay _-N-1) input, piece matching unit (PE _{(K+1) N}) pin Sel_out connect delay time register (Delay _N) input, delay time register Delay _N) output connect piece matching unit (PE _{N (N-1)+1}) pin Sel_in, piece matching unit (PE _{(K+1) N}) pin X-out contiguous block matching unit (PE _{N (N-1)+1}) pin X-in, piece matching unit (PE _N ²) pin MAD_out connect adder (a _N) an input, adder (a _N) output connect the input of motion vector generation unit (MV).

3, a kind of motion estimation circuit of using method for estimating according to claim 2, it is characterized in that each piece matching unit (PE) is all by four registers (Rge1-Reg4), three MUX (Mx1-Mx3), trigger (DEF), two register (RA1, RA2), two MUX (M, MUX), the addition absolute value element | X-Y| and adder (MAD) are formed, the pin Selln connecting pin Sellt of piece matching unit, the control end of MUX (Mx1), the control end of the control end of MUX (Mx2) and MUX (MUX), the pin X_in difference connected with multiple selector (Mx1) of piece matching unit (PE) and an input of MUX (Mx2), another input of MUX (Mx1) connects the output of register (RA1), an input of MUX (Mx3) and an input of MUX (MUX), the output of MUX (Mx1) connects the input of register (RA1), another input of MUX (Mx2) connects the output of register (RA2), another input of another input of MUX (Mx3) and MUX (MUX), the output of MUX (MUX) connects the pin X_out of piece matching unit (PE), the pin Sel_in of piece matching unit (PE) connects the pin clk of d type flip flop (DEF), the input of the control end of MUX (Mx3) and register (Reg3), the output of register (Reg3) connects the pin Sel_out of piece matching unit, the pin Y_in1 of piece matching unit connects the input of register (Reg1) and an input of MUX (M), the pin Y_in2 of piece matching unit connects the input of register (Reg2) and another input of MUX (M), the output of register (Reg1) and register (Reg2) connects the pin Y_out1 and the pin Y_out2 of piece matching unit respectively, the control end of MUX connects the output pin Q of d type flip flop (DEF), the output of MUX (M) connects the addition absolute value element | the input of X-Y|, the addition absolute value element | the output of another input connected with multiple selector (Mx3) of X-Y|, the addition absolute value element | the output of X-Y| connects an input of adder (MAD), another input of adder (MAD) connects the pin MAD_in of piece matching unit, the output of adder (MAD) connects the input of register (Reg4), and the output of register (Reg4) connects the pin MAD_out of piece matching unit.