WO2001008402A2 - Method of block-matching motion estimation with full search in a video sequence and corresponding architecture - Google Patents

Method of block-matching motion estimation with full search in a video sequence and corresponding architecture Download PDF

Info

Publication number
WO2001008402A2
WO2001008402A2 PCT/EP2000/003546 EP0003546W WO0108402A2 WO 2001008402 A2 WO2001008402 A2 WO 2001008402A2 EP 0003546 W EP0003546 W EP 0003546W WO 0108402 A2 WO0108402 A2 WO 0108402A2
Authority
WO
WIPO (PCT)
Prior art keywords
block
blocks
sub
macro
data
Prior art date
Application number
PCT/EP2000/003546
Other languages
French (fr)
Other versions
WO2001008402A3 (en
Inventor
Luca Fanucci
Lorenzo Bertini
Pierpaolo Moio
Sergio Saponara
Original Assignee
Cnr Consiglio Nazionale Delle Ricerche
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cnr Consiglio Nazionale Delle Ricerche filed Critical Cnr Consiglio Nazionale Delle Ricerche
Priority to AU15136/01A priority Critical patent/AU1513601A/en
Publication of WO2001008402A2 publication Critical patent/WO2001008402A2/en
Publication of WO2001008402A3 publication Critical patent/WO2001008402A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/43Hardware specially adapted for motion estimation or compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation

Definitions

  • the present invention relates to the field of video- communication and more precisely it relates to a method for motion estimation in a video sequence by means of a Block-Matching with Full -Search algorithm . Furthermore the invention relates to a low complexity / high throughput programmable architecture that carries out this method.
  • the video communication has many applicazioni among which videotelephony and videoconference on ISDN, high definition digital TV (HDTV) , video systems for sorveglianza remota, those for telemedicina, the gentimento remotely and the telelavoro.
  • ISDN high definition digital TV
  • HDTV high definition digital TV
  • ISO has developed JPEG for applicazioni with the immagini statiche and MPEG in the releases 1, 2 and recently 4 for interactive video playback, the entertainment such asty video distribution and for HDTV.
  • the ITU-T has proposto H.261 and its evoluzioni H.263, H.263+ and H.26L for applicazioni of videotelephony and videoconference .
  • These codec require a high complexity hardware that contrasta with the need of sviluppare systems to low costo, whereby becomes indispensabile ricorrere to architectures VLSI dedicate.
  • the technique essential, developed in the codec ISO and ITU-T, for compressure of the signals video, is that of motion estimation, or Motion Estimation (ME) , which riduce the ridondanza of informazione time present in a video sequence, i.e. between two frame of the same.
  • ME Motion Estimation
  • the idea base of the ME through the technique of the block-matching (BMA) , is that of dividere in blocks the current frame in the video sequence and for each block searchre, in a suitable search window frame computed previously, that more simile according to a suitable function of costo.
  • BMA block-ma tching
  • IB candidate block N x N is shown in the central position wherein the upper left point has coordinates (p, p).
  • the ma tching algorithm consists in computing the SAD ( Sum of Absolute Difference) between the blocks ⁇ and b and is defined as follows : if
  • - b(i + n,j + m) is a pixel of candidate block b , indexes m and n indicated the position differential of candidate block b in the search window 3 , i.e. the coordinates of a motion vector MV, is
  • the calculus is repeated for all the 4p 2 possible positions of candidate block b in search window 3.
  • the coordinates of block b corresponding to the value of the function of minimum cost are used for the prediction:
  • the architecture that is suitable for a VLSI implementation is systolic with a pipeline data flux organisation.
  • the data of search window 3 and of reference block are loaded in a modulated computational structure, and pass through lines of delay to registers that have, principalmente, the object of carrying out a correct temporizzaée of equal .
  • three data loading lines are used, one for reference block a and two for data of search window 3 , for a total of 4 - [(2 » - l)(N - l)+ N]-l- 4N 2 registers, having indicated wherein N is the block characteristic dimension and p is the maximum movement In the search window.
  • the use of the search window registers provides then the need of 2 - [(2/? - l)(N - l)+ N]+ N 2 Multiplexer (MUX) and of a relative control logic.
  • MUX Multiplexer
  • FS-BMA Block- Matching motion estimation in a video sequence with Full- Search
  • the video coding standards such as for example H.263, MPEG-4, wherein the data flux is new and effective, with a substantial complexity reduction of the architecture that implements this method and high efficiency throughput/area .
  • reference block has dimension N/2xN/2 and the Motion Vector processor comprises two modules of Minimum Distortion Detection with the resource of memorizzazione of way that an allows of to calculate the Motion Vector of the blocks N/2xN/2 and, for every 4 blocks N/2xN/2, the other calculating also the MV of the block NxN fromit costituito.
  • reference block has dimension N/4xN/4 and the Motion Vector processor comprises two modules of Minimum Distortion Detection with the resource of memorizzazione of way that an allows of to calculate, for every 4 blocks N/4xN/4, the Motion Vector of the blocks N/2xN/2 fromit costituiti and the other calculating, for every 4 blocks N/2xN/2, i.e. for every 16 blocks N/4xN/4 , also the MV of the block NxN fromit costituito.
  • the architecture according to the invention having the characteristic of have two sole lines of loading of the data, unitthough to the organizzazione of the data loaded, riduce the number of registers occorrenti a
  • IA and IB show the general principle of calculus of the MV of a reference block N x N in an search window of dimensions (2p + N - ⁇ ) 2 ;
  • - figure 1C shows the division according to the invention of a Macro-block (MB) ⁇ x ⁇ in four under Blocchi (SB) ⁇ /2x ⁇ /2;
  • - figure 2 shows the block diagram of the operation of the method according to the invention;
  • - figure 4 shows a diagrammatical view functional of the architecture of figure 3;
  • - figure 6 shows the structure inner of a processor Element (PE) of figures 4 and 5;
  • - figure 9 details the structure of the module ouJle adder in adder Tree of figure 8 ;
  • - figure 10 shows the general structure of the Motion Vector processor (MVP) of figure 4 ;
  • MVP Motion Vector processor
  • figure 11 details the structure of the mdd_spo module in the MVP of figure 10;
  • figure 12 details the structure of the mdd module in the mdd_spo module of figure 11;
  • figure 13 details the structure of the modmin module in the mdd module of figure 12;
  • figure 14 shows the circuit solution which, applied to the buffering resource, matrix of Shift Register SR of figure 3, 4 and 5, allows the dynamical programming of parameter p ;
  • FIG 15 shows a diagrammatical view of a source coder or codec H.263/MPEG that uses the architecture of figures 3 and 4 as a module of Motion Estimation or ME.
  • the four blocks SBs of every block MB are processed in turn by control structure described hereinafter and all the relative 4p 2 Sums of Absolute Difference (SAD) are suitably stored in a SAD memory .
  • SAD Sums of Absolute Difference
  • the SAD memory is indicated with the numeral 4, and has dimension 4p 2 words, being p the dimension of the search window
  • k sub-block SB l.. ⁇
  • This process implements the following formula, which defines the SAD(i,j) relative to block NxN responsive to the SAD k ⁇ ,j) relative to the single N/2xN/2 SB:
  • the SAD relative to the blocks SBs N/2xN/2 are provided through line 7 for evaluating the SADmin and the relative MV, according to formula (2) above indicated.
  • line 8 After having processed the first three SBs and with the output of the SAD relative to the fourth SB, line 8 provides the SAD relative to the NxN MB that allow computing its MV, always according to formula (2) .
  • ME module 100 which comprises:
  • control unit 40 and counters 50.
  • the systolic architecture is shown of the system of figure 3 whose core is, in snake 10, a two-dimensional array 10a of Processing Element or PE 11 linked among them by means of the first input line 9x.
  • a second input line indicated with the numeral 9y, crosses the PE 11 of each column 11a, lib, lie, lid and, between each column, crosses respectively columns 14a, 14b and 14c of an array 10b of Shift Register (SR) 14, which represents the buffering resource of the system.
  • SR Shift Register
  • a clock line 12 provides the clock signal to elements 11 and 14 of matrix 10a and 10b.
  • Arrays 10a and 10b being pipeline connected by lines 9x and 9y, form a snake structure indicated with 10 in figure 3 and in figures 4 and 5.
  • the single element PE 11 of figures 4 and 5 has a general structure shown in figure 6 and is substantially a unit 110 for computing an absolute difference with carry and of registers 111, 112 and 113 necessary respectively to the propagation of the data of the search window, of the SBs and of the partial SADs .
  • a threshold maximum value of SAD is, beyond which, during of the partial calculus, there is no need to make further increments. This is obtained by limiting to a reasonable value the number M of bits of the AD processor 110 that carries the SAD.
  • This procedure indicates to the codec that comprises ME module 100 the opportunity to carry out an intraframe coding of the corresponding MB (and not by the MV) .
  • This parameter M, of maximum number of bits of the SAD is one of the architecture hardware configuration parameters .
  • Unit 110 of PE 11 of figure 6 is an absolute Difference (AD) processor and is shown in figure 7.
  • AD processor module 110 signals, through preset_out 115 output module, whether the maximum value of M bit has been reached, giving the propagation of the latter through the preset of the downstream register.
  • Adder Tree 20 stores the partial sums coming from rows 11' of matrix 10a of PE 11.
  • figure 9 is shown one of the modules double adder 21 of figure 8, which comprises two adders 201 and 202.
  • the Adder Tree 20 output value of the SAD(n, m) under formula (1) described above is on line 5. This way a parallel processing is obtained by means of matrix 10a of PE but with a serial sequence of the data flux.
  • the Motion Vector processor (MVP) 30 controls whether the SAD ⁇ n, m) 5 provided by Adder Tree 20 is less than the previous one of which the minimum value is stored in a corresponding register in module 60 of Minimum Distortion Detection MDD. In the affirmative, MVP 30 updates this register with the new value. At the end of the control step the registers in MVP 30 contain the SAD minimum and the coordinates (m,n) of the respective MV. In figure 10 the structure is shown of the MVP 30 of figure 4.
  • Counter cnt_in, 301 suitably synchronised scans, column after column, all the possible positions of comparison contained in the SBs search window ( 0 ⁇ cnt_in ⁇ 4p 2 -1 ) .
  • counters cnt_in_r 302 and cnt_in_c 303 indicate, respectively, the number of row and the number of column of the candidate position, under the condition 0 ⁇ cnt _in_r , cnt _in _c ⁇ 1p -1.
  • a first module mdd_spo 304 receives, from the sad_in 5 input module, the organized succession of the SAD of the SBs along with the above position values, providing the SAD minimum value and the relative MV.
  • the static position i.e. that with MV of null coordinates, in accordance with what provided for by the standard, is preferred through the possibility of decrease the relative SAD of a fixed value (input parameter of the architecture) that can be assigned by inlet module sad_sb_in 305.
  • This functionality is obtained (see figures 12 and 13) through the modmin 61 module that is present in the MDD module 60 of Minimum Distortion Detection.
  • the generic SAD for the MB is obtained by the sum of the relative four SADs of SB.
  • MVP 30 (figure 10) a Dual Port Ram 4 memory has been provided capable of storing the partial SAD calculus for each of the 4p 2 search window positions.
  • memory 4 is scanned sequentially through port b 307 by counter cnt_in 301 for picking up the partial SADs of MB (sad_stored) and for summing them by means of adder 6 to the SAD of the current SBs providing for a maximum calculus threshold.
  • the result of the sum is then loaded in memory 4 at the same location, piloting the write addresses, port b 307, through the value of cnt_in 301 suitably delayed.
  • mask 308 (and_m) has been provided capable of zeroing, at adder 6, the value relative to the partial SAD.
  • Fig. 17 shows the status of SR and PE internal registers during that operation.
  • the shadow PE and SR means that search area and reference block pixels are correctly aligned thus providing useful results to AD while the others are not .
  • the array operation is divided in a preload phase
  • the PE array is loaded via the x line of Fig. Orig#4 with the N 2 /4 pixels of the reference block while the PE and SR matrixes are loaded via the y line of Fig. Orig#4 with the first N 2 /4 + (N/2 - l)(2 j p - 2) pixels of the relevant search area. Both the reference block and search area are scanned in the typical row-column way.
  • the duration of this preload phase is N 2 /4+ (N/2 - l)(2_p - 2) clock cycles, after which the array is ready for BM operation (with reference to the given example the architecture status relevant to candidate blocks is shown in Fig.17.
  • the generic PE (i , 0) elements (1st column of PE) elaborate the AD
  • (with i 0 , 1... ⁇ V/2-1) related to the evaluation of the SAD ⁇ -p,-p) , while all the others columns are in idle (see Fig. 17) .
  • All the aforesaid processing steps have to be performed 2p times to cover the whole search area, before starting the BM computation for the following reference block.
  • the first pixel of the i-th SB search area i.e. starting of the preload phase for i-th SB
  • the PE matrix just a clock cycle after the last pixel of the (i -1) -th SB one.
  • N xN MBs from its corresponding, aforesaid, N/2xN/2 SBs.
  • the proposed architecture is characterized by a continuos input data flow with an overall throughput of l/T a , where T a is the time required to process candidate and reference pixels relevant to one N x N MB. T a amounts for 4(2p + N 12 - Y) 2 T clock being (2/J + N/2 - 1) 2 the number of pixels relevant to a search area for a NOT FURNISHED UPON FILING

Abstract

Method of motion estimation in a video sequence by means of a block-matching with full search. Firstly the current video frame (1) that forms the sequence is divided into a plurality of reference macro-block (MB), and each macro-block (MB) is divided into a plurality of sub-blocks (SB). Then a search window (3) is chosen in a video frame (2) previous to the current frame (1) and a SAD (Sum of Absolute Difference) is calculated between the pixels of a first reference sub-block (SB) of the current frame (a) and all the ones of the sub-blocks (SB) of equal size (b) in the search window (3). Then the SADmin are determined between all the calculated SAD. Repeating the calculus for each further sub-block (SB) the MV of the macro-block (MB) is computed. An architecture for carrying out this search has two data loading lines (9x, 9y) of reference block (a) and of candidate block (b); a matrix (10a) of processor Element (11) for loading the data of the reference block (a) and comparing them with the data of the candidate block (b); a buffering resource (10b) for adapting the serial input (9y) of the data with their parallel processing (9x, 9y) carried out by the matrix (10a) of the PE (11); an accumulator (20) of the partial sums computed by the matrix (10a) or the PE (11); a Motion Vector processor (30) for computing the MV of reference block (a) with respect to the candidate blocks (b).

Description

TITLE METHOD OF BLOCK-MATCHING MOTION ESTIMATION WITH FULL SEARCH IN A VIDEO SEQUENCE AND LOW COMPLEXITY / HIGH THROUGHPUT ARCHITECTURE Field of the invention
The present invention relates to the field of video- communication and more precisely it relates to a method for motion estimation in a video sequence by means of a Block-Matching with Full -Search algorithm . Furthermore the invention relates to a low complexity / high throughput programmable architecture that carries out this method.
Description of the prior art
The video communication has many applicazioni among which videotelephony and videoconference on ISDN, high definition digital TV (HDTV) , video systems for sorveglianza remota, those for telemedicina, the apprendimento remotely and the telelavoro.
Sistemi multimediali that utilizzano the video communication trovano a limit of base in the high number of bit necessary for representing the video signals, which is traduced in an eccessivo load for resources of transmission and memorizzazione . for cercare of passing this limit, which presently not allows a soddisfacente sviluppo of these systems in a market of type consumer, is necessary ricorrere to techniques of compressure of the signals video (see Table 1 for characteristics of main image format) .
In this context, committee international of the ISO and of the ITU-T have developed different standard of codifica/devideo coding, or codec :
ISO has developed JPEG for applicazioni with the immagini statiche and MPEG in the releases 1, 2 and recently 4 for interactive video playback, the entertainment such asty video distribution and for HDTV.
The ITU-T has proposto H.261 and its evoluzioni H.263, H.263+ and H.26L for applicazioni of videotelephony and videoconference . These codec require a high complexity hardware that contrasta with the need of sviluppare systems to low costo, whereby becomes indispensabile ricorrere to architectures VLSI dedicate.
The technique essential, developed in the codec ISO and ITU-T, for compressure of the signals video, is that of motion estimation, or Motion Estimation (ME) , which riduce the ridondanza of informazione time present in a video sequence, i.e. between two frame of the same.
The idea base of the ME, through the technique of the block-matching (BMA) , is that of dividere in blocks the current frame in the video sequence and for each block searchre, in a suitable search window frame computed previously, that more simile according to a suitable function of costo. Presently have been developed different algorithms of block-ma tching (BMA) based on different strategie of search: Full -Search, Three Step Search, 2D Logari thmic Search, Conjugated direction Search, Cross Search, Hierarchical Search and recentemente algorithms of type predittivo.
Among these, the algorithm of Full Search (FS) , which achieves a Full-Search in the search window, is the better to obtain high such kind of the image coded. Actually, with reference to figures IA and IB, a block of pixel square of dimensions N x N of the current frame 1, called reference block and indicated as block α, is compared with the all the blocks of equal dimension of the frame computed previously 2, called candidate blocks and indicated as block b , in a search window 3. the search window 3 has dimensions ph * pv , where p„ is the number of pixel of its edge horizontal whereas pv is the number of pixel of its edge vertical; if ph = pv = p and N is the number of pixel of the edge of the block square N N the possible positions of the block b in the search window 3 are 4p2.
In figure IB candidate block N x N is shown in the central position wherein the upper left point has coordinates (p, p).
According to the full -search (FS) technique, the ma tching algorithm consists in computing the SAD ( Sum of Absolute Difference) between the blocks α and b and is defined as follows : if
- α(i,j) is a pixel of reference block α,
- b(i + n,j + m) is a pixel of candidate block b , indexes m and n indicated the position differential of candidate block b in the search window 3 , i.e. the coordinates of a motion vector MV, is
N-\ N-\
SAD(n,/w) = 2∑l α(*> J) ~ b(i + n,j + m) \ { 1 )
.=0 j=0 ' wherein - ph ≤ m ≤ ph - \ and - pv ≤ n ≤ pv - l Usually ph = pv = p .
The calculus is repeated for all the 4p2possible positions of candidate block b in search window 3. The coordinates of block b corresponding to the value of the function of minimum cost are used for the prediction:
MV = (m, n) jn SAD min = min [SAD(n, m)] (2) . mm where ""•">
This exhaustive approach is characterised by a big computing complexity. For example, for a video format 4CIF
(a 30 frame/s with the N = 16 and p = \β for cases of practical interest in accordance to what provided by ITU-T standard) e necessary a computational power of more that
12xl09 operations of aJsolute difference for each second.
In addition to the full search (FS) also the other cited algorithms have been studied for reducing this computing complexity pagando however versus such kind of the image coded with respect to the case of the FS .
By virtue of the regularity of the FS algorithm and the high flux of data required, the architecture that is suitable for a VLSI implementation is systolic with a pipeline data flux organisation. According to this architecture, the data of search window 3 and of reference block are loaded in a modulated computational structure, and pass through lines of delay to registers that have, principalmente, the object of carrying out a correct temporizzazione of equal .
Per a plausibile application of the FS to standards such as the QCIF, CIF and the 4CIF, are known architectures that are, however, still very complex with reference to the dimension of the search window, such as: a) Hya Nam and Moon Key Lee, "High-Throughput B-M VLSI Architecture with Low Memory Bandwidth", IEEE Trans, on Circuits and System, vol.45, n.4, pp. 508-512, Apr. 1998. b) Luc De Vos and Michael Stegherr, "Parametrizable VLSI Architectures for Full-Search Block-Matching Algorithm". IEEE Trans, on Circuits and System, vol.36, n.10, pp. 1309-1316, Oct.1989. c) Chaur-Heh Hsieh and Ting-Pang Lin, "VLSI Architecture for Block-Matching Motion Estimation Algorithm" , IEEE Trans . on Circuits and System for Video Technology, vol .2 , n.2, pp. 169-175, June 1992.
In a) and b) the following aspects are present:
- l'high ricorrenza of the generico register for propagation of the data, which more of every other serves to the complexity of the struttura;
- l'high number of lines for operation of the data, which determines a further incremento of the architecture complexity; - 1 ' organizzazione complex of the flux of data, which has ripercussioni onto the costs relative to the resources hardware necessary for its operation.
In the architecture according to a) , in particular, three data loading lines are used, one for reference block a and two for data of search window 3 , for a total of 4 - [(2 » - l)(N - l)+ N]-l- 4N2 registers, having indicated wherein N is the block characteristic dimension and p is the maximum movement In the search window. The use of the search window registers provides then the need of 2 - [(2/? - l)(N - l)+ N]+ N2 Multiplexer (MUX) and of a relative control logic.
In the architecture described in b) , according to a quadratic array solution, 2N - (2/? - l)+ 7N2 registers, 2N - (2/? - l)+ N2 MUX to 3 vie and N2 elements of calculus are used combined in a network of very complex connections. Instead, with a Linear array solution the complexity of the structure is reduced through a phase of hardware mul tiplexing that however is capable of driving the typical video streams (30 frame/s in CIF standard with p = l6 and N = 16) single to frequencies working very high with all the relative power consumption drawbacks. In particular, the architecture according to b) is strongly limited operation of standards and difference of fotogram. Part of the problems of a) and b) are overcome in the architecture described in c) wherein however, since the parallel processing of macro-blocks ΝxΝ is carried out by means of a matrix of ΝxΝ Elementary Processors, the circuit complexity in the cases of practical interest (Ν=16, p=16) is still very high.
For the above reasons, the architectures according to the state of the art are not very effective for a consumer market .
Summary of the invention It is an object of the present invention to provide, in a video communication system, a method of Block- Matching motion estimation in a video sequence with Full- Search (hereinafter FS-BMA) , according to the video coding standards, such as for example H.263, MPEG-4, wherein the data flux is new and effective, with a substantial complexity reduction of the architecture that implements this method and high efficiency throughput/area .
It is another object of the present invention to provide an architecture for carrying out this method that allows a not complex data flux and memory layout in the source coder or codec made according to the international standards .
It is a particular object of the present invention to provide a such an architecture that allows the implementation of additional features such as:
- Advanced Prediction mode (AP) provided in the international standards (H.263, MPEG-4),
- la chosen of the MV (Motion Vector) to norma minima, - la predilezione of the block in central positione, to which corresponds a MV of coordinates nulle,
- la dynamical programming of the search window and then the possibility of implementare also the search to mezzo pixel, - la dynamical programming of the search window as technique to obtain a riduzione of the power consumed.
- la parametricita hardware versus the parameters N and p above introdotti .
The above objects are achieved from the method according to the present invention, whose characteristic is that the FS-BMA on a macro-block is carried out starting from the FS-BMA relative to the respective sub-blocks. Preferably, are provided the steps of:
- in a video sequence, division of the current video frame in a plurality of thereof macro-blocks;
- partition of every macro-block NxN in a plurality of thereof HxH sub-blocks of dimensions N/HxN/H with the H parametrico; - for each macro-block choosing a search window in a video frame computed previously with respect to the frame corrente; calculus of a SAD between the pixel of a first sottoreference block of the current frame and all the sub- blocks of equal dimension present in the search window;
- computing the SADmιn between all the calculatedSAD and calculus of the MV of the first sub-block on the basis of the SADmιn;
- repeating the calculus of the SADmιn and of the MV for each sub-block wherein is divided said macro-block;
- computing the MV of the macro-block starting from the computing carried out onto the respective sub-blocks;
- repeating the calculus of the MV for other macro- blocks . Advantageously, every macro-block has square dimension NxN and its sub-blocks are four and have square dimension N/2xN/2 (H=2) as well. Advantageously, each macro-block has square dimension NxN and its sub-blocks are 16 and have square dimension N/4xN/4 (H=4) . The above objects are achieved also by an architecture for carrying out a block-matching with full search, wherein the determination is necessary of the motion vector of a reference block present in the current frame of a video sequence with respect to a block present in a search window of the frame computed previously with respect to the frame corrente, whose characteristic is it comprises :
- due lines of loading of the data respectively of reference block and of candidate block; - a matrix of processor Element for loading the data of reference block and comparing them with the data of candidate block;
- a buffering resource for adapting the input seriale of the data with their processing parallel eseguito from the matrix of the processor Element;
- a accumulatore of the partial sums elaborate from the matrix of the PE;
- a Motion Vector processor for calculus of the Motion Vector of said blocks of reference with respect to said candidate block.
Advantageously, in the case of H=2 , reference block has dimension N/2xN/2 and the Motion Vector processor comprises two modules of Minimum Distortion Detection with the resource of memorizzazione of way that an allows of to calculate the Motion Vector of the blocks N/2xN/2 and, for every 4 blocks N/2xN/2, the other calculating also the MV of the block NxN fromit costituito.
Advantageously anwhich, in the case of H=4 , reference block has dimension N/4xN/4 and the Motion Vector processor comprises two modules of Minimum Distortion Detection with the resource of memorizzazione of way that an allows of to calculate, for every 4 blocks N/4xN/4, the Motion Vector of the blocks N/2xN/2 fromit costituiti and the other calculating, for every 4 blocks N/2xN/2, i.e. for every 16 blocks N/4xN/4 , also the MV of the block NxN fromit costituito.
The architecture according to the invention, having the characteristic of have two sole lines of loading of the data, unitamente to the organizzazione of the data loaded, riduce the number of registers occorrenti a
(_V/ H + 2p - 2) {N I H -l)+ N I H + (N I H)2 .
Potendo drive at the same time blocks NxN and blocks N/HxN/H relatively to the latter, with respect to the state of the art is riduce of a factor H2 the needs of elements of calculus, which are less numbersi with respect to the registers, but singlermente more complex, and not is needs of the circuiteria of which to the prior art according to a) and b) for operation of the data flux, i.e. Multiplexer (MUX) and relative control logic.
Brief description of the drawings
Further characteristics and advantages of the method and of the architecture according to the present invention will be made clearer with the following description of a thereof particular embodiment (H=2), exemplifying but not limitative, with reference to attached drawings wherein:
- figures IA and IB show the general principle of calculus of the MV of a reference block N x N in an search window of dimensions (2p + N - \)2 ;
- figure 1C shows the division according to the invention of a Macro-block (MB) ΝxΝ in four under Blocchi (SB) Ν/2xΝ/2; - figure 2 shows the block diagram of the operation of the method according to the invention;
- figure 3 details a structure global of the architecture according to the invention and the connectedoni between the different modules for case of N=8 (then N/H=4), p = 4 ; - figure 4 shows a diagrammatical view functional of the architecture of figure 3;
- figure 5 shows the organizzazione of the snake of figure 4 consisting in the matrix of PE and SR, for case of N=8 (then N/H=4), p = 4 and M = 9 ; - figure 6 shows the structure inner of a processor Element (PE) of figures 4 and 5;
- figure 7 details the structure of the module AD processor incluso in the PE of figure 6;
- figure 8 shows the diagrammatical view circuit of the module of Adder Tree of figure 4 in the case of N=16 (then N/H=8) , = 9;
- figure 9 details the structure of the module ouJle adder in adder Tree of figure 8 ; - figure 10 shows the general structure of the Motion Vector processor (MVP) of figure 4 ;
- figure 11 details the structure of the mdd_spo module in the MVP of figure 10;
- figure 12 details the structure of the mdd module in the mdd_spo module of figure 11;
- figure 13 details the structure of the modmin module in the mdd module of figure 12;
- figure 14 shows the circuit solution which, applied to the buffering resource, matrix of Shift Register SR of figure 3, 4 and 5, allows the dynamical programming of parameter p ;
- figure 15 shows a diagrammatical view of a source coder or codec H.263/MPEG that uses the architecture of figures 3 and 4 as a module of Motion Estimation or ME. - Figure 16 and 17 show the Search area pixel mapping for the case example N/2=3, p=3.
- Figures 18 and 19 show a flux graph for the architecture according to the invention for N=4
Description of the preferred embodiments Method of motion estimation
As indicated in figure 1C, the method according to the invention, for carrying out a FS-BMA on a macro-block MB
NxN four sub-blocks SBs N/2xN/2 thereof are considered
(exemplifying, but not limitative, with H=2) . Starting from this partition, the control of a MB, i.e. the calculus of SAD minimum and of the corresponding MV, is carried out starting from the results of FS-BMA obtained for four SBs.
This way, if at the same time both the MV relative to blocks SBs N/2xN/2 and the MV relative to blocks MB NxN, are computed, the computing resources necessary for the video standards are reduced four times.
More precisely, the four blocks SBs of every block MB are processed in turn by control structure described hereinafter and all the relative 4p2 Sums of Absolute Difference (SAD) are suitably stored in a SAD memory .
With reference to figure 2, which represents a diagrammatical view of operation of the method, the SAD memory is indicated with the numeral 4, and has dimension 4p2 words, being p the dimension of the search window
(figure IB) .
Memory 4, which is a Dual Port RAM, loads progressively the SADk (i,j) (i = 0...2p - l, j - 0...2? — 1) relative to the k sub-block SB (k = l..Λ) coming from line 5. The global structure of the flux diagram of the architecture is given in Figures 18 and 19 for N=4.
More precisely, as indicated in Figure 2, the SAD on k-\ line 5 are summed to the values V SAD . (, j) coming as output r=l from memory 4 on line 6a, from adder 6 and the result of the sum is loaded in memory 4 through line 6b.
This process implements the following formula, which defines the SAD(i,j) relative to block NxN responsive to the SADk{ ,j) relative to the single N/2xN/2 SB:
SAD(i,j) = ∑SADk(i,j) k=\ The SAD relative to the blocks SBs N/2xN/2 are provided through line 7 for evaluating the SADmin and the relative MV, according to formula (2) above indicated.
After having processed the first three SBs and with the output of the SAD relative to the fourth SB, line 8 provides the SAD relative to the NxN MB that allow computing its MV, always according to formula (2) .
Structure of the architecture
The novel layout of the data flux above described is mapped in the architecture of figure 3, indicated as a ME module 100, which comprises:
- an array 10, called snake, shown in detail in figures 4-7, in which the partial sums of the input data are computed through lines 9x and 9y respectively of candidate block b and of reference block a;
- an Adder Tree 20, detailed in figures 8-9, which receives the partial sums from snake 10 through lines 13;
- a MV processor 30, detailed in figures 10-13, which receives through line 5 the calculatedSAD by Adder Tree 20;
- a control unit 40 and counters 50.
With reference to figures 4 and 5, the systolic architecture is shown of the system of figure 3 whose core is, in snake 10, a two-dimensional array 10a of Processing Element or PE 11 linked among them by means of the first input line 9x.
PE 11 are arranged in four columns 11a, lib, lie, lid and in four rows 11' (per N/H=4) . A second input line, indicated with the numeral 9y, crosses the PE 11 of each column 11a, lib, lie, lid and, between each column, crosses respectively columns 14a, 14b and 14c of an array 10b of Shift Register (SR) 14, which represents the buffering resource of the system. A clock line 12 provides the clock signal to elements 11 and 14 of matrix 10a and 10b.
Arrays 10a and 10b, being pipeline connected by lines 9x and 9y, form a snake structure indicated with 10 in figure 3 and in figures 4 and 5.
The single element PE 11 of figures 4 and 5 has a general structure shown in figure 6 and is substantially a unit 110 for computing an absolute difference with carry and of registers 111, 112 and 113 necessary respectively to the propagation of the data of the search window, of the SBs and of the partial SADs . In PE 11 a threshold maximum value of SAD is, beyond which, during of the partial calculus, there is no need to make further increments. This is obtained by limiting to a reasonable value the number M of bits of the AD processor 110 that carries the SAD. This procedure indicates to the codec that comprises ME module 100 the opportunity to carry out an intraframe coding of the corresponding MB (and not by the MV) . This parameter M, of maximum number of bits of the SAD, is one of the architecture hardware configuration parameters .
Unit 110 of PE 11 of figure 6 is an absolute Difference (AD) processor and is shown in figure 7. AD processor module 110 signals, through preset_out 115 output module, whether the maximum value of M bit has been reached, giving the propagation of the latter through the preset of the downstream register.
The same procedure has been used in the module 20 of Adder Tree of figures 3 and 4, shown in more detail in figure 8. Adder Tree 20 stores the partial sums coming from rows 11' of matrix 10a of PE 11. In figure 9 is shown one of the modules double adder 21 of figure 8, which comprises two adders 201 and 202.
The Adder Tree 20 output value of the SAD(n, m) under formula (1) described above is on line 5. This way a parallel processing is obtained by means of matrix 10a of PE but with a serial sequence of the data flux.
In order that the loading in Adder Tree 20 the candidate blocks in PE matrix 10a is carried out correctly a buffering resource is necessary that embodied in the architecture according to the invention by matrix 10b of Shift Register (SR) 14 of figures 4 and 5. Such SR matrix requires only an elementary functionality of flip-flop (D- FF) type.
Always with reference to figure 4, for every possible couple of coordinates (m,n) in the search window, the Motion Vector processor (MVP) 30 controls whether the SAD{n, m) 5 provided by Adder Tree 20 is less than the previous one of which the minimum value is stored in a corresponding register in module 60 of Minimum Distortion Detection MDD. In the affirmative, MVP 30 updates this register with the new value. At the end of the control step the registers in MVP 30 contain the SAD minimum and the coordinates (m,n) of the respective MV. In figure 10 the structure is shown of the MVP 30 of figure 4. Counter cnt_in, 301 suitably synchronised scans, column after column, all the possible positions of comparison contained in the SBs search window ( 0 ≤ cnt_in ≤ 4p2 -1 ) . Similarly, counters cnt_in_r 302 and cnt_in_c 303, indicate, respectively, the number of row and the number of column of the candidate position, under the condition 0 < cnt _in_r , cnt _in _c < 1p -1.
A first module mdd_spo 304 (detailed in figure 11) receives, from the sad_in 5 input module, the organized succession of the SAD of the SBs along with the above position values, providing the SAD minimum value and the relative MV.
Preferably, the static position, i.e. that with MV of null coordinates, in accordance with what provided for by the standard, is preferred through the possibility of decrease the relative SAD of a fixed value (input parameter of the architecture) that can be assigned by inlet module sad_sb_in 305. The values provided by counters cnt_in_r 302 and cnt_in_c 303 are used as MV coordinates, being useful to discriminate, among all the positions for which the minimum SAD is obtained, that is nearest to the static position, given by cnt _ in _ r = cnt _ in _ c = p . This functionality is obtained (see figures 12 and 13) through the modmin 61 module that is present in the MDD module 60 of Minimum Distortion Detection.
The generic SAD for the MB is obtained by the sum of the relative four SADs of SB. For achieving this object in
MVP 30 (figure 10) a Dual Port Ram 4 memory has been provided capable of storing the partial SAD calculus for each of the 4p2 search window positions.
As also indicated in figure 10, memory 4 is scanned sequentially through port b 307 by counter cnt_in 301 for picking up the partial SADs of MB (sad_stored) and for summing them by means of adder 6 to the SAD of the current SBs providing for a maximum calculus threshold. In the following cycle the result of the sum is then loaded in memory 4 at the same location, piloting the write addresses, port b 307, through the value of cnt_in 301 suitably delayed.
Since, for the first of the four SB, memory 4 does not contain significant data, mask 308 ( and_m) has been provided capable of zeroing, at adder 6, the value relative to the partial SAD.
At the end of the control step of the third SB, the data written in memory 4 are not any more significant since the output of adder 6 provides directly the SAD of MB. This output is therefore sent to a second module mdd_spo 309 that, like tyhe former 304, supplies the minimum SAD and the relative MV of MB. Organisation of the data flux
In this section a more detailed description of the ALPHA-B input data flow and relevant SAD processing for a N/2 x N/2 reference SB and its corresponding search area is provided (caso esemplificativo, a non limitativo, di H=2) . We refer to Fig. 16 where the search area and candidate block loading (via the y line) are shown for the case example N/2 = 3 and /> = 3. In particular, Fig. 17 shows the status of SR and PE internal registers during that operation. The shadow PE and SR means that search area and reference block pixels are correctly aligned thus providing useful results to AD while the others are not .
The array operation is divided in a preload phase
(which is necessary to properly align the reference block data with the relevant search area data) and a continuos processing phase. During the preload phase the PE array is loaded via the x line of Fig. Orig#4 with the N2 /4 pixels of the reference block while the PE and SR matrixes are loaded via the y line of Fig. Orig#4 with the first N2 /4 + (N/2 - l)(2jp - 2) pixels of the relevant search area. Both the reference block and search area are scanned in the typical row-column way. The duration of this preload phase is N2 /4+ (N/2 - l)(2_p - 2) clock cycles, after which the array is ready for BM operation (with reference to the given example the architecture status relevant to candidate blocks is shown in Fig.17. At the end of the preload phase the generic PE (i , 0) elements (1st column of PE) elaborate the AD | α(/,0) - b(i - p,0 - p) | (with i = 0 , 1...ΪV/2-1) related to the evaluation of the SAD{-p,-p) , while all the others columns are in idle (see Fig. 17) . At next clock cycle the PE (i , l) elements (2nd column) elaborate the value psum(i,ϊ) = psum(i,0) + \ (i,Y) - b(i - p,l - p) \ , where psum(i,0) is the AD of the previous column related to the SAD (-p, -p) while the following (j = 2...N/2- 1 ) columns are in idle. Note that the presence of the shift registers in the y line has allowed the proper values of the b pixels to be present at that clock cycle in the 2nd column of the array. It is also important to underline that during this clock cycle the PE (i , 0) elements (1st column) are not idle but they are calculating the AD | α(z',0)- b(i - p +1,0- p) | related to the evaluation of the SAD(-p + \,-p) (see Fig. 18) . So, after N /2 clock cycles from the end of the preload phase, the PE (i , N/2-l) elements (last column) provide to the
ΛT/2-1
Adder Tree the N/2 partial sums T| a(i, j) - b(i - p, j - p) | ,
(with i=0, l..., N/2- l ) related to the SAD ( -p, -p) (see Fig. 17-d) . The Adder Tree performs the addition of these partial sums yielding the
N/2-\ N/2-l
SAD(-p,-p) = \ a(i,j) -b(i - p,j - p) \ . Then, after 2p cycles
1=0 j=0 all the SAD (n, -p) (with - p ≤ n ≤ p - l ) are ready (see Fig. 17-i) . It has to be considered that, before starting the processing of the next column of SAD(n,-p +1) (see Fig. 17- 1) , N 12 -1 idle clock cycles are necessary to skip partial sums relative to not valid cadidate blocks (see Figs. 17- h, 17-i) . However, it is worth noting that the array is continuously filled with new data of the search area, independently of the inner array operation thus simplifying ALPHA-B interface with coder frame memory as it will be detailed later. All the aforesaid processing steps have to be performed 2p times to cover the whole search area, before starting the BM computation for the following reference block. In particular, the first pixel of the i-th SB search area (i.e. starting of the preload phase for i-th SB) is input to the PE matrix just a clock cycle after the last pixel of the (i -1) -th SB one.
Obviously, according to (3) and (4) , the hardware structure sketched in Fig. 0rig#2 and the relevant FG of
Fig. 18,19 allow for the concurrent elaboration of the
N xN MBs from its corresponding, aforesaid, N/2xN/2 SBs.
Summarizing, the proposed architecture is characterized by a continuos input data flow with an overall throughput of l/Ta, where Ta is the time required to process candidate and reference pixels relevant to one N x N MB. Ta amounts for 4(2p + N 12 - Y)2 Tclock being (2/J + N/2 - 1)2 the number of pixels relevant to a search area for a NOT FURNISHED UPON FILING
NOT FURNISHED UPON FILING
NOT FURNISHED UPON FILING
NOT FURNISHED UPON FILING
NOT FURNISHED UPON FILING
NOT FURNISHED UPON FILING
NOT FURNISHED UPON FILING
NOT FURNISHED UPON FILING
NOT FURNISHED UPON FILING
NOT FURNISHED UPON FILING
NOT FURNISHED UPON FILING

Claims

1. Method of block-matching motion estimation with full search in a video sequence, characterised in that said block-ma tching with full search on a macro-block (MB) is carried out starting from the block-matching with full search relative to a plurality of sub-blocks (SB) of the former.
2. Method of motion estimation according to claim 1, wherein the steps are provided of : - in a video sequence, division of the current video frame (1) , that forms said succession, into a plurality of reference macro-block (MB) ,
- partition of each macro-block (MB) into a plurality of sub-blocks (SB) ; - for each macro-block, choosing a search window (3) in a video frame (2) computed previously with respect to the current frame (1) ;
- calculus of a Sum of Absolute Difference (SAD) between the pixel of a first sub-block (SB) of reference of the current frame (a) and all the sub-blocks (SB) of equal dimension (b) present in the search window (3);
- computing the SADmin between all the calculated SAD and calculus of the motion vector (MV) of the first sub-block (SB) on the basis of said SADmin; - repeating the calculus of the SADmin and of the motion vector (MV) for each further sub-block (SB) into which said macro-block (MB) is divided;
- computing the MV of the macro-block (MB) starting from the calculus carried out for the respective sub-blocks (SB) ;
- repeating the calculus of the MV for other macro-blocks and sub-blocks .
3. Method of motion estimation according to claims 1 or 2, wherein said macro-block has square dimension NxN and its sub-blocks (SB) are HxH and have square dimension N/HxN/H.
4. Method according to the previous claims, wherein the central position of said search window (3) corresponds to the MV of null coordinates.
5. Architecture for carrying out a block-matching with full search, wherein is the motion vector (MV) of a reference block (a) present in the current frame (1) is determined of a video sequence with respect to a block (b) present in an search window (3) of the frame computed previously (2) to the current frame (1), characterised in that it comprises :
- two respective loading lines of the data (9x, 9y) of the reference block (a) and of the candidate block (b) ; - a matrix (10a) of processor Element (11) for loading the data of said reference block (a) and comparing them with the data of said candidate block (b) ;
- a buffering resource (10b) for adapting the input seriale (9y) of the data with their processing parallel (9x, 9y) eseguito from the matrix (10a) of the processor Element (11) ;
- a accumulatore (20) of the partial sums elaborate from the matrix (10a) of the PE (11) ;
- a Motion Vector processor (30) for computing the Motion Vector (MV) of said reference block (a) with respect to said candidate blocks (b) .
6. Architecture according to claim 5, wherein said reference block (a) has dimension N/HxN/H (with H >1) and said Motion Vector processor (30) comprises two modules (60) of Minimum Distortion Detection with the resource of memorizzazione of which an allows of to calculate the Motion Vector (MV) of the blocks N/2xN/2 and the other, for every 4 blocks N/2xN/2, calcola also the MV of the block NxN fromit costituito.
7. Architecture according to claim 5 or 6 , wherein said search window has dimension p and said buffering resource (10b) comprises means for program dinamicamente the value of said dimension p within a range [l, pmax] .
8. Architecture according to claim 7, wherein said buffering resource (10b) is executed by means of a chain of Shift Register (14).
9. Architecture according to claim 8, wherein said means for dynamical programming of parameter p comprise (N/H-1) Multiplexer (18) and means for check of said Multiplexer suitable for modificare the length useful of the (N/H-1) chains of SR (14) of said buffering resource (10b) viste from said matrix (10a) of PE (11) .
10. Architecture according to claim 5 or 6 , wherein said buffering resource (10b) has a structure based on memories RAM suitably controlled.
11. the architecture, according to claim 7, wherein the dimension of said buffering resource (10b) is (N/H-l)(2p-
2).
12. Architecture according to claim 5 or 6 , wherein is provided an organizzazione pipeline of the data flux coming from said two lines of loading (9x, 9y) .
13. Architecture according to claim 5, wherein said matrix (10a) of the PE (11) implements a function of cost chosen between SAD, MAD, MSE for algorithms of Block Matching.
14. Architecture according to claim 6, wherein said modules of Minimum Distortion Detection (60) calculate, to parita of function of cost minimum, the MV to norma minima .
15. Architecture according to claim 6, wherein the flux inner of the data is costituito, for every block N/HxN/H, from the alternarsi of a step of Preload of the duration of (N/H)2+(N/H-l)(2p-2) clock cycles and of a step NOT FURNISHED UPON FILING
PCT/EP2000/003546 1999-04-19 2000-04-19 Method of block-matching motion estimation with full search in a video sequence and corresponding architecture WO2001008402A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU15136/01A AU1513601A (en) 1999-04-19 2000-04-19 Method of block-matching motion estimation with full search in a video sequence and low complexity/high throughput architecture

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
ITPI990025 IT1309846B1 (en) 1999-04-19 1999-04-19 MOTORCYCLE ESTIMATION METHOD IN A VIDEO SEQUENCE BY DIBLOCK-MATCHING TECHNIQUE WITH FULL SEARCH AND LOW PROGRAMMABLE ARCHITECTURE
ITPI99A000025 1999-04-19

Publications (2)

Publication Number Publication Date
WO2001008402A2 true WO2001008402A2 (en) 2001-02-01
WO2001008402A3 WO2001008402A3 (en) 2003-04-17

Family

ID=11394388

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2000/003546 WO2001008402A2 (en) 1999-04-19 2000-04-19 Method of block-matching motion estimation with full search in a video sequence and corresponding architecture

Country Status (3)

Country Link
AU (1) AU1513601A (en)
IT (1) IT1309846B1 (en)
WO (1) WO2001008402A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1315381A1 (en) * 2001-09-06 2003-05-28 Nokia Corporation A method for performing motion estimation in video encoding, a video encoding system and a video encoding device
GB2400260B (en) * 2003-03-31 2006-08-23 Duma Video Inc Video compression method and apparatus

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074533B (en) * 2023-04-06 2023-08-22 湖南国科微电子股份有限公司 Motion vector prediction method, system, electronic device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0639032A2 (en) * 1993-08-09 1995-02-15 C-Cube Microsystems, Inc. Structure and method for a multistandard video encoder/decoder
US5477278A (en) * 1991-12-24 1995-12-19 Sharp Kabushiki Kaisha Apparatus for detecting motion of moving picture

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5477278A (en) * 1991-12-24 1995-12-19 Sharp Kabushiki Kaisha Apparatus for detecting motion of moving picture
EP0639032A2 (en) * 1993-08-09 1995-02-15 C-Cube Microsystems, Inc. Structure and method for a multistandard video encoder/decoder

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FANUCCI L ET AL: "High-throughput, low complexity, parametrizable VLSI architecture for full search block matching algorithm for advanced multimedia applications" ICECS'99. PROCEEDINGS OF ICECS '99. 6TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (CAT. NO.99EX357), ICECS'99. PROCEEDINGS OF ICECS'99. 6TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, PAFOS, CYPRUS, 5-, pages 1479-1482 vol.3, XP002164569 1999, Piscataway, NJ, USA, IEEE, USA ISBN: 0-7803-5682-9 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1315381A1 (en) * 2001-09-06 2003-05-28 Nokia Corporation A method for performing motion estimation in video encoding, a video encoding system and a video encoding device
US7031389B2 (en) 2001-09-06 2006-04-18 Nokia Corporation Method for performing motion estimation in video encoding, a video encoding system and a video encoding device
US7486733B2 (en) 2001-09-06 2009-02-03 Nokia Corporation Method for performing motion estimation in video encoding, a video encoding system and a video encoding device
GB2400260B (en) * 2003-03-31 2006-08-23 Duma Video Inc Video compression method and apparatus
US7519115B2 (en) 2003-03-31 2009-04-14 Duma Video, Inc. Video compression method and apparatus

Also Published As

Publication number Publication date
AU1513601A (en) 2001-02-13
IT1309846B1 (en) 2002-02-05
ITPI990025A1 (en) 2000-10-19
WO2001008402A3 (en) 2003-04-17

Similar Documents

Publication Publication Date Title
De Vos et al. Parameterizable VLSI architectures for the full-search block-matching algorithm
US5719642A (en) Full-search block matching motion estimation processor
EP1120747A2 (en) Motion estimator
US6687303B1 (en) Motion vector detecting device
KR101578052B1 (en) Motion estimation device and Moving image encoding device having the same
US20040151392A1 (en) Image encoding of moving pictures
KR20030007087A (en) Motion estimation apparatus and method for scanning a reference macroblock window in a search area
JPH04294469A (en) Correlative device
US20090016634A1 (en) Half pixel interpolator for video motion estimation accelerator
KR100270799B1 (en) Dct/idct processor
KR100416444B1 (en) Motion vector selection method and image processing device performing this method
US5636152A (en) Two-dimensional inverse discrete cosine transform processor
US20020080880A1 (en) Effective motion estimation for hierarchical search
Baglietto et al. Parallel implementation of the full search block matching algorithm for motion estimation
US5870500A (en) Method for processing data in matrix arrays in a motion estimation system
CN101778280B (en) Circuit and method based on AVS motion compensation interpolation
WO1996004733A1 (en) System and method for inverse discrete cosine transform implementation
WO2001008402A2 (en) Method of block-matching motion estimation with full search in a video sequence and corresponding architecture
Fanucci et al. A parametric VLSI architecture for video motion estimation
Baek et al. A fast array architecture for block matching algorithm
KR100437177B1 (en) Moving amount estimating device
US20050089099A1 (en) Fast motion estimating apparatus
Goel et al. An efficient data reuse motion estimation engine
US6668087B1 (en) Filter arithmetic device
Hsia et al. Very large scale integration (VLSI) implementation of low-complexity variable block size motion estimation for H. 264/AVC coding

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP