CN1396762A

CN1396762A - Moving estimating device and method for reference macro block window in scanning search area

Info

Publication number: CN1396762A
Application number: CN02122743A
Authority: CN
Inventors: 赵真显; 卢亨来; 李润泰; 全炳宇
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2001-07-09
Filing date: 2002-06-10
Publication date: 2003-02-12
Anticipated expiration: 2022-06-10
Also published as: CN1297134C; KR20030007087A; JP2003125415A; GB2378345A; GB0213247D0; KR100486249B1; GB2378345B

Abstract

A motion estimation technique compares a current macroblock with different reference macroblocks in a reference frame search area. A motion vector for the current macroblock is derived from the reference macroblock most closely matching the current macroblock. To reduce the number of instructions required to load new reference macroblocks, overlapping portions between reference macroblocks are reused and only nonoverlapping portions are loaded into a memory storage device.

Description

The movement estimation apparatus and the method that are used for reference macro block window in scanning search area

Technical field

The present invention relates to a kind of movement estimation apparatus and method.More particularly, the present invention relates to be used for the movement estimation apparatus and the method for reference macro block window in scanning search area.

Background technology

The application number that this application is enjoyed application on July 9 calendar year 2001 is the priority of the korean patent application of 2001-40904, and complete here its content of introducing for your guidance.

Video encoder generate to be observed video compression international standard, such as H.261, H.263, the bit stream of MPEG-1, MPEG-2, MPEG-4, MPEG-7 and MPEG-21.These standards are widely used in fields such as storage, the images serve based on the Internet, amusement, digital broadcasting, portable video terminal.

Video compression standard uses present frame is divided into a plurality of macro blocks (macroblock) motion estimation techniques (MB).Calculate current MB and be present in the reference frame search district other with reference to the difference between the MB.The reference MB the most similar to current MB is considered to " match block " and it chosen in the field of search.The motion vector of the current MB of phase difference between current MB of indication and the match block is encoded.Phase difference is meant the alternate position spike between current MB and the match block.Owing to only transmit the motion vector of current MB, thus need to transmit or storage be than a small amount of data.

The relation of the current MB and the field of search as shown in Figure 1.According to 1/4th public intermediate forms (Quarter Common Intermediate Format, QCIF), a frame comprises 176 * 144 pixels, present frame 2 comprises 99 current MB, and each current MB 10 comprises 16 * 16 pixels.In reference frame 4, calculate the motion vector of current MB 10.The field of search 12 comprises 48 * 48 pixels in the reference frame 4.

In the field of search 12, will be the most similar to current MB 10 16 * 16 be identified as match block with reference to MB.Can calculate current MB and poor with reference between the MB with various method.For example by using absolute difference average (MAD), absolute mean variance (MAE) or absolute difference summation (SAD).SAD is the most general because it only needs subtraction and add operation.

Fig. 2 has illustrated a kind of search substantially fully, wherein pixel 10_1 and 14_1 is loaded into 32 bit registers 15 and 17 respectively.Use ALU (ALU) 30 to calculate SAD then.With ALU 30 more current MB 10 with before, they are stored in the memory also are loaded in 32 bit registers 15 and 17 to a pixel of a pixel with reference to MB 14a.Compare with will being present in reference MB14a, 14b in the field of search 12,14c etc. and a pixel of 10 1 pixels of current MB.

The method of estimation of this simple ideas provides pinpoint accuracy.But, because too many calculating is arranged so limited transfer rate.This method also is not suitable for by the limited multiduty CPU (CPU) of some disposal abilities, the real-time decoding that carries out such as some CPU that use in handheld personal computer (PC).

Can use a kind of fast search algorithm (not shown) to calculate SAD by a limited number of in the more current MB and the field of search with reference to MB.Compare this kind fast search algorithm with above-mentioned complete searching method and can reduce calculation times significantly.Yet this fast search algorithm has reduced image quality.

Use all direction search method development and the method for a kind of quick calculating SAD.(Single Instruction Multiple Data, SIMD) method can be calculated the SAD of a plurality of pixels simultaneously to use single-instruction multiple-data.The operation times of this minimizing has improved transfer rate.

Fig. 3 has illustrated and has used the SAD of SIMD parts to calculate.Current MB 10 and be loaded into 64 bit registers 16 and 18 respectively with reference to MB 14a eight pixel 10_8 and 14_8 separately.SIMD machine 20 calculates the SAD that is loaded into eight pixels in 64 bit registers 16 and 18 simultaneously.The SAD that wherein calculates each pixel separately with typical searching algorithm fully is different, uses the SIMD technology can realize a plurality of pixels parallel computation simultaneously SAD.

Amount of calculation changes along with the direction of next MB displacement in the field of search 12.As shown in Figure 3, whenever selecting next MB to carry out horizontal shift, must from memory, visit current MB 10 and be loaded into register 16 and 18 with reference to eight pixels among the MB 14 and with it.This a large amount of memory access has increased derives the required time quantum of motion vector and has increased energy consumption.

Because a large amount of memory accesses and the macro-energy consumption of following, these conventional motion estimation methods are not suitable for mobile environment.The problem that the present invention is devoted to address this problem and other is relevant with prior art.

Summary of the invention

Motion estimation techniques compares current macro with different reference macroblocks in the reference frame search district.From the reference macroblock that mates the most with current macro, derive the motion vector of current macro.Load for reducing the required number of instructions of new reference macroblock, overlapping part between the multiplexing reference macroblock, and only will not have overlapping part cargo shipment in memory unit.

Description of drawings

By aforesaid and other purpose, function and advantage will become more clear to detailed description the present invention of the preferred embodiment of the present invention with reference to the accompanying drawings.

Fig. 1 illustrates how to derive the prior art of motion vector schematic diagram.

Fig. 2 illustrates the method that uses all direction search method and use absolute difference summation (SAD), the prior art schematic diagram of carrying out the conventional method of motion-vector search.

Fig. 3 is that explanation uses single-instruction multiple-data (SIMD) method to carry out the prior art schematic diagram of the conventional method of motion-vector search.

Fig. 4 is a block diagram of carrying out the system of estimation according to the present invention.

Fig. 5 is the schematic diagram that extracts filter.

Fig. 6 is the current macro after explanation is extracted and the schematic diagram of the corresponding field of search.

Fig. 7 is how explanation uses two groups of registers according to the present invention a schematic diagram.

How Fig. 8 illustrates according to the present invention at field of search displacement reference macroblock.

Fig. 9 is how explanation discerns motion vector according to the present invention a flow chart.

Figure 10 A-10D is the chart of the instruction count of comparison different motion estimation technique.

Figure 11 A-11D explanation conventional motion estimation method and according to other difference between the method for estimating of the present invention.

Figure 12 relatively also illustrates in the difference aspect the memory access according to vertical scanning technique of the present invention and other scanning technique.

Figure 13 is the part of difference computing unit 110 in the key diagram 4 conceptually.

Embodiment

The present invention provides the efficient estimation that reduces memory accesses by multiplexing public register when the scan reference MB of the field of search.。

Fig. 4 is the block diagram of preferred embodiment of the present invention movement estimation system.Movement estimation system comprises present frame (C/F) 100, first registers group 102, difference computing unit 110, the field of search (S/A) 104, second registers group 106 and controller 108.First and second registers group 102 and 106 are respectively a macro block (MB) of present frame 100 and a macro block storage pixel of the field of search 104.In an example, the size of a MB is 16 * 16 pixels.First and second registers group 102 and 106 each can store the array of one 16 * 16 pixel.Controller 108 can be with software or hardware construction.

Fig. 5 has illustrated to use and has extracted the pre-treatment step that filter is implemented at 4: 1.Go up use n at present frame 100 (Fig. 4): 1 extracts filter to reduce required hardware resource.Present frame is represented by incoming frame 130 in Fig. 5.Frame 130 is divided into four kinds by four 4: 1 extraction filter 126a, 126b, 126c and 126d and extracts frame a, b, c and d, and is stored in the frame memory 128.Convert digital signal from the vision signal of electric charge coupling image-capture portion part (CCD) 120 outputs to by analog-digital converter (ACD) 122.From the signal of ADC 122 outputs are rgb signals.Preprocessor 124 converts rgb signal to the YCbCr signal.In one embodiment, extracting 126 of filters extracts Y-signal.

Extract a kind pixel that filter 126a is used for incoming frame 130, extract filter 126b and be used for b kind pixel, extract filter 126c and be used for c kind pixel, and extraction filter 126d is used for d kind pixel.After the extraction, extract back frame a, b, c and d and be stored in the frame memory 128.

As the result who incoming frame 130 is carried out extract at 4: 1, the size of a MB is reduced to 8 * 8 pixels.Extraction with present frame 130 same ratios is carried out in the field of search 104.For example, the size that the field of search of 48 * 48 pixels is carried out extracting the field of search at 4: 1 is reduced to 24 * 24 pixels.Fig. 6 has illustrated through extracting a back current MB 140 and the corresponding field of search 150 thereof at 4: 1.

For explain just, present frame is described as through one of frame a, b, c and d after four kinds of extractions extracting filter among Fig. 5 at 4: 1.The size of each MB in the present frame 100 is 8 * 8 pixels and is 24 * 24 pixels through the size that extracts the field of search 104 of filter at 4: 1.

A current MB of first registers group 102 (Fig. 4) storage present frame 100, and of second registers group, 106 memory search districts 104 is with reference to MB.First and second registers group 102 and 106 predesigned order storage pixels with the numeral that is described as among Fig. 7 drawing a circle.The calculating order of each is all determined in first and second registers group 102 and 106 concerning each group of eight pixels.

Fig. 7 has illustrated the structure and the loading sequence of first and second registers group 102 among Fig. 4 and 106.The first registers group 102 current MB of storage also comprise that each all stores the register of eight pixels.Specify these registers with predetermined order from 0 to 7.Second registers group 106 comprises register and order from 8 to 15 these registers of appointment to be scheduled to that each all stores eight pixels.For calculating is stored in the current MB in first registers group 102 and is stored in poor between the reference MB in the second register device group 106, calculate the SAD and the motion vector MV of current reference block with following formula.

SAD (dx, dy) = Σ_{m = x}^{x + N - 1} Σ_{n = y}^{y + n - 1} | I_{k} (m, n) - I_{k - 1} (m + dx, n + dy) |

(MVx, MVy) = \min_{(dx, dy) &Element; R^{2}} SAD (dx, dy)

Wherein, (m is that k frame is at (M, the pixel value of N) locating n) to k.Motion vector (MVx, MVy) displacement of expression current block match block in the reference frame.

Difference computing unit 110 (Fig. 4) calculates the poor of eight pixels simultaneously with the single-instruction multiple-data among Fig. 3 (SIMD) method.

Figure 13 has conceptually illustrated the difference computing unit 110 among Fig. 4.Absolute difference between each pixel of each register 144 of each pixel of each register 142 of first registers group 102 and second registers group 106 is stored in the register 132.For example, the absolute difference between 142a and the 144a is stored among the 132a, and the absolute difference between 142b and the 144b is stored among the 132b.For calculating the absolute difference between 142 and 144, carry out an inner summarizing instruction and will be stored in each difference addition in the register 132 in the dot-dash wire frame of Figure 13.

Shown in the dot-dash wire frame of Figure 13, a plurality of adders are only used in the execution of an inner summarizing instruction.For with each value addition, carry out summation with an addition instruction and shift instruction in conventional method, therefore comparing with this method needs extra circulation.So, for intactly calculating the extraction current MB in back and extracting the back and need carry out eight inner summarizing instructions with reference to the match block between the MB.

In case calculated current MB 10 and with reference to the SAD of MB 14 all pixels, by with the SAD addition calculation of each pixel inside summation with reference to MB 14a.After having calculated all inside summations with reference to MB of the field of search 12, the reference MB that will have minimum inner summation is identified as match block, and result of calculation is exported as the MB among Fig. 4 poor (E_MB).How controller 108 controls among Fig. 4 use the SIMD scan method to be shifted with reference to the MB window to reduce memory accesses in the field of search 104.

Figure 12 illustrates in greater detail traditional scan method and according to some difference of scan method of the present invention.For search fully, according to traditional scan method, current reference block obtains next reference block in level or pixel of vertical direction displacement, respectively shown in Figure 12 _ 1 and 12_2.In these cases, the pixel overlaid that uses in most of pixel in the reference block of current comparison and the next reference block that will compare.

For the horizontal sweep shown in Figure 12 _ 1, have only the rightmost zone of next registers group 106 ' _ 2 to comprise new pixel for the pixel in registers group 106 ' _ 1.Same,, have only next registers group 106 for the vertical scanning shown in Figure 12 _ 2 " _ 2 zone bottom comprises with respect to current registers group 106 " _ new pixel for 1.Even if having only fringe region to comprise new pixel, to whole reference macroblock 106 execute stores visit.

The vertical scanning that is used for the SIMD scheme according to the present invention is in Figure 12 _ 3 explanations.In Fig. 4, have only new pixel 106 _ 2 to be loaded into second registers group 106 from main storage.As shown in Figure 7, the second registers group 160b is multiplexing is stored in the overlaid pixel in 9 to 15 register area among the first registers group 160a.New one-row pixels value only is loaded into first register area 8 of the second registers group 160b.First register area 8 moves down into last position in the second registers group 160b.Other register area 9 to 15 of storage and the overlapping pixel column of next reference block moves 1 in sequence.For example, register area 9 has moved on to first position, and register 10 has moved on to second position, and register 11 has moved on to the 3rd position etc.

This with reference to MB displacement only need the primary memory visit with in the field of search 104, read when the each vertical movement delegation new do not have an overlapping pixel (Fig. 4).Owing to do not need from memory, to read next whole 8 * 8 array of pixels, thereby reduced the memory accesses that is used for scanning search area 104 with reference to MB.

Fig. 8 has illustrated the displacement in the field of search 104 with reference to MB.Among Fig. 4 under the control of controller 108 vertical scan reference MB window.With reference to pixel column of the each vertical movement of MB window.Though the vertical window displacement has been described here, same technology can be used for the horizontal window displacement.When pixel can the usage level displacement when current and vertical row reference frame is stored in continuous position in the memory.

As mentioned above, can store the register of MB data and when vertical movement is with reference to the MB window in the field of search when using, multiplexing current with reference to MB and next with reference to the overlaid pixel between the MB.This has reduced the required memory accesses of controller 108 scanning search areas.Current MB is stored in first registers group, and currently is stored in second registers group with reference to MB.

Fig. 9 is the flow chart that illustrates in greater detail SIMD sweeping scheme of the present invention.In step 170, with n: 1 ratio extracts present frame and reference frame.For explain just, n=4 in the present embodiment.Parameter H S indicates in the field of search first with reference to the position of last row of MB, and V parameter S indicates in the field of search first with reference to the position of last column of MB, and parameter DCM indicates frame after four extractions.

Here, first is the uppermost MB in the left side in the field of search with reference to MB, and first first Parameter H S and second V parameter S with reference to MB is zero.In step 172, Parameter H S, VS, DCM are initialized as zero, and minimum difference E MIN is initialized as big as far as possible value, for example, and infinity.

Identification number 0,1,2 and 3 is distributed to frame after four extractions respectively.In step 174 parameter DCM is compared to determine whether last estimation that extracts frame is finished with value 4.If last extracts the estimation of back frame and does not finish, current MB is loaded into first registers group 140 (referring to Fig. 7) in step 176.

Determine that in step 178 whether the HS parameter is less than 17.When the HS parameter is not less than 17, finish the estimation of last row (HS16) in the field of search.HS is reset to zero and in piece 198, DCM is added 1 to next DCM frame in step 192.Handle and get back to step 174 then.

If the estimation to HS16 is not finished, determine that in step 180 whether the VS parameter is less than 17.If VS is less than 17, in step 182 and 184 execution pipeline operation sequences.Have only last column VS1 to be loaded in step 1 82 with reference to MB (referring to Fig. 8).If the estimation of delegation is not finished to the end, that is,, will be loaded into the second registers group 160a with reference to MB in step 182 if be not displaced to last column VS16 with reference to the MB window.Calculate current MB and poor with reference between the MB in step 184.

In this case, the newline VS1 on the vertical direction stores in the register region sequence of first register position.For example, the next one is loaded into the register 8 ($register 8 of the second registers group 160b with reference to the next new overlapping pixel column that do not have of MB) in.Other register district, that is, and register 9 ($register 9) to register 15 ($register 15), in sequence, upwards move 1.That is the multiplexing register district register 9 ($register 9 that are stored in of the second registers group 160b among Fig. 7) to register 15 ($register 15) in pixel.Therefore, only from memory the pixel of visit newline VS1 (Fig. 8) and it is stored in the register district register 8 ($register 8 of the second registers group 160b) in.

In step 184, calculate and to be loaded among Fig. 7 poor between the MB in first and second registers group 140 and 160.In step 186 MB difference E_MB is compared with minimum difference E_MIN.If MB difference E_MB less than minimum difference E_MIN, is set to MB difference E_MB at step 188 minimum difference E_MIN.If MB difference E_MB is not less than minimum difference E_MIN, keeps current minimum difference E_MIN, and V parameter S is added 1 in step 190.Repeating step 180 to 190 arrives last column VS16 (Fig. 8) up to the vertical scanning to reference MB then.

If determined to be not less than 17 in step 180, V parameter S be initialized as zero in step 200 as second the V parameter S of result that scans last column VS16.In step 202 Parameter H S is added 1, and handle and get back to step 178.In other words, with reference to the right shift of MB window a location of pixels.Repeating step 180 to 190 then.

After reference MB window is displaced to last row HS16 in the horizontal direction, that is,, first Parameter H S is reinitialized to zero in step 192 if determined that in step 178 Parameter H S is not less than 17.In step 198 the DCM parameter is added 1 and handle and to get back to step 174.Increase the DCM parameter and mean that having carried out another extracts the estimation of back frame.

The estimation of frame is all finished after all extract, that is, if determined that in step 174 the DCM parameter is not less than 4, the reference MB that will have minimum difference in step 204 is identified as match block.By all MB of present frame being repeated the estimation of the above-mentioned present frame of finishing dealing with.

As mentioned above, the current MB of first and second register set stores and with reference to MB.In the field of search vertical movement with reference to the MB window to carry out estimation.Multiplexing current with reference to MB and next with reference to the overlaid pixel between the MB.As a result, when being loaded into the next one in second registers group with reference to MB, need less instruction (load/store).Can obtain fast motion estimation with less energy consumption like this.

Figure 10 a to 10d has illustrated the advantage of the present invention with respect to the conventional motion estimation method.Figure 10 a has indicated the instruction count of the conventional motion estimation method of not carrying out extraction (that is complete searching algorithm).Following situation has obtained determining: 26.2% of total instruction count as memory reference instruction in the conventional method of Figure 10 a, and the remainder of instruction count 73.8% is used for non-memory access.Figure 10 a is corresponding with Fig. 2, and wherein horizontal shift is carried out estimation with reference to MB and with SAD for each pixel in the field of search.Figure 10 b has illustrated total instruction count of carrying out the conventional motion estimation method that extracts.Figure 10 c has illustrated total instruction count of using the conventional motion estimation method of extraction and SIMD.

Figure 10 d has illustrated total instruction count of using method for estimating of the present invention.For three kinds of situations that Figure 10 b illustrates in the 10d, percentage 27.0%, 1.6% and 0.9% is respectively the relative ratios of the memory reference instruction counting of comparing with the conventional motion estimation method of Figure 10 a.The method of visiting non-overlapped part with orthogonal scanning is the high efficiency technical that reduces memory access count obviously.

Two frames that Figure 11 explanation has 1/4th public intermediate forms (QCIF) of requirement extract 99 total clock cycle numbers that minimum SAD is required.In Figure 11,11a is corresponding to Figure 10 a, and 11b is corresponding to Figure 10 b, and 11c is corresponding to Figure 10 c, and 11d is corresponding to Figure 10 d.Visit non-overlapped part with the orthogonal scanning scheme and improved twice with respect to the conventional motion estimation method performance of using common SIMD.

Above-mentioned scanning technique can be used single-instruction multiple-data (SIMD) parts that are used for comparison current macro and reference macroblock or realize than CLIW (VLIW) parts.The scheme that is used to mate macro block can comprise absolute difference average (MAD), absolute mean variance (MAE) or absolute difference summation (SAD) scheme.Be used to select the method for next reference macroblock can comprise fast algorithm or complete searching algorithm.Certainly, also can use other single instrction/multidata parts, matching scheme and searching algorithm.

This invention can be by moving one from the computer available media in general purpose digital computer, (for example include but not limited to as magnetic storage media, ROM, floppy disk, hard disk etc.), the light readable medium (for example, CD-ROM, DVD etc.) storage medium and the program of carrier wave (for example, on the internet transmit) implement.The computer available media can storage and execution in the Distributed Computer System that is connected by network.

Said system can be used dedicated processor systems, microcontroller, programmable logic units or carry out some or all of microprocessor operating.Above-mentioned certain operations can realize with hardware with software realization and other operation.

For simplicity, operation is described as the functional block that connects in various or different software modules.But this is unnecessary and may has following situation that promptly, these functional blocks or module accumulate in an independent logical block of equal valuely, do not have obvious limit in program or the operation.In any case, functional block and software module or flexibly the various functional characters of interface can realize separately, perhaps unite realization with hardware mode or software mode and other operation.

After describing with the preferred embodiments of the present invention and its principle has been described, should know and aspect layout and details, to revise the present invention and not departing from these principles.Claimed is the interior all modifications and the variation of spirit and scope of claim below.

Claims

1. image processing apparatus comprises:

First memory cell is used to store current macro;

Second memory cell is used to store first reference macroblock;

Computing unit is used to calculate the poor of content that first memory cell and second memory cell store; And

Controller is used for replacing with the non-overlapped part of second reference macroblock and second reference macroblock being loaded into second memory cell by the non-overlapped part with first reference macroblock.

2. image processing apparatus as claimed in claim 1, wherein the result of computing unit is used for determining motion vector.

3. image processing circuit as claimed in claim 1, wherein computing unit comprises single-instruction multiple-data (SIMD) parts.

4. image processing apparatus as claimed in claim 1 is wherein by being calculated poor between first memory cell and second memory cell, multiplexing first reference macroblock and the overlapping part of second reference macroblock in second memory cell by computing unit.

5. image processing apparatus as claimed in claim 1, wherein to comprise each all be a plurality of registers that current macro is stored one group of pixel value to first memory cell, and second memory cell is included as a plurality of registers that first reference macroblock is stored one group of pixel value.

6. image processing apparatus as claimed in claim 5, wherein computing unit one group of pixel value that will be stored in simultaneously in each register of first memory cell compares with one group of pixel value in each register that is stored in second memory cell.

7. image processing apparatus as claimed in claim 5, wherein each of a plurality of registers is all stored the delegation or row of current macro in first memory cell, and each of a plurality of registers is stored the delegation or row of first reference macroblock in second memory cell.

8. image processing apparatus as claimed in claim 1, wherein the non-overlapped part with second reference macroblock is downloaded to second memory cell from memory device.

9. image processing apparatus as claimed in claim 1, its middle controller by will be in second memory cell first register position of the non-overlapped part of storage move to last register position and other register of the storage first reference macroblock lap in second memory cell moved in order, thereby second reference macroblock is loaded into second memory cell.

10. image processing apparatus as claimed in claim 1 comprises preprocessor, present frame is extracted for a plurality of extraction back present frames and with the reference frame extraction be a plurality of extractions back reference frame.

11. image processing apparatus as claimed in claim 1, its middle controller and computing unit are realized with software or example, in hardware.

12. image processing apparatus as claimed in claim 5, wherein computing unit comprises:

The 3rd memory cell is used to store the absolute difference between each pixel of each register of each pixel of each register of first memory cell and second memory cell; And

Summing circuit is used for deriving the summation of the absolute difference that is stored in the 3rd memory cell.

13. image processing apparatus as claimed in claim 12, wherein summing circuit only comprises a plurality of adders.

14. image processing apparatus as claimed in claim 12, wherein single inner summarizing instruction causes summing circuit to generate the summation that all are stored in the absolute difference in the 3rd memory cell.

15. a method for estimating comprises:

Load current macro;

Load current reference macroblock;

Compare current macro and current reference macroblock; And

The non-overlapped part that replaces to next reference macroblock by the non-overlapped part with laden current reference macroblock is loaded next reference macroblock.

16. method as claimed in claim 15 comprises: the lap of multiplexing current reference macroblock, with more next reference macroblock and current macro.

17. method as claimed in claim 15 comprises:

Under an instruction, one group of non-overlapped pixel is loaded into the current register that comprises the sign of the non-overlapped partial pixel of current reference macroblock from next reference macroblock; And

The pixel overlapping in multiplexing other register with next reference macroblock.

18. method as claimed in claim 17 comprises the register that memory loads identified from storage of reference frames.

19. method as claimed in claim 17 comprises that the order of marker register with the non-overlapped part of the next reference macroblock of storage moves to last register position, and moves on the order with other register.

20. method as claimed in claim 15 comprises that the every group of pixel value that will load the back current macro compares simultaneously with the every group of pixel value that loads the current reference macroblock in back.

21. method as claimed in claim 20, wherein every group of pixel value comprises the delegation of current macro or the delegation or row of row or current reference macroblock.

22. method as claimed in claim 15 comprises and uses single-instruction multiple-data (SIMD) parts or than CLIW (VLIW) parts, be used for comparison current macro and current reference macroblock.

23. method as claimed in claim 15 comprises and uses coupling macro block scheme relatively current macro and current reference macroblock.

24. method as claimed in claim 23, wherein mating the macro block scheme is absolute difference average (MAD), absolute mean variance (MAE) or absolute difference summation (SAD).

25. method as claimed in claim 15 comprises and uses fast algorithm or complete searching algorithm to select next reference macroblock.

26. method as claimed in claim 15 comprises:

It is a plurality of extractions back present frame that present frame is extracted;

It is a plurality of extractions back reference frame that reference frame is extracted;

Select current macro from extracting the back present frame;

The selected current macro of displacement in the field of search of reference frame after extraction is with identification and current grand the most similar fast reference macroblock; And

Derive the motion vector of the reference macroblock that is identified.

27. method as claimed in claim 20 comprises:

Store the absolute difference between every group of pixel value of every group of pixel value of laden current macro and laden current reference macroblock; And

Derive the summation of this absolute difference.

28. method as claimed in claim 27 comprises and only uses adder to derive the summation of absolute difference.

29. method as claimed in claim 28 comprises the summation of using an independent inner summarizing instruction to generate all absolute differences.