CN100502511C

CN100502511C - Method for organizing interpolation image memory for fractional pixel precision predication

Info

Publication number: CN100502511C
Application number: CN 200410076759
Authority: CN
Inventors: 罗忠; 王静; 宋彬; 常义林
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2004-09-14
Filing date: 2004-09-14
Publication date: 2009-06-17
Anticipated expiration: 2024-09-14
Also published as: CN1750659A

Abstract

This invention provides a memory organization method fro high effective image storage according to pixel property, which contains in pixel accuracy movement prediction, dividing pixel into integer position subset, 1/2 position subset, 1/4 position subset, --- to 1/2 to the power n position subset, finally divide into 2 to the power n sub images having same size with raw image, forming sub images into a continues memory area and stored in memory. Said invention also provides a method of adopting single instruction multiple data (SIMD) speeding technology in data signal processor (DSP) for filtering interpolation to generate said interpolation image and method (SAD, sum-of-absolute-differences) for quick calculating cost function in motion estimation.

Description

The interpolation image memory organization method that is used for the fraction pixel precision motion prediction

Technical field

The present invention relates to the fraction pixel precision motion prediction algorithm in the video compression coding, more particularly, relate to the interpolation image memory organization method that is used for the fraction pixel precision motion prediction in a kind of video compression coding, the method for each fraction pixel that generates 4 times of interpolation images based on this interpolation image memory organization method and the method for calculating predicated error index S AD based on these two kinds of methods fast.

Background technology

Occupy in the video compression coding standard of main flow in the industry cycle using at present, though be international H.263, H.263+, H.264 and MPEG-4, or domestic AVS (Advanced Audio-VideoSystem, the advanced audio frequency and video coded system of China) all based on a common framework, that is: image block+motion prediction+residual image dct transform (also claiming integer transform, Hadarmard conversion)+quantification+entropy coding.Wherein, utilize the correlation between the frame of moving image front and back, predict through the corresponding region in the frame behind the post exercise by the zone in the preceding frame, thereby the acquisition residual image also quantizes and entropy coding, the statistic correlation that has made full use of between the moving image frame is eliminated redundancy, reaches the purpose of data compression.Therefore, motion prediction is the core of this class based on the video compression coding standard of common frame, is the topmost factor that influences overall compression efficiency.

The general process of motion prediction is such: for certain given area (as MB) in the current video frame, search is based on the matching area of certain error criterion optimum in reference frame (former frame, the perhaps preceding k frame under the multi-reference frame situation).Predicted zone can represent that this vector is called motion vector or displacement vector with a two-dimensional vector with respect to the variation of reference zone geometric position.This process based on certain certain errors criterion search optimal motion vector is called estimation, is the part of whole motion prediction.The efficient of motion prediction depends on the residual image between predicted zone and the estimation range, and residual error is more little, and efficient is high more.Further, in fact the efficient of motion prediction be decided by the precision of estimation, and precision of prediction directly depends on motion vector.

Image in the digital video all is that analog video carries out discrete sampling and digitized result on time and space, and sampling in time forms each discrete frame, and sampling spatially forms each pixel in the frame.In a frame, pixel is to obtain according to certain sampling interval sampling for the continuous analog image in space.Therefore the distance between two adjacent pixels is exactly the sampling interval.In order more accurately to represent motion vector, need to introduce the notion of fractional sampling position (Fractional Sample Position) motion prediction.Adopt the fractional sampling position, it is contemplated that the fractional position pixel is arranged between two adjacent integer pixels, such as 1/2 pixel (is 1/2 sampling interval apart from the integer pixel distance), 1/4 pixel (is 1/4 sampling interval apart from the integer pixel distance), 1/8 pixel (is 1/8 sampling interval apart from the integer pixel distance) etc.Fact proved, after having adopted fractional sampling position motion prediction, video compression coding efficient can improve a lot, after adopting 1/4 pixel precision motion prediction, the PSNR (Peak Signal to Noise Ratio, Y-PSNR) of general compressed video can improve 2dB.At present, what H.263, H.263+ adopt is 1/2 pixel precision motion prediction, and what H.264 adopt is 1/4 pixel precision motion prediction, and what domestic AVS adopted is 1/4 pixel precision motion prediction.

The general process of fractional sampling position motion prediction is: at first adopt certain integer pixel motion estimation algorithm, such as full search, 3 footworks, new 3 footworks, 4 footworks etc., obtain optimum integer pixel motion vector; And then carry out 1/2 pixel motion around in this integer pixel motion vector position and estimate, find 1/2 optimum pixel motion vector position; Carrying out 1/4 pixel motion if desired and estimate, is the center with this optimum 1/2 location of pixels then, carries out 1/4 pixel motion around and estimates; Equally, after obtaining optimum 1/4 pixel motion vector position, can carry out 1/8 pixel motion and estimate.

1/4 pixel precision motion prediction standard implementation method with regulation in H.264/AVC is an example, as shown in Figure 1, hypographous circle is that current macro position, unblanketed circle are that integer pixel positions, triangle are that 1/2 location of pixels, pore are 1/4 location of pixels among the figure, the path direction of searching for when the arrow among the figure is represented to carry out motion search.The motion vector vector ratio that to be current macro form with respect to the difference of its reference macroblock position on x, y both direction is as [2 ,-3] etc.; In standard H.264/AVC, the search procedure of 1/4 pixel precision motion prediction can be divided into three steps:

1) adopts certain method for estimating, find out the integer pixel best match position;

2) from integer pixel best match position and 8 1/2 location of pixels on every side thereof, find out 1/2 pixel best match position:

3) from 1/2 pixel best match position and 8 1/4 location of pixels on every side thereof, find out 1/4 pixel best match position.

In the said process, the match search of integer pixel positions is that local decode reconstructed image with a certain frame in front is a reference picture.The match search of 1/2 location of pixels and 1/4 location of pixels will be reference picture with the image after the local decode reconstructed image interpolation then, the wide and height of this reference picture all 4 times to original image.

The structure of 4 times of reference pictures as shown in Figure 2.Pixel wherein is divided into following a few class:

1) integer pixel: the row, column coordinate all is those pixels of sampling interval integral multiple, as the hypographous circle pixel A among Fig. 2, B, C, D, E, F, G, H, I, J, K, L or the like.

2) 1/2 pixel: promptly have at least one to have (k+1/2) d or (k-1/2) form of d in the row, column coordinate, but the row, column coordinate does not have (k+1/4) d or (k-1/4) those pixels of d form, wherein k is an integer, and d is the sampling interval.As shown in Figure 2, wherein each 1/2 pixel is divided into two subclasses again:

A, complete 1/2 pixel: promptly the row, column coordinate all has (k+1/2) d or (k-1/2) form of d, as pixel j, gg, hh or the like.

B, half 1/2 pixel: at once, have only one to have (k+1/2) d or (k-1/2) form of d, for example pixel b, h, m, s, aa, bb, cc, dd, ee, ff or the like in the row coordinate.

3) 1/4 pixel: promptly have at least one to have (k+1/4) d or (k-1/4) those pixels of the form of d in the row, column coordinate, wherein k is an integer, and d is the sampling interval.As unblanketed circle pixel a, c, d, e, f, g, i, k or the like among Fig. 2.

The generation of 4 times of reference pictures adopts a multistage interpolation process to finish.Be divided into following steps:

1) generate 1/2 pixel by integer pixel by interpolation, wherein the interpolation filter of Cai Yonging is FIR (the Finite Impulse Response finite impulse response) filter on one 6 rank, and its weight vector is w=[1 ,-5,20,20 ,-5,1] ^TProcess is as follows:

A, producing half 1/2 pixel by integer pixel by interpolation, is example with pixel b, h:

b ₁=(E-5*F+20*G+20*H-5*I+J), generate median b ₁,

B=Clip ((b ₁+ 16)〉〉 5), skew, normalization is sheared.

Wherein, skew is to add a number (side-play amount can just can be born); Normalization refers to for a variable to make that divided by a positive number merchant's absolute value is not more than 1 all the time in this variable-value scope; Shear expression for the variable that surpasses certain scope, force its value in this scope.Scope such as variable x is [0,18], when x=20, has exceeded this scope, and then x will be sheared x=18.

Because the absolute value sum of each weights of filter is 32, so normalization is exactly divided by 32, and 5 bit manipulations realize with moving to right.Shear function C lip not adjusting to by shearing in [0,255] scope at the interior numerical value of [0,255] scope.As a same reason, can be in the hope of h:

h ₁＝(A-5*C+20*G+20*M-5*R+T)

h＝Clip((h ₁+16)>>5)

B, produce complete 1/2 pixel by interpolation by half 1/2 pixel.The used filter of interpolation remains 6 top rank FIR filters.With pixel j is example:

j ₁=(bb-5gg+20*h ₁+ 20*m ₁-5*kk+cc), generate median j ₁

j＝Clip((j ₁+512)>>10)

Through above two sub-steps, all 1/2 pixels have all generated.

2) generate 1/4 pixel by integer pixel and 1/2 pixel by interpolation.Therefore 1/4 pixel is between two integer pixels or 1/2 pixel (horizontal direction, vertical direction, diagonal) all, can adopt the method for carrying out arithmetic average for two integer pixels that close on or 1/2 pixel to try to achieve.Concrete computing formula is as follows:

a＝(D+b+1)>>1

c＝(E+b+1)>>1

d＝(D+h+1)>>1

n＝(H+h+1)>>1

More than average for horizontal direction.For the situation that diagonal is averaged, account form is as follows:

e＝(b+h+1)>>1

g＝(b+m+1)>>1

In the prior art, 4 times of interpolation images depositing according to the natural order continuation mode in internal memory carried out, and promptly deposits according to pattern shown in Figure 2.Yet it is not the most rational pattern that 4 times of interpolation images are deposited according to natural order.Generate in the process of 4 times of images in interpolation, what at first generate is integer pixel (existing), generates 1/2 pixel then, generates 1/4 pixel at last.In the 1/2 pixel process of generation, if adopt based on SIMD (Single Instruction Multiple Data, single-instruction multiple-data) DSP (Digital Signal Processor, digital signal processing chip) quickens treatment technology, and the SIMD instruction needs monoblock to read integer pixel; In the 1/4 pixel process of generation, need monoblock to read 1/2 pixel and integer pixel equally.According to such natural order tissue image internal memory, can't accomplish that monoblock reads all kinds of pixels, because present storage means is according to natural order, does not classify for pixel.Such as since 4 times of interpolation image top left corner pixel, store successively.In such order be exactly: whole, 1/4,1/2,1/4, whole, 1/4,1/2,1/4 ..., after first row finishes, enter second row, repeat this order, to the last delegation.Like this, in any one region of memory, pixel is not continuously arranged according to class.Such as any two discontinuous appearance of integer pixel, any two 1/2 discontinuous appearance of pixel.Said nothing of the similar pixel of a monoblock.Therefore if read all whole pixels, just must read according to certain intervals (every 3 numbers) in internal memory, efficient is very low.

In addition, after 4 times of interpolation images generate, carry out in the process of estimation, when the integer-pel precision motion prediction, need the integer position subclass in predicted macro block and the 4 times of images be compared, ask SAD (Summed Absolute Difference, absolute difference and).Equally, when carrying out 1/2 pixel precision motion prediction, only need and 4 times of interpolation images in the part of 1/2 position subclass compare; The just part of 1/4 position subclass that 1/4 pixel precision motion prediction need compare.If therefore adopt SIMD class DSP speed technology, relatively calculate SAD at every turn, all only need monoblock to read the pixel that a certain class has predicable.But in the natural order memory organization of image, these pixels with predicable are not deposited continuously, are not easy to monoblock and read.

Summary of the invention

Above-mentioned defective at prior art, the present invention will solve that monoblock reads problems such as all kinds of pixels because of interpolation image is deposited can't accomplishing of causing according to natural order in internal memory in the existing technology of video compressing encoding, a kind of new image memory organization method is provided, make 1/2 pixel and 1/4 pixel precision motion prediction can make full use of SIMD class DSP accelerating algorithm, to improve computational efficiency.

For solving the problems of the technologies described above, the invention provides a kind of interpolation image memory organization method that is used for the fraction pixel precision motion prediction, carrying out 1/2 ⁿDuring the pixel precision motion prediction (wherein n is a natural number), organize the internal memory of interpolation image according to the following steps:

(1) according to generate 2 ⁿTimes interpolation image is divided into integer position subclass, 1/2 with wherein pixel ¹Position subclass, 1/2 ²The position subclass ..., and 1/2 ⁿThe position subclass, comprise respectively in described each subclass whole integer pixel, all 1/2 pixels, all 1/4 pixels ..., and all 1/2 ⁿPixel;

(2) form a whole pixel sub image identical with the whole integer pixel in the described integer position subclass with original image size;

Classify by vertical, level between each 1/2 pixel and the adjacent integer pixel and diagonal position relation, all 1/2 pixels in the described 1/2 position subclass further are divided into 3 littler subclass, constitute 3 1/2 pixel sub images identical correspondingly with original image size;

Classify by vertical, the level between each 1/4 pixel and adjacent integer pixel and 1/2 pixel and diagonal position relation and distance relation, all 1/4 pixels in the described 1/4 position subclass further are divided into 12 littler subclass, constitute 12 1/4 pixel sub images identical correspondingly with original image size;

The rest may be inferred, at n more than or equal to 3 o'clock, by each 1/2 ⁿPixel and adjacent integer pixel, 1/2 pixel, 1/4 pixel ..., 1/2 ^(n-1)Vertical, level between the pixel and diagonal position relation and distance relation are classified, with described 1/2 ⁿIn the subclass of position all 1/2 ⁿPixel further is divided into (2 ²ⁿ-2 ^{2 (n-1)}) individual littler subclass, constitute (2 correspondingly ²ⁿ-2 ^{2 (n-1)}) individual identical with original image size 1/2 ⁿThe pixel sub image;

(3) described each number of sub images is formed a contiguous memory area stores in memory;

(4) except whole pixel sub image, described each number of sub images of having stored is carried out zero initialization process.

In (3) step of interpolation image memory organization method of the present invention, can by in following three kinds of joining methods any will be described contiguous memory area stores of each number of sub images formation in memory:

A, 2 ²ⁿ X 1 splicing, i.e. vertical bar splicing;

B, 2 ⁿX 2 ⁿSplicing, i.e. square splicing;

C, 1 X 2 ²ⁿSplicing, i.e. horizontal stripe shape splicing.

At 1/4 pixel precision motion prediction, the present invention also provide a kind of according to above-mentioned interpolation image memory organization method, utilize single-instruction multiple-data class speed technology that digital signal processing chip provides to generate the method for each fraction pixel of 4 times of interpolation images, comprising following steps:

(1) utilizes whole pixel sub image SP ₀, generate 1/2 pixel sub image SP by the filtering interpolation ₄, SP ₈

(2) utilize 1/2 pixel sub image SP ₄, generate 1/2 pixel sub image SP by the filtering interpolation ₁₂

(3) utilize whole pixel sub image SP ₀With 1/2 pixel sub image SP ₄, SP ₈, SP ₁₂, generate 1/4 pixel sub image SP by horizontal direction filtering interpolation ₁, SP ₅, SP ₉, SP ₁₃

(4) utilize whole pixel sub image SP ₀With 1/2 pixel sub image SP ₄, SP ₈, SP ₁₂, generate 1/4 pixel sub image SP by vertical direction filtering interpolation ₂, SP ₆, SP ₁₀, SP ₁₄

(5) utilize whole pixel sub image SP ₀With 1/2 pixel sub image SP ₄, SP ₈, SP ₁₂, generate 1/4 pixel sub image SP with-45 ° of diagonal filtering interpolation by+45 ° ₃, SP ₇, SP ₁₁, SP ₁₅

At 1/4 pixel precision motion prediction, the present invention also provides a kind of method of each fraction pixel according to 4 times of interpolation images of above-mentioned generation, the single-instruction multiple-data class speed technology of utilizing digital signal processing chip to provide to calculate the method for predicated error index S AD fast, wherein calculates current macro MB according to the following steps ₀In motion estimation process, the reference macroblock MB of certain position in the reference frame _rBetween predicated error index S AD:

(1) when putting in order the pixel precision motion prediction, according to MB _rThe position, from subimage SP ₀Middle monoblock reads MB _rData, calculate then absolute difference and;

(2) when carrying out 1/2 pixel precision motion prediction, according to MB _rThe position, from subimage SP ₄, SP ₈, SP ₁₂In certain monoblock read MB _rData, calculate then absolute difference and;

(3) when carrying out 1/4 pixel precision motion prediction, according to MB _rThe position, from subimage SP ₁, SP ₂, SP ₃, SP ₅, SP ₆, SP ₇, SP ₉, SP ₁₀, SP ₁₁, SP ₁₃, SP ₁₄, SP ₁₅In certain monoblock read MB _rData, calculate then absolute difference and.

Method of the present invention has overcome the problem of being brought when existing 4 times of interpolation images are deposited according to natural order that reads inconvenience in internal memory.Make 1/2 pixel and 1/4 pixel precision motion prediction can make full use of SIMD class DSP accelerating algorithm to raise the efficiency.Wherein, according to attribute each pixel in 4 times of interpolation images is divided into subclass, every subset is divided into the plurality of sub image again, makes each subimage to be read as the monoblock data in SIMD class ordering calculation.Adopt method of the present invention, can H.263/H.263+, H.264, in international standard such as MPEG-4 and AVS 1.0 national standards, generate 4 times of interpolation image computings and estimation computing for the interpolation in 1/2 pixel and the 1/4 pixel precision motion prediction process, can effectively quicken.Especially the time by SIMD class DSP acceleration mechanism.Therefore, can under the constant prerequisite of other condition, improve the frame per second of video coding and decoding.Improve the performance of video communication kind equipment, perhaps, reduce the cost of product by adopting the lower DSP of disposal ability to reach same performance such as video conference or video telephone.

Description of drawings

The invention will be further described below in conjunction with drawings and Examples, in the accompanying drawing:

Fig. 1 is the search procedure of 1/4 pixel precision motion prediction in the existing H.264/AVC standard;

Fig. 2 is integer pixel, 1/2 pixel and the 1/4 pixel relative geometry position relation in 4 times of interpolation images;

Fig. 3 classifies and all kinds of numberings according to attribute to the various pixels in 4 times of interpolation images among the present invention;

Fig. 4 a, Fig. 4 b, Fig. 4 c are respectively 4 times of interpolation images under original image, the conventional store method and 4 times of interpolation images under the storage means of the present invention;

Fig. 5 a, Fig. 5 b, Fig. 5 c be 4 times of interpolation image P of gained of the present invention respectively _4x4In the splicing storage means of 16 number of sub images.

Embodiment

To be that the present invention will be described for example with 1/4 pixel precision motion prediction below.

For 1/4 pixel precision motion prediction, the key of the inventive method is a kind of reorganization and joining method for 4 times of (be width and highly all be 4 times interpolation image of original image) interpolation reference picture contents, thereby guarantees its storage continuously in internal memory.After using method of the present invention, when forming 4 times of interpolation reference pictures by interpolation and carry out 1/2 pixel, the estimation of 1/4 pixel motion, can make full use of the propinquity of pixel data in memory space of being visited in succession and significantly improve computational efficiency.When the SIMD that utilizes DSP to provide quickens function, when quickening to handle as MMX, the SSE etc. of Intel CPU, its performance boost is especially obvious, because this data space propinquity is fit to SIMD very much.The efficient realization of the 1/4 pixel precision motion prediction that requires during H.264/AVC this method is primarily aimed at, but its principle can be used for 1/2 pixel precision motion prediction H.263/H.263+ fully, 1/4 pixel precision motion prediction among the MPEG-4, and 1/4 pixel precision motion prediction in the AVS standard.

Wherein,, wherein pixel is divided into three subclass according to 4 times of interpolation images that will generate, as follows:

Integer position subclass S _IP={ whole integer pixel }, the pixel of this subclass is represented with solid big garden in Fig. 3;

1/2 position subclass S _HP={ all 1/2 pixels }, the pixel of this subclass is represented with filled squares in Fig. 3;

1/4 position subclass S _QP={ all 1/4 pixels }, the pixel of this subclass is represented with open squares in Fig. 3.

As can be seen, at the integer pixel A (being numbered 0) in the upper right corner, correspondingly have 3 1/2 pixels (being numbered 4,8,12), and 12 1/4 pixels (being numbered 1,2,3,5,6,7,9,10,11,13,14,15) are arranged in the frame of broken lines from Fig. 3.Therefore, among the present invention with integer position subclass S _IPIn all pixels make to constitute a whole pixel sub image SP ₀, this subimage and original image are measure-alike.

Classify by vertical, level between each 1/2 pixel and the adjacent integer pixel and diagonal position relation again, with 1/2 position subclass S _HPIn all 1/2 pixels further be divided into 3 subclass, each subclass constitutes 1/2 a pixel sub image identical with original image size, constitutes 3 1/2 pixel sub image SP altogether ₄, SP ₈, SP ₁₂

Classify by vertical, the level between each 1/4 pixel and adjacent integer pixel and 1/2 pixel and diagonal position relation and distance relation (classification will be considered distance for 1/4 pixel) again, with 1/4 position subclass S _HPIn all 1/4 pixels further be divided into 12 subclass, each subclass constitutes 1/4 a pixel sub image identical with original image size, constitutes 12 1/4 pixel sub image SP altogether ₁, SP ₂, SP ₃, SP ₅, SP ₆, SP ₇, SP ₉, SP ₁₀, SP ₁₁, SP ₁₃, SP ₁₄, SP ₁₅

Therefore, 4 times of interpolation images finally can be expressed as:

P_{4 x 4} = [\begin{matrix} {SP}_{0}, {SP}_{1}, {SP}_{4}, {SP}_{5} \\ {SP}_{2}, {SP}_{3}, {SP}_{6}, {SP}_{7} \\ {SP}_{8}, {SP}_{9}, {SP}_{12}, {SP}_{13} \\ {SP}_{10}, {SP}_{11}, {SP}_{14}, {SP}_{15} \end{matrix}]

In the above for P _4x4According to pixels classify reorganize after, generated 16 with big or small subimage such as original image.These subimages specifically storage in internal memory can have a lot of modes.The simplest method is exactly the storage separately respectively of 16 width of cloth images, but, the great-jump-forward reading of data when can causing calculating the corresponding sad value in each position, this storage means takes place, as in 1/2 pixel search procedure, during the sad value of 1/2 location of pixels correspondence directly over the calculating optimum integer pixel positions, need be from the 8th width of cloth image reading of data, but when calculating the sad value of upper left side and upper right side 1/2 location of pixels correspondence, but need reading of data from the 12nd width of cloth image, if 16 width of cloth images are separate, stored respectively, then the 8th width of cloth image and the 12nd width of cloth image deposit position meeting wide apart in internal memory will certainly influence access speed in these two reciprocal visits in position.Therefore, more scientific methods is the splicing storage, promptly 16 width of cloth images is spliced into the big image of a width of cloth and is stored in the internal memory.For this reason, the present invention proposes three kinds of joining methods,

So-called internal memory splicing mainly is to be used for those to read for internal memory and write data and have according to 2 the integer power multiple byte boundary method to the raising read-write efficiency of the DSP design of inferior requirement.If certain data block of read-write not according to the boundary alignment of certain byte number (such as 32,64 bytes), just needs to read or write some data (zero padding usually) more and gathers into boundary alignment, so nature can influence efficient.

For 1/2 pixel and 1/4 pixel precision motion prediction, the present invention can provide three kinds of joining methods that satisfy boundary alignment.Fig. 5 a, Fig. 5 b, Fig. 5 c show three kinds of internal memory joining methods that are suitable for for 1/4 pixel precision motion prediction situation.Each piece among this figure is represented a number of sub images.Be respectively:

A, 16 X, 1 splicing, promptly 16 number of sub images are spliced into a vertical bar;

B, 4 X, 4 splicings, (promptly 16 number of sub images are spliced into a square);

C, 1 X, 16 splicings, promptly 16 number of sub images are spliced into one (horizontal stripe).

For 1/2 pixel precision motion prediction situation, three kinds of splicing strategies can should be arranged mutually equally also:

A, 4 X, 1 splicing (vertical bar);

B, 2 X, 2 splicings (square);

C, 1 X, 4 splicings (horizontal stripe).

For 1/8 pixel, so bigger n, and these three kinds splicing strategies all are suitable for, but may also have more strategy.Therefore for general n, three kinds of splicing strategies are:

A, 2 ²ⁿ X 1 splicing, i.e. (splicing of vertical bar shape);

B, 2 ⁿX 2 ⁿSplicing, i.e. square splicing;

C, 1 X 2 ²ⁿSplicing, i.e. horizontal stripe shape splicing.

At last, except whole pixel sub image, described each number of sub images of having stored is carried out zero initialization process.

By the process of 4 times of interpolation image P4x4 of original image generation, because the structural facility of P4x4 can adopt SIMD class DSP assisted instruction to finish, the convolution algorithm of filter design, shift operation can be finished for the unit monoblock with the subimage.Why adopt method for numbering serial shown in Figure 3, be for existing H.264 with other conformance to standard)

In estimation, the SAD computing can be instructed by SIMD by the submatrix that takes out from certain number of sub images and predicted macroblock size equates and be quickened.

According to above-mentioned interpolation image memory organization method, the SIMD class speed technology that can utilize DSP to provide generates each fraction pixels of 4 times of interpolation images according to the following steps:

(2) utilize 1/2 pixel sub image SP ₄, generate 1/2 pixel sub image SP by the filtering interpolation _12.

According to the method for above-mentioned interpolation image memory organization method with each fraction pixel that generates 4 times of interpolation images, the SIMD class speed technology that can utilize DSP to provide is calculated current macro MB according to the following steps ₀In motion estimation process, the reference macroblock MB of certain position in the reference frame _rBetween predicated error index S AD:

(1) when putting in order the pixel precision motion prediction, according to MB _rThe position, from subimage SP ₀Middle monoblock reads MB _rData, calculate SAD then.

(2) when carrying out 1/2 pixel precision motion prediction, according to MB _rThe position, from subimage SP ₄, SP ₈, SP ₁₂In certain monoblock read MB _rData, calculate SAD then.

(3) when carrying out 1/4 pixel precision motion prediction, according to MB _rThe position, from subimage SP ₁, SP ₂, SP ₃, SP ₅, SP ₆, SP ₇, SP ₉, SP ₁₀, SP ₁₁, SP ₁₃, SP ₁₄, SP ₁₅In certain monoblock read MB _rData, calculate SAD then.

Adopt the present invention, can be H.263/H.263+, H.264, in international standard such as MPEG-4 and the AVS1.0 national standard, generate 4 times of interpolation image computings and the estimation computing is effectively quickened for the interpolation in 1/2 pixel and the 1/4 pixel precision motion prediction process.Especially the time by SIMD class DSP acceleration mechanism.Therefore, can under the constant prerequisite of other condition, improve the frame per second of video coding and decoding.Improve the performance of video communication kind equipment, perhaps, reduce the cost of product by adopting the lower DSP of disposal ability to reach same performance such as video conference or video telephone.These two kinds of methods can both improve the competitiveness of product in market.Effect of the present invention is significant, has following experimental data that effect of the present invention can be described:

Experiment 1: adopt the inventive method in conjunction with MMX, SSE2 is optimized and quickens for 1/4 picture element interpolation process, and for classical test pattern sequence C lair, News and Foreman, the result is shown in following table one:

Table one

Experiment 2: adopt the inventive method, carry out the acceleration optimization of whole 1/4 pixel precision motion prediction process for classical test pattern sequence C lair, News and Foreman, the result is shown in following table two, and the data in the table two are coding frame numbers that per second can be finished:

Table two

Method of the present invention is directly applied for and follows H.263/H.263+, H.264, the video encoder and the decoder of MPEG-4 international standard and AVS domestic standard, 1/2 pixel and 1/4 pixel precision motion prediction in can realizing.Also be suitable for video encoder and decoder that other adopts 1/2 pixel and 1/4 pixel precision motion prediction, to realize 1/2 pixel and 1/4 pixel precision motion prediction.

Method of the present invention also is applicable to and realizes 1/8 pixel precision motion prediction, and the inevitable motion prediction that at first carries out 1/2 pixel and 1/4 pixel precision of any 1/8 pixel precision motion prediction.Therefore 1/2 pixel and 1/4 pixel precision motion prediction are the necessary component and the prerequisite of 1/8 pixel precision motion prediction.Also be applicable to the realization of 1/8 pixel precision motion prediction in any other (not necessarily following certain standard) video encoder of employing 1/8 pixel precision motion prediction and the realization of decoder.

Attached: abbreviation that uses in this patent and Key Term

The English Chinese of abbreviation

AVC Audio-Video Coding audio-video coding

The advanced audio frequency and video coded system of AVS Advanced Audio-Video System (country)

DB deci-Bell decibel

DSP Digital Signal Processor digital signal processing chip

DV Displacement Vector displacement vector

MB Macroblock macro block

MV Motion Vector motion vector

MPEG Moving Picture Experts Group Motion Picture Experts Group (International Standards Organization)

PSNR Peak Signal-to Noise Ratio Y-PSNR

SIMD Single Instruction Multiple Data single-instruction multiple-data

MMX MultiMedia Extension Multimedia Xtension

SAD Sum of Absolute Differences absolute difference and

SSEStream SIMD Extension single-instruction multiple-data expansion instruction set

Claims

1, a kind of interpolation image memory organization method that is used for the fraction pixel precision motion prediction is characterized in that, is carrying out 1/2 ⁿDuring the pixel precision motion prediction, wherein, n is a natural number, organizes the internal memory of interpolation image according to the following steps:

2, interpolation image memory organization method according to claim 1 is characterized in that, described n equals 1, when carrying out 1/2 pixel precision motion prediction, organizes the internal memory of interpolation image according to the following steps:

(1) according to 2 times of interpolation images that will generate, wherein pixel is divided into integer position subclass and 1/2 position subclass, comprise the whole integer pixel in the described integer position subclass, comprise all 1/2 pixels in the described 1/2 position subclass;

(4) except whole pixel sub image, described 3 the 1/2 pixel sub images of having stored are carried out zero initialization process.

3, interpolation image memory organization method according to claim 1 is characterized in that, described n equals 2, when carrying out 1/4 pixel precision motion prediction, organizes the internal memory of interpolation image according to the following steps:

(1), wherein pixel is divided into integer position subclass S according to 4 times of interpolation images that will generate _IP, 1/2 position subclass S _HP, and 1/4 position subclass S _QP, described integer position subclass S _IPIn comprise the whole integer pixel, described 1/2 position subclass S _HPIn comprise all 1/2 pixels, described 1/4 position subclass S _QPIn comprise all 1/4 pixels;

(2) with described integer position subclass S _IPIn the whole integer pixel form a whole pixel sub image SP identical with original image size ₀

Classify by vertical, level between each 1/2 pixel and the adjacent integer pixel and diagonal position relation, with described 1/2 position subclass S _HPIn all 1/2 pixels further be divided into 3 littler subclass, constitute 3 subimage SPs identical correspondingly with original image size ₄, SP ₈, SP ₁₂

Classify by vertical, the level between each 1/4 pixel and adjacent integer pixel and 1/2 pixel and diagonal position relation and distance relation, with described 1/4 position subclass S _QPIn all 1/4 pixels further be divided into 12 littler subclass, constitute 12 subimage SPs identical correspondingly with original image size ₁, SP ₂, SP ₃, SP ₅, SP ₆, SP ₇, SP ₉, SP ₁₀, SP ₁₁, SP ₁₃, SP ₁₄, SP ₁₅

(4) except whole pixel sub image, described 3 1/2 pixel sub images and 12 1/4 pixel sub images of having stored are carried out zero initialization process.

4, interpolation image memory organization method according to claim 1 is characterized in that, described n equals 3, when carrying out 1/8 pixel precision motion prediction, organizes the internal memory of interpolation image according to the following steps:

(1) according to 8 times of interpolation image that will generate, pixel wherein is divided into integer position subclass, 1/2 position subclass, 1/4 position subclass and 1/8 position subclass, comprises whole integer pixel, all 1/2 pixels, all 1/4 pixels and all 1/8 pixels in described each subclass respectively;

(2) form an integer pixel subimage identical with the whole integer pixel in the described integer position subclass with original image size;

Classify by vertical, the level between each 1/8 pixel and adjacent integer pixel, 1/2 pixel, 1/4 pixel and diagonal position relation and distance relation, all 1/8 pixels in the described 1/8 position subclass further are divided into 48 subclass, constitute 48 1/8 pixel sub images identical correspondingly with original image size;

(4) except whole pixel sub image, described 3 the 1/2 pixel sub images of having stored, 12 1/4 pixel sub images and 48 1/8 pixel sub images are carried out zero initialization process.

5, according to each described interpolation image memory organization method among the claim 1-4, it is characterized in that, in described (3) step, can by in following three kinds of joining methods any will be described contiguous memory area stores of each number of sub images formation in memory:

A, 2 ²ⁿX 1 splicing, i.e. vertical bar splicing;

B, 2 ⁿX 2 ⁿSplicing, i.e. square splicing;

C, 1 X 2 ²ⁿSplicing, i.e. horizontal stripe shape splicing.

6, interpolation image memory organization method according to claim 5, it is characterized in that, described n=2, in described (3) step, can by in following three kinds of joining methods any will be described contiguous memory area stores of each number of sub images formation in memory:

A, 16 X, 1 splicing, i.e. vertical bar splicing;

B, 4 X, 4 splicings, i.e. square splicing;

C, 1 X, 16 splicings, i.e. horizontal stripe shape splicing.

7, interpolation image memory organization method according to claim 6 and utilize single-instruction multiple-data class speed technology that digital signal processing chip provides to generate the method for each fraction pixel of 4 times of interpolation images is characterized in that, may further comprise the steps:

8, the method for each fraction pixel of 4 times of interpolation images of the generation according to claim 7 and single-instruction multiple-data class speed technology of utilizing digital signal processing chip to provide is calculated the method for predicated error index S AD fast, it is characterized in that, calculate current macro MB according to the following steps ₀In motion estimation process, the reference macroblock MB of certain position in the reference frame _rBetween predicated error index S AD: