CN1481169A

CN1481169A - A real-time 1/4 interpolation method based on multi-stage pipeline structure

Info

Publication number: CN1481169A
Application number: CNA031525024A
Authority: CN
Inventors: 晁黄; 黄晁; 王荣刚; 李锦涛
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2003-08-01
Filing date: 2003-08-01
Publication date: 2004-03-10
Anticipated expiration: 2023-08-01
Also published as: CN1186939C

Abstract

A real-time 1/4 interpolation method based on a multi-stage pipeline structure. The 1/4 interpolation process specified in the MPEG4 protocol is combined with the motion estimation process, that is, only 1/4 interpolation is performed on the blocks that need to be used in the motion estimation process, and the interpolation process is organized into data reading, line 1/ 2 interpolation, column 1/2 interpolation, bilinear interpolation and data output 5-stage pipeline structure, which can not only save the memory resource occupation in the interpolation process, but also greatly speed up the interpolation speed. The method includes the following steps: performing real-time interpolation on the blocks needed in the process of codec motion estimation; combining the interpolation process and the motion estimation process in a pipeline sequence; eliminating repeated data reading by setting up an intermediate buffer; organizing the interpolation process into 5-stage pipeline structure; the 1/2 interpolation process is designed into a 4-stage pipeline structure; this method can be applied to the design of video encoder based on MPEG4 protocol.

Description

A kind of real-time 1/4 interpolation method based on multi-stage pipeline arrangement

Technical field

The present invention relates to the video coding and decoding technology field, particularly between frame of video image based on a kind of real-time 1/4 interpolation method of the encoding and decoding technique of time prediction based on multi-stage pipeline arrangement.

Background technology

In video coding, often image is divided into two kinds of I picture and inter frame images.Wherein, inter frame image need adopt motion estimation coding method, and this mainly is because the pixel of adjacent image piece has very big temporal correlation.The main thought of this coding method is to find the piece that mates most with the encoding block predicted value (prediction piece) as encoding block in reference picture, encodes.Encoding block and prediction piece matching degree are high more, and the efficient of coding is high more.In order to improve the matching degree of the two, just need to improve the precision of estimation.What MPEG1 adopted is whole pixel precision, MPEG2 and H.263 employing be half-pixel accuracy, in MPEG4, adopted 1/4 pixel precision, thereby can improve code efficiency.But 1/4 precision estimation need be carried out 1/4 interpolation to reference picture, 1/4 interpolation process computation complexity height, and desire is carried out interpolation to an integral point need carry out 6 rank linear interpolation and bilinear interpolations to 6 * 6 integral points around this point.As shown in Figure 1, an image block through 1/4 interpolation after the size become original 16 times.A whole pixel in the original image piece becomes 16 points after through 1/4 interpolation, process as shown in Figure 2, it is that 16 points need be with 6 * 6 integral points around the A as input that desire will be put the A interpolation, detailed process is:

(1) respectively 6 * 6 each row of data is carried out 6 rank linear interpolations, obtain a little 1 and 5.

(2) respectively an A, 1,4 columns are carried out 6 rank linear interpolations, obtain a little 2,3,4.

(3) respectively to an A, 2, the C place every trade linear interpolation of advancing, obtain an a, b, h, i, o, p.

(4) respectively an A, a, 1, b column are carried out the alignment interpolation and obtain a c, d, e, f, j, k, m, l.

(5) utilize some A, B, C and D that a m is carried out low-pass filtering.

Just obtain putting A 16 some A, a, 1, b, c, d, e, f, 2, h, 3, i, j, k, l, m on every side after 1/4 interpolation through above-mentioned 5 steps.We are through finding to exist following problems to anatomizing of interpolation process:

(1), needs big capacity storage space storage interpolation result if whole frame data are carried out carrying out the data volume that the every frame data amount of estimation will be expanded to 16 frames again after the interpolation.

(2) the complete serial of interpolation and motion estimation process has increased the interframe encode time.

(3) there be a large amount of the repetition in adjacent whole pixel as the interpolation input data of an A, B, C, D.

Summary of the invention

The object of the present invention is to provide a kind of real-time 1/4 interpolation method based on multi-stage pipeline arrangement.It is undertaken by interpolation process is combined with motion estimation process, and interpolation process is organized into multi-stage pipeline arrangement, reduces the time and the space hold of video encoding-decoding process 1/4 interpolation process, thereby improves encoding-decoding efficiency.Reduction takies storage resources, improves the arithmetic speed of 1/4 value, and the speed that improves coding and decoding video is played an important role.

Technical scheme is as follows:

A kind of real-time 1/4 interpolation method based on multi-stage pipeline arrangement, this method combines with motion estimation process 1/4 interpolation process of stipulating in the MPEG4 agreement carries out by the streamline order, and with interpolation process be organized into that data are read in, row 1/2 interpolation, row 1/2 interpolation, ranks 1/4 interpolation and 5 grades of Fully-pipelined structures of data output, so not only can save taking of memory resource in the interpolation process, and can accelerate interpolation speed greatly.May further comprise the steps:

(1) current block of using in the motion estimation process is carried out real-time interpolation;

(2) estimation and 1/4 interpolation process are organized into 2 stage pipeline structure;

(3) repeat to read in by setting up intermediate buffer to eliminate data;

(4) 1/4 interpolation process is organized into data are read in, row 1/2 interpolation, row 1/2 interpolation, bilinear interpolation and data export 5 stage pipeline structure;

(5) 1/2 interpolation process is designed to 4 stage pipeline structure;

(6) design is scalable: can be different according to using, and improve interpolation speed or reduce resource occupation by the number that increases or reduce the interpolation arithmetic unit easily.

Described method combines motion estimation process and 1/4 interpolation process and carries out, and the current block of using in the motion estimation process is carried out real-time interpolation, thereby can the conserve memory resource occupation.

Described method is organized into 2 stage pipeline structure with estimation and 1/4 interpolation process, reduces the time occupation proportion of interpolation in whole cataloged procedure.

Described method, by setting up intermediate buffer buffer portion input data, eliminate adjacent whole pixel in interpolation process, import data repeat read in, thereby each whole pixel relevant with the interpolated data piece only need be read into once.

Described method, with 1/4 interpolation process be organized into that data are read in, row 1/2 interpolation, row 1/2 interpolation, bilinear interpolation and data export 5 stage pipeline structure, quicken the arithmetic speed of 1/4 interpolation.

Described method, 6 rank linear interpolations with 1/2 interpolation process adopts are designed to 4 stage pipeline structure, and streamline starts the every bat in back and calculates a result, thereby can improve the arithmetic speed of interpolation greatly.

Described method, the number of interpolation arithmetic unit is configurable, can be different according to using, improve interpolation speed or reduce resource occupation by the number that increases or reduce the interpolation arithmetic unit easily.

Description of drawings

Fig. 1 is data block 1/4 an interpolation process schematic diagram;

Fig. 2 is the interpolation process schematic diagram of an A;

Fig. 3 is that an A and B input data repeat schematic diagram; (white portion is a repeat region among the figure)

Fig. 4 is 1/4 slotting overall construction drawing; (clip:min (max (0, s7), 255)) wherein

The method comprises following feature:

(1) current block of using in the motion estimation process is carried out real-time interpolation.

H.264 having 16 * 16 in the agreement, 16 * 8,8 * 16,8 * 8,8 * 4,4 * 8 and 4 * 4 data blocks of totally 7 kinds of shapes, video encoding-decoding process is unit with the macro block, only use 48 * 48 interpolation results (decoding only needs 16 * 16) at every kind of shape piece each macro block in motion estimation process, these results use up and once just no longer repeat to use, if so adopt the way of real-time interpolation, only needed before macro block is carried out estimation, the data that will use are carried out interpolation, the result is placed in the buffering area of 48 * 48 (decoding is 16 * 16), when carrying out the estimation of next macro block, override the interpolated data that former macro block is used with new slotting straight result, the whole motion estimation process of great frame of so no matter encoding only needs the memory space of 48 * 48 bytes (decoding is needs 16 * 16 bytes only) to deposit interpolation result.

(2) estimation and interpolation process are pressed the streamline sequential organization

In order to improve the degree of concurrence of estimation and 1/4 interpolation, the present invention with estimation and 1/4 interpolation process according to the streamline sequential organization, as shown in Figure 3, after interpolation device is calculated delegation's interpolation result, just carry out estimation with this line data, interpolation device calculates the next line interpolation result simultaneously.The time of such 1/4 interpolation process takies basically by " dissolving is " in motion estimation process.

(3) repeat to read in by setting up intermediate buffer to eliminate data

Desire is carried out interpolation to 1 A, need its 6 * 6 data blocks on every side as input as shown in Figure 2, same desire is carried out interpolation to a B also needs its 6 * 6 data blocks on every side as input, because A is adjacent with B, therefore the input of the two existence is a large amount of repeats, and putting the back 5 row inputs of A and preceding 5 row of some B as shown in Figure 3 is repetitions.Same reason, also there are the repetition of 5 row, 6 row in the interpolation input of the some C among Fig. 2 and the input of A.In order to reduce the input that repeats of data, the present invention has adopted an intermediate buffer, preceding 6 line data of elder generation's buffer memory whole data block interpolation input, successively each point in the data block is carried out interpolation then, when like this first whole pixel of going being carried out interpolation, data directly obtain in the memory block internally, do not need the repeated accesses external memory area, increase simultaneously a row buffer again, be used for cushioning the data of reading in of next line (the 7th row), reading in simultaneously of the interpolation of first line data and the 7th line data carried out, when the whole pixel interpolation of first row finishes, the 7th line data also should be read in the buffering area, with 2-6 line data and the 7th line data that just read in interpolation input data as the whole pixel of second row, finish 1/4 interpolation and so forth, the result is deposited in 48 * 48 the intermediate buffer and call for motion estimation process to whole data block.The characteristics of this structure are that the number of times of visit external memory area is low, and each data only need be read in once.

(4) interpolation device is designed to 5 stage pipeline structure

Interpolation device serves as the input unit with 6 * 6 data blocks, is divided into that data are read in, row 1/2 interpolation, row 1/2 interpolation, bilinear interpolation and output five-stage pipeline structure.As shown in Figure 3, every bat is sent into one 6 * 6 data block of interpolation device a point A is carried out interpolation, through after the five-stage pipeline, obtains 16 interpolation results corresponding with this point, after streamline starts fully, and 16 interpolation results of every bat output.

(5) 1/2 interpolation process is designed to 4 stage pipeline structure.

Because at the 6 rank linear interpolations that 1/2 interpolation process adopts, computation complexity height.For example establish continuous 6 data of delegation and be (in0, in1, in2, in3, in4, in5) desire is carried out 6 rank linear interpolations to in2, computational process is: out=min ((max ((in0-5*in1+20*in2+20*in3-5*in4+in5+16)/32,0), 255), if finishing, this computing will reduce the frequency of whole system in one claps, in order to eliminate this critical path, the present invention is designed to 4 stage pipeline structure with this computing, as shown in Figure 4,6 data are sent in every bat, and 4 clap interpolation result of the every bat output in back.

An interpolation arithmetic unit as shown in Figure 4, it can independently finish the interpolation of series of points, can improve interpolation speed by the number that changes the interpolation arithmetic unit or reduce resource occupation.

Practical application

Our Ying Jing is successfully applied to this method in the H.264 decoding proofing chip that I develop, and has obtained significant effect, and the present per second of this chip can decoding standard definition video (720 * 576) 35 frames.

Design scalable: improve interpolation speed or reduce resource occupation by the number that increases or reduce the interpolation arithmetic unit easily.This method can be applied in the design based on the video encoder of MPEG4 agreement.

Claims

1. A real-time 1/4 interpolation method based on a multi-stage pipeline structure, which combines the 1/4 interpolation process specified in the MPEG4 protocol with the motion estimation process in pipeline order, and organizes the interpolation process into data read-in, line 1 /2 interpolation, column 1/2 interpolation, row and column 1/4 interpolation and data output 5-stage full pipeline structure, including the following steps:

(1) Perform real-time interpolation on the current block used in the motion estimation process;

(2) Organize the motion estimation and 1/4 interpolation process into a 2-stage pipeline structure;

(3) Eliminate repeated reading of data by setting up an intermediate buffer;

(4) Organize the 1/4 interpolation process into a 5-stage pipeline structure of data input, row 1/2 interpolation, column 1/2 interpolation, bilinear interpolation and data output;

(5) Design the 1/2 interpolation process into a 4-stage pipeline structure;

(6) Scalable design: According to different applications, it is convenient to increase or decrease the number of interpolation calculation units to increase interpolation speed or reduce resource occupation.

2. The method according to claim 1, characterized in that: the motion estimation process and the 1/4 interpolation process are combined, and the current block used in the motion estimation process is interpolated in real time, thereby saving storage resource occupation .

3. The method according to claim 1, characterized in that: the motion estimation and 1/4 interpolation processes are organized into a two-stage pipeline structure to reduce the time occupation ratio of interpolation in the entire encoding process.

4. according to the described method of claim 1, it is characterized in that: by setting up intermediate buffer buffer part input data, eliminate the repeated reading of input data of adjacent integer pixel point in interpolation process, thereby make each and interpolation data block The relevant integer pixels need only be read in once.

5. according to the described method of claim 1, it is characterized in that: 1/4 interpolation process is organized into data read-in, row 1/2 interpolation, column 1/2 interpolation, bilinear interpolation and data output 5-stage pipeline structure , to speed up the operation speed of 1/4 interpolation.

6. The method according to claim 1, characterized in that: the 6-order linear interpolation used in the 1/2 interpolation process is designed into a 4-stage pipeline structure, and a result is calculated for each shot after the pipeline starts, thereby greatly improving the interpolation operating speed.

7. The method according to claim 1, characterized in that: the number of interpolation calculation units is configurable, and according to different applications, it is convenient to increase or decrease the number of interpolation calculation units to increase interpolation speed or reduce resource occupation.