CN102413329B

CN102413329B - Motion estimation realizing method of configurable speed in video compression

Info

Publication number: CN102413329B
Application number: CN201110371098.8A
Authority: CN
Inventors: 余宁梅; 贾文华; 顾梅花
Original assignee: Xian University of Technology
Current assignee: Guangzhou Qing Ji Polytron Technologies Inc
Priority date: 2011-11-21
Filing date: 2011-11-21
Publication date: 2014-06-04
Anticipated expiration: 2031-11-21
Also published as: CN102413329A

Abstract

The invention discloses a motion estimation realizing method of a configurable speed in video compression. The motion estimation realizing method specifically comprises the following steps of: reasonably configuring and processing quantity of PE units according to user demands; in the PE units, calculating costs of basic blocks; based on the cost correlation of different sizes of blocks to obtain costs of different blocks under different partition modes; and reading complete all reference data line by line, comparing final costs obtained by all PEs and taking minimum cost to determine optimal motion information MV. According to the invention, times of storage access are effectively reduced, and coding speed completely can meet requirements of real-time coding of high-definition video.

Description

A kind of motion estimation implementing method of configurable speed in video compression

Technical field

The invention belongs to video compression transmission technology technical field, be specifically related to a kind of motion estimation implementing method of configurable speed in video compression.

Background technology

The HD video the most frequently used coded format of encoding has MPEG-2-TS, MPEG-4, VC-1 and H.264/AVC etc.The total feature of these standards is good network compatibility and efficient coding quality, and is easy to hardware realization etc., therefore aspect video compression, is widely used.In the hardware configuration of video encoder, the computation complexity of interframe movement estimation module and memory bandwidth consumption account for 50%～90%, and therefore, the performance of interframe movement estimation module has directly determined the performance of encoder.

The main process of interframe encode is: first original image is drawn to piece, carry out estimation taking piece as unit, in order to improve precision, conventionally these pieces are cut apart again, carry out match search with different sized blocks, main flow coding standard is original image to be divided into 16 × 16 macro block MB (micro block) at present, then be 16 × 16 by this macroblock partitions, 16 × 8, 8 × 16, 8 × 8, 8 × 4, 4 × 8, 4 × 4 these 7 kinds cut apart, totally 41 current blocks, cut apart under pattern this, the movable information of the adjacent block of having encoded by current block is motion vector MV (motion ventor), in its reference frame image, draw a prediction piece, again centered by this piece, to extending out m pixel, draw the search window of estimation, containing k pixel, k=(m*2+16) * (m*2+16).Then allow these 7 kinds 41 sub-blocks cutting apart in this region of search, carry out match search, then determine motion vector MV by the size that compares its cost.

The hard-wired main key technology of estimation: altitude information utilance, the cost relation of low distortion and different masses size.

The hard-wired data reusing technology of estimation, can effectively reduce memory access number of times, thereby effectively reduces hardware resource consumption and system power dissipation.At present, the Hardware Implementation that a kind of altitude information is reused has become study hotspot.In existing document, specified the classification of data reusing degree, A level mechanism is reused the overlapping reference pixel in the adjacent reference block of a current block.B level mechanism is reused the adjacent overlapping reference pixel with reference to band of a current block.C level mechanism is reused the overlapping region of the search window of adjacent current block.D level mechanism is reused the pixel in the whole search window of continuous current block.A level mechanism has minimum storage area but needs maximum memory access number of times, and D level mechanism memory access number of times minimum still consumes memory space on maximum sheet.According to different demands, need to adopt different data reusing mechanism to carry out the contradiction between balance memory space and memory access.Current C DBMS is reused under the restriction of current memory bandwidth the most efficient, and therefore most design adopts C data reusing.

Searching algorithm is another key in estimation, mainly comprises two kinds of modes of full-search algorithm and fast search algorithm.Full-search algorithm is successively to travel through with reference to all positions in window, and this method has the highest fidelity, but has again comparatively speaking maximum hardware consumption.There are at present a lot of fast search algorithms, but all taking distortion as cost, under the prerequisite therefore allowing in system, should select the searching algorithm of distortion factor minimum as far as possible.

The motion estimation algorithm of variable block length, has improved precision, but has also brought very large computation complexity.Less cut size can drop to minimum by the distortion of coding, and the computation complexity therefore bringing can draw by cost combination between its different size piece, effectively reduces computation complexity.

Summary of the invention

The object of this invention is to provide a kind of motion estimation implementing method of configurable speed in video compression, effectively reduce the number of times of memory access, coding rate can meet the requirement of HD video real-time coding completely.

The technical solution adopted in the present invention is, a kind of motion estimation implementing method of configurable speed in video compression, is characterized in that, concrete steps are as follows:

Step 1, according to the PE unit number of user's request reasonable disposition parallel processing;

Step 2, in inside, PE unit, calculate basic block cost;

Step 3, cost correlation based on different size piece, draw the various costs of cutting apart the different masses under pattern;

Step 4, run through whole reference datas line by line, the final cost that each PE is obtained compares, and gets minimum cost and is defined as motion vector MV.

The concrete grammar of step 2 is:

Step 2.1, from on-chip memory, read line by line reference data each pixel of each row of the row data and current macro MB is asked to differential mode, wherein, on-chip memory size is (m*2+a) * (m*2+a) pixels, macroblock size is a*a pixels, (m ,+m) be hunting zone;

Step 2.2, will adhere to a separately ²the a*a an of/b b*b piece differential mode sums up as part cost, and wherein, b is the size of smallest partition piece;

Step 2.3, according to different traversal positions, the validity of determining section cost, thus draw part cost and produce its useful signal.

The concrete grammar of step 3 is:

Step 3.1, the register of storage b*b piece cost memory allocated space are set;

Step 3.2, in each memory space, counter is set, draws the whether cumulative complete signal full of judgment part cost;

Step 3.3, judge the useful signal of the part cost that input step 2.3 obtains, by its corresponding effectively part cost by the cumulative memory space having distributed that enters of diverse location;

Step 3.4, judge that whether full is effective, complete cost is sent to the positional information being drawn by this register label in addition of simultaneously sending;

Step 3.5, return to step 3.2, until obtain the complete cost of a a*a piece b*b sub-block, utilize cost correlation between different masses, splicing draws various whole costs of cutting apart pattern;

Meanwhile, the complete cost of the more current complete cost of sending and last position, selects the cost information storage of the little position of distortion, and current cost is also stored and done other match patterns uses.

The motion estimation implementing method of a kind of configurable speed in video compression of the present invention, utilize fully the correlation on reference data space, under the prerequisite that does not reduce encoding precision, reduce memory access number of times, the in the situation that of configuration and horizontal column position equivalent number PE unit, the data utilization rate of reading from on-chip memory reaches 100%, and read rate drops to 0 again.It is configurable that the inventive method realizes the multiple parameters such as hunting zone, I/O number, coding rate and hardware consumption, met the different demands of different user, and its coding rate can meet the requirement of HD video real-time coding completely.

Brief description of the drawings

Fig. 1 is macroblock partition pattern diagram in the present invention;

Fig. 2 is piece coupling schematic diagram in estimation;

Fig. 3 is reference data spatial coherence schematic diagram in the present invention;

Fig. 4 is the syntagmatic schematic diagram of the cost of different size piece in the present invention;

Fig. 5 is the relation of differential mode useful signal and data in the present invention;

Fig. 6 is part cost and the cumulative process schematic diagram in register in the present invention.

Embodiment

As shown in Figure 2, the process of estimation is the position in reference block that goes out current macro by the fortune merit vector prediction of coded macroblocks, then travels through within the scope of hunting zone centered by predicted position, by judging its residual error cost, determines motion vector.

The present embodiment is taked following configuration: hunting zone (32 ,+32), configuring 5 PE, macroblock size is selected 16*16, smallest partition 4*4, piece is cut apart 7 kinds of patterns (16 × 16,16 × 8,8 × 16,8 × 8,8 × 4,4 × 8,4 × 4), 41 fritters of meter.The motion estimation implementing method of a kind of configurable speed in video compression of the present invention, concrete steps are as follows:

Step 1, it is 5 according to the PE unit number of user's request reasonable disposition parallel processing.

According to user to I/O resource, in real time processing speed, the requirement of the aspects such as hardware consumption, determines the number of PE, horizontal position is by the PE serial process configuring.

Step 2, in inside, PE unit, calculate basic block cost.

Step 2.1, from on-chip memory, read line by line reference data each pixel of each row of the row data and current macro MB is asked to differential mode, wherein, on-chip memory size is 80*80pixels, and macroblock size is 16*16pixels, (32 ,+32) are hunting zone.Having 65*65 position to need traversal, is respectively P _x1y1, P _x1y2, P _x1y3... P _x1y65, P _x2y1, P _x2y2... .P _x65y64, P _x65y65.In the time of 5 PE of configuration, as shown in Figure 3, that first PE (being PE1) processes is P _x1yi(i=1,2,3 ..., 65), P _x6yi(i=1,2,3 ..., 65), that parallel second PE (being PE2) processes is P _x2yi(i=1,2,3 ..., 65), P _x7yi(i=1,2,3 ..., 65).

Read the first row reference data from on-chip memory time, PE1 gets 16 pixels of 1 row of first position, asks differential mode computing respectively with 16 row pixels of current macro, obtains 256 differential mode results.

Step 2.2,16*16 the differential mode that adheres to 64 4*4 pieces separately summed up as part cost, wherein, b is the size of smallest partition piece.

Shown in Fig. 1 and Fig. 4, the cost value of the known 4*4 of drawing piece just can by the method that merges cut apart the cost value of pattern to other differences, therefore by these 256 differential modes taking affiliated 4*4 piece as base unit sums up, obtain the part cost of 64 4*1 pieces.

Wherein, cost computing formula is: J (s, c (m))=SAD (s, c (m)),

SAD (s, c (m)) = Σ_{x = 1}^{M} Σ_{y = 1}^{N} | s (x, y) - c (x - m_{x}, y - m_{y}) |,

Wherein, J is cost function, and s is current initial data of encoding, and c be coding and rebuilding for carrying out the data of reference frame of motion compensation.M, N is the parameter of the ∑ of Matrix Calculating and symbol, is respectively line number and the columns of summed matrix, for the part cost of 4*1 piece, M=4, N=1.

Step 2.3, according to different traversal positions, the validity of determining section cost, thus draw part cost and produce its useful signal.Specific as follows:

From the traversal mode of entirely searching for, the first row data of reference windows only and P _x1yithere is correlation, and the second row data and P _x1yi, P _x2yithere is correlation, by that analogy, the 16 row data and P _x1yi, P _x2yi... P _x16yihave correlation, whether the result thus can determining step 2.2 drawing is effective, so there is the part cost validity that is similar to parallelogram as shown in Figure 5.Known 64 the part costs that produce since the reference data of 16 row that read are all that effectively they belong to respectively different positions.

Step 3, cost correlation based on different size piece, draw the various costs of cutting apart the different masses under pattern.This step is to adopt " retaining " principle to calculate optimal movement information MV.

Step 3.1,16, the register of storage 4*4 piece cost memory allocated space are set;

Step 3.3, judge the useful signal of the part cost that input step 2.3 obtains, by its corresponding effectively part cost by the cumulative memory space having distributed that enters of diverse location.Specific as follows:

The data accumulation record of cost register as shown in Figure 6, first row is the label of cost register, and label a line is thereafter the part worth of data receiving, capitalization A～P in figure is 16 row of current macro, and the line number of the reference pixel that reads from on-chip memory of numeral in table, taking No. 1 register as example, first cycle deposits A1 (the part cost of the first row of current macro and reference data the first row) in, second period is by the cumulative B2 register that enters No. 1, C3 afterwards, D4, now, in No. 1 register, cumulative A1B2C3D4 is exactly P _x1y1the complete cost of first row 4*4 piece, therefore, full (full is the full signal of the cistern) home position signal in module, the E5 in the 5th cycle deposits and covers register in No. 1, then E5F6G7H8 added up after by full set, it is known that now send is P _x1y1the complete cost of second row 4*4 piece, according to this principle, No. 1 every 4 cycle of register send a complete 4*4 cost, 16 cycle can complete the traversal of a point, and can start to receive the data of the 17th position of storage.15 register principles are below with above-mentioned consistent.

Step 3.4, judge that whether full is effective, complete cost is sent to the positional information being drawn by this register label in addition of simultaneously sending.

Step 3.5, return to step 3.2, until obtain the complete cost of the 4*4 sub-block of 16 16*16 pieces, utilize cost correlation between different masses, splicing draws whole costs of cutting apart pattern in 7;

Meanwhile, the complete cost of the more current complete cost of sending and last position, selects the cost information storage of the little position of distortion, and current cost is also stored and done other match patterns uses, 16 × 16,16 × 8,8 × 16,8 × 8,8 × 4,4 × 8 match pattern.

Step 4, run through whole reference datas line by line, the final cost that each PE is obtained compares, and gets 7 kinds of minimum costs of cutting apart 41 fritters of pattern and determines optimal movement information MV, thereby realized estimation.

When the inventive method is chosen following parameter: hunting zone (32 ,+32), configure 65 PE, macroblock size is selected 16*16, smallest partition 4*4, piece is cut apart 41 fritters of 7 kinds of patterns (16 × 16,16 × 8,8 × 16,8 × 8,8 × 4,4 × 8,4 × 4), need 80 cycles to complete the coupling of current macro; When 5 PE of configuration, need 1040 cycles to complete the coupling of current macro, the two is under the support of SMIC0.13 μ m CMOS technology library, and the processing speed that can reach is respectively 1920 × 1080@36fps and 1920 × 1080@462fps, has met the demand of HD video real-time coding.

Speed 30 frames using present HD video coding are per second as the minimum standard that realizes, and configure 5 PE unit and process, and under SMIC (SMIS) 0.13 μ m CMOS technology library is supported, circuit performance parameters is as shown in the table:

Hunting zone	65×65
		Piece Dimension Types	4×4，4×8，8×4，8×8，16×8，8×16，16×16
Technique	SMIC 0.13μm CMOS
		Door number	150K
On-chip SRAM	80×80×8bits
		Frequency	300MHz
Cycles/MB	1040cycle
		Processing speed	1920*1080@36fps

As shown in the table, for utilizing the technology comparing result data of the inventive method and prior art:

Can find out, the coding rate of the inventive method can meet the requirement of HD video real-time coding completely.

Claims

1. a motion estimation implementing method for configurable speed in video compression, is characterized in that, concrete steps are as follows:

Step 2, in inside, PE unit, calculate basic block cost; Concrete grammar is:

Step 2.3, according to different traversal positions, the validity of determining section cost, thus draw part cost and produce its useful signal;

Step 3, cost correlation based on different size piece, draw the various costs of cutting apart the different masses under pattern, and concrete grammar is:

Step 3.4, judge that whether full is effective, complete cost is sent to the positional information being drawn by register label in addition of simultaneously sending; Described register label is the sequence number of the register of setting in step 3.1;

Step 3.5, return to step 3.2, until obtain the complete cost of the b*b sub-block of a a*a piece, utilize cost correlation between different masses, splicing draws various whole costs of cutting apart pattern;

Meanwhile, the complete cost of the more current complete cost of sending and last position, selects the cost information storage of the little position of distortion, and current cost is also stored and done other match patterns uses;