CN102413329B - Motion estimation realizing method of configurable speed in video compression - Google Patents
Motion estimation realizing method of configurable speed in video compression Download PDFInfo
- Publication number
- CN102413329B CN102413329B CN201110371098.8A CN201110371098A CN102413329B CN 102413329 B CN102413329 B CN 102413329B CN 201110371098 A CN201110371098 A CN 201110371098A CN 102413329 B CN102413329 B CN 102413329B
- Authority
- CN
- China
- Prior art keywords
- cost
- piece
- complete
- register
- motion estimation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Abstract
The invention discloses a motion estimation realizing method of a configurable speed in video compression. The motion estimation realizing method specifically comprises the following steps of: reasonably configuring and processing quantity of PE units according to user demands; in the PE units, calculating costs of basic blocks; based on the cost correlation of different sizes of blocks to obtain costs of different blocks under different partition modes; and reading complete all reference data line by line, comparing final costs obtained by all PEs and taking minimum cost to determine optimal motion information MV. According to the invention, times of storage access are effectively reduced, and coding speed completely can meet requirements of real-time coding of high-definition video.
Description
Technical field
The invention belongs to video compression transmission technology technical field, be specifically related to a kind of motion estimation implementing method of configurable speed in video compression.
Background technology
The HD video the most frequently used coded format of encoding has MPEG-2-TS, MPEG-4, VC-1 and H.264/AVC etc.The total feature of these standards is good network compatibility and efficient coding quality, and is easy to hardware realization etc., therefore aspect video compression, is widely used.In the hardware configuration of video encoder, the computation complexity of interframe movement estimation module and memory bandwidth consumption account for 50%~90%, and therefore, the performance of interframe movement estimation module has directly determined the performance of encoder.
The main process of interframe encode is: first original image is drawn to piece, carry out estimation taking piece as unit, in order to improve precision, conventionally these pieces are cut apart again, carry out match search with different sized blocks, main flow coding standard is original image to be divided into 16 × 16 macro block MB (micro block) at present, then be 16 × 16 by this macroblock partitions, 16 × 8, 8 × 16, 8 × 8, 8 × 4, 4 × 8, 4 × 4 these 7 kinds cut apart, totally 41 current blocks, cut apart under pattern this, the movable information of the adjacent block of having encoded by current block is motion vector MV (motion ventor), in its reference frame image, draw a prediction piece, again centered by this piece, to extending out m pixel, draw the search window of estimation, containing k pixel, k=(m*2+16) * (m*2+16).Then allow these 7 kinds 41 sub-blocks cutting apart in this region of search, carry out match search, then determine motion vector MV by the size that compares its cost.
The hard-wired main key technology of estimation: altitude information utilance, the cost relation of low distortion and different masses size.
The hard-wired data reusing technology of estimation, can effectively reduce memory access number of times, thereby effectively reduces hardware resource consumption and system power dissipation.At present, the Hardware Implementation that a kind of altitude information is reused has become study hotspot.In existing document, specified the classification of data reusing degree, A level mechanism is reused the overlapping reference pixel in the adjacent reference block of a current block.B level mechanism is reused the adjacent overlapping reference pixel with reference to band of a current block.C level mechanism is reused the overlapping region of the search window of adjacent current block.D level mechanism is reused the pixel in the whole search window of continuous current block.A level mechanism has minimum storage area but needs maximum memory access number of times, and D level mechanism memory access number of times minimum still consumes memory space on maximum sheet.According to different demands, need to adopt different data reusing mechanism to carry out the contradiction between balance memory space and memory access.Current C DBMS is reused under the restriction of current memory bandwidth the most efficient, and therefore most design adopts C data reusing.
Searching algorithm is another key in estimation, mainly comprises two kinds of modes of full-search algorithm and fast search algorithm.Full-search algorithm is successively to travel through with reference to all positions in window, and this method has the highest fidelity, but has again comparatively speaking maximum hardware consumption.There are at present a lot of fast search algorithms, but all taking distortion as cost, under the prerequisite therefore allowing in system, should select the searching algorithm of distortion factor minimum as far as possible.
The motion estimation algorithm of variable block length, has improved precision, but has also brought very large computation complexity.Less cut size can drop to minimum by the distortion of coding, and the computation complexity therefore bringing can draw by cost combination between its different size piece, effectively reduces computation complexity.
Summary of the invention
The object of this invention is to provide a kind of motion estimation implementing method of configurable speed in video compression, effectively reduce the number of times of memory access, coding rate can meet the requirement of HD video real-time coding completely.
The technical solution adopted in the present invention is, a kind of motion estimation implementing method of configurable speed in video compression, is characterized in that, concrete steps are as follows:
The concrete grammar of step 2 is:
Step 2.1, from on-chip memory, read line by line reference data each pixel of each row of the row data and current macro MB is asked to differential mode, wherein, on-chip memory size is (m*2+a) * (m*2+a) pixels, macroblock size is a*a pixels, (m ,+m) be hunting zone;
Step 2.2, will adhere to a separately
2the a*a an of/b b*b piece differential mode sums up as part cost, and wherein, b is the size of smallest partition piece;
Step 2.3, according to different traversal positions, the validity of determining section cost, thus draw part cost and produce its useful signal.
The concrete grammar of step 3 is:
Step 3.1, the register of storage b*b piece cost memory allocated space are set;
Step 3.2, in each memory space, counter is set, draws the whether cumulative complete signal full of judgment part cost;
Step 3.3, judge the useful signal of the part cost that input step 2.3 obtains, by its corresponding effectively part cost by the cumulative memory space having distributed that enters of diverse location;
Step 3.4, judge that whether full is effective, complete cost is sent to the positional information being drawn by this register label in addition of simultaneously sending;
Step 3.5, return to step 3.2, until obtain the complete cost of a a*a piece b*b sub-block, utilize cost correlation between different masses, splicing draws various whole costs of cutting apart pattern;
Meanwhile, the complete cost of the more current complete cost of sending and last position, selects the cost information storage of the little position of distortion, and current cost is also stored and done other match patterns uses.
The motion estimation implementing method of a kind of configurable speed in video compression of the present invention, utilize fully the correlation on reference data space, under the prerequisite that does not reduce encoding precision, reduce memory access number of times, the in the situation that of configuration and horizontal column position equivalent number PE unit, the data utilization rate of reading from on-chip memory reaches 100%, and read rate drops to 0 again.It is configurable that the inventive method realizes the multiple parameters such as hunting zone, I/O number, coding rate and hardware consumption, met the different demands of different user, and its coding rate can meet the requirement of HD video real-time coding completely.
Brief description of the drawings
Fig. 1 is macroblock partition pattern diagram in the present invention;
Fig. 2 is piece coupling schematic diagram in estimation;
Fig. 3 is reference data spatial coherence schematic diagram in the present invention;
Fig. 4 is the syntagmatic schematic diagram of the cost of different size piece in the present invention;
Fig. 5 is the relation of differential mode useful signal and data in the present invention;
Fig. 6 is part cost and the cumulative process schematic diagram in register in the present invention.
Embodiment
As shown in Figure 2, the process of estimation is the position in reference block that goes out current macro by the fortune merit vector prediction of coded macroblocks, then travels through within the scope of hunting zone centered by predicted position, by judging its residual error cost, determines motion vector.
The present embodiment is taked following configuration: hunting zone (32 ,+32), configuring 5 PE, macroblock size is selected 16*16, smallest partition 4*4, piece is cut apart 7 kinds of patterns (16 × 16,16 × 8,8 × 16,8 × 8,8 × 4,4 × 8,4 × 4), 41 fritters of meter.The motion estimation implementing method of a kind of configurable speed in video compression of the present invention, concrete steps are as follows:
According to user to I/O resource, in real time processing speed, the requirement of the aspects such as hardware consumption, determines the number of PE, horizontal position is by the PE serial process configuring.
Step 2.1, from on-chip memory, read line by line reference data each pixel of each row of the row data and current macro MB is asked to differential mode, wherein, on-chip memory size is 80*80pixels, and macroblock size is 16*16pixels, (32 ,+32) are hunting zone.Having 65*65 position to need traversal, is respectively P
x1y1, P
x1y2, P
x1y3... P
x1y65, P
x2y1, P
x2y2... .P
x65y64, P
x65y65.In the time of 5 PE of configuration, as shown in Figure 3, that first PE (being PE1) processes is P
x1yi(i=1,2,3 ..., 65), P
x6yi(i=1,2,3 ..., 65), that parallel second PE (being PE2) processes is P
x2yi(i=1,2,3 ..., 65), P
x7yi(i=1,2,3 ..., 65).
Read the first row reference data from on-chip memory time, PE1 gets 16 pixels of 1 row of first position, asks differential mode computing respectively with 16 row pixels of current macro, obtains 256 differential mode results.
Step 2.2,16*16 the differential mode that adheres to 64 4*4 pieces separately summed up as part cost, wherein, b is the size of smallest partition piece.
Shown in Fig. 1 and Fig. 4, the cost value of the known 4*4 of drawing piece just can by the method that merges cut apart the cost value of pattern to other differences, therefore by these 256 differential modes taking affiliated 4*4 piece as base unit sums up, obtain the part cost of 64 4*1 pieces.
Wherein, cost computing formula is: J (s, c (m))=SAD (s, c (m)),
Wherein, J is cost function, and s is current initial data of encoding, and c be coding and rebuilding for carrying out the data of reference frame of motion compensation.M, N is the parameter of the ∑ of Matrix Calculating and symbol, is respectively line number and the columns of summed matrix, for the part cost of 4*1 piece, M=4, N=1.
Step 2.3, according to different traversal positions, the validity of determining section cost, thus draw part cost and produce its useful signal.Specific as follows:
From the traversal mode of entirely searching for, the first row data of reference windows only and P
x1yithere is correlation, and the second row data and P
x1yi, P
x2yithere is correlation, by that analogy, the 16 row data and P
x1yi, P
x2yi... P
x16yihave correlation, whether the result thus can determining step 2.2 drawing is effective, so there is the part cost validity that is similar to parallelogram as shown in Figure 5.Known 64 the part costs that produce since the reference data of 16 row that read are all that effectively they belong to respectively different positions.
Step 3.1,16, the register of storage 4*4 piece cost memory allocated space are set;
Step 3.2, in each memory space, counter is set, draws the whether cumulative complete signal full of judgment part cost;
Step 3.3, judge the useful signal of the part cost that input step 2.3 obtains, by its corresponding effectively part cost by the cumulative memory space having distributed that enters of diverse location.Specific as follows:
The data accumulation record of cost register as shown in Figure 6, first row is the label of cost register, and label a line is thereafter the part worth of data receiving, capitalization A~P in figure is 16 row of current macro, and the line number of the reference pixel that reads from on-chip memory of numeral in table, taking No. 1 register as example, first cycle deposits A1 (the part cost of the first row of current macro and reference data the first row) in, second period is by the cumulative B2 register that enters No. 1, C3 afterwards, D4, now, in No. 1 register, cumulative A1B2C3D4 is exactly P
x1y1the complete cost of first row 4*4 piece, therefore, full (full is the full signal of the cistern) home position signal in module, the E5 in the 5th cycle deposits and covers register in No. 1, then E5F6G7H8 added up after by full set, it is known that now send is P
x1y1the complete cost of second row 4*4 piece, according to this principle, No. 1 every 4 cycle of register send a complete 4*4 cost, 16 cycle can complete the traversal of a point, and can start to receive the data of the 17th position of storage.15 register principles are below with above-mentioned consistent.
Step 3.4, judge that whether full is effective, complete cost is sent to the positional information being drawn by this register label in addition of simultaneously sending.
Step 3.5, return to step 3.2, until obtain the complete cost of the 4*4 sub-block of 16 16*16 pieces, utilize cost correlation between different masses, splicing draws whole costs of cutting apart pattern in 7;
Meanwhile, the complete cost of the more current complete cost of sending and last position, selects the cost information storage of the little position of distortion, and current cost is also stored and done other match patterns uses, 16 × 16,16 × 8,8 × 16,8 × 8,8 × 4,4 × 8 match pattern.
When the inventive method is chosen following parameter: hunting zone (32 ,+32), configure 65 PE, macroblock size is selected 16*16, smallest partition 4*4, piece is cut apart 41 fritters of 7 kinds of patterns (16 × 16,16 × 8,8 × 16,8 × 8,8 × 4,4 × 8,4 × 4), need 80 cycles to complete the coupling of current macro; When 5 PE of configuration, need 1040 cycles to complete the coupling of current macro, the two is under the support of SMIC0.13 μ m CMOS technology library, and the processing speed that can reach is respectively 1920 × 1080@36fps and 1920 × 1080@462fps, has met the demand of HD video real-time coding.
Speed 30 frames using present HD video coding are per second as the minimum standard that realizes, and configure 5 PE unit and process, and under SMIC (SMIS) 0.13 μ m CMOS technology library is supported, circuit performance parameters is as shown in the table:
Hunting zone | 65×65 |
|
4×4,4×8,8×4,8×8,16×8,8×16,16×16 |
Technique | SMIC 0.13μm CMOS |
Door number | 150K |
On-chip SRAM | 80×80×8bits |
Frequency | 300MHz |
Cycles/MB | 1040cycle |
Processing speed | 1920*1080@36fps |
As shown in the table, for utilizing the technology comparing result data of the inventive method and prior art:
Can find out, the coding rate of the inventive method can meet the requirement of HD video real-time coding completely.
Claims (1)
1. a motion estimation implementing method for configurable speed in video compression, is characterized in that, concrete steps are as follows:
Step 1, according to the PE unit number of user's request reasonable disposition parallel processing;
Step 2, in inside, PE unit, calculate basic block cost; Concrete grammar is:
Step 2.1, from on-chip memory, read line by line reference data each pixel of each row of the row data and current macro MB is asked to differential mode, wherein, on-chip memory size is (m*2+a) * (m*2+a) pixels, macroblock size is a*a pixels, (m ,+m) be hunting zone;
Step 2.2, will adhere to a separately
2the a*a an of/b b*b piece differential mode sums up as part cost, and wherein, b is the size of smallest partition piece;
Step 2.3, according to different traversal positions, the validity of determining section cost, thus draw part cost and produce its useful signal;
Step 3, cost correlation based on different size piece, draw the various costs of cutting apart the different masses under pattern, and concrete grammar is:
Step 3.1, the register of storage b*b piece cost memory allocated space are set;
Step 3.2, in each memory space, counter is set, draws the whether cumulative complete signal full of judgment part cost;
Step 3.3, judge the useful signal of the part cost that input step 2.3 obtains, by its corresponding effectively part cost by the cumulative memory space having distributed that enters of diverse location;
Step 3.4, judge that whether full is effective, complete cost is sent to the positional information being drawn by register label in addition of simultaneously sending; Described register label is the sequence number of the register of setting in step 3.1;
Step 3.5, return to step 3.2, until obtain the complete cost of the b*b sub-block of a a*a piece, utilize cost correlation between different masses, splicing draws various whole costs of cutting apart pattern;
Meanwhile, the complete cost of the more current complete cost of sending and last position, selects the cost information storage of the little position of distortion, and current cost is also stored and done other match patterns uses;
Step 4, run through whole reference datas line by line, the final cost that each PE is obtained compares, and gets minimum cost and is defined as motion vector MV.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110371098.8A CN102413329B (en) | 2011-11-21 | 2011-11-21 | Motion estimation realizing method of configurable speed in video compression |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110371098.8A CN102413329B (en) | 2011-11-21 | 2011-11-21 | Motion estimation realizing method of configurable speed in video compression |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102413329A CN102413329A (en) | 2012-04-11 |
CN102413329B true CN102413329B (en) | 2014-06-04 |
Family
ID=45915138
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110371098.8A Active CN102413329B (en) | 2011-11-21 | 2011-11-21 | Motion estimation realizing method of configurable speed in video compression |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102413329B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104301732B (en) * | 2014-10-13 | 2017-05-17 | 哈尔滨工业大学深圳研究生院 | video coding motion estimation unit hardware circuit |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101778281A (en) * | 2010-01-13 | 2010-07-14 | 中国移动通信集团广东有限公司中山分公司 | Method for estimating H.264-based fast motion on basis of structural similarity |
CN102113326A (en) * | 2008-08-04 | 2011-06-29 | 杜比实验室特许公司 | Overlapped block disparity estimation and compensation architecture |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060126739A1 (en) * | 2004-12-15 | 2006-06-15 | Stoner Michael D | SIMD optimization for H.264 variable block size motion estimation algorithm |
US8358695B2 (en) * | 2006-04-26 | 2013-01-22 | Altera Corporation | Methods and apparatus for providing a scalable motion estimation/compensation assist function within an array processor |
-
2011
- 2011-11-21 CN CN201110371098.8A patent/CN102413329B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102113326A (en) * | 2008-08-04 | 2011-06-29 | 杜比实验室特许公司 | Overlapped block disparity estimation and compensation architecture |
CN101778281A (en) * | 2010-01-13 | 2010-07-14 | 中国移动通信集团广东有限公司中山分公司 | Method for estimating H.264-based fast motion on basis of structural similarity |
Non-Patent Citations (2)
Title |
---|
H.264中运动估计算法的一种硬件实现架构;白向晖等;《电视技术》;20041117(第11期);第17-19页 * |
白向晖等.H.264中运动估计算法的一种硬件实现架构.《电视技术》.2004,(第11期), |
Also Published As
Publication number | Publication date |
---|---|
CN102413329A (en) | 2012-04-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107534770B (en) | Image prediction method and relevant device | |
CN102165780B (en) | Video encoder and method, and video decoder and method thereof | |
CN100471275C (en) | Motion estimating method for H.264/AVC coder | |
CN103931184B (en) | Method and apparatus for being coded and decoded to video | |
CN107734335A (en) | Image prediction method and relevant apparatus | |
CN100415002C (en) | Multi-mode multi-viewpoint video signal code compression method | |
CN103891290A (en) | Motion vector processing | |
CN102934444A (en) | Method and apparatus for video encoding and method and apparatus for video decoding | |
CN102291581B (en) | Realizing method of self-adaptive motion estimation supporting frame field | |
CN104918053A (en) | Methods and apparatuses for encoding and decoding motion vector | |
CN106464908A (en) | Method and device for transmitting prediction mode of depth image for interlayer video encoding and decoding | |
CN103188496A (en) | Fast motion estimation video encoding method based on motion vector distribution forecast | |
CN101986716A (en) | Quick depth video coding method | |
CN103096090A (en) | Method of dividing code blocks in video compression | |
CN102811346A (en) | Encoding mode selection method and system | |
CN103079067A (en) | Motion vector predicted value list construction method and video encoding and decoding method and device | |
CN102148990B (en) | Device and method for predicting motion vector | |
CN102647598A (en) | H.264 inter-frame mode optimization method based on maximin MV (Music Video) difference value | |
CN104919799A (en) | Method and apparatus of depth to disparity vector conversion for three-dimensional video coding | |
CN1703094B (en) | Image interpolation apparatus and methods that apply quarter pel interpolation to selected half pel interpolation results | |
CN101860747B (en) | Sub-pixel movement estimation system and method | |
CN101959067B (en) | Decision method and system in rapid coding mode based on epipolar constraint | |
CN102413329B (en) | Motion estimation realizing method of configurable speed in video compression | |
CN103096064B (en) | The method and relevant device of coding and reconstructed pixel block | |
CN101227616B (en) | H.263/AVC integer pixel vectors search method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C41 | Transfer of patent application or patent right or utility model | ||
TR01 | Transfer of patent right |
Effective date of registration: 20160829 Address after: 7, building 15, 510665 software Road, Guangzhou, Guangdong, Tianhe District Patentee after: Guangzhou Qing Ji Polytron Technologies Inc Address before: 710048 Shaanxi city of Xi'an Province Jinhua Road No. 5 Patentee before: Xi'an University of Technology |