CN100413344C

CN100413344C - Method for realizing high-parallel frame predicator

Info

Publication number: CN100413344C
Application number: CNB2006101138688A
Authority: CN
Inventors: 李树国; 杨晨
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2006-10-20
Filing date: 2006-10-20
Publication date: 2008-08-20
Anticipated expiration: 2026-10-20
Also published as: CN1937774A

Abstract

The invention belongs to Video decoder IC design field. The character is in that: to the same operation among the prediction formulas in 17 different prediction modes of 4x4 block with 16 pixels adopting digit computation strength cut method to remove computation redundance; providing a in-frame predictor system with high degree of parallelism, which can process the predicted values of 16 pixels within every clock cycle. From the results achieved, compared to the design with the use of reconstruction, this invention can decrease circuit area under same parallelism and simplifies the control logic.

Description

The implementation method of the intra predictor generator of high degree of parallelism

Technical field

The present invention relates to the integrated circuit (IC) design field of coding and decoding video.

Background technology

H.264 standard is the joint video team (JVT of the MPEG (Motion Picture Experts Group) of the VCEG (video coding expert group) of international telecommunication union telecommunications sector ITU-T and International Standards Organization/ISO/IEC of International Electrotechnical Commission, Joint VideoTeam) Kai Fa standard, this is present up-to-date video encoding standard.Similar to standard before, they all are by space of removing image and the purpose that temporal redundancy reaches compression.

Infra-frame prediction among the H 264:

The infra-frame prediction of H 264 be utilize the top of current block and left face pixel mutually as the reference pixel, the pixel of current block is predicted, remove spatial redundancies effectively, greatly improved the efficient of coding.For luminance component, H.264 have the piece of two kinds of different sizes to carry out infra-frame prediction: the piece of 4 * 4 sizes is applicable to the image that has a large amount of details, and 9 kinds of predictive modes are arranged; The piece of 16 * 16 sizes is applicable to the image of flat site, and 4 kinds of predictive modes are arranged.For chromatic component, have only the piece of 8 * 8 sizes, 4 kinds of predictive modes are arranged.For the piece and different predictive modes of different sizes, decoder adopts different predictor formulas to obtain the predicted value of pixel.The prediction piece of 3 kinds of different sizes and the predictive mode that reaches 17 kinds when improving precision of prediction, make that also complexity and the redundancy when hardware is realized increases greatly.

Next the discussion of problem for convenience, Fig. 1 has provided the definition of space coordinates.Wherein each grid is represented a pixel, the grid of grey is represented the predict pixel point of current block, the grid of white is represented the reference image vegetarian refreshments adjacent with current block, level is to the right the positive direction of x axle, be the positive direction of y axle straight down, the starting point coordinate of x axle and y axle all is-1, and the step spacing all is 1.Defined after the coordinate system, the position of reference pixel and predict pixel just can be determined by coordinate [x, y] is unique.

P[x ,-1], the reference image vegetarian refreshments of x=-1～15 expression current block tops is named as h1～h17 again; P[-1, y], the reference image vegetarian refreshments of y=0～15 expression current block lefts is named as v2～v17 again.Simultaneously, with pred4 * 4 _L[x, y], x=0～3,16 predict pixel points of the luminance block of y=0～3 expressions, 4 * 4 sizes are with pred16 * 16 _L[x, y], x=0～15,256 predict pixel points of the luminance block of y=0～15 expressions, 16 * 16 sizes are with pred8 * 8 _C[x, y], x=0～7,64 predict pixel points of the chrominance block of y=0～7 expressions, 8 * 8 sizes.

The predictive mode of the luminance block of 4 * 4 sizes:

The top that the luminance block use of 4 * 4 sizes is adjacent and the pixel of left are predicted, one has 9 kinds of predictive modes, as shown in Figure 2, wherein the grid of grey is represented 16 predict pixel points, and the grid that is labeled as h1～h9 and v2～v5 of white is represented adjacent reference image vegetarian refreshments.The name of 9 kinds of predictive modes and description are as shown in table 1, and the predictor formula of each pattern provides behind table.

The name and the description of the predictive mode of table 14 * 4 luminance block

The pattern numbering	Name	Describe
The pattern numbering	Name	Describe	0 (vertically)	Intra_4×4_Vertical	Vertically release the value of predict pixel point by h2～h5
1 (level)	Intra_4×4_Horizontal	Release the value of predict pixel point by v2～v5 level	0 (vertically)	Intra_4×4_Vertical
1 (level)	Intra_4×4_Horizontal	Release the value of predict pixel point by v2～v5 level	2(DC)	Intra_4×4_DC	Release the value of predict pixel point by the mean value of h2～h5 and v2～v5
3 (bottom left diagonal)	Intra_4×4_Diagonal_Down_left	45 degree directions are released the value of predict pixel point along the lower-left	2(DC)	Intra_4×4_DC
3 (bottom left diagonal)	Intra_4×4_Diagonal_Down_left		4 (bottom right diagonal)	Intra_4×4_Diagonal_Down_Right	45 degree directions are released the value of predict pixel point along the bottom right
5 (right vertical)	Intra_4×4_Vertical_Right	Release the value of predict pixel point along the 266 degree directions that vertically take over	4 (bottom right diagonal)	Intra_4×4_Diagonal_Down_Right
5 (right vertical)	Intra_4×4_Vertical_Right		6 (following levels)	Intra_4×4_Horizontal_Down	Along level on the lower side 266 degree directions release the value of predict pixel points
7 (left side is vertical)	Intra_4×4_Vertical_left	Release the value of predict pixel point along the 266 degree directions of vertically taking back	6 (following levels)	Intra_4×4_Horizontal_Down
7 (left side is vertical)	Intra_4×4_Vertical_left		8 (going up level)	Intra_4×4_Horizontal_Up	Along level on the upper side 266 degree directions release the value of predict pixel points

Pattern 0:pred4 * 4 _L[x, y]=p[x ,-1], x=0,1,2,3, y=0,1,2,3

Pattern 1:pred4 * 4 _L[x, y]=p[-1, y], x=0,1,2,3, y=0,1,2,3

Pattern 2: when reference pixel h2～h5 and v2～v5 exist

pred4×4 _L[x，y]＝(p[0，-1]+p[1，-1]+p[2，-1]+p[3，-1]+p[-1，0]+p[-1，1]+p[-1，2]+p[-1，3]+4)/8

When having only reference pixel h2～h5 to exist

pred4×4 _L[x，y]＝(p[0，-1]+p[1，-1]+p[2，-1]+p[3，-1]+2)/4

When having only reference pixel v2～v5 to exist

pred4×4 _L[x，y]＝(p[-1，0]+p[-1，1]+p[-1，2]+p[-1，3]+2)/4

When reference pixel h2～h5 and v2～v5 do not exist

pred4×4 _L[x，y]＝128

Mode 3: pred4 * 4 _L[x, y]=(p[6 ,-1]+3 * p[7 ,-1]+2)/4x=3, y=3

Pred4 * 4 _L[x, y]=(p[x+y ,-1]+2 * p[x+y+1 ,-1]+p[x+y+2 ,-1]+2)/other values of 4 x=, other values of y=

Pattern 4:pred4 * 4 _L[x, y]=(p[x-y-2 ,-1]+2 * p[x-y-1 ,-1]+p[x-y ,-1]+2)/4 x＞y

pred4×4 _L[x，y]＝(p[-1，y-x-2]+2×p[-1，y-x-1]+p[-1，y-x]+2)/4 x＜y

Pred4 * 4 _L[x, y]=(p[0 ,-1]+2 * p[-1 ,-1]+p[-1,0]+2)/other values of 4 x=, other values of y=

Pattern 5:pred4 * 4 _L[x, y]=(p[x-(y/2)-1 ,-1]+p[x-(y/2) ,-1]+1)/2 2 * x-y=0,2,4,6

pred4×4 _L[x，y]＝(p[x-(y/2)-2，-1]+2×p[x-(y/2)-1，-1]+p[x-(y/2)，-1]+2)/4 2×x-y＝1，3，5

pred4×4 _L[x，y]＝(p[-1，0]+2×p[-1，-1]+p[0，-1]+2)/4 2×x-y＝-1

Pred4 * 4 _L[x, y]=(p[-1, y-1]+2 * p[-1, y-2]+p[-1, y-3]+2)/other values of 42 * x-y=

Pattern 6:pred4 * 4 _L[x, y]=(p[-1, y-(x/2)-1]+p[-1, y-(x/2)]+1)/2 2 * y-x=0,2,4,6

pred4×4 _L[x，y]＝(p[-1，y-(x/2)-2]+2×p[-1，y-(x/2)-1]+p[-1，y-(x/2)]+2)/4 2×x-y＝1，3，5

pred4×4 _L[x，y]＝(p[-1，0]+2×p[-1，-1]+p[0，-1]+2)/4 2×x-y＝-1

Pred4 * 4 _L[x, y]=(p[x-1 ,-1]+2 * p[x-2 ,-1]+p[x-3 ,-1]+2)/other values of 42 * x-y=

Mode 7: pred4 * 4 _L[x, y]=(p[x+ (y/2) ,-1]+p[x+ (y/2)+1 ,-1]+1)/2 y=0,2

pred4×4 _L[x，y]＝(p[x+(y/2)，-1]+2×p[x+(y/2)+1，-1]+p[x+(y/2)+2，-1]+2)/4 y＝1，3

Pattern 8:pred4 * 4 _L[x, y]=(p[-1, y+ (x/2)]+p[-1, y+ (x/2)+1]+1)/2 x+2 * y=0,2,4

pred4×4 _L[x，y]＝(p[-1，y+(x/2)]+2×p[-1，y+(x/2)+1]+p[-1，y+(x/2)+2]+2)/4x +2×y＝1，3

pred4×4 _L[x，y]＝(p[-1，2]+3×p[1，3]+2)/4 x+2×y＝5

Pred4 * 4 _L[x, y]=p[-1,3] other values of x+2 * y=

The predictive mode of the luminance block of 16 * 16 sizes:

The top that the luminance block use of 16 * 16 sizes is adjacent and the pixel of left are predicted, one has 4 kinds of predictive modes, as shown in Figure 3, wherein the part of grey is represented 256 predict pixel points, and the part that is labeled as h1～h17 and v2～v17 of white is represented adjacent reference image vegetarian refreshments.The name of 4 kinds of predictive modes and description are as shown in table 2, and the predictor formula of each pattern provides behind table.

The name and the description of the predictive mode of table 216 * 16 luminance block

The pattern numbering	Name	Describe
The pattern numbering	Name	Describe	9 (vertically)	Intra_16×16_Vertical	Vertically release the value of predict pixel point by h2～h17
10 (levels)	Intra_16×16_Horizontal	Release the value of predict pixel point by v2～v17 level	9 (vertically)	Intra_16×16_Vertical
10 (levels)	Intra_16×16_Horizontal	Release the value of predict pixel point by v2～v17 level	11(DC)	Intra_16×16_DC	Release the value of predict pixel point by the mean value of h2～h17 and v2～v17
15 (planes)	Intra_16×16_Plane	Utilize linear plane function to release the value of predict pixel point by h1～h17 and v2～v17	11(DC)	Intra_16×16_DC

Pattern 9:pred16 * 16 _L[x, y]=p[x ,-1] x=0..15, y=0.15

Pattern 10:pred16 * 16 _L[x, y]=p[-1, y] x=0 15, y=0..15

Pattern 11: when reference pixel h2～h17 and v2～v17 exist

pred16×16 _L[x，y]＝(p[0，-1]++p[15，-1]+p[-1，0]++p[-1，15]+16)/32

When having only reference pixel h2～h17 to exist

pred16×16 _L[x，y]＝(p[0，-1]+..+p[15，-1]+8)/16

When having only reference pixel v2～v17 to exist

pred16×16 _L[x，y]＝(p[-1，0]+.+p[-1，15]+8)/16

When reference pixel h2～h17 and v2～v17 do not exist

pred16×16 _L[x，y]＝128

Pattern 15:pred16 * 16 _L[x, y]=Clip1 ((a+b * (x-7)+c * (y-7)+16))/32

a＝16×(p[-1，15]+p[15，-1])，b＝(5×H+32)/64，c＝(5×V+32)/64

H = Σ_{x = 0}^{7} (x + 1) \times (p [8 + x, - 1] - p [6 - x, - 1])

V = Σ_{y = 0}^{7} (y + 1) \times (p [- 1,8 + y] - p [- 1,6 - y])

Wherein the meaning of Clip1 function is:

The predictive mode of the chrominance block of 8 * 8 sizes:

The chrominance block of 8 * 8 sizes uses the pixel on the top and left side that is adjacent to predict, one has 4 kinds of predictive modes, as shown in Figure 4, wherein the part of grey is represented 64 predict pixel points, and the part that is labeled as h1～h9 and v2～v9 of white is represented adjacent reference image vegetarian refreshments.The name of 4 kinds of predictive modes and description are as shown in table 3, and the predictor formula of each pattern provides behind table.

The name and the description of the predictive mode of table 38 * 8 chrominance block

The pattern numbering	Name	Describe
The pattern numbering	Name	Describe	12(DC)	Intra_Chroma_DC	Release the value of predict pixel point by the mean value of h2～h9 and v2～v9
13 (levels)	Intra_Chroma_Horizontal	Release the value of predict pixel point by v2～v9 level	12(DC)	Intra_Chroma_DC
13 (levels)	Intra_Chroma_Horizontal	Release the value of predict pixel point by v2～v9 level	14 (vertically)	Intra_Chroma_Vertical	Vertically release the value of predict pixel point by h2～h9
16 (planes)	Intra_Chroma_Plane	Utilize the red plane function of line to release the value of predict pixel point by h1～h9 and v2～v9	14 (vertically)	Intra_Chroma_Vertical

Pattern 12: be divided into four zones and predict

(1) zone of x=0～3 and y=0～3

When reference pixel h2～h5 and v2～v5 exist

pred8×8 _C[x，y]＝(p[0，-1]+.+p[3，-1]+p[-1，0]+..+p[-1，3]+4)/8

When having only reference pixel h2～h5 to exist

pred8×8 _C[x，y]＝(p[0，-1]+.+p[3，-1]+2)/4

When having only reference pixel v2～v5 to exist

pred8×8 _C[x，y]＝(p[-1，0]++p[-1，3]+2)/4

When reference pixel h2～h5 and v2～v5 do not exist

pred8×8 _C[x，y]＝128

(2) zone of x=4～7 and y=0～3

When reference pixel h6～h9 and v2～v5 exist

pred8×8 _C[x，y]＝(p[4，-1]++p[7，-1]-p[-1，0]++p[-1，3]+4)/8

When having only reference pixel h6～h9 to exist

pred8×8 _C[x，y]＝(p[4，-1]++p[7，-1]+2)/4

When having only reference pixel v2～v5 to exist

pred8×8 _C[x，y]＝(p[-1，0]+..+p[-1，3]+2)/4

When reference pixel h6～h9 and v2～v5 do not exist

pred8×8 _C[x，y]＝128

(3) zone of x=0～3 and y=4～7

When reference pixel h2～h5 and v6～v9 exist

pred8×8 _C[x，y]＝(p[0，-1]++p[3，-1]+p[-1，4]++p[-1，7]+4)/8

When having only reference pixel h2～h5 to exist

pred8×8 _C[x，y]＝(p[0，-1]++p[3，-1]+2)/4

When having only reference pixel v6～v9 to exist

pred8×8 _C[x，y]＝(p[-1，4]+.+p[-1，7]+2)/4

When reference pixel h2～h5 and v6～v9 do not exist

pred8×8 _C[x，y]＝128

(4) zone of x=4～7 and y=4～7

When reference pixel h6～h9 and v6～v9 exist

pred8×8 _C[x，y]＝(p[4，-1]++p[7，-1]+p[-1，4]++p[-1，7]+4)/8

When having only reference pixel h6～h9 to exist

pred8×8 _C[x，y]＝(p[4，-1]+.+p[7，-1]+2)/4

When having only reference pixel v6～v9 to exist

pred8×8 _C[x，y]＝(p[-1，4]+.+p[-1，7]+2)/4

When reference pixel h2～h5 and v2～v5 do not exist

pred8×8 _C[x，y]＝128

Pattern 13:pred8 * 8 _C[x, y]=p[-1, y] x=0.7, y=0.7

Pattern 14:pred8 * 8 _C[x, y]=p[x ,-1] x=0.7, y=07

Pattern 16:pred8 * 8 _C[x, y]=Clip1 ((a+b * (x-3)+c * (y-3)+16))/32

a＝16×(p[-1，7]+p[7，-1])，b＝(17×H+16)/32，c＝(17×V+16)/32

H = Σ_{x = 0}^{3} (x + 1) \times (p [4 + x, - 1] - p [2 - x, - 1])

V = Σ_{y = 0}^{3} (y + 1) \times (p [- 1,4 + y] - p [- 1,2 - y])

At present, have and adopt reconfigurable method to design about the circuit structure of the intra predictor generator of H 264 decoders.The reconfigurable arithmetic element that can calculate the predicted value of 1 pixel each clock cycle of this method design earlier improves the degree of parallelism of calculating by a plurality of such arithmetic elements are walked abreast again.Reconfigurable method is that cost has been saved hardware resource to reduce the degree of parallelism that calculates, and need dispatch the inputoutput data of each arithmetic element, has increased the complexity of control logic.

The present invention has analyzed H.264 17 kinds of predictive modes of the infra-frame prediction of standard, for the identical operation between the predictor formula of 17 kinds of different predictive modes of 16 pixels of each 4 * 4 size block, adopt the method for numerical calculation intensity reduction to remove the redundancy of calculating, propose a kind of intra predictor generator of high degree of parallelism, can calculate the value of 16 predict pixel points each clock cycle.

Summary of the invention

The object of the present invention is to provide a kind of method that adopts numerical calculation intensity to reduce to remove the system configuration of the intra predictor generator of the high degree of parallelism that obtains after the redundancy of calculating, can calculate the value of 16 predict pixel points each clock cycle.

For the identical operation between the predictor formula of 17 kinds of different predictive modes of 16 pixels of each 4 * 4 size block, adopt the method for numerical calculation intensity reduction to remove the redundancy of calculating, propose a kind of system configuration of intra predictor generator of high degree of parallelism, can calculate the value of 16 predict pixel points each clock cycle.From realizing the result, compare with the design of adopting the restructural method, the present invention has reduced circuit area under identical degree of parallelism, simplified control logic.

Numerical calculation intensity reduction algorithm:

The reduction of numerical calculation intensity is intended to reduce the digital conversion technology of calculating strength.The thought of this method is by certain conversion and reorganization, finds out subexpression identical in the different calculating, i.e. redundant operation, only need once calculate these redundant operations, and result of calculation is shared, thus the intensity of calculating reduced, save hardware resource.

Design philosophy:

Observe Fig. 2, Fig. 3 and the predictor formula of 17 kinds of predictive modes altogether shown in Figure 4, as can be seen, it is identical that a lot of subexpressions are arranged between the predictor formula of the different mode of same pixel, be fit to adopt the method for numerical calculation intensity reduction to share these subexpressions, thereby save hardware resource.Further observe and find, a lot of redundant computation are not only arranged between the predictor formula of the different predictive modes of same pixel, and a lot of identical subexpressions are also arranged between the predictor formula of the same predictive mode of 16 pixels of one 4 * 4 size block.So the system configuration of the intra predictor generator that the present invention proposes is based on following design philosophy: to the predictor formula of 17 kinds of predictive modes of 16 pixels of each 4 * 4 size block, adopt the method for numerical calculation intensity reduction, eliminate redundant operation wherein.

Adopt such structure, can calculate the value of 16 predict pixel points each clock cycle, thereby, reduce hardware resource effectively not reducing under the condition of calculating degree of parallelism.During specific implementation, to 4 * 4 size block shown in Figure 2, each clock cycle obtains the value of 16 predict pixel points of this piece, to 16 * 16 size block shown in Figure 3 and 8 * 8 size block shown in Figure 4, be divided into 16 and 44 * 4 size block respectively, according to the order of raster scan, each clock cycle obtains the value of 16 predict pixel points of one of them 4 * 4 size block.

The intra predictor generator entire block diagram:

Among Fig. 2, Fig. 3 and Fig. 4 with 17 kinds of predictive modes from pattern 0 to pattern 16 the numbering.Observe the predictor formula of 17 kinds of predictive modes, as can be seen, pattern 0 all has only add operation (wherein the multiplying with constant can realize with addition and displacement) to the predictor formula of pattern 14, pattern 15 and pattern 16 all are to utilize linear plane function to predict that its predictor formula comprises addition, subtraction and multiplying.So the intra predictor generator of our design is put pattern 0 together and to be adopted the method for numerical calculation intensity reduction to carry out abbreviation to pattern 14, pattern 15 and pattern 16 method that adopts numerical calculation intensity to reduce of putting together is carried out abbreviation.Fig. 5 has provided the overall structure block diagram of intra predictor generator, and it comprises 4 main modules altogether: the adders module is used for calculating the pattern 0 of 16 pixels of each 4 * 4 size block to the predicted value of pattern 14; The plane module is according to the piece position signalling, to 4 * 4 size block of diverse location in the chrominance block of the luminance block of 16 * 16 sizes or 8 * 8 sizes, calculates the pattern 15 of 16 pixels or the predicted value of pattern 16; The signal whether the select module exists according to different predictive modes and reference pixel 16 the predicted value, selects the predicted value of a pattern to export as the value of 16 predict pixel points from pattern 0 to pattern; The control module is a control unit, for the plane module produces the piece position signalling.

The adders module:

The adders module adopts the method for numerical calculation intensity reduction to reduce the number of times of add operation, for the identical add operation in all predictor formulas of pattern 14 of the pattern 0 of 16 pixels of each 4 * 4 size block, result of calculation is shared, eliminated the redundancy of calculating.The structured flowchart of the adders module that finally obtains wherein has 22 8 adders as shown in Figure 6,21 9 adders, 8 10 adders, 2 11 adders and 1 12 adder, 55 adders altogether.The input data of adders module shown in Figure 6 are reference pixel h1～h17 and v2～v17, and dateout is all predicted values that the pattern 0 of 16 pixels of each 4 * 4 size block arrives pattern 14.In order to reduce the length of critical path, between 10bit adder group and 11bit adder group, insert one group of register.The adder that the method for reducing according to numerical calculation intensity designs can increase the adder bit wide step by step, and according to reconfigurable method for designing, the adder in the arithmetic element all need be set to 16 adder according to the data bit width of maximum possible.

The plane module:

The plane module adopts the method for numerical calculation intensity reduction to remove the redundancy of two aspects: on the one hand, the predictive mode 16 of the predictive mode 15 of the luminance block of 16 * 16 sizes and the chrominance block of 8 * 8 sizes all is to utilize linear plane function to predict, forecasting process is similar fully, be varying in size of piece, so the calculating of pattern 16 can all be incorporated in the pattern 15; On the other hand, pattern 15 and pattern 16 comprise 2 multiplyings to the calculating of the predicted value of each pixel, multiplying in the calculating of the predicted value of 16 pixels can be split as add operation, and remove redundancy wherein, thereby eliminate multiplying.The structured flowchart of the plane module that finally obtains as shown in Figure 7, constant calculations module wherein is according to predictor formula calculating parameter a, b, the c of pattern 15 and pattern 16, the predictor calculation module is according to a, b, c and piece position (xb, yb) predicted value of computation schema 15 or pattern 16.

The select module:

The structure of select module as shown in Figure 8.The signal that whether exists according to predictive mode and reference pixel, the pattern 0 that calculates from adders module and plane module by MUX is to the predicted value of pattern 16, for 16 pixels of each 4 * 4 size block are selected the predicted value output of a pattern, and the predicted value of output finished round up and shifting function.

The control module:

The structure of the high degree of parallelism that the present invention proposes can calculate the predicted value of 17 patterns of 16 pixels of one 4 * 4 size block simultaneously, so the control that the control module need be finished is just very simple, only need produce different piece position (xb to 4 * 4 size block of diverse location in the chrominance block of the luminance block of 16 * 16 sizes or 8 * 8 sizes, yb, concrete production method is shown in table 4 and table 5.And adopt reconfigurable method for designing, each arithmetic element all needs the scheduling that a control module is responsible for the data flow in this arithmetic element, also have a total control module to be responsible for the selection of input data of each arithmetic element and the scheduling of the data flow between each arithmetic element between the arithmetic element, along with the raising of degree of parallelism, it is complicated that control logic also can become.

The piece position production method of table 416 * 16 luminance block

The zone of 16 * 16 luminance block	(xb，yb)	The zone of 16 * 16 luminance block	(xb，yb)
The zone of 16 * 16 luminance block	(xb，yb)	The zone of 16 * 16 luminance block	(xb，yb)	x＝0～3，y＝0～3	(0，0)	x＝8～11，y＝0～3	(2，0)
x＝0～3，y＝4～7	(0，1)	x＝8～11，y＝4～7	(2，1)	x＝0～3，y＝0～3	(0，0)	x＝8～11，y＝0～3	(2，0)
x＝0～3，y＝4～7	(0，1)	x＝8～11，y＝4～7	(2，1)	x＝0～3y＝8～11	(0，2)	x＝8～11，y＝8～11	(2，2)
x＝0～3，y＝12～15	(0，3)	x＝8～11，y＝12～15	(2，3)	x＝0～3y＝8～11	(0，2)	x＝8～11，y＝8～11	(2，2)
x＝0～3，y＝12～15	(0，3)	x＝8～11，y＝12～15	(2，3)	x＝4～7，y＝0～3	(1，0)	x＝12～15，y＝0～3	(3，0)
x＝4～7，y＝4～7	(1，1)	x＝12～15，y＝4～7	(3，1)	x＝4～7，y＝0～3	(1，0)	x＝12～15，y＝0～3	(3，0)
x＝4～7，y＝4～7	(1，1)	x＝12～15，y＝4～7	(3，1)	x＝4～7，y＝8～11	(1，2)	x＝12～15，y＝8～11	(3，2)
x＝4～7，y＝12～15	(1，3)	x＝12～15，y＝12～15	(3，3)	x＝4～7，y＝8～11	(1，2)	x＝12～15，y＝8～11	(3，2)

The piece position production method of table 58 * 8 chrominance block

The zone of 8 * 8 luminance block	(xb，yb)	The zone of 8 * 8 luminance block	(xb，yb)
The zone of 8 * 8 luminance block	(xb，yb)	The zone of 8 * 8 luminance block	(xb，yb)	x＝0～3，y＝0～3	(0 0)	x＝4～7，y＝0～3	(1 0)
x＝0～3 y＝4～7	(0 1)	x＝4～7，y＝4～7	(1 1)	x＝0～3，y＝0～3	(0 0)	x＝4～7，y＝0～3	(1 0)

The invention is characterized in that this intra predictor generator is realized with FPGA, contains adder Module adders, linear plane function module plane selects module select, and control module control, wherein:

Adder Module adders, be connected in series successively and form by 8 adder groups, 9 adder groups, 10 adder groups, registers group, 11 adder groups and 12 adder groups, the intra prediction mode H.264 0 of 16 pixels that is used to calculate current each 4 * 4 size block is to the predicted value of intra prediction mode 14, wherein:

8 adder groups, form by 22 8 adders, wherein the input signal of each adder is certain two pixel among reference pixel h1～h17 and the v2～v17, for current 4 * 4 size block, be set as follows coordinate system: level is the positive direction of x axle to the right, is the forward and reverse of y axle straight down, the starting point coordinate of x, y diaxon all is-1, h1～h17 represents the reference image vegetarian refreshments of current 4 * 4 size block top, and v2～v17 represents the reference image vegetarian refreshments of current 4 * 4 size block lefts, down together;

9 adder groups are made up of 21 9 adders, and wherein the input signal of each adder is the output of 22 8 adders and among reference pixel h9 and the v5 certain two;

10 adder groups are made up of 8 10 adders, and wherein the input signal of each adder is in the output of 21 9 adders certain two;

Registers group is made up of 8 10 bit registers, stores the output valve of 8 10 adders respectively;

11 adder groups are made up of 2 11 adders, and wherein the input signal of each adder is in the output of 8 10 bit registers certain two;

12 adder groups are made up of 1 12 adder, and its input signal is the output of 2 11 adders;

Linear plane function module plane, be connected in series successively by constant calculations module and predictor calculation module and form, this plane module is according to the piece position signalling, piece to 4 * 4 sizes of diverse location in the chrominance block of the luminance block of 16 * 16 sizes or 8 * 8 sizes, calculate the predictive mode H.264 15 of 16 pixels or the predicted value of predictive mode 16, wherein:

The constant calculations module, two inputs are arranged, one of them input reference pixel h1～h17 and v2～v17, the current piece that will predict of another input signal indication is luminance block or chrominance block, output is constant value a, b and the c in predictive mode 15 or the predictive mode 16, wherein:

Predictive mode 15 is predictive modes of the luminance block of 16 * 16 sizes, is expressed as:

pred16×16 _L[x，y]＝Clip1((a+b×(x-7)+c×(y-7)+16))/32，

a＝16×(p[-1，15]+p[15，-1])，

b＝(5×H+32)/64，

c＝(5×V+32)/64，

H = Σ_{x = 0}^{7} (x + 1) \times (p [8 + x, - 1] - p [6 - x, - 1]),

V = Σ_{y = 0}^{7} (y + 1) \times (p [- 1,8 + y] - p [- 1,6 - y]),

Wherein the meaning of Clip1 function is:

Wherein (x y) is the coordinate of pixel, down together;

Predictive mode 16 is predictive modes of the chrominance block of 8 * 8 sizes, is expressed as:

pred8×8 _C[x，y]＝Clip1((a+b×(x-3)+c×(y-3)+16))/32，

a＝16×(p[-1，7]+p[7，-1])，

b＝(17×H+16)/32，

c＝(17×V+16)/32，

H = Σ_{x = 0}^{3} (x + 1) \times (p [4 + x, - 1] - p [2 - x, - 1]),

V = Σ_{y = 0}^{3} (y + 1) \times (p [- 1,4 + y] - p [- 1,2 - y]);

The predictor calculation module, two inputs are arranged, an input receives the H.264 predictive mode 15 of described constant calculations module output or constant a, b, the c of predictive mode 16, the current piece that will predict of another input indication is luminance block or chrominance block, finishes calculating to the predicted value of each pixel of the chrominance block of the luminance block of 16 * 16 sizes or 8 * 8 sizes by following formula:

(a+b×dx+c×dy)-[b×(3-xp)+c×(3-yp)]

Wherein (dx, dy) be by the position of this 4 * 4 size block in the chrominance block of the luminance block of 16 * 16 sizes or 8 * 8 sizes (xb, yb) Jue Ding a pair of coefficient, (xp yp) is the position coordinates of pixel in this 4 * 4 size block:

Dx=4 * (xb-1), dy=4 * (yb-1), be used for the luminance block of 16 * 16 sizes,

Dx=4 * xb, dy=4 * yb is used for the chrominance block of 8 * 8 sizes,

Described predictor calculation module totally 28 a usefulness adder realizes calculating to the predicted value of 16 pixels of each 4 * 4 size block;

Select module select, have two data-signal inputs, reception predicted value from the predictive mode 0 of module adders output to predictive mode 14, another receives from the predictive mode 15 of module plane output or the predicted value of predictive mode 16, also have two signal input end, an input predictive mode is selected control signal, whether another input exists the differentiation control signal of reference pixel, this select module is selected four kinds of possible prediction of output values of corresponding predictive mode by a MUX: reference pixel value, get corresponding reference pixel value as output valve; Numeral 128, when the luminance block of expression 4 * 4 sizes is predicted according to No. 2 predictive modes, predicted value when reference pixel h2～h5 and v2～v5 do not exist, when perhaps the luminance block of 16 * 16 sizes is predicted according to No. 11 predictive modes, predicted value when reference pixel h2～h17 and v2～v17 do not exist, when perhaps the chrominance block of 8 * 8 sizes is predicted according to No. 12 predictive modes, as reference pixel h2～h5 and v2～v5, perhaps h6～h9 and v2～v5, perhaps h2～h5 and v6～v9, the predicted value when perhaps h6～h9 and v6～v9 do not exist; The numbering of adder, expression are got the output of adder of corresponding numbering as predicted value; The output of plane module, the output that the plane module is got in expression amount to the predicted value of 16 pixels of output as predicted value;

Control module control, the current piece that will predict of input signal indication is luminance block or chrominance block, output be 4 * 4 size block of diverse location in the chrominance block of the luminance block of 16 * 16 sizes or 8 * 8 sizes the piece position (xb, yb).

Realize with FGPA on the design's hardware, carry out the behavioral scaling modeling, carry out RTL level coding and functional simulation with Vetilog with Matlab.Based on the worst technology of SMIC 0.18 μ m, comprehensively the critical path maximum delay of the circuit after is 10ns, and promptly the clock frequency reaches 100MHz, and circuit scale is no more than 140,000.

Description of drawings

Fig. 1 coordinate defined declaration

9 kinds of predictive modes of Fig. 24 * 4 luminance block

4 kinds of predictive modes of Fig. 3 16 * 16 luminance block

4 kinds of predictive modes of Fig. 48 * 8 chrominance block

The entire block diagram of Fig. 5 intra predictor generator

The structured flowchart of Fig. 6 adders module

The structured flowchart of Fig. 7 plane module

The structured flowchart of Fig. 8 select module

Execution mode

Here specify and adopt numerical calculation intensity reduction algorithm, obtain the specific implementation process of adders module and plane module after the removal calculating redundancy, and the execution mode of select module.

Adders module shown in Figure 6 is made of the adder group, and the pattern 0 of 16 pixels by observing 4 * 4 size block is to the predictor formula of pattern 14, to wherein identical add operation, only calculates once and the result is shared.In order to reduce the length of critical path, between 10 adder groups of Fig. 6 and 11 adder groups, inserted one group of register.The add operation that each adder group among Fig. 6 is finished is as follows:

8bit adder group:

Numbering	The 8bit adder	Numbering	The 8bit adder	Numbering	The 8bit adder
Numbering	The 8bit adder	Numbering	The 8bit adder	Numbering	The 8bit adder	8_1	add_8_h1_h2＝h1+h2	8_9	add_8_h10_h11＝h10+h11	8_17	add_8_v6_v7＝v6+v7
8_2	add_8_h2_h3＝h2+h3	8_10	add_8_h12_h13＝h12+h13	8_18	add_8_v8_v9＝v8+v9	8_1	add_8_h1_h2＝h1+h2	8_9	add_8_h10_h11＝h10+h11	8_17	add_8_v6_v7＝v6+v7
8_2	add_8_h2_h3＝h2+h3	8_10	add_8_h12_h13＝h12+h13	8_18	add_8_v8_v9＝v8+v9	8_3	add_8_h3_h4＝h3+h4	8_11	add_8_h14_h15＝h14+h15	8_19	add_8_v10_v11＝v10+v11
8_4	add_8_h4_h5＝h4+h5	8_12	add_8_h16_h17＝h16+h17	8_20	add_8_v12_v13＝v12+v13	8_3	add_8_h3_h4＝h3+h4	8_11	add_8_h14_h15＝h14+h15	8_19	add_8_v10_v11＝v10+v11
8_4	add_8_h4_h5＝h4+h5	8_12	add_8_h16_h17＝h16+h17	8_20	add_8_v12_v13＝v12+v13	8_5	add_8_h5_h6＝h5+h6	8_13	add_8_v1_v2＝h1+v2	8_21	add_8_v14_v15＝v14+v15
8_6	add_8_h6_h7＝h6+h7	8_14	add_8_v2_v3＝v2+v3	8_22	add_8_v16_v17＝v16+v17	8_5	add_8_h5_h6＝h5+h6	8_13	add_8_v1_v2＝h1+v2	8_21	add_8_v14_v15＝v14+v15
8_6	add_8_h6_h7＝h6+h7	8_14	add_8_v2_v3＝v2+v3	8_22	add_8_v16_v17＝v16+v17	8_7	add_8_h7_h8＝h7+h8	8_15	add_8_v3_v4＝v3+v4
8_8	add_8_h8_h9＝h8+h9	8_16	add_8_v4_v5＝v4+v5			8_7	add_8_h7_h8＝h7+h8	8_15	add_8_v3_v4＝v3+v4

9bit adder group:

10bit adder group:

Numbering	The 10bit adder
Numbering	The 10bit adder	10_1	add_10_h2_h3_h4_h5_v2_v3_v4_v5＝add_9_h2_h3_h4_h5+add_9_v2_v3_v4_v5
10_2	add_10_h2_h3_h4_h5_v6_v7_v8_v9＝add_9_h2_h3_h4_h5+add_9_v6_v7_v8_v9	10_1
10_2		10_3	add_10_h6_h7_h8_h9_v2_v3_v4_v5＝add_9_h6_h7_h8_h9+add_9_v2_v3_v4_v5
10_4	add_10_h6_h7_h8_h9_v6_v7_v8_v9＝add_9_h6_h7_h8_h9+add_9_v6_v7_v8_v9	10_3
10_4		10_5	add_10_h2_h3_h4_h5_h6_h7_h8_h9＝add_9_h2_h3_h4_h5+add_9_h6_h7_h8_h9
10_6	add_10_v2_v3_v4_v5_v6_v7_v8_v9＝add_9_v2_v3_v4_v5+add_9_v6_v7_v8_v9	10_5
10_6		10_7	add_10_h10_h11_h12_h13_h14_h15_h16_h17＝add_9_h10_h11_h12_h13+add_9_h14_h15_h16_h17
10_8	add_10_v10_v11_v12_v13_v14_v15_v16_v17＝add_9_v10_v11_v12_v13+add_9_v14_v15_v16_v17	10_7

11bit adder group:

Numbering	The 11bit adder
Numbering	The 11bit adder	11_1	add_11_h_all＝add_10_h2_h3_h4_h5_h6_h7_h8_h9+add_10_h10_h11_h12_h13_h14_h15_h16_h17
11_2	add_11_v_all＝add_10_v2_v3_v4_v5_v6_v7_v8_v9+add_10_v10_v11_v12_v13_v14_v15_v16_v17	11_1

12bit adder group:

Numbering	The 12bit adder
Numbering	The 12bit adder	12_1	add_12_h_v_all＝add_11_h_all+add_11_v_all

Plane module shown in Figure 7 is made up of constant calculations module and predictor calculation module.

The constant calculations module is according to the predictor formula of pattern 15 and pattern 16, calculating parameter a, b, c.The predictor calculation module is according to a, b, c and piece position (xb, yb) predicted value of computation schema 15 or pattern 16.The following describes the specific implementation process that the predictor calculation module adopts numerical calculation intensity reduction algorithm.

Predictor formula according to pattern 15 and pattern 16 can be known, each pixel of the luminance block of 16 * 16 sizes is finished the calculating of (1) formula, perhaps each pixel of the chrominance block of 8 * 8 sizes is finished the calculating of (2) formula, just can be obtained the predicted value of pixel.[x, y] wherein is the position coordinates of pixel to be predicted in coordinate system.

a+b×(x-7)+c×(y-7) (1)

a+b×(x-3)+c×(y-3) (2)

(dx, dy) (xp yp), can realize unified (3) formula of using of the calculating of (1) formula and (2) formula with a pair of coordinate by introducing a pair of coefficient.Wherein (dx is that ((xp yp) is the position coordinates of pixel in 4 * 4 size block for xb, yb) Jue Ding a pair of coefficient by the position of this 4 * 4 size block in the chrominance block of the luminance block of 16 * 16 sizes or 8 * 8 sizes dy).

(a+b×dx+c×dy)-[b×(3-xp)+c×(3-yp)] (3)

Make C=that (a+b * dx+c * dy), for 16 pixels of each 4 * 4 size block, C is a common subexpression, can only calculate once the result is shared.For the luminance block of 16 * 16 sizes, (dx dy) obtains according to (4) formula, and for the chrominance block of 8 * 8 sizes, (dx dy) obtains according to (5) formula.(4) multiplying in formula and (5) formula can realize with shifting function, and the dx that obtains and dy be 4 multiple, thus the multiplying among the C also can realize with shifting function, need 2 adders altogether so finish the calculating of C.

dx＝4×(xb-1)，dy＝4×(yb-1) (4)

dx＝4×xb，dy＝4×yb (5)

Make S=[b * (3-xp)+c * (3-yp)], for 16 pixels of each 4 * 4 size block, (xp, yp) difference is so need to calculate respectively separately S value because its position.With the position coordinates of 16 pixels (xp, the yp) calculating formula of substitution S, and multiplying realized that with add operation obtain the result shown in the table 6, wherein b+c is the subexpression that can share.The calculating of the S value of 16 pixels in the statistical form 6 can be seen needing 10 adders altogether.So, by the method for numerical calculation intensity reduction, finish the calculating of the S value of 16 pixels, only need, and saved multiplying with 10 adders.

The calculating of the S value of table 616 pixel

(xp，yp)	s	(xp，yp)	s
(xp，yp)	s	(xp，yp)	s	(0，0)	2(b+c)+(b+c)	(2，0)	(b+c)+2c
(0，1)	b+2(b+c)	(2，1)	b+2c	(0，0)	2(b+c)+(b+c)	(2，0)	(b+c)+2c
(0，1)	b+2(b+c)	(2，1)	b+2c	(0，2)	2b+2(b+c)	(2，2)	b+c
(0，3)	b+2b	(2，3)	b	(0，2)	2b+2(b+c)	(2，2)	b+c
(0，3)	b+2b	(2，3)	b	(1，0)	2(b+c)+c	(3，0)	2c+c
(1，1)	2(b+c)	(3，1)	2c	(1，0)	2(b+c)+c	(3，0)	2c+c
(1，1)	2(b+c)	(3，1)	2c	(1 2)	2b+c	(3，2)	c
(1 3)	2b	(3，3)	0	(1 2)	2b+c	(3，2)	c

In addition, (3) formula of using is finished the predictor calculation of 16 pixels, and finishing also needs 16 subtracters.

To sum up,, finish the calculating of (3) formula, need 2+10+16=28 adder (comprising subtracter) altogether 16 pixels of each 4 * 4 size block.In order to reduce the length of critical path, can between the constant calculations module of Fig. 7 and predictor calculation module, insert one group of register.

The signal whether select module shown in Figure 8 exists according to predictive mode and reference pixel is selected the predicted value output of corresponding predictive mode by a MUX, and concrete system of selection is as shown in table 7.In " predicted value of output " row four kinds of data might appear wherein: the one, and reference pixel value, expression is got corresponding reference pixel value as predicted value; The 2nd, numeral 128, the expression predicted value is exactly 128; The 3rd, the numbering of adder, expression is got the output of corresponding adder as predicted value; The 4th, the output of plane module, expression is got the output of plane module as predicted value.

The system of selection of the predicted value of table 7 output

From realizing the result, adopt the method for numerical calculation intensity reduction to design, compare with reconfigurable method, following three advantages are arranged: 1. the design that obtains of reconfigurable method needs complicated control logic, and the method for numerical calculation intensity reduction is so that control logic is very simple. 2. adopt the method design of numerical calculation intensity reduction can increase step by step the bit wide of adder, and reconfigurable method need to be according to the bit wide design adder of maximum possible. 3. two kinds of methods for designing are when reaching identical calculating degree of parallelism, the hardware resource number that needs is suitable, but consider that reconfigurable method need to be according to the bit wide design adder of maximum possible, and control logic is relative complex also, so the circuit area that the method for numerical calculation intensity reduction obtains is less.

Claims

1. the intra predictor generator of high degree of parallelism is characterized in that, this intra predictor generator is realized with FPGA, contains adder Module adders, and linear plane function module plane selects module select, and control module control, wherein:

8 adder groups, form by 22 8 adders, wherein the input signal of each adder is certain two pixel among reference pixel h1～h17 and the v2～v17, for current 4 * 4 size block, be set as follows coordinate system: level is the positive direction of x axle to the right, is the positive direction of y axle straight down, the starting point coordinate of x, y diaxon all is-1, h1～h17 represents the reference image vegetarian refreshments of current 4 * 4 size block top, and v2～v17 represents the reference image vegetarian refreshments of current 4 * 4 size block lefts, down together;

pred16×16 _L[x，y]＝Clip1((a+b×(x-7)+c×(y-7)+16))/32，

a＝16×(p[-1，15]+p[15，-1])，

b＝(5×H+32)/64，

c＝(5×V+32)/64，

H = Σ_{x = 0}^{7} (x + 1) \times (p [8 + x, - 1] - p [6 - x, - 1]),

V = Σ_{y = 0}^{7} (y + 1) \times (p [- 1,8 + y] - p [- 1,6 - y]),

Wherein the meaning of Clip1 function is:

Wherein (x y) is the coordinate of pixel, down together;

pred8×8 _c[x，y]＝Clip1((a+b×(x-3)+c×(y-3)+16))/32，

a＝16×(p[-1，7]+p[7，-1])，

b＝(17×H+16)/32，

c＝(17×V+16)/32，

H = Σ_{x = 0}^{3} (x + 1) \times (p [4 + x, - 1] - p [2 - x, - 1]),

V = Σ_{y = 0}^{3} (y + 1) \times (p [- 1,4 + y] - p [- 1,2 - y]);

(a+b×dx+c×dy)-[b×(3-xp)+c×(3-yp)]

Dx=4 * (xb-1), dy=4 * (yb-1), be used for the luminance block of 16 * 16 sizes,

Dx=4 * xb, dy=4 * yb is used for the chrominance block of 8 * 8 sizes,

Shared 28 adders of described predictor calculation module realize the calculating to the predicted value of 16 pixels of each 4 * 4 size block;

Select module select, have two data-signal inputs, reception predicted value from the predictive mode 0 of module adders output to predictive mode 14, another receives from the predictive mode 15 of module plane output or the predicted value of predictive mode 16, also have two signal input end, an input predictive mode is selected control signal, whether another input exists the differentiation control signal of reference pixel, this select module is selected four kinds of possible prediction of output values of corresponding predictive mode by a MUX: reference pixel value, get corresponding reference pixel value as output valve; Imitate word 128, when the luminance block of expression 4 * 4 sizes is predicted according to No. 2 predictive modes, predicted value when reference pixel h2～h5 and v2～v5 do not exist, when perhaps the luminance block of 16 * 16 sizes is predicted according to No. 11 predictive modes, predicted value when reference pixel h2～h17 and v2～v17 do not exist, when perhaps the chrominance block of 8 * 8 sizes is predicted according to No. 12 predictive modes, as reference pixel h2～h5 and v2～v5, perhaps h6～h9 and v2～v5, perhaps h2～h5 and v6～v9, the predicted value when perhaps h6～h9 and v6～v9 do not exist; The numbering of adder, expression are got the output of adder of corresponding numbering as predicted value; The output of plane module, the output that the plane module is got in expression amount to the predicted value of 16 pixels of output as predicted value;

Control module control, the piece that will predict that input signal indication is worked as is luminance block or chrominance block, output be 4 * 4 size block of diverse location in the chrominance block of the luminance block of 16 * 16 sizes or 8 * 8 sizes the piece position (xb, yb).