CN102769754B

CN102769754B - H264 encoder and image transformation, quantization and reconstruction method thereof

Info

Publication number: CN102769754B
Application number: CN201210273685.8A
Authority: CN
Inventors: 吴蕾; 郑宇驰
Original assignee: Omnivision Technologies Shanghai Co Ltd
Current assignee: Omnivision Technologies Shanghai Co Ltd
Priority date: 2012-08-02
Filing date: 2012-08-02
Publication date: 2015-04-01
Anticipated expiration: 2032-08-02
Also published as: CN102769754A

Abstract

The invention provides an achieving method of hardware of an H264 encoder and mainly provides a uniform multiplex logic unit for forward DCT/Hadamard transformation and quantization, reversed quantization and reversed DCT/Hadamard transformation operation, and hardware resource is saved to large extent. By means of matching of a residual unit, a forward transformation/quantization unit, a reversed transformation/quantization unit and a reconstruction unit, an image transformation, quantization and reconstruction process in an H264 standard is achieved. Pipeline operation in an encoding process increases operation speed, utilizes the complex logic unit to the largest extent, and increases hardware resource use rate.

Description

The method of H264 encoder and image conversion thereof, quantification and reconstruct

Technical field

The present invention relates to technical field of image processing, particularly relate to a kind of method of H.264 encoder and image conversion, quantification and reconstruct.

Background technology

H.264 standard is a kind of good digital video encoding and decoding standard that joint video team (JVT, Joint Video Team) proposes, and its maximum advantage has very high data compression rate.Under the condition of equal picture quality, H.264 compression ratio can reach 2 times of MPEG-4, why can there is so high compression ratio, this is because H.264 contain a series of new feature, such as multi-frame-reference, change block size motion compensation, also have the compensation of high-precision sub-pel motion, block elimination effect filter etc.

H.264 standard adopt integral discrete cosine transform (i.e. dct transform) and the dynamic range of the correlation that quantizes to come in removal of images signal and reduction Image Coding to reach the object of compression.Image Residual data enter entropy code module on the one hand through dct transform and the coefficient after quantizing, thus produce the code stream after image compression; On the other hand, this coefficient enters reconstruct loop, thus produces reconstructed frame.Reconstructed frame data can participate in the infra-frame prediction of present frame as adjacent block value, and they are written in memory after loop filtering, can as the reference data of subsequent frame.So-called reconstruct loop mainly comprises following three aspects: inverse quantization, oppositely dct transform (i.e. IDCT), compensation data (namely adding predicted value).Because H.264 standard supports DCT4x4 to convert and DCT8 × 8 conversion simultaneously, add the Hadamard conversion carried out for DC coefficient (i.e. DC coefficient) specially, so H.264 encoder needs to support multiple conversion and inverse transformation, calculate more complicated.But known by analysis, there is again certain similitude in these conversion and inverse transformation, provides possibility for reusing some computing module.

Be illustrated in figure 1 conversion and the quantizing structure of the 4x4 block supported in the H.264 encoder of prior art, because one dimension 4 × 4 forward real transform representation formula is as follows:

(\begin{matrix} Y_{0} \\ Y_{2} \end{matrix}) = (\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}) (\begin{matrix} X_{0} + X_{2} \\ X_{1} + X_{2} \end{matrix}),

(\begin{matrix} Y_{1} \\ Y_{3} \end{matrix}) = (\begin{matrix} 2 & 1 \\ 1 & - 2 \end{matrix}) (\begin{matrix} X_{0} - X_{3} \\ X_{1} - X_{2} \end{matrix})

Formula is equally applicable to reverse real transform and Hadamard conversion above, and just conversion coefficient is different.

Therefore, in the conversion of 4 × 4 pieces supported in above-mentioned H.264 encoder and quantizing structure, whole mapped structure comprises 14 × 16 bit conversion buffer memorys, two adders (completing addition and subtraction), four register (R0, R1, R2, Rm), multiplexer MUX and controll block.Controll block is selected to control to carry out the reverse DCT/IHadamard operation of DCT/Hadamard/ by pattern conversion, and realize one-dimensional transform at every turn and carry out in the mode that each clock inputs a pixel, the result of generation is stored in 14 × 16 bit conversion buffer memorys; And two buffer memorys (dbuf0, dbuf1) carry out buffer memory to the coefficient obtained from mapped structure in quantizing structure, the coefficient after output exports a quantification with each clock cycle.

Although 2 adders that above-mentioned H.264 encoder is multiplexing, but throughput is not high, computational speed is slower, and only considered the block size of 4 × 4, can not compatible 8 × 8 conversion, and still overall can not realize comprising the whole process of dct transform, quantification and Image Reconstruction, there is the waste of resource.

Summary of the invention

The object of the present invention is to provide one H.264 encoder and coding method thereof, can support that DCT8 × 8 and DCT4 × 4 convert simultaneously, and the image conversion in can solving H.264 standard on the whole, quantification and restructuring procedure, improve the speed of service, save hardware resource.

For solving the problem, the present invention proposes one H.264 encoder, comprising:

Control unit, for receiving the macro block enabling signal of outside input, and provides a macro block enabling signal and is distributed to other unit to notify that other unit starting macro blocks calculate accordingly;

Asking residual error unit, for obtaining macro block data and the predicted macroblock data of current block from an external memory storage, and calculating the residual error of current block;

Direct transform/quantifying unit, convert for the DCT/Hadamard realizing the forward of described residual error, quantize and Zig-Zag order rearrangement, and will finally obtain Zig-Zag coefficient stored in described external memory storage, non-Zig-Zag coefficient exports;

Inverse transformation/quantifying unit, for the reverse quantification, the DCT/Hadamard conversion that utilize described non-Zig-Zag coefficient to realize described residual error;

Reconfiguration unit, for receiving the residual error after reverse quantification, DCT/Hadamard conversion, and according to described residual error and the view data obtaining reconstruct from described predicted macroblock data, and by the view data of described reconstruct stored in described external memory storage;

Multiplex logic units, for providing the logic needed for calculating to described direct transform/quantifying unit and inverse transformation/quantifying unit, and the intermediate object program calculated described in buffer memory.

Further, described direct transform/quantifying unit comprises DCT module, quantization modules and Zig-Zag order module; Described inverse transformation/quantifying unit comprises inverse-dct-module and inverse quantization module.

Further, described multiplex logic units comprises for the register array of intermediate object program described in buffer memory with for providing the computing module of the logic needed for described calculating.

Further, described register array is the register array of 8 × 8.

Further, the register array of described 8 × 8 comprises the subarray of 44 × 4.

Further, described computing module comprises two the first sub-computing modules, two the second sub-computing modules and a 3rd sub-computing module.

Further, described current block is 4 × 4 pieces, comprises 16 brightness Y blocks, 4 colourity U blocks and 4 colourity V blocks.

Further, described current block is 8 × 8 pieces, comprises 4 brightness Y blocks, 4 colourity U blocks and 4 colourity V blocks.

The present invention also provides a kind of method of image conversion of above-mentioned H.264 encoder, quantification and reconstruct, comprises the following steps:

Described control unit receives the macro block enabling signal of outside input, and provides a macro block enabling signal and be distributed to other unit to notify that other unit starting macro blocks calculate accordingly;

Described ask residual error unit from an external memory storage, obtain current block macro block data and predicted macroblock data, and calculate the residual error of current block;

Described direct transform/quantifying unit is called described multiplex logic units and is carried out forward DCT/Hadamard conversion, quantification and Zig-Zag order rearrangement to described residual error, its intermediate object program buffer memory calculated is to described multiplex logic units, and finally obtaining Zig-Zag coefficient stored in described external memory storage, non-Zig-Zag coefficient exports;

Described inverse transformation/quantifying unit is called described multiplex logic units and is carried out reverse quantification and DCT/Hadamard conversion to described non-Zig-Zag coefficient, residual error after reverse quantification, DCT/Hadamard conversion to described multiplex logic units, and exports by the intermediate object program buffer memory calculated;

Described reconfiguration unit receives the residual error that described inverse transformation/quantifying unit exports, and obtain the view data that reconstructs according to described residual error and the predicted macroblock data that obtain from described external memory storage, and by the view data of described reconstruct stored in described external memory storage.

Further, ask residual error unit by subtracting each other the macro block data of described current block and predicted macroblock data to calculate the residual error of current block described in.

Further, the step that described direct transform/quantifying unit processes residual error under the pattern of I8 × 8 comprises:

The residual error asking residual error unit to export described in reception also sends into DCT module, calls computing module and carries out horizontal DCT computing;

By the result of horizontal DCT computing from top write 8 × 8 register arrays;

Move to left 8 × 8 register arrays, and the data of 8 × 8 register arrays are sent into DCT module, calls computing module and carry out vertical DCT computing;

The result of vertical DCT computing sends into quantization modules, calls computing module and carries out quantization operation;

The result of quantization operations is from the right write 8 × 8 register arrays;

Described Zig-Zag order module reads the data in whole 8 × 8 register arrays, carries out Zig-Zag sequence, and by the Zig-Zag coefficient after sequence to described external memory storage, non-Zig-Zag coefficient is to inverse quantization module.

Further, the step that described inverse transformation/quantifying unit obtains residual error under the pattern of I8 × 8 comprises:

Described inverse quantization module is called described computing module and is carried out inverse quantization to non-Zig-Zag coefficient;

The result of inverse quantization is sent into reverse DCT module, calls computing module and carry out horizontal reverse to DCT computing;

Horizontal reverse is write 8 × 8 register arrays to the result of DCT computing from top;

Move to left 8 × 8 register arrays, and the data of 8 × 8 register arrays are sent into reverse DCT module, calls computing module and carry out vertical oppositely DCT computing;

By the result of vertical oppositely DCT computing from right-hand member write 8 × 8 register arrays;

Move to left 8 × 8 register arrays, obtains the residual error after reverse dct transform, quantification.

Further, under the pattern of I4 × 4, under I16 × 16 and inter-frame mode, described in ask residual error unit, direct transform/quantifying unit, inverse transformation/quantifying unit and reconfiguration unit also to comprise parallel running step.

Further, the step of described parallel running comprises:

Described residual error unit inspection of asking, to the sky mark of the first subarray of described 8 × 8 register arrays, after getting out the predicted macroblock data of current macro, starts the residual error of the first sub-block calculating current macro;

Described residual error is sent into described DCT module, and calls described computing module and carry out horizontal DCT computing, the result of computing is written to the first array;

When described ask residual error unit inspection to the sky mark of the second subarray of described 8 × 8 register arrays time, then move to left the first subarray, and the data of the first subarray are sent into described DCT module, and call computing module and carry out vertical DCT computing;

The result of vertical computing is sent into quantization modules, and calls computing module and carry out quantization operation;

The result quantizing to obtain is write the second subarray, resets the sky mark of the first subarray simultaneously, the residual error of the next sub-block of current macro is sent into DCT module;

When described ask residual error unit inspection all idle to the 4th subarray of external memory storage and 8 × 8 register arrays time, then read out disposable for 16 elements in the second subarray immediately, deliver to Zig-Zag order module and carry out sorting operation;

By Zig-Zag coefficient write external memory storage, non-Zig-Zag coefficient sends into inverse quantization module, and calls computing module and carry out inverse quantization operation;

Result inverse obtained sends into reverse DCT module, calls computing module and carries out horizontal reverse to DCT computing, by the 4th subarray of operation result write 8 × 8 register arrays, resets the sky mark of the second subarray simultaneously;

When detecting that the three sub-array row of 8 × 8 register arrays are idle, move to left the 4th subarray, and the data of the 4th subarray are sent into reverse DCT module, calls computing module and carry out vertical oppositely DCT computing, operation result is write three sub-array row, reset the sky mark of the 4th subarray simultaneously;

When being checked through reconfiguration unit and allowing to receive data, move down three sub-array row, the data that three sub-array arranges are shifted out, reset the sky mark of three sub-array row simultaneously.

Further, under the pattern of I16 × 16, after the vertical computing of described DCT module to 16 of described current block sub-blocks completes, forward Hadamard 4 × 4 is carried out to the DC coefficient of described 16 sub-blocks and converts; After the inverse quantization of described inverse quantization module to 16 of described current block sub-blocks completes, after receiving next DC coefficient sub-block, reverse Hadamard 4 × 4 is carried out to the ac coefficient of described 16 sub-blocks and converts.

Compared with prior art, H.264 encoder provided by the invention and coding method thereof, have following beneficial effect:

1, have employed unified multiplex logic units, this multiplex logic units can support DCT4x4/DCT8x8/Hadanard conversion and their inverse transformation simultaneously, saves hardware resource to a great extent;

2, by asking cooperatively interacting between residual error unit, direct transform/quantifying unit and inverse transformation/quantifying unit and reconfiguration unit, solve the image conversion in H.264 standard, quantification and restructuring procedure on the whole, and the pile line operation in cataloged procedure, improve the speed of service, and maximally utilise multiplex logic units, improve hardware resource utilization.

Accompanying drawing explanation

Fig. 1 is the conversion of 4x4 block and the schematic diagram of quantizing structure supported in a kind of H.264 encoder of prior art;

Fig. 2 is the structural representation of H.264 encoder of the present invention;

Fig. 3 is the structural representation of 8 × 8 register arrays of the multiplex logic units of the specific embodiment of the invention;

Fig. 4 is the structural representation of the subarray of 8 × 8 register arrays shown in Fig. 3;

Fig. 5 A to 5C is the structural representation of the submodule of the computing module of the multiplex logic units of the specific embodiment of the invention;

Fig. 6 is the forward of the specific embodiment of the invention and the logical schematic of reverse DCT4 × 4/Hadamarm conversion;

Fig. 7 A and 7B is the forward of the specific embodiment of the invention and the logical schematic of reverse DCT8 × 8 conversion respectively;

Fig. 8 is the macroblock structure schematic diagram that the H.264 encoder of the specific embodiment of the invention is supported;

Fig. 9 is the positive-going transition/quantifying unit of the specific embodiment of the invention and the structural representation of reciprocal transformation/quantifying unit;

Figure 10 is the schematic diagram of unit operation logic under I8 × 8 pattern of the specific embodiment of the invention;

Figure 11 is the schematic diagram of unit operation logic under I4 × 4 of the specific embodiment of the invention and inter-frame mode;

Figure 12 is the schematic diagram of unit operation logic under I16 × 16 pattern of the specific embodiment of the invention.

Embodiment

The H.264 encoder proposed the present invention below in conjunction with the drawings and specific embodiments and coding method thereof are described in further detail.

As shown in Figure 2, the invention provides one H.264 encoder 1, comprising:

Control unit 11, for receiving the macro block enabling signal of outside input, and provides a macro block enabling signal and is distributed to other unit to notify that other unit starting macro blocks calculate accordingly;

Ask residual error unit 12, for obtaining macro block data and the predicted macroblock data of current block from an external memory storage 2, and calculate the residual error of current block, preferably, by the macro block data of described current block and predicted macroblock data are subtracted each other the residual error calculating current block;

Direct transform/quantifying unit 13, convert for the DCT/Hadamard realizing the forward of described residual error, quantize and Zig-Zag order rearrangement, and will finally obtain Zig-Zag coefficient stored in described external memory storage 2, non-Zig-Zag coefficient exports;

Inverse transformation/quantifying unit 14, for the reverse quantification, the DCT/Hadamard conversion that utilize described non-Zig-Zag coefficient to realize described residual error;

Reconfiguration unit 15, for receiving the residual error after reverse quantification, DCT/Hadamard conversion, and according to described residual error and the view data obtaining reconstruct from described predicted macroblock data, and by the view data of described reconstruct stored in described external memory storage;

Multiplex logic units 16, for providing the logic needed for calculating to described direct transform/quantifying unit and inverse transformation/quantifying unit, and the intermediate object program calculated described in buffer memory.

Due in the h .264 specifications, dct transform, quantification and reconstruct loop are very important parts, and whole process computation is complicated, if so specific implementation time do not optimized, often need more hardware resource.In addition, because infra-frame prediction needs the data using the adjacent block in the left side to predict, the data of left side block are then produced by reconstruct loop, therefore DCT change, to quantize and the speed of service in reconstruct loop can have influence on the disposal ability of infra-frame prediction, even can say that its design quality directly can have influence on the performance of whole coded system.Inverse transformation/the quantifying unit 14 of H.264 encoder of the present invention and reconfiguration unit 15 are the critical pieces in reconstruct loop, simultaneously by multiplexing between direct transform/quantifying unit 13 and inverse transformation/quantifying unit 14 of multiplex logic units 16, optimize dct transform, quantification and reconstruct loop, thus saved hardware resource, improve the performance of encoder.

In the present embodiment, described H.264 encoder 1 also comprises: for providing the external memory interface unit 10 writing and read the interface of data to described external memory storage 2; Described multiplex logic units 16 comprises the register array 162 of 8 × 8 for intermediate object program described in buffer memory and the computing module 161 for completing the logic needed for DCT4 × 4/Hadamard/IDCT4 × 4/IHadamard/DCT8 × 8/IDCT8 × 8 computing described in providing.

Please refer to Fig. 3, in order to adapt to the block size of 8 × 8 and 4 × 4, the register array of described 8 × 8 have employed shared data path, therefore comprises the subarray 0,1,2,3 of 44 × 4, can the intermediate object program of the dct transform/quantification of buffer memory forward and oppositely dct transform/quantification.Each register bit number N=16.During owing to doing DCT8 × 8, macro block only has the sub-block of 48 × 8, so direct transform/quantifying unit 13 and anyway transform/quantization unit 14 this register array of 8 × 8 all multiplexing, also can meet performance.The register array of 8 × 8 also can be divided into the subarray use of 44 × 4, has independently bus access interface, can realize the pile line operation under other patterns, to accelerate computing cycle.Meanwhile, because intermediate object program access is relatively more frequent, so have employed register array, effectively reduces the number of times of access external memory 2, improve performance.Wherein, as shown in Figure 4, subarray can move into/shift out data/move into/shift out data left downwards to 4 × 4 subarrays, also can element in the whole array of disposable reading.

Please refer to Fig. 5 A to 5C, described computing module comprises two first sub-computing module A1, A2, and two second sub-computing module B1, B2 and a 3rd sub-computing module C, can realize DCT4 × 4, IDCT4 × 4, Hadamard, DCT8 × 8 and IDCT8 × 8.Wherein, the concrete function of the first sub-computing module, the second sub-computing module and the 3rd sub-computing module as shown in Fig. 5 A to 5B, x0, x1, x2, x3, x1 ', x2 ' is respectively input.

In the present embodiment, described direct transform/quantifying unit and inverse transformation/quantifying unit by call described two the first sub-computing modules and two the second sub-computing modules and to convert with the forward DCT/Hadamard obtaining 4 × 4 pieces and reverse DCT/Hadamard convert after residual error, by the residual error called described two the first sub-computing modules, two the second sub-computing modules and the 3rd sub-computing module and to convert with the forward DCT/Hadamard obtaining 8 × 8 pieces and after reverse DCT/Hadamard conversion.

Please refer to shown in Fig. 6, x0, x1, x2, x3 are the input of DCT4 × 4, and y0, y1, y2, y3 are IDCT4 × 4(and reverse DCT4 × 4) input.Forward dct transform, through the first sub-computing module A1 and the second sub-computing module B1, can obtain the result of DCT, and in like manner, reverse dct transform, through the second sub-computing module B2 and the first sub-computing module A2, can obtain the result of IDCT.Article two, data path is independent of each other mutually, therefore pipeline system can be adopted to operate.

Please refer to shown in Fig. 7 A, in the computing of forward DCT8 × 8, x0, x1, x2, x3, x4, x5, x6, x7 are the input of DCT8 × 8, and wherein, x0, x1, x6, x7 send into the first sub-computing module A1; X2, x3, x4, x5 send into the second sub-computing module B2.

Please refer to shown in Fig. 7 B, in the computing of reverse DCT8 × 8, y0, y1, y2, y3, y4, y5, y6, y7 are the input of IDCT8 × 8, and wherein, y0, y2, y4, y6 send into the second sub-computing module B1; Y1, y3, y5, y7 send into the 3rd sub-computing module C.

Composition graphs 6, the data flow shown in Fig. 7 A and 7B, for DCT4 × 4, IDCT4 × 4, Hadamard, DCT8 × 8 and IDCT8 × 8, first sub-computing module A1, A2, two second sub-computing module B1, B2 and the detailed input of the 3rd sub-computing module C are as shown in table 1 to 5.Wherein i0 ~ i7 represents the input signal of positive-going transition; J0 ~ j7 represents the input signal of reciprocal transformation; A10 ~ a13 represents the output of the first sub-computing module A1; B10 ~ b13 represents the output of the second sub-computing module B1; A20 ~ a23 represents the output of the second sub-computing module A2; B20 ~ b23 represents the output of the second sub-computing module B2; C0 ~ c3 represents the output of the 3rd sub-computing module C.Table 6 is depicted as the output signal under different mode.

The input of table 1. computing unit A1

	x0	x1	x2	x3
					4×4	i0	i1	i2	i3
Hadamard4×4	i0	i1	i2	i3
					Hadamard2×2	i0	i1	i2	i3
8×8	i0	i1	i6	i7
					i8×8	b10	b13	b12	b11

The input of table 2. computing unit A2

	x0	x1	x2	x3
					i4×4	j0	j1	j2	j3
iHadamard4×4	b20	b23	b22	b11
					iHadamard2×2	b20	b23	b22	b11
8×8	a10	a11	b20	b21
					i8×8	a10	a11	c1	c0

The input of table 3. computing unit B1

x0

x1

x1′

x2

x2′

x3

4×4

a10

2*a13

a13

2*a12

a12

a11

Hadamard4×4

a10

a13

a12

a11

Hadamard2×2

a10

a13

a12

a11

8×8

a20

a23

0.5*a23

a22

0.5a22

a21

i8×8

j0

j2

0.5*j2

j6

0.5*j6

j4

The input of table 4. computing unit B2

The input of table 5. computing unit C

	x0	x1	x2	x3
					8×8	a13	a12	b23	b22
i8×8	j1	j3	j5	j7

Table 6. outputs signal

o0

o1

o2

o3

o4

o5

o6

o7

4×4

b10

b11

b13

b12

Hadamard4×4

b10

b11

b13

b12

Hadamard2×2

b10

b12

b11

b13

i4×4

a20

a21

a22

a23

iHadamard4×4

a20

a21

a22

a23

iHadamard2×2

a20

a21

a22

a23

8×8

b10

c0

b11

c1

b13

c2

b12

c3

i8×8

a20

a21

b20

b21

b22

b23

a22

a23

The present invention also provides a kind of coding method of above-mentioned H.264 encoder, comprises the following steps:

In order to adapt to the image of different size and different content, H.264 support sub-block 4 × 4 and 8 × 8 size.Chrominance block can only 4 × 4 sizes.These two kinds of points of block sizes corresponding, the composition of a macro block as shown in Figure 8.For the block of 4 × 4, a macro block comprises 16 brightness Y blocks (numbering 0 ~ 15), 4 colourity U blocks (numbering 0 ~ 3) and 4 colourity V blocks (numbering 0 ~ 3).For the block of 8 × 8, a macro block comprises 4 brightness Y blocks (numbering 0 ~ 3), 4 colourity U blocks (numbering 0 ~ 3) and 4 colourity V blocks (numbering 0 ~ 3).

Based on the macroblock structure of Fig. 8, H.264 encoder of the present invention mainly supports I4 × 4 in frame, I16 × 16, I8 × 8 and interframe inter tetra-kinds of patterns.Under these four kinds of patterns, direct transform/quantifying unit and inverse transformation/quantifying unit are to the processing method of the residual error under different mode also difference to some extent, and composition graphs 9 is illustrated further.

Please refer to Fig. 9, in the present embodiment, H.264 direct transform/the quantifying unit 13 of encoder comprises DCT module 131, quantization modules 132 and Zig-Zag order module 133; Described inverse transformation/quantifying unit 14 comprises inverse-dct-module 142 and inverse quantization module 141.

For I8 × 8 pattern in frame, required for DCT/IDCT8 × 8 convert, use whole 8 × 8 register arrays, so adopt operation in tandem, after positive-going transition completes, then carry out reverse operation, complete luminance block 0 successively, 1,2, the process of 3.Ask the state machine in residual error unit, wait the signal receiving a fritter and reconstructed, then calculate next residual error, achieve operation in tandem.Luminance block due to 8 × 8 only has 4, and computing cycle number is fewer, can not affect performance in this way so adopt, and but saves 8 × 8 × 16 bit=1024 bit.The calculation process of the DCT/IDCT conversion in frame under the pattern of I8 × 8 as shown in Figure 10, comprises the following steps:

1, the residual error asking residual error unit to export described in reception also sends into DCT module 131, calls computing module and carries out horizontal DCT computing;

2, the result of horizontal DCT computing is write 8 × 8 register arrays 162 from top;

3, move to left 8 × 8 register arrays 162, and the data of 8 × 8 register arrays 162 are sent into DCT module 131, calls computing module and carry out vertical DCT computing;

4, the result of vertical DCT computing sends into quantization modules 132, calls computing module and carries out quantization operation;

5, the result of quantization operations is from the right write 8 × 8 register arrays 162;

6, described Zig-Zag order module 133 reads the data in whole 8 × 8 register arrays 162, carries out Zig-Zag sequence;

7, by the extremely described external memory storage of the Zig-Zag coefficient after sequence;

8, described Zig-Zag order module 133 is by non-Zig-Zag coefficient to inverse quantization module 141, and described inverse quantization module 141 is called described computing module and carried out inverse quantization to non-Zig-Zag coefficient;

9, the result of inverse quantization is sent into reverse DCT module 142, call computing module and carry out horizontal reverse to DCT computing;

10, horizontal reverse is write 8 × 8 register arrays 162 to the result of DCT computing from top;

11, move to left 8 × 8 register arrays 162, and the data of 8 × 8 register arrays 162 are sent into reverse DCT module 142, calls computing module and carry out vertical oppositely DCT computing;

12, the result of vertical oppositely DCT computing is write 8 × 8 register arrays 162 from right-hand member;

13, move to left 8 × 8 register arrays 162, obtains the residual error after reverse dct transform, quantification.

In above-mentioned steps, 1 to 7 is the key step that described direct transform/quantifying unit obtains residual error under the pattern of I8 × 8, and 8 to 13 is the key step that described inverse transformation/quantifying unit obtains residual error under the pattern of I8 × 8.

For I16 × 16, I4 × 4 and inter pattern, then adopt Parallel Design.Each 4 × 4 subarray independent operation in 8 × 8 register arrays.Each 4 × 4 subarrays are provided with an empty flag bit, and the set of empty flag bit/reset is controlled by the state machine of DCT module and reverse DCT module.By asking residual error unit or control unit by judging whether empty flag bit is 1, can judge whether this 4 × 4 subarray can be used.So just pile line operation can be realized.Wherein I4 × 4, inter pattern and I16 × 16 are slightly a little different, are that I4 × 4, inter pattern are without the need to carrying out Hadamard conversion to DC coefficient.As shown in figure 11, main process is as follows for the calculation process of the DCT/IDCT conversion in frame under I4 × 4 and interframe inter pattern:

1, ask residual error unit inspection to the sky mark of array 0, and input DSR, start the residual error calculating sub-block 0.Effective residual error is sent to, and starts the dct transform doing level, and is written to array 0;

2, after horizontal dct transform completes, if the sky mark of array 1 detected, then move to left array 0, sends into and do vertical dct transform and quantization operation, and the result obtained moves to left and writes array 1, and the sky simultaneously resetting array 0 is masked as 1.The residual error of next sub-block 1 then sends into DCT module;

3, when detecting that external memory storage and array 3 are all idle, then read out disposable for 16 elements in array 1 immediately, deliver to Zig-Zag order module and carry out sorting operation, non-Zig-Zag coefficient sends into inverse quantization module, carry out inverse quantization operation, then result feeding inverse-dct-module carries out horizontal reverse to dct transform, and data move down write array 3, and the sky simultaneously resetting array 1 is masked as 1;

4, when detecting that array 2 is idle, move to left array 3, carries out vertical oppositely DCT computing, and obtain result write array 2, the sky simultaneously resetting array 3 is masked as 1;

5, when being checked through reconfiguration unit and allowing to receive data, move down array 2, data shifted out, the sky simultaneously resetting array 2 is masked as 1.

Following table 7 is fluvial processeses of dct transform/quantification in detailed frame under I4 × 4 and interframe inter pattern and reverse dct transform/quantification, is example with the process of the sub-block 0,1,2,3,4 of the current macro inputted.

Table 7 fluvial processes

Under the pattern of I16 × 16, direct transform/quantifying unit need to the DC(direct current after forward dct transform) coefficient carries out Hadamard conversion, so need DC coefficient storage.In such a mode, after processing DCT (horizontal+vertical) conversion of 4 × 4, DC coefficient is deposited into DC coefficient arrays A as shown in figure 12.After completing 16 blocks, read coefficient from DC coefficient arrays A, carry out hadamard4 × 4 and convert, inverse transformation/quantifying unit be then receive 4 × 4 ac coefficient data, carry out inverse quantization operation, by the coefficient (ac coefficient) after inverse quantization stored in external memory storage (SRAM) 2, without the need to writing in register array, complete the process of 16 ac coefficient blocks, and then receive 1 DC coefficient block, carry out anti-Hadamard4 × 4 transform and quantization, at this moment accessed array 3 and array 2 is needed, the value obtained is stored in DC coefficient arrays B, the dequantized coefficients of 16 blocks is read again from external memory storage SRAM, the DC coefficient of each block corresponding is taken out from DC coefficient arrays B, both carry out reverse dct transform at merging, data write array 2, move down array 2, data are shifted out.Result gives reconfiguration unit, same to I4 × 4 of other fluvial processeses and inter pattern.

Various pattern is all for brightness Y block above.For colourity U block and colourity V block, various mode treatment is identical, the process of the brightness Y block under the pattern of basic process same I16 × 16, and difference is to only have 4 colourity U blocks and 4 colourity V blocks, and Hadamard conversion is 2 × 2 sizes.Here repeat no more.

In sum, H.264 encoder provided by the invention and coding method thereof, mainly unified multiplex logic units is provided to forward DCT/Hadamard conversion, quantification and reverse quantification, reverse DCT/Hadamard map function, save hardware resource to a great extent; And by asking cooperatively interacting between residual error unit, direct transform/quantifying unit and inverse transformation/quantifying unit and reconfiguration unit, solve the image conversion in H.264 standard, quantification and restructuring procedure on the whole, and the pile line operation in cataloged procedure, improve the speed of service, and maximally utilise multiplex logic units, improve hardware resource utilization.

Obviously, those skilled in the art can carry out various change and modification to invention and not depart from the spirit and scope of the present invention.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.

Claims

1. a H.264 encoder, is characterized in that, comprise

Control unit, for receiving the macro block enabling signal of outside input, and provide a macro block enabling signal and be distributed to other unit to notify that other unit starting macro blocks calculate accordingly, wherein other unit comprise: ask residual error unit, direct transform/quantifying unit, inverse transformation/quantifying unit, reconfiguration unit;

Direct transform/quantifying unit, DCT/Hadamard for realizing the forward of described residual error converts, quantizes and Zig-Zag order rearrangement, and will finally obtain Zig-Zag coefficient stored in described external memory storage, non-Zig-Zag coefficient exports, and wherein said direct transform/quantifying unit comprises DCT module, quantization modules and Zig-Zag order module;

Inverse transformation/quantifying unit, for the reverse quantification, the DCT/Hadamard conversion that utilize described non-Zig-Zag coefficient to realize described residual error, wherein said inverse transformation/quantifying unit comprises inverse-dct-module and inverse quantization module;

Reconfiguration unit, for receiving the residual error after reverse quantification, DCT/Hadamard conversion, and is added according to described residual error and described predicted macroblock data and obtains the view data of reconstruct, and by the view data of described reconstruct stored in described external memory storage;

Multiplex logic units, comprise computing module and 8x8 register array, wherein, described computing module is for providing the logic needed for calculating to described direct transform/quantifying unit and inverse transformation/quantifying unit, described 8x8 register array is used for the intermediate object program calculated described in buffer memory, and described 8x8 register array comprises the subarray of 4 4x4: the first subarray, the second subarray, three sub-array row and the 4th subarray.

2. H.264 encoder as claimed in claim 1, it is characterized in that, described computing module comprises two the first sub-computing modules, two the second sub-computing modules and a 3rd sub-computing module.

3. H.264 encoder as claimed in claim 1, it is characterized in that, described current block is 4 × 4 pieces, comprises 16 brightness Y blocks, 4 colourity U blocks and 4 colourity V blocks.

4. H.264 encoder as claimed in claim 1, it is characterized in that, described current block is 8 × 8 pieces, comprises 4 brightness Y blocks, 4 colourity U blocks and 4 colourity V blocks.

5. utilize a method for the image conversion of the H.264 encoder described in claim 1, quantification and reconstruct, it is characterized in that, comprising:

Described control unit receives the macro block enabling signal of outside input, and provide a macro block enabling signal and be distributed to other unit to notify that other unit starting macro blocks calculate accordingly, wherein other unit comprise: ask residual error unit, direct transform/quantifying unit, inverse transformation/quantifying unit, reconfiguration unit;

Described reconfiguration unit receives the residual error that described inverse transformation/quantifying unit exports, and described residual error and the predicted macroblock data that obtain from described external memory storage to be added the view data obtaining reconstructing, and by the view data of described reconstruct stored in described external memory storage.

6. the method for image conversion as claimed in claim 5, quantification and reconstruct, is characterized in that, described in ask residual error unit by subtracting each other the macro block data of described current block and predicted macroblock data to calculate the residual error of current block.

7. the method for image conversion as claimed in claim 5, quantification and reconstruct, it is characterized in that, the step that described direct transform/quantifying unit processes residual error under the pattern of I8 × 8 comprises:

8. the method for image conversion as claimed in claim 5, quantification and reconstruct, it is characterized in that, the step that described inverse transformation/quantifying unit obtains residual error under the pattern of I8 × 8 comprises:

The result of inverse quantization is sent into inverse-dct-module, calls computing module and carry out horizontal reverse to DCT computing;

Move to left 8 × 8 register arrays, and the data of 8 × 8 register arrays are sent into inverse-dct-module, calls computing module and carry out vertical oppositely DCT computing;

9. the method for image conversion as claimed in claim 5, quantification and reconstruct, it is characterized in that, under I4 × 4, I16 × 16 and inter-frame mode, described in ask residual error unit, direct transform/quantifying unit, inverse transformation/quantifying unit and reconfiguration unit also to comprise parallel running step.

10. the method for image conversion as claimed in claim 9, quantification and reconstruct, it is characterized in that, the step of described parallel running comprises:

Described residual error is sent into described DCT module, and calls described computing module and carry out horizontal DCT computing, the result of computing is written to the first subarray of described 8 × 8 register arrays;

Result inverse quantization operation obtained sends into inverse-dct-module, calls computing module and carries out horizontal reverse to DCT computing, by the 4th subarray of operation result write 8 × 8 register arrays, resets the sky mark of the second subarray simultaneously;

When detecting that the three sub-array row of 8 × 8 register arrays are idle, move to left the 4th subarray, and the data of the 4th subarray are sent into inverse-dct-module, calls computing module and carry out vertical oppositely DCT computing, operation result is write three sub-array row, reset the sky mark of the 4th subarray simultaneously;

The method of 11. image conversions as claimed in claim 10, quantification and reconstruct, it is characterized in that, under the pattern of I16 × 16, after the vertical computing of described DCT module to 16 of described current macro sub-blocks completes, forward Hadamard 4 × 4 is carried out to the DC coefficient of described 16 sub-blocks and converts; After the inverse quantization of described inverse quantization module to 16 of described current macro sub-blocks completes, after receiving next DC coefficient sub-block, reverse Hadamard 4 × 4 is carried out to the DC coefficient of described 16 sub-blocks and converts.