Background technology
The loop filter of deblocking effect is the important component part of video encoding standard H.264/AVC, can significantly improve the signal to noise ratio and the subjective quality of image.Deblocking effect filtering is based on macro block, and to the edge of luminance component and each 4 * 4 fritter of chromatic component, according to first vertical edges, the order of back horizontal sides from left to right, is carried out from top to bottom one by one.Fig. 1 shows the H.264/AVC single macro block data of video encoding standard, and wherein each lattice is represented 4 * 4 fritters, and the lines of overstriking need to represent each marginal position of filtering.If when the edge of the macro block leftmost side is carried out filtering, also need use the right side 4 columns certificates of left adjacent macroblocks, and may be to 3 columns wherein according to changing; When the edge of macro block top side is carried out filtering, also need to use low 4 line data of adjacent macroblocks, and may change 3 line data wherein.
When vertical edges is carried out the single step filtering operation, need use current fritter 4 sample datas (q0, q1, q2, q3) and 4 sample datas of left adjacent isles (p0, p1, p2, p3); When horizontal sides is carried out the single step filtering operation, then relate to current fritter 4 sample datas (q0, q1, q2, q3) and go up adjacent isles 4 sample datas (p0, p1, p2, p3), as shown in Figure 2.After filtering finishes, sample q0, q1, the data value of q2 and sample p0, p1, the data value of p2 will be updated.
The hardware circuit of deblocking effect filter realizes that existent method mainly contains following several at present:
A kind of is to adopt different data paths to handle to vertical edges filtering and horizontal sides filtering.This method shortcoming is that hardware spending is big, and the control logic of horizontal sides filtering and vertical edges filtering has very big-difference, and the required processing time is also different.
Another method is after vertical edges filtering finishes, and each 4 * 4 fritter is carried out transposition, adopts same data path to carry out horizontal sides filtering then.In the conventional method, filtering is that each edge in the macro block is carried out in order successively.The advantage of this method is by transposition, makes vertical edges filtering and horizontal sides filtering can adopt same data path, has reduced the control complexity; Shortcoming is because each edge is carried out filtering in order successively, has destroyed the data relation of interdependence.This method is that the service data relation of interdependence improves processing speed, need utilize jumbo register array to preserve the intermediate operations result.
Also having a kind of circuit is to adopt a plurality of transposition circuit units and buffering memory cell, realizes the deblocking effect filtering of two dimension.Its advantage is that vertical edges filtering and horizontal sides filtering hocket, and has utilized the relation of interdependence of data in the macro block well, thereby has improved processing speed.Shortcoming is that circuit overhead is big, and control is complicated, and does not consider the data relation of interdependence between the horizontal adjacent macroblocks.
Embodiment
The present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
The hardware circuit that the present invention proposes a kind of new and effective deblocking effect filter realizes, its general structure as shown in Figure 3, wherein the one-dimension loop filter part is as shown in Figure 4.
Among Fig. 3, the sample data of current macro is from the upper level module of data processing streamline; What SRAM2 stored is all low 4 line data of going up adjacent macroblocks place macro-block line, in order to provide current macro filtering required last adjacent macroblocks data; The coded message register provides required coded message for filter module, in order to calculate B
S, α, filtering parameters such as β; Deblocking effect filter module itself is by a slice dual-port SRAM (SRAM1), and one-dimension loop filter unit and control unit constitute.
SRAM1 comprises the current macro data in order to all required sample datas of storage current macro filtering operation, the right side 4 columns certificates of left adjacent macroblocks, low 4 line data of last adjacent macroblocks.The intermediate data that obtains after the filtering of one dimension vertical edges still is stored among the SRAM1.
When each macro block is carried out filtering operation, at first SRAM1 is carried out loading data.In the present invention, only need to load two parts data: current macro data and last adjacent macroblocks data.The left adjacent macroblocks data of using during filtering then are that left adjacent macroblocks filtering is reserved in the SRAM1 when finishing.
For making things convenient for storage and processing, the bit wide of SRAM1 is 32 bits.Fig. 5 has listed all data of SRAM1 storage.Each lattice is represented one 4 * 4 fritter among the figure, corresponding 4 32 bit words.Numeral is the call number of each 4 * 4 fritter in the lattice.Fig. 6 shows SRAM1 possible storage organization.
1. one-dimension loop filter shown in Figure 4 by incorporating into and going out one dimension edge filter unit, and 2. 4 * 4 fritter buffer units and 3. configurable 4 * 4 fritter transposition unit three parts constitute.
Incorporate into and what go out one dimension edge filter unit input is 8 sample values that participate in filtering, corresponding two 32 bit words P and Q are output as through value P ' and Q ' behind the filtering operation, and filtering parameter is provided by control unit.This one dimension edge filter unit also can be operated in bypass condition, and this moment exports P ' and Q ' directly equals to import P and Q.
The input Q of one dimension edge filter unit is from the read port of SRAM1.The input data of port Q are to be the unit with 4 * 4 fritters, and each clock cycle, one 32 bit words was word for word imported.
The output Q ' of one dimension edge filter unit after 4 * 4 fritter buffer units postpone four clock cycle as the input P of one dimension edge filter unit.This 4 * 4 fritter buffer unit can adopt the register of 16 byte or the register of 4 32 bit words to realize.
The output P ' of one dimension edge filter unit then is sent to configurable 4 * 4 fritter transposition unit.These configurable 4 * 4 fritter transposition unit are made of the register of 16 byte, and it can be operated in two kinds of patterns: direct mode operation and transposed mode.Behind configurable transposition unit, the data of per 4 * 4 fritters or directly exported or exported by transposition, export target may be SRAM1, may be SRAM2, also may deposit for the outer frame of sheet, this is the attribute (current macro, left adjacent macroblocks still goes up adjacent macroblocks) by current filtering output sample data place macro block, affiliated 4 * 4 fritters present position in macro block, and the decision of the direction of current edge filter.
Further tell about the present invention below in conjunction with specific embodiment.
Deblocking effect filter circuit in the specific embodiments of the invention carries out vertical edges filtering earlier, in output filtering result, each 4 * 4 fritter is carried out transposition; Vertical edges filtering utilizes same data path to carry out horizontal sides filtering after finishing.In the filtering,, can effectively utilize the relation of interdependence between the data by the choose reasonable filter sequence.The control mode of the data flow of whole deblocking effect filter circuit is described below:
The input Q of one dimension edge filter unit is from the read port of SRAM1.The input data of port Q are to be the unit with 4 * 4 fritters, and each clock cycle, one 32 bit words was word for word imported.4 * 4 fritter call numbers of utilizing Fig. 5 to provide, the input data sequence of port Q is expressed as follows by filter vertical edges and two kinds of situations of filter horizontal sides:
1. when filtering vertical edges, the input data sequence of port Q is by whether filtering leftmost side vertical edges in two kinds of situation:
(1) if need filter leftmost side vertical edges, 4 * 4 little blocks of data of input port Q are followed successively by:
24,0,1,2,3;
25,4,5,6,7;
26,8,9,10,11;
27,12,13,14,15;
28,16,17;
29,18,19;
30,20,21;
31,22,23
During input 24,25,26,27,28,29,30,31 these several 4 * 4 fritter data, one dimension edge filter unit is in bypass condition; When importing other data, one dimension edge filter unit is in filter state.
(2) if do not filter leftmost side vertical edges, 4 * 4 little blocks of data of input port Q are followed successively by:
0,1,2,3;
4,5,6,7;
8,9,10,11;
12,13,14,15;
16,17;
18,19;
20,21;
22,23
During input 0,4,8,12,16,18,20,22 these several 4 * 4 fritter data, one dimension edge filter unit is in bypass condition; When importing other data, one dimension edge filter unit is in filter state.
2. when filtering horizontal sides, the input data sequence of port Q is by whether filtering the top side horizontal sides in two kinds of situation:
(1) if need filter top side horizontal sides, 4 * 4 little blocks of data of input port Q are followed successively by:
32,0,4,8,12;
33,1,5,9,13;
34,2,6,10,14;
35,3,7,11,15;
36,16,18;
37,17,19;
38,20,22;
39,21,23
During input 32,33,34,35,36,37,38,39 these several 4 * 4 fritter data, one dimension edge filter unit is in bypass condition; When importing other data, one dimension edge filter unit is in filter state.
(2) if do not filter the top side horizontal sides, 4 * 4 little blocks of data of input port Q are followed successively by:
0,4,8,12;
1,5,9,13;
2,6,10,14;
3,7,11,15;
16,18;
17,19;
20,22;
21,23
During input 0,1,2,3,16,17,20,21 these several 4 * 4 fritter data, one dimension edge filter unit is in bypass condition; When importing other data, one dimension edge filter unit is in filter state.
Configurable 4 * 4 fritter transposition unit can be operated in two kinds of patterns: direct mode operation and transposed mode.No matter be operated under which kind of pattern, all have the delay of 4 clock cycle between the input of configurable 4 * 4 fritter transposition unit and the output, the output useful signal produces by mode shown in Figure 7.
The direction of transfer of register array data has horizontal direction and two kinds of possibilities of vertical direction in the transposition unit.Every four clock cycle, or keep or change original data passes direction.If the data passes direction unanimity of 4 clock cycle of front and back, then circuit working is at direct mode operation; If the data passes direction difference of 4 clock cycle of front and back, then circuit working is in transposed mode.
Fig. 8 has provided the control circuit of register array data passes direction in the transposition unit.Wherein input signal DIN_VALID is the input useful signal of configurable 4 * 4 fritter transposition unit; TP_CTRL is the transposition enable signal of current 4 * 4 fritters input data.Output DIR_CTRL is the data passes direction control signal of register array.
The mode of operation of circuit is the attribute (current macro, left adjacent macroblocks still goes up adjacent macroblocks) by current filtering output sample data P ' place macro block, present position in macro block, and the decision of the direction of current edge filter.
When carrying out normal deblocking effect filtering operation, relate to three partial datas: the current macro data, the right side four columns of left adjacent macroblocks reach the low four lines data of adjacent macroblocks according to this.If leftmost side vertical edges or top side horizontal sides are not carried out filtering operation, then do not need the data of left adjacent macroblocks or the data of last adjacent macroblocks.
The right side 4 columns certificates for left adjacent macroblocks only participate in vertical edges filtering, are saved in the outer frame of SRAM2 or sheet after filtering finishes and deposit.The part (27,29,31 fritters in the corresponding diagram 4) that meets the low 4 row conditions of macro block in this part data, if there is adjacent macroblocks down in current macro, then this part data outputs to SRAM2, transposition enables; Deposit otherwise output to the outer frame of sheet, transposition is forbidden; Other partial datas of left side adjacent macroblocks output to the outer frame of sheet and deposit, and transposition is forbidden.
Low 4 line data for last adjacent macroblocks only participate in horizontal sides filtering, are saved in the outer frame of sheet after filtering finishes and deposit.When exporting this part data, transposition enables.
For the current macro data, the intermediate data that produces after the vertical edges filtering turns back in the original memory space of SRAM1, and this moment, transposition enabled all the time.Change has taken place in the compound mode of pairing 4 adjacent samples of each 32 bit words this moment, but the memory space of each 4 * 4 fritter pairing 4 32 bit words in SRAM1 does not change.
The data that horizontal sides filtering obtains after finishing can be divided into four parts: first is the low 4 row right 4 row partial datas (15,19,23 fritters in the corresponding diagram 4) of current macro; Second portion is the non-low 4 row partial datas of the right side 4 row of current macro (in the corresponding diagram 43,7,11,17,21 fritters); Third part is the non-right 4 row partial datas of low 4 row (in the corresponding diagram 4 12,13,14,18,22 fritters) of current macro; The 4th part is current macro other parts data.
For first's data, if there is right adjacent macroblocks in current macro, then this part data outputs to SRAM1, and transposition enables.If there is not right adjacent macroblocks in it, but have adjacent macroblocks down, then this part data outputs to SRAM2, and transposition is forbidden.Under other situation, this part data outputs to the outer frame of sheet and deposits, and transposition enables.
For the second portion data, if there is right adjacent macroblocks in current macro, then this part data outputs to SRAM1, deposits otherwise output to the outer frame of sheet.All the time transposition beginning energy when exporting this part data.
For the third part data, if there is adjacent macroblocks down in current macro, then this part data outputs to SRAM2, and transposition is forbidden; Deposit otherwise output to external frame, transposition enables.
The 4th partial data outputs to external frame and deposits, and transposition enables.
After horizontal sides filtering finished, that part of data that current macro turns back to SRAM1 were not saved in original memory space, but were saved in the memory space interior (the residing memory space of 24 ~ 31 fritters in the corresponding diagram 5) of the left adjacent macroblocks data of original preservation.
Fig. 9 shows the concrete grammar flow process that the efficient deblocking effect circuit that utilizes the present invention to propose carries out deblocking effect filtering.
In sum, the invention discloses a kind of hardware circuit implementation method of H.264/AVC video encoding standard deblocking effect filter, it is by a slice dual-port SRAM, one-dimension loop filter unit and control unit constitute, wherein one-dimension loop filter unit is by incorporating into and going out one dimension edge filter unit, and 4 * 4 fritter buffer cells and configurable 4 * 4 fritter transposition unit constitute.It can effectively utilize the relation of interdependence of data between interior data of macro block and the horizontal adjacent macroblocks, thereby can significantly improve the speed of deblocking effect filtering.Because same data path is adopted in vertical edges filtering and horizontal sides filtering, and circuit is succinct, be easy to control simultaneously, realize that the required door of whole deblocking effect filter circuit number is less.The present invention is applicable to the Video Codec hardware circuit realization in computer and the microelectronic.