Summary of the invention
Technical problem to be solved by this invention uses general framework time-sharing multiplex in estimation, video scaling, spatial domain/time-domain filtering, reduces complexity, saving chip area and the power consumption of chip calculating and stored logic simultaneously.
In order to solve the problem, the invention provides a kind of reusable pixel processing method, adopt same image processor to perform the processes pixel of at least two kinds of process types; The method comprises the following steps:
Determine the process type that present frame is processed; In described image processor, be all configured with corresponding instruction for each process type;
Described image processor obtains present frame;
The instruction adopting the process type for the aforementioned present frame determined to configure, carries out the process of each execution cycle successively to each macroblock to be encoded of described present frame.
Optionally, described process type comprises one or more in estimation, video scaling, airspace filter and time-domain filtering.
Optionally, described execution cycle is divided into the first execution cycle and the second execution cycle; The described corresponding instruction for process type configuration comprises:
When described process type is estimation, the instruction of the first execution cycle is carry out the following interpolation in n rank and n rank to the reference macroblock in reference frame, and n is positive integer, and the instruction of the second execution cycle is that the reference macroblock after described macroblock to be encoded and interpolation asks poor;
When described process type is video scaling, the instruction of the first execution cycle is for carry out the following interpolation in n rank and n rank to described macroblock to be encoded, and n is positive integer, and the instruction of the second execution cycle is bye;
When described process type is airspace filter, the instruction of the first execution cycle is for carry out the following interpolation in n rank and n rank to described macroblock to be encoded, and n is positive integer, and the instruction of the second execution cycle is bye;
When described process type is time-domain filtering, the instruction of the first execution cycle is bye, and the instruction of the second execution cycle is ask poor to the reference macroblock in described macroblock to be encoded and reference frame.
Optionally, the described corresponding instruction for process type configuration also comprises: the coefficient of the following interpolation formula in n rank and n rank in period 1 instruction;
Described interpolation formula is: P=round [(α
1a
1+ α
2a
2+ ... + α
n-1a
n-1+ α
na
n)/2
n], wherein: P is gained pixel after interpolation, round () function is for returning the integer value that rounds up, A
1~ A
nfor n known pixels adjacent in same a line or same row, α
1~ α
nfor each coefficient of described interpolation formula.
Optionally, described n is less than or equal to 6.
Optionally, when the instruction of described first execution cycle is for carry out six rank interpolation to reference macroblock, or when carrying out six rank interpolation to macroblock to be encoded in present frame, the coefficient of described interpolation formula is [1 ,-5,20,20 ,-5,1].
Optionally, when the instruction of described first execution cycle is bye, the coefficient of described interpolation formula is [1].
Optionally, described execution cycle is divided into the first execution cycle and the second execution cycle; Describedly successively the process that each macroblock to be encoded of present frame carries out each execution cycle to be comprised: each macroblock to be encoded processing described present frame in a pipeline fashion successively.
Optionally, the each macroblock to be encoded of described pipeline system process comprises: while adopting the second execution cycle instruction corresponding with process type to carry out corresponding computing to current macroblock to be encoded, adopts the first execution cycle instruction corresponding with process type to carry out corresponding computing to next macroblock to be encoded.
Optionally, described reference macroblock and macroblock to be encoded are based on H.264 standard.
Optionally, described reference macroblock and macroblock to be encoded are at least 4 × 4 pixel sizes.
Optionally, described execution cycle is divided into the first execution cycle and the second execution cycle; The process that described each macroblock to be encoded to present frame carries out each execution cycle comprises:
Carry out the following interpolation in n rank and n rank at the first execution cycle to the reference macroblock in reference value, n is positive integer, asks poor at the second execution cycle to the reference macroblock after described macroblock to be encoded and interpolation;
Or carry out the following interpolation in n rank and n rank at the first execution cycle to described macroblock to be encoded, n is positive integer, in the second execution cycle bye;
Or in the first execution cycle bye, at the second execution cycle, the reference macroblock in described macroblock to be encoded and reference frame is asked poor.
Present invention also offers a kind of reusable video frequency processing chip, comprising:
Be configured to be suitable for determining the determining unit to the process type that present frame processes;
Be configured to the acquiring unit being suitable for obtaining present frame;
Register array, is configured to be suitable for the reference macroblock in temporary reference frame of working as needed for pre-treatment and the macroblock to be encoded in described present frame;
Clock control cell, is configured to be suitable for providing execution cycle;
Arithmetic element array, is configured to be suitable for carrying out corresponding computing to the reference macroblock in register array or macroblock to be encoded;
Controller, be connected to described determining unit, register array, arithmetic element array, clock control cell, be configured to be suitable for reading when the reference macroblock needed for pre-treatment and macroblock to be encoded, and according to the instruction corresponding with present frame process type, control algorithm cell array carries out corresponding computing.
Optionally, also comprise: the instruction dispensing unit being connected to described controller, be suitable for the instruction of the pre-configured correspondence of each process type.
Optionally, described process type comprises one or more in estimation, video scaling, airspace filter and time-domain filtering.
Optionally, described instruction dispensing unit also comprises: coefficient dispensing unit, is suitable for each coefficient of the following interpolation formula in n rank and n rank in the instruction of configuration first execution cycle.
Optionally, described arithmetic element array comprises the first execution cycle array and the second execution cycle array.
Optionally, described first execution cycle array comprises filter array, and described second execution cycle array comprises subtracter array.
Optionally, described controller also comprises: rhythm control unit, for processing each macroblock to be encoded of present frame in a pipeline fashion successively according to described execution cycle, while making arithmetic element array adopt the second corresponding execution cycle instruction to carry out corresponding computing to current macroblock to be encoded, the first corresponding execution cycle instruction is adopted to carry out corresponding computing to next macroblock to be encoded.
Optionally, the reference macroblock in described register array and macroblock to be encoded are based on H.264 standard.
Optionally, when described reference macroblock or macroblock to be encoded are a × a pixel size, a is positive integer, and described register array is at least the individual pixel size of (a+6) × (a+6).
Optionally, described reference macroblock or macroblock to be encoded are at least 4 × 4 pixel sizes, and described register array is at least 10 × 10 pixel sizes.
Compared with prior art, technical scheme of the present invention has the following advantages:
1, the present invention is by the similitude analyzed and summarize in estimation, video scaling, spatial domain/time-domain filtering algorithm and compatibility, adopt reusable designing technique, by by diverse ways configuration register and arithmetic element, make the computing that originally will have been come by different hardware unit, can be realized by time-sharing multiplex by individual feature unit, effectively reduce the quantity of hardware cell, reduce the complexity of chip, save chip area and power consumption simultaneously.
2, in possibility, the solution based on H.264 standard is given, to meet the requirement of the International video coding standard more widely of application at present.
3, in possibility, by configuring the coefficient of interpolation formula, making the bye in the first execution cycle be equivalent to the single order interpolation arithmetic carried out current pixel itself, having unified computing, simplified hardware logic.
4, in possibility, adopt each macroblock to be encoded of pipeline system process present frame, improve treatment effeciency.
Embodiment
Set forth a lot of detail in the following description so that fully understand the present invention.But the present invention can be much different from alternate manner described here to implement, those skilled in the art can when without prejudice to doing similar popularization when intension of the present invention, therefore the present invention is by the restriction of following public concrete enforcement.
Secondly, the present invention utilizes schematic diagram to be described in detail, and when describing the embodiment of the present invention in detail, for ease of illustrating, described schematic diagram is example, and it should not limit the scope of protection of the invention at this.
In order to solve the technical problem in background technology, inventor analyzes the coding and decoding video related in background technology and image processing techniques.
The core algorithm of estimation has two parts: the interpolation of sub-pix and ask the macro block of present frame and the residual error of reference frame; The core algorithm of video scaling carries out vertical and horizontal twice totally filter coefficients to each pixel; The core algorithm of airspace filter carries out vertical and horizontal twice filter coefficients to each pixel of image, and horizontal filtering will be input based on longitudinal filtered result; The core algorithm of time-domain filtering is the difference asking current macro and predicted macroblock.
Can be drawn by above analysis, estimation is asked residual sum time-domain filtering all to comprise and is asked poor to pixel, and estimation sub-pixel interpolation, video scaling and airspace filter all comprise interpolation.Therefore, the core algorithm of above-mentioned technology can be unified to be summarised as 2 steps: interpolation and ask poor.
Based on above-mentioned analysis and refinement, the invention provides a kind of reusable pixel processing method.Fig. 1 is the method flow diagram of the embodiment of reusable pixel processing method of the present invention, and it at least comprises the following steps:
Perform step S10, determine the process type of carrying out present frame, described process type corresponds to the instruction in order to process present frame.Particularly, described process type comprises the one in estimation, video scaling, airspace filter, time-domain filtering.Often kind of process type corresponds respectively to the different instructions in order to process present frame.
It should be noted that, described instruction can be preset in advance and be fixed in controller, or is configured according to different process types.When allowing to be configured according to different process types, before execution step S10, can step S00 be performed, according to different process types, the instruction that configuration is corresponding.
Particularly, for meeting the support completely to 4 kinds of computings, 4 kinds of computings being divided into 2 execution cycles complete, in the period 1, realizing interpolation arithmetic, realize in second round asking difference operation.If some computing only needs interpolation arithmetic or only demand difference operation, then realize interpolation in the period 1, second round bye, or period 1 bye, realizes asking difference second round.By such instruction configuration, the unification of computing can be realized, effectively reduce arithmetic logic.
This embodiment can process estimation, video scaling, airspace filter, time-domain filtering 4 kinds process type, so the instruction of corresponding configuration comprises:
When described process type is estimation, the instruction of the first execution cycle is for carry out the following interpolation in n rank and n rank to reference macroblock, and the instruction of the second execution cycle is that the reference macroblock after described macroblock to be encoded and interpolation asks poor;
When described process type is video scaling, the instruction of the first execution cycle is carry out the following interpolation in n rank and n rank to the macroblock to be encoded of present frame, and the instruction of the second execution cycle is bye;
When described process type is airspace filter, the instruction of the first execution cycle is carry out the following interpolation in n rank and n rank to the macroblock to be encoded of present frame, and the instruction of the second execution cycle is bye;
When described process type is time-domain filtering, the instruction of the first execution cycle is bye, and the instruction of the second execution cycle is ask poor to the reference macroblock in described macroblock to be encoded and reference frame.
The described corresponding instruction for process type configuration also comprises: each coefficient in the following interpolation formula in n rank and n rank in configuration period 1 interpolation instruction.Described interpolation formula is: P=round [(α
1a
1+ α
2a
2+ ... + α
n-1a
n-1+ α
na
n)/2
n], wherein: P is gained pixel after interpolation, round () function is for returning the integer value that rounds up, A
1~ A
nfor n known pixels adjacent in same a line or same row, α
1~ α
nfor each coefficient of described interpolation formula.
Such as: according to H.264 standard, carry out six jump values, then each coefficient of six rank interpolation formulas is configured to [1 ,-5,20,20 ,-5,1].Such as: carry out quadravalence interpolation, then coefficient is only 4, the coefficient of the 1st and the 6th pixel can be set to 0, only arrange 4 coefficients.For another example: when the first execution cycle is bye, now can be considered and carry out single order interpolation arithmetic to macroblock to be encoded itself, only need the coefficient of known pixels itself to be set to 1, the coefficient of rest of pixels is set to 0.
With reference to figure 1, continue to perform step S20, obtain present frame.
Particularly, for estimation, its operation result is the residual error in reference frame in reference macroblock and present frame between macroblock to be encoded.Equally, time-domain filtering is also ask poor to 2 macro blocks in reference frame and present frame.And for video scaling and airspace filter, only need that interpolation arithmetic is done to 1 macro block in present frame and can obtain result, it asks poor execution cycle to be wheel dummy status.Distinguish to some extent, video scaling comprises laterally longitudinally 2 interpolation, and each filtering is all the difference first obtaining this pixel some pixel adjacent with vertical and horizontal, these differences is carried out computing and tables look-up being added in original pixel, obtains new pixel.Horizontal filtering will be input based on longitudinal filtered result.
It should be noted that, this embodiment is based on H.264 standard, so accordingly, reference frame, present frame and the macro block that wherein contains are also all based on H.264 standard.It will be understood by those skilled in the art that H.264 standard is compared with other existing video encoding standard simultaneously, there is higher data compression ratio, more outstanding image quality can be provided under identical bandwidth.Accordingly, the computation complexity of H.264 encoding is higher compared with other existing video encoding standard.Therefore, can meet on the basis of H.264 standard, this embodiment is compatible other digital video coding standard existing natch, such as: H.263, MPEG-4, AVS etc.
With reference to figure 1, continue to perform step S30, adopt the instruction that the process type for the aforementioned present frame determined configures, successively each macroblock to be encoded of described present frame is carried out to the process of each execution cycle.
Particularly, at the first execution cycle, adopt the instruction corresponding with process type to carry out interpolation to macroblock to be encoded in present frame, or adopt the instruction corresponding with process type to carry out interpolation to reference macroblock in reference frame, or bye.At the second execution cycle, adopt the instruction corresponding with process type to ask poor to the reference macroblock after macroblock to be encoded and interpolation, or adopt the instruction corresponding with process type to ask poor to reference macroblock in described macroblock to be encoded and reference frame, or bye.And two steps in repetitive cycling, until each macroblock to be encoded in present frame is all disposed.
Further, during estimation, at the first execution cycle, the following interpolation in n rank and n rank is carried out to the reference macroblock in reference value, at the second execution cycle, the reference macroblock after described macroblock to be encoded and interpolation is asked poor.Or during video scaling/airspace filter, at the first execution cycle, the following interpolation in n rank and n rank is carried out, in the second execution cycle bye to described macroblock to be encoded.Or during time-domain filtering, in the first execution cycle bye, at the second execution cycle, the reference macroblock in described macroblock to be encoded and reference frame is asked poor.
It should be noted that, this embodiment adopts pipelined operation, namely while the second execution cycle instruction is performed to current macroblock to be encoded, next macroblock to be encoded is performed to the instruction of the first execution cycle, to ensure from second period, each cycle has a result to export afterwards.
In view of in H.264 standard, macro block is defined as 16 × 16,16 × 8,8 × 16,8 × 8,8 × 4,4 × 8,4 × 4 totally 7 kinds of patterns, so the reference macroblock of this embodiment or macroblock to be encoded are at least 4 × 4 pixel sizes, to meet the lowest limit requirement of H.264 standard.
Particularly, described interpolation comprises the following interpolation in n rank and n rank or bye, and n is positive integer.H.264 the highest six rank interpolation are supported at present in standard, the interpolation of its 1/2 pixel calculates 1/2 middle pixel based on 6 pixels adjacent in same a line or same row, even to the reference macroblock interpolation of 14 × 4 pixel size, then at least need 10 × 10 pixels centered by these 4 × 4 pixels.
It should be noted that, although it will be understood by those skilled in the art that and at present H.264 the highlyest in standard only support six rank interpolation, should not be construed as the present invention and be only applicable to the following interpolation in six rank and six rank.In fact, the present invention is intended to by time-sharing multiplexing technology, reduces the quantity of hardware cell, reduces computational complexity, to the implementation of wherein concrete interpolation, is not specifically limited.
Below in conjunction with the drawings and specific embodiments, technical scheme of the present invention is described further.Fig. 2 is the working timing figure of reusable pixel processing method one embodiment of the present invention.With reference to figure 2, the present embodiment contains altogether 7 process, is respectively continuous print 3 estimation, afterwards 1 video scaling, 1 airspace filter and 2 continuous print time-domain filterings.
It should be noted that, the present embodiment is for illustrate the present invention better, so the transformation of process type is comparatively extreme.In actual applications, often only same process is carried out in a period of time, such as: during estimation, all complete to each macroblock to be encoded in major general's present frame and each reference macroblock in reference frame and ask poor, and after obtaining motion vector and residual error data according to optimization matching function, just likely change process type.So corresponding instruction usually can not be complicated like this to the present embodiment.
In the present embodiment, the instruction of corresponding configuration refers to following table 1:
Sequence number |
Process type |
First execution cycle instruction |
Second execution cycle instruction |
1 |
Estimation |
Interpolation |
Ask poor |
2 |
Estimation |
Interpolation |
Ask poor |
3 |
Estimation |
Interpolation |
Ask poor |
4 |
Video scaling |
Interpolation |
Bye |
5 |
Airspace filter |
Interpolation |
Bye |
6 |
Time-domain filtering |
Bye |
Ask poor |
7 |
Time-domain filtering |
Bye |
Ask poor |
Composition graphs 2 illustrates specific works process.
First is treated to estimation, so according to corresponding instruction, carries out interpolation in the period 1 to the macroblock to be encoded of present frame.
Correspond to the second round of the first process and the period 1 of the second process second round, so perform the instruction corresponding to second round of the first process-carried out asking poor by the reference macroblock after macroblock to be encoded and interpolation, poor result is asked in output.Instruction-interpolation that the period 1 that execution simultaneously second processes is corresponding.
Period 3 corresponds to the second round of the second process and the period 1 of the 3rd process, so perform the instruction corresponding to second round of the second process-ask poor, Output rusults, performs instruction-interpolation corresponding to the period 1 of the 3rd process simultaneously.
Period 4 corresponds to the second round of the 3rd process and the period 1 of the 4th process, so perform the instruction corresponding to second round of the 3rd process-ask poor, Output rusults.The instruction that the period 1 that execution simultaneously the 4th processes is corresponding.Because the process type conversion of the 4th process is video scaling, the instruction of corresponding video scaling first execution cycle is interpolation, so perform interpolation arithmetic herein.
Period 5 corresponds to the second round of the 4th process and the period 1 of the 5th process, so instruction-bye that the second round performing the 4th process is corresponding, Output rusults (result namely after interpolation), the instruction that the period 1 that execution simultaneously the 5th processes is corresponding.Because the process type conversion of the 5th process is airspace filter, the instruction of corresponding airspace filter first execution cycle is interpolation, so perform interpolation arithmetic herein.
Period 6 corresponds to the second round of the 5th process and the period 1 of the 6th process, so instruction-bye that the second round performing the 5th process is corresponding, Output rusults (result namely after interpolation), the instruction that the period 1 that execution simultaneously the 6th processes is corresponding.Because the process type conversion of the 6th process is time-domain filtering, the instruction of corresponding time-domain filtering first execution cycle is bye, so perform bye herein.
7th cycle corresponded to the second round of the 6th process and the period 1 of the 7th process, so perform the instruction corresponding to second round of the 6th process-ask poor, Output rusults (namely asking the result after difference), instruction-bye that the period 1 that execution simultaneously the 7th processes is corresponding.
8th cycle corresponded to the second round of the 7th process, so perform the instruction corresponding to second round of the 7th process-ask poor, and exported and asked poor result.
Complete whole process thus, from second round, each cycle has a result to export.
Realize time-sharing multiplex by the configuration of above-mentioned instruction, make to realize 4 kinds of dissimilar calculation functions by unified arithmetic logic, simplify storage and arithmetic logic.
It should be noted that, as seen through the above description of the embodiments, the mode that partly or entirely can add required general hardware platform by software that those skilled in the art can be well understood to the application realizes.Based on such understanding, the technical scheme of the application can embody with the form of software product the part that prior art contributes in essence in other words, this computer software product can be stored in storage medium, as ROM/RAM, magnetic disc, CD etc., comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) perform the method described in some part of each embodiment of the application or embodiment.
The application can be used in numerous general or special purpose computing system environment or configuration.Such as: personal computer, server computer, handheld device or portable set, laptop device, multicomputer system, system, set top box, programmable consumer-elcetronics devices, network PC, minicom, mainframe computer, the distributed computing environment (DCE) comprising above any system or equipment etc. based on microprocessor.
The application can describe in the general context of computer executable instructions, such as program module.Usually, program module comprises the routine, program, object, assembly, data structure etc. that perform particular task or realize particular abstract data type.Also can put into practice the application in a distributed computing environment, in these distributed computing environment (DCE), be executed the task by the remote processing devices be connected by communication network.In a distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium comprising memory device.
Present invention also offers a kind of reusable video frequency processing chip, by the mode of shared register and arithmetic element interpolation arithmetic with ask difference operation to unite.First define a unified register array, for depositing the pixel of parallel processing, then the arithmetic element array that definition one is unified, this array both can carry out asking poor, n rank interpolation arithmetic can be carried out again, last for estimation, video scaling, these the dissimilar process of spatial domain/time-domain filtering, with diverse ways configuration register array and arithmetic element array, corresponding result can be produced.
Fig. 3 is the structural representation of the embodiment of reusable video frequency processing chip of the present invention.With reference to figure 3, this embodiment comprises:
Determining unit 10, acquiring unit 20, register array 30, clock control cell 40, controller 50, arithmetic element array 60.
Particularly, determining unit 10, for determining the process type of carrying out present frame, described process type corresponds to the instruction in order to process present frame; Acquiring unit 20, for obtaining present frame; Register array 30, for temporary when the reference macroblock in the reference frame needed for pre-treatment and the macroblock to be encoded in present frame; Clock control cell 40, for providing execution cycle; Controller 50, for reading when the reference macroblock needed for pre-treatment and macroblock to be encoded, and obtain instruction corresponding to present frame process type, control algorithm cell array 60 carries out corresponding computing; Arithmetic element array 60, for according to the instruction of controller 50 and execution cycle, carries out corresponding computing to the reference macroblock in register array 30 or macroblock to be encoded.
This embodiment also can comprise instruction dispensing unit 00, for according to different process types, configures the instruction corresponding with process type.
This embodiment supports one or more process types in estimation, video scaling, airspace filter and time-domain filtering.
Particularly, corresponding with process type instruction comprises:
When described process type is estimation, the instruction of the first execution cycle is for carry out the following interpolation in n rank and n rank to described reference macroblock, and n is positive integer, and the instruction of the second execution cycle is that the reference macroblock after described macroblock to be encoded and interpolation asks poor;
When described process type is video scaling, the instruction of the first execution cycle is for carry out the following interpolation in n rank and n rank to described macroblock to be encoded, and n is positive integer, and the instruction of the second execution cycle is bye;
When described process type is airspace filter, the instruction of the first execution cycle is for carry out the following interpolation in n rank and n rank to described macroblock to be encoded, and n is positive integer, and the instruction of the second execution cycle is bye;
When described process type is time-domain filtering, the instruction of the first execution cycle is bye, and the instruction of the second execution cycle is ask poor to the reference macroblock in described macroblock to be encoded and reference frame.
Above-mentioned instruction can be preset in advance and be fixed in controller 50, or is configured according to different process types.When allowing to be configured according to different process types, by instruction dispensing unit 00 according to different process types, configure the instruction corresponding with process type.Such as: under certain computing environment, only have the processing demands of estimation and video scaling, then instruction dispensing unit 00 only needs to configure the instruction corresponding with estimation and video scaling.
Described instruction dispensing unit 00 also can comprise: coefficient dispensing unit 01, for configure the first execution cycle interpolation instruction in each coefficient of the following interpolation formula in n rank and n rank.
The highest six rank interpolation are supported at present in view of in H.264 standard, the interpolation of its 1/2 pixel calculates 1/2 middle pixel based on 6 pixels adjacent in same a line or same row, computing formula based on six rank interpolation H.264 can be: P=round [(A-5B+20C+20D-5E+F)/32], by coefficient dispensing unit 01, the coefficient of interpolation formula is configured to [1,-5,20,20,-5,1].For another example: carry out quadravalence interpolation, then coefficient is only 4, and the coefficient of the 1st and the 6th pixel is set to 0 by coefficient dispensing unit 01, only arranges 4 coefficients.For another example: when the first execution cycle is bye, now can be considered and carry out single order interpolation arithmetic to macroblock to be encoded itself, only need, by coefficient dispensing unit 01, the coefficient of known pixels itself is set to 1, the coefficient of rest of pixels is set to 0.
Particularly, described controller 50 also can comprise rhythm control unit 51, for processing each macroblock to be encoded of present frame in a pipeline fashion successively according to execution cycle, that is: while arithmetic element array 60 adopts the second corresponding execution cycle instruction to carry out corresponding computing to current macroblock to be encoded, the first corresponding execution cycle instruction is adopted to carry out corresponding computing to next macroblock to be encoded, to ensure from second period, each cycle has a result to export afterwards.
The configuration of register array 30 is at least that interpolation still asks poor, the highest n rank interpolation and carry out process to how many pixels simultaneously relevant with related operation.Particularly, if only process 1 pixel at every turn, when related operation is for asking poor, only need in register array 30 to preserve reference macroblock and macroblock to be encoded; And related operation is when being interpolation, need in register array 30, according to the highest n rank interpolation, to preserve the individual pixel size of at least (a+n) × (a+n) centered by macro block (such as: a × a pixel size).If speed up processing, m pixel is processed simultaneously, then also need be multiplied by the pixel quantity m of parallel processing on aforementioned base.
In the h .264 standard macro block is defined as 16 × 16,16 × 8,8 × 16,8 × 8,8 × 4,4 × 8,4 × 4 totally 7 kinds of patterns, so the reference macroblock of this embodiment or macroblock to be encoded are at least 4 × 4 pixel sizes, to meet the lowest limit requirement of H.264 standard.Meanwhile, H.264 support the highest six rank interpolation at present in standard, the interpolation of its 1/2 pixel calculates 1/2 middle pixel based on 6 pixels adjacent in same a line or same row.So when reference macroblock or macroblock to be encoded are a × a pixel size, register array 30 is at least the individual pixel size of (a+6) × (a+6).On this basis, if carry out interpolation to the reference macroblock of 14 × 4 pixel size or macroblock to be encoded, then register array 30 is at least 10 × 10 pixel sizes centered by these 4 × 4 pixels.
The configuration of arithmetic element array 60 is at least that interpolation still asks poor, the highest n rank interpolation and carry out process to how many pixels simultaneously relevant with related operation.First, for meeting computing demand, arithmetic element array 60 at least comprises the first execution cycle array 61 and the second execution cycle array 62.Particularly, the first execution cycle array 61 at least comprises filter permutation, and the second execution cycle array 62 at least comprises subtracter array.Described first execution cycle array 61 is relevant with the highest n rank interpolation, also with to carry out process to how many pixels relevant simultaneously.The quantity of parallel processing is more, and execution speed is faster, and correspondingly the cost of hardware configuration is higher.Described second execution cycle array 62 is with to carry out process to how many each pixel relevant simultaneously.
The estimation of 4x4 pixel is carried out with two execution cycles, the highest support six rank interpolation is example, then need the register array 30 configuring few 10x10 pixel size, can support to deposit the data that 4x4 pixel carries out six rank interpolation needs, also can support that 4x4 pixel asks the data required for difference.
Meanwhile, configuration 16 six rank filter arrays and 16 subtracter arrays are also needed.
Above-mentioned register array 30 and arithmetic element array 60 can the different operating such as passive movement estimation, video scaling, spatial domain/time-domain filtering share, and the core algorithm supporting these to operate.
The present invention is based on the reusing design technology of arithmetic unit and structure, analyze and the similitude of arithmetic unit and structure and compatibility in extraction module and system algorithm, make single arithmetic element can time-sharing multiplex in polyalgorithm, reduce the complexity of chip calculating and stored logic, simultaneously due to the decreased number of arithmetic element, chip can be realized with less area, thus save chip area and power consumption.
Further, in the present invention, give the solution based on H.264 standard, to meet the requirement of the International video coding standard more widely of application at present.
Further, the present invention, by configuring the coefficient of interpolation formula, makes the bye in the first execution cycle be equivalent to the single order interpolation arithmetic carried out current pixel itself, has unified computing, simplified hardware logic.
Further, the present invention also can adopt each macroblock to be encoded of pipeline system process present frame, improves treatment effeciency.
Although the present invention with preferred embodiment openly as above; but it is not for limiting the present invention; any those skilled in the art without departing from the spirit and scope of the present invention; the Method and Technology content of above-mentioned announcement can be utilized to make possible variation and amendment to technical solution of the present invention; therefore; every content not departing from technical solution of the present invention; the any simple modification done above embodiment according to technical spirit of the present invention, equivalent variations and modification, all belong to the protection range of technical solution of the present invention.