Summary of the invention
In order to solve existing technical problem in the above-mentioned prior art, primary and foremost purpose of the present invention is to provide a kind of video decoding optimization method, format conversion to the most time taking two-dimentional IDCT and YCbCr adopts streamline control respectively, increased substantially the efficient of video decode, under lower decoding chip operating frequency, realized the real-time video decoding.
Another object of the present invention is to provide a kind of video decoding optimization device, realize above-mentioned two-dimentional IDCT and the integrated video decoding optimization method of YCbCr2RGB.
Primary and foremost purpose of the present invention is achieved through the following technical solutions: a kind of video decoding optimization method, it is characterized in that, and may further comprise the steps: step 101, reading instruction for processing data is analyzed the algorithmic formula that deal with data adopted; Step 102 according to the described analysis result of step 101, selects respective algorithms to realize the processing of data, and described respective algorithms is two-dimentional IDCT algorithm or YCbCr2RGB algorithm; Step 103 adopts pipeline control mode, carries out described respective algorithms.
Preferably, step 103 comprises following step: step 113, to command unit input clear command; Step 123 is to initial address, address change amplitude, the operational order of command unit input access number; Step 133 according to the operational order that step 123 is imported, is selected first state machine or second state machine, enters two-dimentional IDCT operation stages or YCbCr2RGB operation stages respectively; Adopt pipeline control mode, carry out reading of data in IDCT computing or the YCbCr2RGB computing, operation result is handled, stored to the data that read.
Preferably, step 133 adopts pipeline control mode, and the step of carrying out the IDCT computing is: at first by first state machine data select signal and latch signal in algoritic module and the storage address module are operated; In each state of first state machine,, give different data respectively to data select signal in the algoritic module and latch signal; Again from the different addresses peek of memory, carry out the row, column repeatedly computing that circulates successively by an adder and multiplier, the intermediate operations result cache is in memory; Row or column circulates repeatedly in the calculating process, during the flag bit sign=1 of first state machine, the computing of one dimension IDCT is finished, and operation result is stored in the memory.
Preferably, step 133 adopts pipeline control mode, and the step of carrying out the YCbCr2RGB computing is: at first by second state machine data select signal in the algoritic module and latch signal are operated; In each state of second state machine,, give different data respectively to data select signal in the algoritic module and latch signal; Read in two groups of YCbCr values of the computing of wanting again by input/output register from the outside, carry out the format conversion of YcbCr to RGB by two adders and a multiplier; When the flag bit sign=1 of second state machine, the YCbCr2RGB computing finishes, and operation result is stored in the input/output register.
Another object of the present invention is achieved through the following technical solutions: a kind of video decoding optimization device is characterized in that comprising: the command unit that is used to read the deal with data instruction; Be used for the algorithmic formula that analyzing and processing data adopts, the kernel control module of selecting IDCT algorithm or YCbCr2RGB algorithm that data are handled, kernel control module comprises first state machine and second state machine of the streamline control that is used for realizing respectively two-dimentional IDCT algorithm and YCbCr2RGB algorithm; Be used for input/output register and memory for described first state machine and the second state machine computing peek and kernel control module storage operation result; Be used to carry out the rudimentary algorithm module of IDCT algorithm or YCbCr2RGB algorithm; Be used for the result that operation result with the rudimentary algorithm module exports input/output register or memory to and select output module; And the access numerical control molding piece that is used for the access of control storage data; Described result selects output module to be connected with rudimentary algorithm module, input/output register, memory respectively, command unit, input/output register, rudimentary algorithm module are connected with kernel control module respectively, and access numerical control molding piece is connected between memory and the kernel control module.
The present invention has the following advantages with respect to prior art: two-dimentional IDCT the most time-consuming in the video decode and YCbCr2RGB format conversion are integrated in the system, move two-dimentional IDCT algorithm or YCbCr2RGB algorithm according to different instructions again.In the process of two-dimentional IDCT algorithm of operation or YCbCr2RGB algorithm,, thereby on hardware circuit, only need a multiplier and two additions to the implementation employing pipeline control mode of these two algorithms; Pipeline control mode in the algorithm implementation has not only increased substantially arithmetic speed, has also reduced the complexity of hardware circuit, under lower frequency, realizes the real-time video decoding transmission.
Embodiment
Below in conjunction with embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited thereto.
As shown in Figure 1, video decoding optimization method of the present invention, its implementation process may further comprise the steps:
Step 101, reading instruction for processing data is analyzed the algorithmic formula that deal with data adopted.In the present embodiment, the algorithmic formula that deal with data adopted is two-dimentional IDCT or YCbCr2RGB algorithmic formula.
Step 102 according to the described analysis result of step 101, selects respective algorithms to realize the processing of data.If analysis result is two-dimentional IDCT algorithmic formula, then select two-dimentional IDCT algorithm to realize the inverse discrete cosine transform of vision signal; If analysis result is the YCbCr2RGB algorithmic formula, then select the YCbCr2RGB algorithm to realize the format conversion of vision signal, become rgb format from the YCbCr format conversion.Described two-dimentional IDCT algorithm and YCbCr2RGB algorithm all meet the design frequency height, hardware occupies few these two conditions of resource.
Step 103 adopts pipeline control mode, carries out described respective algorithms.Described pipeline control mode is specially, to the following key step that algorithm is carried out, adopt pipeline system to come computing: to read data in memory RAM or the input/output register, operation result is handled, stored to the data that read to memory RAM or input/output register.In the present embodiment, operation result mainly refers to the processed video data.
This step 103 adopts pipeline control mode, as shown in Figure 2, carries out respective algorithms to realize the implementation procedure of data processing, specifically comprises following step:
Step 113, the command unit input clear command in input/output register.Mainly be to remove the operational order of carrying out last computing.
Step 123, initial address, address change amplitude, the operational order of the command unit input access number in input/output register.With the operational order of non-zero signal to start with.
Step 133 according to the operational order that step 123 is imported, is selected first state machine or second state machine, enters two-dimentional IDCT operation stages or YCbCr2RGB operation stages respectively; Adopt pipeline control mode, carry out several key steps of IDCT computing or YCbCr2RGB computing: reading of data, operation result is handled, stored to the data that read.
(1) be first state machine as if what select, what then enter is two-dimentional IDCT operation stages, then at first comes data select signal and latch signal in algoritic module and the memory RAM address module are operated by first state machine; In each state of first state machine,, give different data respectively to data select signal in the algoritic module and latch signal; Again from the different addresses peek of memory RAM, carry out the row, column repeatedly computing that circulates successively by an adder and multiplier, the intermediate operations result cache is in memory RAM; Circulate repeatedly in the calculating process, when the flag bit sign=1 of first state machine, the computing of one dimension IDCT just finished, with the processed video storage in memory RAM, thereby realized the streamline control of complicated algorithm function.Because algoritic module recycles, thereby only needs an adder and a multiplier, has reduced the shared chip area of module.
Below be the IDCT algorithm that adopts coding/decoding method of the present invention to finish, and the concrete implementation procedure of this algorithm:
The IDCT algorithmic formula is as follows:
This algorithm is realized inverse discrete cosine transform, has used the row, column method of computing successively, finishes the once IDCT computing of two dimension.The transformation matrix of acquiescence input is with row order input from top to bottom, need move twice the IDCT algorithm in DSP_VFU.Special feature of the present invention is to support the matrix operation of 4~255 types, is carried out the computing of dissimilar matrixes by input instruction.
When operation IDCT algorithm,, deposit in the memory RAM at first with the cosine value of needs etc.Two-dimentional IDCT computing with 8 * 8 matrixes is an example, and one group of cosine value of needs is deposited among the RAM that determines the address.The MPEG2 cosine value of acquiescence is the value after having enlarged 65536 times, so that higher precision is arranged; And what follow in H264 is " IDCT " computing of integer, and the value of acquiescence is for enlarging twice.In row order input store RAM 3 from top to bottom, the integer cosine value A1 of MPEG2 is as follows:
A2 is as follows for the pairing hexadecimal cosine value of above-mentioned integer cosine value A1:
The step of computing is: after treating the input IDCT operational order first time, kernel control module is peek automatically from memory RAM, after finishing the IDCT computing of specified type matrix by control first adder and multiplier then, the result selects successively that computing is the intact data of output module to deposit among the RAM with matrix form, after finishing IDCT computing for the first time, with the flag bit sign set of first state machine.The flag bit sign of initialization first state machine then, operation is the IDCT computing for the second time, kernel control module is peek automatically from memory RAM, computing, at last operation result is deposited in the address ram of appointment, the flag bit sig n set of first state machine, so far once complete two-dimentional IDCT computing is finished.
(2) be second state machine as if what select, what then enter is the YCbCr2RGB operation stages, then at first comes data select signal in the algoritic module and latch signal are operated by second state machine; In each state of second state machine,, give different data respectively to data select signal in the algoritic module and latch signal; Read in two groups of YCbCr values of the computing of wanting again by input/output register from the outside, carry out the format conversion of YcbCr to RGB by two adders and a multiplier; When the flag bit sign=1 of second state machine, the YCbCr2RGB computing finishes, and operation result is stored in the input/output register.
Below be that the YCbCr that adopts coding/decoding method of the present invention to finish changes the RGB algorithm, and the concrete implementation procedure of this algorithm:
The form of format conversion is respectively:
R=1.164(Y-16)+1.596(Cr-128);
G=1.164(Y-16)-0.813(Cr-128)-0.392(Cb-128);
B=1.164(Y-16)+2.017(Cb-128);
This algorithm, in once command, the YCbCr that has promptly finished two groups changes the RGB conversion, and the rgb format that obtains is RGB565, represents color with 16, and wherein red R is 5, and green G is 6, blue B is 5.
Calculating process is: two groups of YCbCr values of computing are put into input/output register, import operational order then, kernel control module 2 is peeked from input/output register, again by control first adder 31, second adder 32 and multiplier 33 computings, two groups of RGB565 values are put into input/output register 1 after finishing, and with the flag bit set of second state machine.Do not use memory RAM in this algorithm.
The concrete streamline of step 133 as shown in Figure 3.For idct transform,, from memory RAM, get the A number at first clock of operational order; Get the B number at second clock then, simultaneously the A number is carried out primary computing; Then, from memory RAM, get the C number, simultaneously the B number is carried out primary computing, and simultaneously the A number operation result first time is carried out the computing second time at the 3rd clock; The 4th clock, peek D carries out the computing first time to the C data simultaneously, and the B data are carried out the computing second time, and the operation result of A number is stored in the memory RAM, by that analogy.For the YCbCr2RGB format conversion, its streamline and idct transform similar; Different just peeks read to input/output register, and operation result also is to be stored in the input/output register.Like this, just realized the acceleration of hardware by the algorithm of streamline control, video two dimension IDCT processing and transmission and YCbCr2RGB format conversion just do not need to write the complicated algorithm software program and realize, only need input control order to tell this processing of kernel control module which data, kernel control module and peripheral hardware circuit thereof will read automatically in the mode of high performance pipeline, processing, write-back operation result.
Apparatus of the present invention mainly adopt DSP_VFU (Digital Singnal Processor-Video FunctionalUnit) to examine existing video decoding optimization method.When performing step 101, command unit by input/output register 1 reads the instruction that needs to handle which kind of data from the outside, and the instruction of being read is transferred to kernel control module 2, analyze by 2 pairs of algorithmic formulas that deal with data adopted of kernel control module.During performing step 102, select corresponding algorithm, data are handled by kernel control module 2.
When performing step 103, described command unit is arranged in the input/output register 1, selects first state machine 22 or second state machine 23 by the big state machine in the kernel control module 2 21.In addition, kernel control module 2 is also coordinated the relation between two-dimentional IDCT and the YCbCr2RGB algorithm, control the state variation of first state machine 21, second state machine 22 according to calculating process, to first adder 31, second adder 32 and multiplier 33 output control signals, realize the streamline control of two-dimentional IDCT and YCbCr2RGB respectively.
(1) a bit concrete, the structure of realization IDCT computing as shown in Figure 4, comprises that mainly kernel control module 2, rudimentary algorithm module, memory RAM 3, access numerical control molding piece, result select output module 4, input/output register 1.The rudimentary algorithm module is selected output module 4 to be connected, and mainly is made up of first adder 31 and multiplier 33 with kernel control module 2, access numerical control molding piece, result respectively.Access numerical control molding piece mainly is made up of RAM Data Control module 51 and address ram control module 52.
First adder 31 and multiplier 33 all have data select signal and data latch signal.51 major controls of RAM Data Control module are from the RAM reading of data; The storage address of address ram control module 52 major control RAM data, this address control module 52 can be implemented in access number in the different addresses of height 16 bit wides of 32 bit wide RAM.
The result selects output module 4, and main instruction according to the command unit output in input/output register 1 is selected operation result is exported to input/output register 1, or external memory RAM 3.Input/output register 1, RAM initial address, address change amplitude and the instruction etc. of main temporary outside input.
What kernel control module was mainly realized is to peek from memory RAM, according to IDCT algorithm and operational order, data and control signal in rudimentary algorithm module output RAM, behind the rudimentary algorithm module arithmetic, operation result is write among the RAM, repeatedly computing circulates, at most can be in an instruction, carry out 65535 computings (255x255 matrix) continuously, can realize being up to the data manipulation of 16 bit wides, be applied as sequence circuit.
(2) a bit concrete, the structure of realization YCbCr2RGB format conversion as shown in Figure 4, comprises that mainly kernel control module 2, rudimentary algorithm module, result select output module 4, input/output register 1.The rudimentary algorithm module mainly is made up of the first adder 31 that has data select signal and data latch signal, second adder 32 and multiplier 33.Kernel control module 2 reads the YcbCr formatted data from input/output register 1, according to YcbCr algorithm and operational order, to rudimentary algorithm module output YcbCr formatted data and control signal, behind the rudimentary algorithm module arithmetic, operation result is write input/output register 1.