CN102223543B - Reference pixel read and storage system - Google Patents
Reference pixel read and storage system Download PDFInfo
- Publication number
- CN102223543B CN102223543B CN 201110157144 CN201110157144A CN102223543B CN 102223543 B CN102223543 B CN 102223543B CN 201110157144 CN201110157144 CN 201110157144 CN 201110157144 A CN201110157144 A CN 201110157144A CN 102223543 B CN102223543 B CN 102223543B
- Authority
- CN
- China
- Prior art keywords
- reference pixel
- motion vector
- data
- macroblock
- buffer memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention relates to a reference pixel read and storage system, comprising a motion vector prediction unit, a macroblock address generation unit and a reference pixel reading unit which are connected, wherein the motion vector prediction unit and the macroblock address generation unit are respectively connected with the reference pixel reading unit and a control unit, the motion vector prediction unit obtains the current cutting MV (mean value) according to the calculated MVP (mean value property); the macroblock address generation unit generates a reference pixel reading address according to the received MV; the reference pixel reading unit reads pixel data from an external buffer memory into an internal buffer memory space of the system. In the reference pixel read and storage system, different video processing algorithms can be combined, the current image cutting information can be automatically obtained to carry out vector prediction, the data is prepared before a coding-decoding device needs the data, so as to carry out motion compensation operation by the coding-decoding device, the operating speed and read capacity of the system are greatly improved under the premise ofoccupying small resource.
Description
Technical field
The present invention relates to video decoding filed, relate to the system that the reference pixel of motion compensation portion in the video decode technology reads concretely, particularly suitable but be not limited only to the H.264 video decoding system of standard.
Background technology
It is a most important components in the video coding system that reference pixel reads, and it is responsible for that the reference video data in the external memory is transported to codec inside and carries out the motion compensation computing for codec.Because the resolution of present video sequence is more and more higher, its data-moving amount is also increasing, therefore can under the less situation of DDR (Double Data Rate synchronous DRAM) bandwidth consumption, design one and be directed to the key that special-purpose DMA (Direct Memory Access, direct memory access) Video processing and the carrying of energy satisfying magnanimity reference image data becomes whole coding/decoding system design specially.
H.264 be the high compression digital video coding-coding device standard that is proposed by the joint video team (JVT, Joint Video Team) that ITU-T video coding expert group (VCEG) and ISO/IEC dynamic image expert group (MPEG) constitute jointly.Its relevant algorithm brief introduction is:
H.264 algorithm is macro block (MB) at a processing unit of video, and according to the computing demand, macro block also can further be divided into the form of Fig. 1 again.And the motion-vector prediction of reference pixel read module (MVP) and data read all are to cut apart according to the mode of Fig. 1, and its macroblock prediction as shown in Figure 2.The purpose of macroblock prediction is to find the MVP of current block for the information by adjacent block, that is to say the MVP that finds current block by A, B, C (D) piece.The computing of MVP has 4 kinds:
1) UP_8 * 8 predictions:
Be 16 * 8 and 8 * 16 in the macroblock partition as Fig. 1, i.e. UP_8 * 8 o'clock:
1. when the reference frame of the reference frame of current macro and the adjacent macroblocks of direction shown in Figure 2 is identical, the MVP of current macro (motion-vector prediction) the i.e. motion vector of adjacent macroblocks (MV) for this reason then.Be 8 * 16 cut apart among Fig. 1 such as current macro, and cut apart for that of the left side that if this reference frame index (refidx) is identical with the reference frame index (refidx) of A piece among Fig. 2, this MVP of cutting apart is MVA so.
2. when the B among Fig. 2, C macro block were all unavailable, then the MVP of current macro was the MV of A macro block.
3. if there is and only has the refidx with current macro identical among Fig. 2 among reidxA, the reidxB of A, B, C macro block, the reidxC, the i.e. MV of same macro for this reason of the MVP of current macro so.
2) P_Skip prediction:
Its reference frame index refidxL0 value is 0 o'clock, for the derivation of its motion vector mvL0 following restriction is arranged, in Fig. 2 if below in the condition any one be that very then two of motion vector mvL0 components are 0:
The address mbAddrA of a, macro block A is unavailable;
The address mbAddrB of b, macro block B is unavailable;
The refIdxL0A that reference frame draws of c, macro block A equals 0, and two components of mvL0A all equal 0;
The refIdxL0B that reference frame draws of d, macro block B equals 0, and two components of mvL0B all equal 0;
If above-mentioned condition all is not true, then by the 4th) mode of planting normal motion-vector prediction carries out computing, need make macroblock partition index (mbPartIdx)=0, inferior macroblock partition index (subMbPartIdx)=0, forward reference frame index (refIdxL0) and current inferior macroblock partition type (currSubMbType)=" na " are as input.And prediction output is directly composed to mvL0, because predicted value is the predicted value of motion vector.
3) B_Skip﹠amp; The B_Direct prediction:
B_Skip, B_Direct are identical when prediction processing, and their difference is that the former does not have residual error, and the latter has.The prediction of this part is divided into the time directly and direct two kinds of space.(mb_type) equals B_Skip or B_Direct_16x16 when the current macro type, and perhaps current inferior macro block (mb) type (sub_mb_type[mbPartIdx]) equals B_Direct_8 * 8 o'clock, will enter this part prediction and carry out MVP and handle.The concrete operation mode is according to determining from the situation of direct predictive mode Warning Mark position (direct_spatial_mv_pred_flag) signal that code stream gets: if direct_spatial_mv_pred_flag equals 1 then adopt the direct predictive mode in space, otherwise direct predictive mode of employing time.
4) prediction of normal condition:
When above-mentioned special circumstances do not satisfy, to being predicted as of MVP: judge the availability of the adjacent macroblocks that current son is cut apart among Fig. 2, if B, C macro block are neither available and the A macro block can be used, the mvpLX that the MV information of A macro block is directly composed to correspondence gets final product; Otherwise whether the reference key of judging adjacent macroblocks is consistent with the current reference key of cutting apart macro block, if consistent, that just composes the mvpLX of cutting apart to current son to this adjacent MV of cutting apart; If inconsistent, the computing of motion-vector prediction mvpLX will be provided by each motion vector intermediate value of cutting apart.(MV=0 of unavailable macro block refidx=-1) is: (MVA, MVB MVC), when the C macro block is unavailable, replace with the D macro block current macro MVP=median median prediction.After the computing of finishing MVP, the MVD addition that obtains in the MVP that obtains and the code stream, obtain the MV of current macro.By the reference key refidx that obtains in this MV and the code stream, can find the needed reference pixel of current macro interpolation.
The DMA that is responsible for the video data carrying at present in the processing system for video is a slave of whole codec mostly or forms module, demand according to codec is carried data, codec sends read-write requests, DMA goes moving data efficiently, has higher efficient although it is so, but under the more and more higher development trend of resolution, the efficient of this data transfer mode will be more and more slower.
Summary of the invention
The invention provides a kind of reference pixel and read storage system, can obtain the information that present image cuts apart automatically and carry out motion-vector prediction, before codec needs these data, just DSR, carry out the motion compensation computing for codec, have data transporting capacity efficiently, and take less system resource, improved serviceability and and the line speed of system.
Reference pixel of the present invention reads storage system, comprise the motion vector prediction unit, macroblock address generation unit and the reference pixel reading unit that connect successively, described motion vector prediction unit, macroblock address generation unit and reference pixel reading unit are connected with control unit with data bus interface respectively, wherein:
Motion vector prediction unit is the macro block vector correlation data that system provides, and carries out the current grand MVP of cutting apart and calculates, and the MVP that will calculate gained then obtains the current MV of cutting apart with corresponding MVD addition; The MV that last column that current macro is cut apart is cut apart writes external memory by data bus interface, as the last adjacent movable information of next line macro block, the MV of current macro is offered described macroblock address generation unit;
The macroblock address generation unit produces the address that reference pixel reads according to the MV that receives; According to the position that current macro is cut apart and son is cut apart, and the wide high information of image, image block organizational form according to the rules, calculate offset address, the burst transfer length of reference pixel and the altitude information that need read, add that the address of reading that forms reference pixel behind the base address of reference frame transfers to the read operation that data bus interface generates reference pixel;
The reference pixel reading unit reads in pixel data in the interior spatial cache of system from external memory.
The outer decode system of native system comprises for the related data that motion vector prediction unit provides: motion vector, the positional information of current macro in image and the wide high information of image of the image sequence of the reference frame index of current macro (refidx), motion vector residual error (MVD), current macro reference frame number (POC) value, last adjacent and same position macro block.Wherein image sequence number (POC) is mainly used in the playing sequence of identification image, also is used for when the inter prediction sheet is decoded the initial pictures sequence number of mark reference picture simultaneously.The video decoder of relevant criterion such as reference pixel of the present invention reads that storage system can be applied to meet H.264, AVS or MPEG.System of the present invention can be in conjunction with different video processnig algorithms, automatically obtain the information that present image cuts apart and carry out motion-vector prediction, and the video data of needs is transported to the inner buffer of DMA according to prediction result from DDR, before codec needs these data just DSR, carry out the motion compensation computing for codec, taking the speed of service and the reading capability that has improved system under the prerequisite of little resource greatly.
A kind of concrete scheme is, described motion vector prediction unit comprises interconnective movable information cache module and motion vector processing module, and described movable information cache module links to each other with the reference pixel reading unit, wherein:
The movable information cache module is made of input-buffer and output buffers, and being used for provides base address and burst transfer length to the reference pixel reading unit.The memory space of input-buffer and output buffers all is to be 60bits by a bit wide, and the degree of depth is that dual-port SRAM (mv_sram) and a bit wide of 12 are 64bits, and the degree of depth is that 2 dual-port SRAM (poc_sram) constitutes.
The motion vector processing module starts corresponding computing according to different macro block MVP types and obtains MV, and the MV that obtains is preserved and be passed to described macroblock address generation unit.The motion vector processing module to the processing procedure of data is: judge the MVP type according to the current macro type information 1.; 2. obtain the required movable information of MVP from buffer memory, the data that read from external memory need store into the corresponding internal memory; 3. the type of judging according to MVP starts corresponding MV computing; 4. finish after the MV computing, motion vector is saved in the corresponding internal memory.By the motion vector processing module to the effect of MV comprise: 1. the MV as opposite position is saved among the DDR; 2. be saved among the DDR as last adjacent MV; 3. use as the adjacent MV in this reference pixel read module left side.
Further, it is adjacent on the buffer memory, back to first reference frame movable information and POC that the input-buffer of described movable information cache module is used for, after the receiving system enabling signal enables, differentiate the current macro type earlier, if the type of macro block is B_direct_16x16 then does not read neighbor information, then do not read the back to first reference frame movable information if there is directly prediction in the macro block, otherwise will from internal memory, read total movement information to the motion vector static memory; That the output buffers of described movable information cache module is used for is adjacent on the buffer memory, current macro movable information and POC, after last height of current macro is cut apart the MV computing and finished, with the information of institute's buffer memory and be written out to external memory; Wherein the movable information of current macro when preserving as long as current macro exists forward prediction then to preserve propulsion information, then preserve when only there is back forecast in current macro the back to movable information;
The MVP type is judged in acting as according to the current macro type information of described motion vector processing module, from buffer memory, obtain the required movable information of MVP again, to store into the corresponding buffer memory from the data that external memory reads, type according to MVP starts corresponding MV computing then, after finishing the MV computing, the MV that obtains is saved in is transferred to the macroblock address generation unit in the corresponding buffer memory.The motion vector processing module is that the cutting apart of 8 * 8 of the current macro type (mb_type) that provides according to macro block s operation control module (mb_ctrl) and current macro/son is cut apart type, and (b8 * 8_part_type) adjudicate the current MVP type of cutting apart starts corresponding computing according to different MVP types then.
Another kind of concrete scheme is, the macroblock address generation unit includes address generation module and the reference picture packing module that is connected, wherein the address generation module is connected with control unit with the reference pixel reading unit respectively with reference picture packing module, the address generation module be used for to calculate the current reference image data of cutting apart drop on the offset address of the top left corner pixel of each pixel data bag, at the burst transfer length of current pixel packet and the inner height of pixel data bag, and these data are latched.Latched data offers described reference pixel reading unit and reads.As shown in Figure 3, system operates to pile line operation, As time goes on, the parallel sub-macro block of each functional unit also increases gradually, after having moved with a functional unit, the next functional unit that reruns is compared, and the streamline operational mode has significantly improved the operational efficiency of system.Because system operates to pile line operation, so data latch for table tennis.Latch by table tennis, can guarantee carrying out smoothly of streamline, in " with reference to pixel reads " this grade flowing water calculating process, the offset address that needs the top left corner pixel of each current data packet, the burst transfer length of current pixel packet and the inner height data of pixel data bag, these data were generated by " macroblock address generation " flowing water in last timeslice, and be latched in " ping ", the therefore offset address that carries out next packet that " macroblock address generation " just can be parallel with " with reference to pixel reads ", the computing of the inner height data of burst transfer length and pixel data bag.Finish and be latched in after the computing " pang ", after " with reference to pixel reads " finishes current computing like this etc., next just can get " pang " data carry out computing.Ping-pong operation can allow whole pile line operation carry out smoothly.
When described pixel data bag was latched, preferred a kind of mode was described address generation module when the pixel data bag is latched, and sheet External Reference image is that unit preserves according to piece, and luminance component and chromatic component are separately preserved.In order to guarantee the access efficiency of memory, sheet External Reference image is that unit preserves according to piece, and preserves not according to raster scan order, and luminance component and chromatic component are separated.For improving the access efficiency of the outer DDR of sheet, image preferably is unit packing storage with 1KB, wherein chromatic component is interlocked and preserves.
Further a kind of scheme is, in the described reference picture packing module, the degree of depth of luminance component buffer memory is 216, if macroblock partition be 8 * 8 and sub-macroblock partition be 4 * 8 or 4 * 4, when horizontal direction is image element interpolation simultaneously, the bit wide of luminance component buffer memory is 72bits, otherwise is 64bits.
The smallest partition of luminance component is 4 * 4, predicts that one 8 * 4 need 24 * 9 pixel values at most, and the luminance component (unidirectional) of a macro block of prediction needs 1728Bytes altogether, and (216 * 8pixels) memory spaces are as the reference picture element caching.The buffer memory bit wide is 72bits (9pixels), and the degree of depth is 216, and each Byte all writes and enables (wen) signal.Each macroblock partition/sub-macroblock partition uses the reference zone base address to fix, can be cut apart by this/cut apart type, macroblock partition number and the sub-macroblock partition of sub-macroblock partition number determine.Wherein macroblock partition type and macroblock partition number determine the reference zone base address of macroblock partition, and be as shown in table 1:
Table 1:
When only being 8 * 8 macroblock partition when comprising sub-macroblock partition, the offset address of sub-macroblock partition reference zone number is determined by sub-macroblock partition type and sub-macroblock partition, and is as shown in table 2:
Table 2:
The address of cutting apart reference zone is obtained by the base address in the table 1 and the offset address addition in the table 2.
When macroblock partition be 8 * 8 and sub-macroblock partition be 4 * 8 or 4 * 4, and horizontal direction is when being image element interpolation, buffer memory uses 72bits (9pixels) bit wide, otherwise buffer memory only uses 64bits (8pixels) bit wide.
Also a kind of scheme is, in the described reference picture packing module, the degree of depth of chromatic component buffer memory is 48, when macroblock partition is more than 8 * 8 or macroblock partition and sub-macroblock partition when being 8 * 8, the bit wide of chromatic component buffer memory is 64bits, otherwise be 48bits, and vacant 16bit bit wide is set to invalid bit.
When only considering the YUV420 form, chromatic component in the horizontal direction with vertical direction on resolution all be half of luminance component.So the smallest partition unit of single luminance component is 2 * 2, the interpolation of whenever carrying out 2 * 2 need read 3 * 3 pixels at most.Consider to save bandwidth, with green component Cb and alternately storage of red component Cr, the chromatic component (unidirectional) of then predicting a macro block needs 384Bytes altogether, and (24 * 16pixels) internal storage spaces are as the reference picture element caching.The bit wide of this buffer memory is 64bits (8pixels), and the degree of depth is 48, and each Byte all writes and enables (wen) signal gating.Each macroblock partition/sub-macroblock partition uses the reference zone base address to fix, can be cut apart by this/cut apart type, macroblock partition number and the sub-macroblock partition of sub-macroblock partition number determine.Wherein, the reference zone base address of macroblock partition type and macroblock partition number decision macroblock partition, as shown in table 3:
Table 3:
When only being 8 * 8 macroblock partition when comprising sub-macroblock partition, the offset address of sub-macroblock partition reference zone number is determined by sub-macroblock partition type and sub-macroblock partition, and is as shown in table 4:
Table 4:
The address of cutting apart reference zone is obtained by the base address in the table 3 and the offset address addition in the table 4.
When macroblock partition is more than 8 * 8 or macroblock partition and sub-macroblock partition when being 8 * 8, be divided into such as current chroma: (16 * 16 refer to the brightness data scale of a macro block to the branch pixel chroma interpolation of 16 * 16 macroblock partition, this moment, the actual size of cutting apart of corresponding colourity was 8 * 8) time, the whole 64bits of buffer memory (8pixels) bit wide used this moment; (4 * 4 refer to the brightness data scale of a macro block and ought currently be divided into 4 * 4 branch pixel chroma interpolations of cutting apart, this moment, the actual size of cutting apart of corresponding colourity was 2 * 2) time, use 48bits (6pixels) bit wide in the buffer memory this moment, and remaining 16bit bit wide is set to invalid bit.
Another concrete scheme is that described reference pixel reading unit comprises reference picture cache module and data bus interface; Wherein to reference video data, data bus interface is used for reading and transmitting data to the reference picture cache module for the front and back of storing moving compensation.
Preferably, in the described reference picture cache module, the luminance component buffer memory is four dual-port SRAMs of 7776Bytes (62.208Kbits), and the bit wide of every SRAM is 72bits (9pixels), and the degree of depth is 216; The chromatic component buffer memory is four dual-port SRAMs of 1536Bytes (12.288Kbits), and the bit wide of every SRAM is 64bits (8pixels), and the degree of depth is 48.The reference picture cache module is used for the front and back of motion compensation to the storage of reference video data, and the motion compensating module (MC) that these data are supplied with outside the system of the present invention uses.
Further scheme is that described data bus interface is the bus interface that meets the AXI bus specification.AXI (Advanced eXtensible Interface) is a kind of bus protocol, this agreement is most important parts in AMBA (Advanced Microcontroller Bus Architecture) 3.0 agreements that propose of ARM company, is a kind of towards high-performance, high bandwidth, the low bus on chip that postpones.
Data bus interface structure in system of the present invention has 1 independently read channel and 1 write access; Random length burst transfer (burst), reading the burst transfer scope is 1~4, writing the burst transfer scope is 2~8; Read-write ID is fixed as 1; The fixed bus bit wide is 64bits; Fixedly reading burst transfer (AR) and writing burst transfer (AW) type is increment mode (b01), does not have that lock is read (ARLOCK) and lock is write (AWLOCK) and operated, and response is not done in corresponding to reading (RRESP) and write response (BRESP).Also can use other data/address bus to realize correlation function, concrete parameter setting should adapt with the type of data/address bus.
Data bus interface is according to the first address of the reference frame base address memory space that is transmitted by system, and after calculating the offset address of current reference frame according to current refidx (reference frame index), the base address that obtains current reference frame pixel in the DDR; This address add by the macroblock address generation unit generate when the data-bias address that pre-treatment need be read, obtain the actual address of the data that need read; And then the burst transfer width that provides according to the macroblock address generation unit initiates read-write and handles, and needs the height of the data that read also to be provided by the macroblock address generation unit in packet.Add up and finish (the address accumulation amount is fixed as 32) by data bus interface in the address, relative offset address, burst transfer length and packet inner height latch after being calculated by the macroblock address generation unit.
Test is learnt, the storage system that reads reference pixel of the present invention can satisfy the decoding reference data DMA carrying demand of 1 road 1080p, 60 frames/s, support 8 road 576p, 30 frames/s decoding reference data DMA carrying demand at most, and can be in conjunction with different video processnig algorithms, automatically obtain the information that present image cuts apart and carry out motion-vector prediction, and the video data of needs is transported to the inner buffer of DMA according to prediction result from DDR, before codec needs these data, just DSR, carry out the motion compensation computing for codec.The data/address bus bit wide of system is 64bits, peak bandwidth<1.6GB, and total memory consumption roughly only is 78kbits, is taking the speed of service and the reading capability that has improved system under the prerequisite of little resource greatly.
Below in conjunction with the embodiment by the accompanying drawing illustrated embodiment, foregoing of the present invention is described in further detail again.But this should be interpreted as that the scope of the above-mentioned theme of the present invention only limits to following example.Do not breaking away under the above-mentioned technological thought situation of the present invention, various replacements or change according to ordinary skill knowledge and customary means are made all should comprise within the scope of the invention.
Description of drawings
Fig. 1 is the macroblock partition schematic diagram in the protocol algorithm H.264.
Fig. 2 is the macroblock prediction schematic diagram in the protocol algorithm H.264.
Fig. 3 is the flowing water work schematic diagram that reference pixel of the present invention reads storage system.
Fig. 4 is the structured flowchart that reference pixel of the present invention reads storage system.
Fig. 5 is the input-buffer storage organization schematic diagram of movable information cache module among Fig. 4.
Fig. 6 is the output buffers storage organization schematic diagram of movable information cache module among Fig. 4.
Fig. 7 is that address generation module packet pixel reads schematic diagram among Fig. 4.
Embodiment
As shown in Figure 4, reference pixel of the present invention reads storage system, comprise the motion vector prediction unit, macroblock address generation unit and the reference pixel reading unit that are connected, described motion vector prediction unit, macroblock address generation unit are connected with control unit with the reference pixel reading unit respectively, wherein:
Motion vector prediction unit is the macro block vector correlation data that system provides, and carries out the current grand MVP of cutting apart and calculates, and the MVP that will calculate gained then obtains the current MV of cutting apart with corresponding MVD addition; The MV that last column that current macro is cut apart is cut apart writes external memory by data bus interface, as the last adjacent movable information of next line macro block, the MV of current macro is offered described macroblock address generation unit;
The macroblock address generation unit produces the address that reference pixel reads according to the MV that receives; According to the position that current macro is cut apart and son is cut apart, and the wide high information of image, image block organizational form according to the rules, calculate offset address, the burst transfer length of reference pixel and the altitude information that need read, add that the address of reading that forms reference pixel behind the base address of reference frame transfers to the read operation that data bus interface generates reference pixel;
The reference pixel reading unit reads in pixel data in the interior spatial cache of system from external memory.
The outer decode system of native system comprises for the related data that motion vector prediction unit provides: motion vector, the positional information of current macro in image and the wide high information of image of the POC of the refidx of current macro (reference frame index), MVD (motion vector residual error), current macro reference frame (image sequence number) value, last adjacent and same position macro block.
Described reference pixel reading unit comprises reference picture cache module and data bus interface; Wherein to reference video data, data bus interface is used for reading and transmitting data to the reference picture cache module for the front and back of storing moving compensation.In the reference picture cache module, luminance component and chromatic component are separately preserved, wherein the luminance component buffer memory is four dual-port SRAMs of 7776Bytes, and the bit wide of every SRAM is 72bits, and the degree of depth is 216; The chromatic component buffer memory is four dual-port SRAMs of 1536Bytes, and the bit wide of every SRAM is 64bits, and the degree of depth is 48.Data bus interface is the bus interface that meets the AXI bus specification.It has 1 independently read channel and 1 write access; The random length burst transfer, reading the burst transfer scope is 1~4, writing the burst transfer scope is 2~8; Read-write ID is fixed as 1; The fixed bus bit wide is 64bits; Fixedly reading burst transfer (AR) and writing burst transfer (AW) type is increment mode (b01), does not have that lock is read (ARLOCK) and lock is write (AWLOCK) and operated, and response is not done in corresponding to reading (RRESP) and write response (BRESP).
Data bus interface is according to the first address of the reference frame base address memory space that is transmitted by system, and after calculating the offset address of current reference frame according to current refidx (reference frame index), the base address that obtains current reference frame pixel in the DDR; This address add by the macroblock address generation unit generate when the data-bias address that pre-treatment need be read, obtain the actual address of the data that need read; And then the burst transfer width that provides according to the macroblock address generation unit initiates read-write and handles, and needs the height of the data that read also to be provided by the macroblock address generation unit in packet.Add up and finish (the address accumulation amount is fixed as 32) by data bus interface in the address, relative offset address, burst transfer length and packet inner height latch after being calculated by the macroblock address generation unit.
Include interconnective movable information cache module and motion vector processing module in motion vector prediction unit, described movable information cache module links to each other with data bus interface in the reference pixel reading unit, wherein:
The movable information cache module is made of input-buffer and output buffers, is used for providing base address and burst transfer (burst) length to give the reference pixel reading unit.As shown in Figure 5 and Figure 6, the memory space of input-buffer and output buffers all is to be that (label is 0~59bit) to 60bits by a bit wide, the degree of depth is that 12 dual-port SRAM (mv_sram) and bit wide are that (label is 0~63bit) to 64bits, and the degree of depth is that 2 dual-port SRAM (poc_sram) constitutes.Difference is, in the input-buffer storage organization of Fig. 5, refl10_inf represents to store is with respect to the reference information of the macro block of position in the forward reference frame, in the output buffers storage organization of Fig. 6, what dec_frm represented to store is that current macro is finished the reference information that motion vector is handled the current macro that obtains after the computing, this information will be temporarily stored in output buffers, it is write in the external memory (DDR) after the whole decodings of the reference information of current macro are finished again.
The reference information of the last adjacent macroblocks of opposite position in the forward reference frame that up_infl0 among Fig. 5 and Fig. 6 represents to store.Up_infl1 represents to store back in the reference frame reference information of the last adjacent macroblocks of opposite position.RefInf:8b represents reference data information, these data are 8bit altogether, wherein (0~4bit) is refidx (reference frame sequence number) to low 5bit, 6bit is in the frame/the inter prediction sign is (in 0 representative frame, between 1 representative frame), 7bit is frame field mark (0 representative frame, 1 represents the field), 8bit is for pushing up field, an end mark (field, 0 representative top, 1 represents field, the end).Mvx:14b: the motion vector of expression x direction, this partial data takies the space of 14bit.Mvy:14b: the motion vector of expression y direction, this partial data takies the space of 12bit.POC_8 * 8_1:32b: represent the 1st image sequence that 8 * 8 sons are cut apart number (POC) data of current macro, take the space of 32bit.In like manner POC_8 * 8_2:32b then represents the 2nd the POC data that 8 * 8 sons are cut apart of current macro, by that analogy.
The storage organization that in traditional mode, does not have mv_sram and poc_sram, so traditional approach is exactly to go to carry data according to the demand of decoder, can initiatively not go to obtain these average informations of decoder.And after adopting the storage mode of system of the present invention, be that the reference data that decoder needs is moved in the reference picture cache module of reference pixel reading unit in advance, make decoder in the needs reference data, just can obtain these data at once.Needn't be as traditional mode, decoder sends request to DMA (direct memory access), and DMA just begins to remove data after receiving request then.
It is adjacent on the buffer memory, back to first reference frame movable information and POC that the input-buffer of movable information cache module is used for, after the receiving system enabling signal enables, differentiate the current macro type earlier, if the type of macro block is B_direct_16 * 16 then does not read neighbor information, then do not read the back to first reference frame movable information if there is directly prediction in the macro block, otherwise will from internal memory, read total movement information to the motion vector static memory.
That output buffers is used for is adjacent on the buffer memory, current macro movable information and POC, after last height of current macro is cut apart the MV computing and finished, with the information of institute's buffer memory and be written out to external memory; Wherein the movable information of current macro when preserving as long as current macro exists forward prediction then to preserve propulsion information, then preserve when only there is back forecast in current macro the back to movable information.
The motion vector processing module starts corresponding computing according to different macro block MVP types and obtains MV, and the MV that obtains is preserved and be passed to described macroblock address generation unit.The motion vector processing module to the processing procedure of data is: judge the MVP type according to the current macro type information 1.; 2. obtain the required movable information of MVP from buffer memory, the data that read from external memory need store into the corresponding internal memory; 3. the type of judging according to MVP starts corresponding MV computing; 4. finish after the MV computing, motion vector is saved in the corresponding internal memory.By the motion vector processing module to the effect of MV comprise: 1. the MV as opposite position is saved among the DDR; 2. be saved among the DDR as last adjacent MV; 3. use as the adjacent MV in this reference pixel read module left side.
In the macroblock address generation unit, include the address generation module, reference picture packing module and the reference picture cache module that are connected, wherein the address generation module is connected with the data bus interface of control unit with the reference pixel reading unit respectively with reference picture packing module, the address generation module be used for to calculate the current reference image data of cutting apart drop on the offset address of the top left corner pixel of each pixel data bag, at the burst transfer length of current pixel packet and the inner height of pixel data bag, and these data are latched.Latched data offers data bus interface and reads from DDR.Because system operates to pile line operation, therefore data herein latch for table tennis.
In order to guarantee the access efficiency of memory, when described pixel data bag is latched, when the address generation module latchs the pixel data bag, not that to be image unit with the pixel preserve according to the order of raster scan, but sheet External Reference image is preserved according to the order of raster scan for unit according to packet (packet), be that unit preserves according to raster scan and remain with the pixel in packet inside.Simultaneously luminance component and chromatic component are separately preserved.For improving the access efficiency of the outer DDR of sheet, image preferably is unit packing storage with 1KB, wherein chromatic component is interlocked and preserves.When reading the reference pixel address, elder generation runs through the pixel that needs in the packet, and then reads the pixel in other packets.When required data are distributed in 4 adjacent packets, for guaranteeing the access efficiency of memory, get by packet sequence number sequential read.As shown in Figure 7, when reading the zone when striding 4 packets, the packet sequence number according to 0~3 reads.Be example with packet0: suppose that the zone that " reading the zone " reads in this packet is that a block size is 13 * 12 PEL (picture element) matrix zone " A ", the inner height of " A " is 12 so.And the width of burst transfer (burst) should be R (13 * 8/64)+1=2; Wherein rounding operation is represented in " R () " computing, R (1.25)=1 for example, R (7.9)=7.13 * 8/64 the meaning is: because the width of reference zone " A " is 13 pixels, each pixel represents that with 8bit therefore the size of zone " A " each row of data amount is 13 * 8bit.In native system, a burst transfer (burst) is 64bit, the data that i.e. burst transmission can be carried 64bit from DDR (also can be 32bit or 128bit to inner buffer, look the burst transfer situation and determine, the design is defined as 64bit), therefore above data are being added 1 divided by 64 backs, just can obtain transmitting the required burst transfer of delegation's video data (burst) number.As long as tell every capable burst transfer (burst) number of data bus interface and packet inner height value in the reference pixel reading unit like this, just can obtain current packet and " A " zone has been removed altogether need carry out how many times burst transfer (burst).At above-mentioned burst transfer (burst) number of times in for example be: 2 * 12=24.
In read data packet, at first to determine the zone that each packets need reads during data.Be example with the luminance component, establish read the coordinate of position, the regional upper left corner in entire image for (start_p_posx, start_p_posy); Can determine height ref_p_height and the width ref_p_width of reference zone according to cutting apart type and location of interpolation; By above-mentioned parameter can calculate read the coordinate of position, the regional lower right corner in image (end_p_posx, end_p_posy), formula is:
end_p_posx=start_p_posx+ref_p_width
end_p_posy=start_p_posy+ref_p_height
Because the width of each packet and highly be 32pixels, can be by start_p_posx[10:5 relatively] (because each packet is 32pixels, 32 is 5 powers of 2, therefore [4:0] these 5 low bit positions of start_p_posx just do not need to have compared, and only get final product than higher bit position) and end_p_posx[10:5] judge whether reference zone strides packet in the horizontal direction.If the two equates that then horizontal direction is not striden packet; Otherwise stride packet.By comparing start_p_posy[10:5] and end_p_posy[10:5] judge whether reference zone strides packet in vertical direction, and determination methods is identical with horizontal direction; Still stride the situation of vertical orientation data bag and generate and read address and burst transfer length according to whether striding packet and the level of striding afterwards.
In described reference picture packing module, the degree of depth of luminance component buffer memory is 216, if macroblock partition be 8 * 8 and sub-macroblock partition be 4 * 8 or 4 * 4, when simultaneously horizontal direction is image element interpolation, the bit wide of luminance component buffer memory is 72bits, otherwise is 64bits.
The smallest partition of luminance component is 4 * 4, predicts that one 8 * 4 need 24 * 9 pixel values at most, and the luminance component (unidirectional) of a macro block of prediction needs 1728Bytes altogether, and (216 * 8pixels) memory spaces are as the reference picture element caching.The buffer memory bit wide is 72bits (9pixels), and the degree of depth is 216, and each Byte all writes and enables (wen) signal.Each macroblock partition/sub-macroblock partition uses the reference zone base address to fix, can be cut apart by this/cut apart type, macroblock partition number and the sub-macroblock partition of sub-macroblock partition number determine.Wherein macroblock partition type and macroblock partition number determine the reference zone base address of macroblock partition, and be as shown in table 1:
Table 1:
When only being 8 * 8 macroblock partition when comprising sub-macroblock partition, the offset address of sub-macroblock partition reference zone number is determined by sub-macroblock partition type and sub-macroblock partition, and is as shown in table 2:
Table 2:
The address of cutting apart reference zone is obtained by the base address in the table 1 and the offset address addition in the table 2.
When macroblock partition be 8 * 8 and sub-macroblock partition be 4 * 8 or 4 * 4, and horizontal direction is when being image element interpolation, buffer memory uses 72bits (9pixels) bit wide, otherwise buffer memory only uses 64bits (8pixels) bit wide.
In reference picture packing module, the degree of depth of chromatic component buffer memory is 48, and when macroblock partition is more than 8 * 8 or macroblock partition and sub-macroblock partition when being 8 * 8, the bit wide of chromatic component buffer memory is 64bits, otherwise be 48bits, and vacant 16bit bit wide is set to invalid bit.
When only considering the YUV420 form, chromatic component in the horizontal direction with vertical direction on resolution all be half of luminance component.So the smallest partition unit of single luminance component is 2 * 2, the interpolation of whenever carrying out 2 * 2 need read 3 * 3 pixels at most.Consider to save bandwidth, with green component Cb and alternately storage of red component Cr, the chromatic component (unidirectional) of then predicting a macro block needs 384Bytes altogether, and (24 * 16pixels) internal storage spaces are as the reference picture element caching.The bit wide of this buffer memory is 64bits (8pixels), and the degree of depth is 48, and each Byte all has the wen signal gating.Each macroblock partition/sub-macroblock partition uses the reference zone base address to fix, can be cut apart by this/cut apart type, macroblock partition number and the sub-macroblock partition of sub-macroblock partition number determine.Wherein, the reference zone base address of macroblock partition type and macroblock partition number decision macroblock partition, as shown in table 3:
Table 3:
When only being 8 * 8 macroblock partition when comprising sub-macroblock partition, the offset address of sub-macroblock partition reference zone number is determined by sub-macroblock partition type and sub-macroblock partition, and is as shown in table 4:
Table 4:
The address of cutting apart reference zone is obtained by the base address in the table 3 and the offset address addition in the table 4.
When macroblock partition is more than 8 * 8 or macroblock partition and sub-macroblock partition when being 8 * 8, be divided into such as current chroma: (16 * 16 refer to the brightness data scale of a macro block to the branch pixel chroma interpolation of 16 * 16 macroblock partition, this moment, the actual size of cutting apart of corresponding colourity was 8 * 8) time, the whole 64bits of buffer memory (8pixels) bit wide used this moment; (4 * 4 refer to the brightness data scale of a macro block and ought currently be divided into 4 * 4 branch pixel chroma interpolations of cutting apart, this moment, the actual size of cutting apart of corresponding colourity was 2 * 2) time, use 48bits (6pixels) bit wide in the buffer memory this moment, and remaining 16bit bit wide is set to invalid bit.
In described reference picture cache module, the luminance component buffer memory is four dual-port SRAMs of 7776Bytes (62.208Kbits), and the bit wide of every SRAM is 72bits (9pixels), and the degree of depth is 216; The chromatic component buffer memory is four dual-port SRAMs of 1536Bytes (12.288Kbits), and the bit wide of every SRAM is 64bits (8pixels), and the degree of depth is 48.The reference picture cache module is used for the front and back of motion compensation to the storage of reference video data, and these data are used for the motion compensating module outside the system of the present invention (MC).
The storage system that reads reference pixel of the present invention can satisfy the decoding reference data DMA carrying demand of 1 road 1080p, 60 frames/s, support 8 road 576p, 30 frames/s decoding reference data DMA carrying demand at most, the data/address bus bit wide of system is 64bits, peak bandwidth<1.6GB, total memory consumption is roughly 78kbits.
Claims (10)
1. reference pixel reads storage system, it is characterized by and comprise motion vector prediction unit, macroblock address generation unit and the reference pixel reading unit that is connected, described motion vector prediction unit is connected with control unit with the reference pixel reading unit simultaneously, and the macroblock address generation unit is connected with control unit with the reference pixel reading unit simultaneously, wherein:
Motion vector prediction unit is the macro block vector correlation data that system provides, and carries out the current grand motion-vector prediction of cutting apart and calculates, and the motion-vector prediction that will calculate gained then obtains the current motion vector of cutting apart with corresponding motion vector residual error addition; The motion vector that last column that current macro is cut apart is cut apart writes external memory by data bus interface, as the last adjacent movable information of next line macro block, the motion vector of current macro is offered described macroblock address generation unit;
The macroblock address generation unit produces the address that reference pixel reads according to the motion vector that receives; According to the position that current macro is cut apart and son is cut apart, and the wide high information of image, image block organizational form according to the rules, calculate offset address, the burst transfer length of reference pixel and the altitude information that need read, add that the address of reading that forms reference pixel behind the base address of reference frame transfers to the read operation that data bus interface generates reference pixel;
The reference pixel reading unit reads in pixel data in the interior spatial cache of system from external memory.
2. reference pixel as claimed in claim 1 reads storage system, it is characterized by described motion vector prediction unit and comprise interconnective movable information cache module and motion vector processing module, described movable information cache module links to each other with the reference pixel reading unit, wherein:
The movable information cache module is made of input-buffer and output buffers, and being used for provides base address and burst transfer length to the reference pixel reading unit;
The motion vector processing module starts corresponding computing according to different macroblock motion vector type of prediction and obtains motion vector, and described macroblock address generation unit is preserved and be passed to the motion vector that obtains.
3. reference pixel as claimed in claim 2 reads storage system, that the input-buffer that it is characterized by described movable information cache module is used for is adjacent on the buffer memory, the back is to first reference frame movable information and image sequence number, after the receiving system enabling signal enables, differentiate the current macro type earlier, if the type of macro block is B_direct_16x16 then does not read neighbor information, then do not read the back to first reference frame movable information if there is directly prediction in the macro block, otherwise will from internal memory, read total movement information to the motion vector static memory; That the output buffers of described movable information cache module is used for is adjacent on the buffer memory, current macro movable information and image sequence number, after last height of current macro is cut apart the motion vector computing and finished, with the information of institute's buffer memory and be written out to external memory; Wherein the movable information of current macro when preserving as long as current macro exists forward prediction then to preserve propulsion information, then preserve when only there is back forecast in current macro the back to movable information;
Motion vector prediction types is judged in acting as according to the current macro type information of described motion vector processing module, from buffer memory, obtain the required movable information of motion-vector prediction again, to store into the corresponding buffer memory from the data that external memory reads, type according to motion-vector prediction starts corresponding motion vector computing then, after finishing the motion vector computing, the motion vector that obtains is saved in is transferred to the macroblock address generation unit in the corresponding buffer memory.
4. reference pixel as claimed in claim 1 reads storage system, it is characterized by the macroblock address generation unit and include address generation module and the reference picture packing module that is connected, wherein the address generation module is connected with control unit with the reference pixel reading unit simultaneously, and reference picture packing module is connected with control unit with the reference pixel reading unit simultaneously, the address generation module is used for calculating the offset address that the current reference image data of cutting apart drops on the top left corner pixel of each pixel data bag, at the burst transfer length of current pixel packet and the inner height of pixel data bag, and these data are latched.
5. reference pixel as claimed in claim 4 reads storage system, and when it is characterized by described address generation module the pixel data bag being latched, sheet External Reference image is that unit preserves according to piece, and luminance component and chromatic component are separately preserved.
6. reference pixel as claimed in claim 4 reads storage system, it is characterized by in the described reference picture packing module, the degree of depth of luminance component buffer memory is 216, if macroblock partition be 8 * 8 and sub-macroblock partition be 4 * 8 or 4 * 4, when horizontal direction is image element interpolation simultaneously, the bit wide of luminance component buffer memory is 72bits, otherwise is 64bits.
7. reference pixel as claimed in claim 4 reads storage system, it is characterized by in the described reference picture packing module, the degree of depth of chromatic component buffer memory is 48, when macroblock partition is more than 8 * 8 or macroblock partition and sub-macroblock partition when being 8 * 8, the bit wide of chromatic component buffer memory is 64bits, otherwise be 48bits, and vacant 16bit bit wide is set to invalid bit.
8. reference pixel as claimed in claim 1 reads storage system, it is characterized by described reference pixel reading unit and comprises reference picture cache module and data bus interface; Wherein to reference video data, data bus interface is used for reading and transmitting data to the reference picture cache module for the front and back of storing moving compensation.
9. reference pixel as claimed in claim 8 reads storage system, it is characterized by in the described reference picture cache module, and the luminance component buffer memory is four dual-port SRAMs of 7776Bytes, and the bit wide of every SRAM is 72bits, and the degree of depth is 216; The chromatic component buffer memory is four dual-port SRAMs of 1536Bytes, and the bit wide of every SRAM is 64bits, and the degree of depth is 48.
10. reference pixel reads storage system as claimed in claim 8 or 9, and it is characterized by described data bus interface is the bus interface that meets the AXI bus specification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110157144 CN102223543B (en) | 2011-06-13 | 2011-06-13 | Reference pixel read and storage system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110157144 CN102223543B (en) | 2011-06-13 | 2011-06-13 | Reference pixel read and storage system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102223543A CN102223543A (en) | 2011-10-19 |
CN102223543B true CN102223543B (en) | 2013-09-04 |
Family
ID=44779950
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110157144 Expired - Fee Related CN102223543B (en) | 2011-06-13 | 2011-06-13 | Reference pixel read and storage system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102223543B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9325990B2 (en) * | 2012-07-09 | 2016-04-26 | Qualcomm Incorporated | Temporal motion vector prediction in video coding extensions |
CN104811721B (en) * | 2015-05-26 | 2017-09-22 | 珠海全志科技股份有限公司 | The computational methods of decoded video data storage method and motion vector data |
CN107231559B (en) * | 2017-06-01 | 2019-11-22 | 珠海亿智电子科技有限公司 | A kind of storage method of decoded video data |
WO2020140244A1 (en) * | 2019-01-03 | 2020-07-09 | 北京大学 | Video image processing method and device, and storage medium |
CN114466196B (en) * | 2022-04-11 | 2022-07-08 | 苏州浪潮智能科技有限公司 | Video data processing method, system, device and computer readable storage medium |
CN114866855B (en) * | 2022-04-29 | 2024-01-19 | 陕西科技大学 | Abnormal display screen pixel data organization and transmission method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101127913A (en) * | 2006-08-18 | 2008-02-20 | 富士通株式会社 | Interframe prediction processor with mechanism for providing locations of reference motion vectors |
CN101257625A (en) * | 2008-04-01 | 2008-09-03 | 海信集团有限公司 | Method for indexing position in video decoder and video decoder |
CN102055981A (en) * | 2010-12-31 | 2011-05-11 | 北京大学深圳研究生院 | Deblocking filter for video coder and implementation method thereof |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4712643B2 (en) * | 2006-08-17 | 2011-06-29 | 富士通セミコンダクター株式会社 | Inter-frame prediction processing apparatus, inter-frame prediction method, image encoding apparatus, and image decoding apparatus |
EP2104356A1 (en) * | 2008-03-18 | 2009-09-23 | Deutsche Thomson OHG | Method and device for generating an image data stream, method and device for reconstructing a current image from an image data stream, image data stream and storage medium carrying an image data stream |
-
2011
- 2011-06-13 CN CN 201110157144 patent/CN102223543B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101127913A (en) * | 2006-08-18 | 2008-02-20 | 富士通株式会社 | Interframe prediction processor with mechanism for providing locations of reference motion vectors |
CN101257625A (en) * | 2008-04-01 | 2008-09-03 | 海信集团有限公司 | Method for indexing position in video decoder and video decoder |
CN102055981A (en) * | 2010-12-31 | 2011-05-11 | 北京大学深圳研究生院 | Deblocking filter for video coder and implementation method thereof |
Also Published As
Publication number | Publication date |
---|---|
CN102223543A (en) | 2011-10-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102223543B (en) | Reference pixel read and storage system | |
US10397588B2 (en) | Method and apparatus for resource sharing between intra block copy mode and inter prediction mode in video coding systems | |
US8019000B2 (en) | Motion vector detecting device | |
KR100907843B1 (en) | Caching method and apparatus for video motion compensation | |
EP2366251B1 (en) | Intelligent decoded picture buffering | |
US20050190976A1 (en) | Moving image encoding apparatus and moving image processing apparatus | |
US8565308B2 (en) | Interframe prediction processor with address management mechanism for motion vector storage | |
CN100484246C (en) | Pixel prefetching device of motion compensating module in AVS video hardware decoder | |
US8203648B2 (en) | Motion vector detecting apparatus and motion vector detecting method | |
US20100020879A1 (en) | Method for decoding a block of a video image | |
US20070071099A1 (en) | External memory device, method of storing image data for the same, and image processor using the method | |
US7853091B2 (en) | Motion vector operation devices and methods including prediction | |
CN103533366B (en) | The caching method compensated for video motion and device | |
CN101325710A (en) | Motion refinement engine with a plurality of cost calculation methods for use in video encoding and methods for use therewith | |
JP2006217560A (en) | Method for reducing size of reference frame buffer memory, and frequency of access | |
US20080089418A1 (en) | Image encoding apparatus and memory access method | |
CN101325709A (en) | Motion refinement engine with selectable partitionings for use in video encoding and methods for use therewith | |
US20090060038A1 (en) | Encoding device and encoding method and decoding device and decoding method | |
US9363524B2 (en) | Method and apparatus for motion compensation reference data caching | |
CN101783958A (en) | Computation method and device of time domain direct mode motion vector in AVS (audio video standard) | |
US8644380B2 (en) | Integer pixel motion estimation system, motion estimation system for quarter-pixel luminance, motion estimation system for quarter-pixel chrominance, motion estimation system for combined luminance, motion estimation system for combined luminance and chrominance, and motion estimation system for quarter-pixel luminance and chrominance | |
CN101631242A (en) | Video weighted prediction system and coding and decoding method thereof | |
CN118160301A (en) | Delay reduction for reordering prediction candidates | |
KR100708183B1 (en) | Image storing device for motion prediction, and method for storing data of the same | |
KR20060064509A (en) | Apparatus for motion estimation of image data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130904 |
|
CF01 | Termination of patent right due to non-payment of annual fee |