Summary of the invention
At above-mentioned problems of the prior art, the object of the present invention is to provide a kind of double speed dynamic random access memory control device and method that asynchronous buffer is arranged, to realize high efficiency data access, utilize the data bandwidth of storer ddr sdram most possibly, improve data throughput, satisfy the demand of high definition video decoding the big data quantity accessing operation.
For finishing the foregoing invention purpose, the technical solution used in the present invention is: a kind of dynamic random access memory control device of double-speed that asynchronous buffer is arranged comprises following parts:
Moderator, it connects writing buffer memory and reading buffer memory of each client's module and memory controller interface respectively, which be used for Que Ding this not serving the request of module according to the Request Priority of client's module read-write requests, the read-write that produces each client's module is replied and is read and write in interface signal such as end and the control unit interface and writes buffer memory and the read-write control signal of reading buffer memory, and the data of client's module read-write requests are carried out the input and output service;
Control unit interface, it connects moderator and Memory Controller, comprise and write buffer memory and read two asynchronous fifo queues of buffer memory, be used for according to the data of address, order and the write operation of the output signal cache read write request of moderator and the read data that buffer memory returns from ddr sdram;
Controller, its connected storage and control unit interface, be used for according to memory logic function sequential, produce the interface control signal of storer, determine that according to the character of the state of read-write cache and request the memory block of storer and the quantity of row opens the memory block and the row of required memory access, write or sense data to storer.
Described moderator determines that the priority of request is: show output>read request>write request.
Described moderator inside has the read request sign to keep in fifo queue, is used for depositing the sign of read request and the data number of read request temporarily.
Described storer is the 1M * 32 bits * 4 storage area stores devices of the ball bar dot matrix encapsulation of two improvement, and its row address and column address are multiplexing, and row address has 13, and column address has 8.
The storage mode of described storer is to deposit 2 * 4 macro blocks in the row of a memory block.
A kind of control method that the double speed dynamic random access memory of asynchronous buffer is arranged may further comprise the steps:
Whether step 1, moderator detect its read-write requests flag register is non-zero, non-zero and write buffer memory when discontented then the buffer memory of writing to memory controller interface send write control signal, otherwise continue to detect, simultaneously detection of stored device control unit interface reads buffer memory, if read that data are arranged in the buffer memory, then send read signal;
Step 2, memory controller interface write the data of buffer memory according to address, order and the write operation of the output signal cache read write request of moderator; The buffer memory of reading in the memory controller interface is subjected to drive from the data latch signal that the dynamic random access memory of double-speed is returned the read data that the dynamic random access memory of buffer memory double-speed is sent back to;
Step 3, controller judgement have or not read write command, have then to determine the memory block of storer and the quantity of row according to this read write command, at first open the memory block and the row of required memory access, and then send out read write command, write data or sense data to storer;
Step 4, judging whether this Request Processing intactly, is to check to have or not refresh requests have then to enter Flushing status, wait for and turn to step 3 through refreshing to get back to, have or not new request, have then all memory blocks are turned off otherwise detect, turn to step 3 then, do not have and then wait in step 4.
Above-mentioned described step 1 further may further comprise the steps:
Whether step 11, moderator detect a service well afoot, if having, then continues the service current process, whether if a last service finishes, then detecting the read-write requests flag register is 0, and whether write buffer memory full, if 0 or full, then go into next clock period;
If otherwise step 1b 1 and discontented determines the service mark register, and the service read-write requests.
In described step 1a, two answer signals are arranged during the service read request, one is request-reply, continue a clock period, one is the data answering signal, if the number of reading the data in the buffer queue of control unit interface is greater than zero, moderator just ejects a read-write requests from the temporary formation of its inner read request, to give this read request client module corresponding to the data answering signal of this request, and send reading data signal and from read buffer memory, fetch data to control unit interface, data up to this request are all sent, and eject sign and its corresponding read request data number of next read request then from the temporary formation of read request.
Moderator in the above-mentioned described step 1 is not only delivered to the address of request when the response read request and is write buffer memory, also with in the temporary fifo queue of its temporary read request sign to moderator inside, the sign of read request and the data number of read request are deposited in this formation temporarily.
Moderator in the above-mentioned described step 1 the service request first beat the time, its horizontal direction request number and vertical direction request line number are write in the buffer memory together with request address, and only in writing buffer memory the start address place of this request effective.
Refreshing by the refresh counter in the controller in the described step 4 produces, and per 1000 clock period are sent out refresh requests one time.
Originally have significant advantage and good effect.The present invention is in video decoding system, requirement according to the decoding pipelined architecture, reading and writing data request to each client's module is reasonably arbitrated, the read/write access operation of control data, finish of the request of each module of decode system to the data access, realize high efficiency data access, utilized the data bandwidth of storer most possibly, satisfied the demand of high definition video decoding the big data quantity accessing operation.1, moderator is arbitrated according to the priority orders that shows output request>read request>write request, return the data of read request as early as possible, make from sending to read request to the time that can accept read data and shorten as much as possible, the system that guaranteed carries out the demand of real-time decoding to data with pipeline system.Can export read data when sending request address in the moderator module and give client's module of read request, make two kinds of operations carry out simultaneously, improve control efficiency.2, read-write cache is asynchronous design in the memory controller interface module, makes the clock of storer control core not be subjected to the influence of system clock, and this just makes the design of Memory Controller can independent, can control the storer of different frequency.Therefore this interface has extensibility and versatility preferably.3, the present invention has utilized four memory blocks (bank) of ddr sdram, has made full use of the storage space of ddr sdram.Reasonably carry out the map addresses from the logical address to the physical address according to logical address visit rule, when satisfying the requirement of system data throughput, reduced the difficulty of controller state machine design.The present invention has reduced the handshake of ddr sdram controller with control unit interface as best one can, makes when having read write command to exist in writing buffer memory, issues ddr sdram controller as far as possible continuously, reduces unnecessary waiting status as far as possible.The data bandwidth that makes full use of ddr sdram satisfies the demand of high-definition real-time decoding to the larger data amount, has reduced the complicacy of design simultaneously, has reduced the resource consumption of controller.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is done further in detail.
In video decoding chip, one co-exists in eight client's modules will read and write storer.They are eight modules on the hardware decoded stream waterline.These eight client's modules are respectively: show output module (Display feeder), motion compensating module (Motion compensation), motion vector prediction module (Premv), reference pictures store module (Reference store), audio decoder module (Audio), variable length decoding module (Vld), transport stream demodulation module (transport), host interface module (Host interface).Reference pictures store module (reference store) has only write request in these eight modules, demonstration output module (Displayfeeder), motion compensating module (Motion compensation), audio decoder module (audio) and variable length decoding module (vld) have only read request, and Premv, transport, host interface can existing write request have read request again.Read-write relation between these several client's modules is: the reference pictures store module is write, and demonstration output module and motion compensating module are read; Host interface module is write, and host interface module and transport stream demodulation module are read; The transport stream demodulation module is write, and variable length decoding module and audio decoder module are read; The motion vector prediction module is write, the motion vector prediction module is read.Be responsible for the output that view data shows owing to show output module display feeder module, therefore in these eight requests, display feeder should give the highest priority.In addition, because ddr sdram clock is asynchronous with system clock, must use the asynchronous buffer memory that buffer memory (rfifo) carries out readback data of reading, therefore send and read request to arbitor return data answer signal data_ack, have the long stand-by period from each module.Therefore in arbitor,, should give read request higher priority with respect to write request.The priority arbitration of moderator (arbitor) is to show output request>read request>write request.
See also Fig. 1 structure drawing of device of the present invention, the present invention includes with lower member:
1, moderator (arbitor), it connects writing buffer memory and reading buffer memory of each client's module and memory controller interface respectively, determine that according to the priority level of client's module read-write requests this serves the request of which module, produce and write buffer memory and the read-write control signal of reading buffer memory in the interface signal of each client's module and the control unit interface, the data of client's module read-write requests are carried out the input and output service.Client's module sends effectively on request (client_req), the horizontal direction request msg line number (line), the difference (pitch) of start of line address, read-write requests (rw) on request msg number (size), the vertical direction.Client_req is effectively high, represents an effectively request, and line and pitch are two dimension request usefulness, and rw has the module of reading to write to use, and 1 is read request, and 0 is write request.The write data of write request module is Client_wdata, the moderator return signal is req_ack, data_ack, rd_end, wr_end, req_ack is effectively high, expression has responded the request of this module, data_ack is that read request is used, the data that expression read request data are returned, rd_end is read request are all returned, and the read request service finishes, wr_end is a write request usefulness, show that this write request service finishes.
The job step of moderator is as follows:
Step 1: whether moderator (Arbitor) inspection can serve new request, if cannot, showing has the service well afoot, so just continue service, sends request address and reads or writes order to writing in the buffer memory.
Step 2: moderator is checked through can serve new request, whether ask so detect, discontented as if request being arranged and writing buffer memory, then determine which client of service according to priority orders, a certain position with the token register represents it is which client, begins to be its service.
2, control unit interface (Ctrl_if, controller interface)
Ctrl_if connects moderator and Memory Controller, and the Ctrl_if inside modules comprises two asynchronous first in first out buffer queues (fifo): wfifo and rfifo, the address that Wfifo buffer memory arbitor sends, the write data of order and write operation.The data that the Rfifo buffer memory reads back from ddr sdram.The Ctrl_if module is the interface of arbitor and controller core.Ctrl_if module one side 20 logical addresses of buffer memory in the following wfifo of the control of controller core convert the sdram physical address to and give controller, giving controller the write data among the wfifo, is that following data of reading back from dram of driving of ddr sdram data strobe write rfifo at dram read data latch signal ddr_dqs.On the other hand and the data in the following rfifo of the control of arbitor export to client.
3, controller (ddr_sdr)
Controller is according to memory logic function sequential, produce the interface control signal of storer, in concrete feature such as still two dimension read-write of one dimension according to the state of writing buffer memory and read write command, need open several memory blocks (bank), several row (row) determine storer the memory block and the row quantity open the memory block and the row of required memory access, write data or read data to storer from storer.Controller is responsible for and the ddr_sdram interface, according to the required control signal of read-write operation timing sequence generating ddr sdram of ddr sdram.
Fig. 2 is the control flow synoptic diagram of arbitor.Read-write requests has fixing priority.Display feeder is responsible for video data output, and therefore, display feeder must have the highest priority.Secondly, because work clock and the system clock of ddr sdram are asynchronous, must adopt asynchronous buffer fifo to carry out the buffer memory of readback data, therefore with respect to write request, the stand-by period of read request can be longer, if read request can not get response for a long time, then the system decodes streamline just might break.Write request does not then have this stand-by period.
See also the sequential chart of writing shown in Figure 3; As long as return request-reply signal (req_ack); Write data will be write among the wfifo continuously; and ( wordsize * lines ) can finish by individual system clock through the request msg number for write operation.Therefore, must give the higher priority of read request.: ( display feeder )> ( motion compensation )> ( vld )> ( audio )> ( transport )> ( premv )> ( host interface read )> ( reference store )> ( premv read )> ( transport write )> ( host interface write ) 。
In Fig. 2, at the rising edge of each clock (clk), whether have a service well afoot, if having, then continue the current process of service if detecting.If a last service just finishes or finishes, then detect (requester||requester) ﹠amp; ﹠amp; I_wfifo_full, if 0, illustrate that wfifo is full or without any read request or write request, so should not do whatever the clock period.If otherwise 1, then explanation has request and wfifo to be discontented with, so can serve next request.Detect requester then, i.e. the request flag register.Requester is one 11 a register, represents respectively to show output module read request (displayfeeder read) from the most significant digit to the lowest order, motion compensating module read request (motion compensation read), variable length decoding module read request (vld read), audio decoder module read request (audio read), transport stream demodulation module read request (transport read), motion vector prediction module read request (premv read), host interface module read request (host interface read), reference pictures store module write request (refrence store write), motion vector prediction module write request (premv write), transport stream demodulation module write request (transport write), host interface module write request (host interface write).From the most significant digit to the lowest order, priority level reduces one by one.Therefore show output module priority>read request module priority>write request module priority as can be seen.As long as the request of high priority is arranged, the request of low priority just must be waited for.
In the request that eight client's modules are sent, the read request number mainly is that motion compensating module is much larger than write request, uninterrupted for guaranteeing system's flowing water (pipeline) decoding, can not wait data all to return after the read request of client's module is sent and just send new read request, Given this, the present invention adopts to the module that read request is arranged and sends out two answer signals, only sends out the mode of an answer signal for client's module of having only write request.The correct realization of this mode requires to have in client's module of read request and the design's interface adopts fifo queue (fifo) to write down its read request.These fifo can be very little, the mode that therefore adopts two answer signals guarantee system decodes carry out under the prerequisite that can not be interrupted of each processing module streamline, the system hardware consumption of increase is seldom.
See also the sequential chart of reading of Fig. 4, except the reference pictures store module reference store that has only write request, other each module all may have read request, send client's module of read request and can receive that two are replied the ack signal, one is request-reply signal (req_ack), and one is data answering signal (data_ack).With the continuous req_ack difference of writing in the sequential, the req_ack of read request is system clock cycle of continuous and effective after moderator responds some read requests, after this client's module sees that request-reply signal req_ack effectively, just can be earlier with the information temporary storage of its current request in fifo, send next request then.Corresponding therewith, Arbitor is in the response read request, and is not only the same with write request, and wfifo is delivered in the address of request, also can be with among the temporary fifo of its temporary read request sign to an arbitor inside, this fifo deposits the sign of read request and the data number of read request temporarily.As long as Arbitor sees the return signal rfifo_nempty=1 of control unit interface ctrl_if module, illustrate the data of reading back are arranged among the rfifo, arbitor will read to eject a read request among the temporary fifo internally, the data_ack that provides corresponding to this request gives this read request module, and send the rfifo_strb signal and from rfifo, fetch data to this module to the ctrl_if module, data up to this request are all sent, and just eject the sign of next read request from the temporary fifo of read request.From the above mentioned as can be seen,, just can send out next request, and the data that need not wait until first request are returned all and are just sent out the next one and ask as long as the read request module is seen req_ack.This mode can guarantee that read request sends faster, and, arbitor sends the operation of request address and fetches data from rfifo and carry out simultaneously to the operation of read request module, this mechanism has realized the water operation of data input and output, shorten time of arbitor response request and read request module from issuing a request to the time of receiving read data, improved efficiency of service.
Control unit interface Ctrl_if module is responsible for control output read write command from wfifo and is given controller, and tells controller will open many several row in several memory blocks (bank) (row) in advance.
Usually, the capacity of each bank is fixed in the double-speed synchronous DRAM of main flow (ddr sdram) chip, so the ddr sdram of different memory sizes has different bank numbers.In ddr sdram chip, the structure of each bank is identical, and the configuration of this many bank has increased the storage space of storer, has increased the dirigibility of using the sdram storage space, has also increased the complicacy of design ddr sdram controller simultaneously.
Video decoding system internal bus width is 128.At first, the requirement for restriction of data bandwidth must use 64 external buss; Secondly, the uncontinuity of logical address memory access causes the discontinuous of physical address, makes the data number (burst_length) that can read and write continuously to sdram behind the address of sdram be necessary for 2.In order to reduce the cost of picture decoding system data storage system, the present invention is the 1M * 32bit * 4bank ddr sdram of ball grid array (FBGA) encapsulation of two improvement, its row address row_addr and column address col_addr are multiplexing, row_addr has 12 (bit), col_addr is totally 8 (bit), because burst_length=2, therefore, when sending the address, col_addr lowest order perseverance is a binary one ' b0.
As shown in Figure 5, for the storage space of 4 bank effectively utilizing sdram, logical address is extremely important to the mapping relations of physical address, at first needs to determine a rational storage organization.A luminance macroblock (MB) of image is 16 * 16=256 byte.Because of the row of next bank can deposit (2
8* 32bit * 2 slice)/(256 * 8bit)=8 luminance macroblock.
Then, analyze the Address requests rule of each module.Video data output module (Displayfeeder), audio decoder module (audio), variable length decoding module (vld), transport stream demodulation module (transport), host interface module (host interface) once request can be read 8 128 bit data that logical address is continuous.Motion compensating module (MC, Motion compensation) the brightness request is read as the data of 13 * 13 pixel blocks at every turn, the colourity request is the data of 5 * 5 pixel blocks, if therefore 13 * 13 luminance block or 5 * 5 chrominance block are striden macro block, then the read request of motion compensating module is read two 128 bit data on the image level direction, otherwise reads one 128 bit data.On the vertical direction, under the frame pattern (frame mode), motion compensating module mc reads 13 continuous row brightness data and 5 continuous circumstances in which people get things ready for a trip degrees of data, the data number that write request of reference pictures store module (Refrencestore) is write is indefinite, can be 6 row, 8 row or the brightness datas of 10 row or or the chroma datas of continuous 7 row, 8 row or 9 row.Under the field mode (fieldmode), motion compensating module reads 13 row brightness datas of interlacing and 5 circumstances in which people get things ready for a trip degrees of data of interlacing, reference pictures store module (reference store) is write 6 row of interlacing, has promptly crossed over 12 row, 8 row or 10 row in vertical direction and has promptly crossed over 7 row of the brightness datas of 16 row or 20 row or interlacing in vertical direction and crossed in vertical direction promptly that 14 row, 8 row have promptly been crossed over 16 row in vertical direction, 9 row have promptly been crossed over 18 circumstances in which people get things ready for a trip degrees of data in vertical direction.Motion vector prediction module Premv read request has two kinds of situations: read the continuous size in address 128 bit data or read the data of two adjacent macroblocks row, this variable provides when premv sends out request.Motion vector prediction module Premv, transport stream demodulation module transport, write request of host interface module host interface write 128 continuous bit data of size logical address, and this size numerical value provides when each module request.In all requests, the mc proportion is greater than 70%.About Reference store and display feeder20%, other module all seldom.
Because sdram has extra expense when changing bank and changing row, when especially in a bank, changing row, just can open new row owing to will shut the row of first front opening earlier, therefore expense is bigger, therefore when doing map addresses, the data that make once request are as far as possible in the same row of a bank.Deposit the mode of 2 * 4 macro blocks in the row of a bank of employing, can make the one-dimensional data memory access host interface module host interface, resolve memory access of module such as video/audio module transport, audio decoder module audio, variable length decoding module vld, video data output module display feeder and need open two even 3 bank, this will decide on initial logical address and horizontal direction request msg number.
As shown in Figure 5, for example: transport module write request start address is 20 ' h12b06 (20 16 systems are counted 12b06) and size=8, and then this write request need be write bank 1, bank0, bank1 successively.And because first bank1 with and the different row of second bank1 correspondence, so in fact need the operation of opening bank three times.Other one dimension requests are like this equally.Row of bank mode of putting 2 * 4 macro blocks makes that the expense of one dimension request is bigger as can be seen.But this mode can guarantee motion compensating module read request and reference pictures store module write request and stride the probability minimum of bank in vertical direction, and because the proportion that the request of mc accounts in all requests is quite big, therefore deposit the mode of 1 * 8=8 luminance macroblock with respect to row of a bank, 2 * 4 mode efficient can be higher.
Row of each bank deposits 8=2 * 4 luminance macroblock in the storage organization shown in Figure 5.Adjacent bank0, bank1, bank2, bank3 correspondence same row, as long as therefore request drops in the unit of adjacent bank0, bank1, bank2, bank3 formation, just need not change row, and so long as in a bank, just need not change row certainly, so the storage cost minimum of this storage mode.In Fig. 5, the unit of different colours respectively corresponding 4 different bank of sdram.The unit of color of the same race is the same bank of sdram.Each dashed square is represented an image macro MB.Each MB is the array of 16 * 16 pixels.
After having determined storage organization, can determine the mapping relations of logical address at an easy rate to physical address.
During the high definition pattern, each row has 2048/16=128 (' h80) individual macro block, mark is when asking, and each row has 1024/16=64 (' h40) individual macro block, so the start address of neighboring macro-blocks row differs 128 * 16 (high definitions)=' h800 or 64 * 16 (SD)=' h400.In Fig. 5, bank0 is with the right bank1 and following bank2, and the unit that the bank3 at diagonal angle constitutes is in same row.Therefore, the address mapping relation that can draw under two kinds of patterns of high definition and SD is as follows:
Address mapping relation under table 1 high definition and the SD pattern
As previously described, size has shown the number of read-write requests memory access logical address in the horizontal direction, lines represents the number of memory access logical address on the vertical direction, therefore after having determined address mapping relation, initial logical address and corresponding size, lines by request, to use several bank, several row with regard to having determined this request.Under the situation of using a plurality of bank, be necessary to tell in advance the use information of these bank of controller and row, therefore, the method that the present invention adopts is first cycle cycle of arbitor in a request of service, its size, lines value are write among the buffer memory wfifo with request address, size and lines only in wfifo the start address place of a request effective, other the time be 0, therefore ctrl_if is when ejecting an order, if find size! from wfifo=0, show that then this is the beginning of a new request, so after ejecting this order, just can be according to the definite row address of the value of its first address, size and lines with which bank and their correspondences.So in the beginning of next cycle cycle, controller just can be seen these information, and carry out corresponding pre-operation.The shortcoming of this mode is to have wasted a part of storage space of wfifo, but helps simplifying the complexity of controller.
The method of the number open_row_cnt of the row that describe in detail to calculate the number open_bank_cnt of the memory block that a request need open below and need open.
1. determine to open several bank.If the bank number that horizontal direction need be opened is bank_x, the bank number that needs on the vertical direction to open is bank_y, then need open bank_x * bank_y bank altogether.Adopt the row of a bank to deposit the mode of 2 * 4 macro blocks for the one dimension request, the bank number that read-write requests need be opened is decided on initial logical address and size, the memory access of transport is an example as previously mentioned, suppose that initial logical address is 20 ' h12b00, size=8 then only need open bank0 and bank1.But if start address is 20 ' h12b06, size=8 then needs to open bank0
0-bank1-bank0
1, obviously, bank0
0And bank0
1Corresponding row address difference so bank0 need shut, and then is opened.The expense that this needs nearly 10 cycle must reduce the efficient of memory access.Therefore stipulate client's module level direction memory access logical address number size<=8 of one dimension memory access, otherwise the one dimension request may need to open the bank more than 3, greatly reduces memory access efficient.Given this, establishing the bank number that needs on the horizontal direction to open is Kx, during then Kx<=3, and Kx=3, must be the one dimension memory access.
On the horizontal direction: suppose that start_addr is the start address of a request.Three variable X _ addr[6:0 are set], tmp0[6:0], tmp1[6:0] and, Kx[1:0].X_addr={1 ' b0 during SD, start_addr[5:0], X_addr=start_addr[6:0 during high definition].Because under the high definition pattern, 128 macro blocks are arranged on the image level direction, under the SD pattern, 64 macro blocks are arranged on the image level direction, therefore in fact X_addr represents is logical address allocation index in the horizontal direction number, as can be seen X_addr ∈ during high definition 0,127}, X_addr ∈ during SD 0,63}.Make tmp0=X_addr, tmp1=X_addr+ (size-1), Kx=tmp1[3:2]-tmp0[3:2], size is a horizontal direction request msg number.Among Fig. 5, because the row of a bank has deposited 4 macro blocks on the image level direction, when difference Kx is scale-of-two 2 ' b00 as can be known, only take a bank on the expression horizontal direction, when difference is scale-of-two 2 ' b01, take two bank on the horizontal direction, take 3 bank during 2 ' b10 on the horizontal direction.
On the vertical direction: variable Y _ addr[12:0 is set], Ky, tmp2[12:0], tmp3[12:0] and, lines[4:0].Wherein lines is the line number that logical address is striden in a request in vertical direction.Under the frame pattern, the two dimension request is memory access line by line, and therefore, lines equals the line number of this client's module vertical direction memory access.During field mode, the two dimension request is the interlacing memory access, so lines is two times of this client's module vertical direction memory access line number.During SD, make Y_addr=start_addr[18:6], Y_addr=start_addr[19:7 during high definition].Can know that in fact Y_addr has represented logical address coordinate figure in vertical direction.Because the storage mode that the present invention adopts has guaranteed two-dimentional request and has used two bank in vertical direction at most, therefore make tmp2=Y_addr, tmp3=Y_addr+ (lines-1), make Ky=(tmp3[5]==tmp2[5]), be tmp3[5]=tmp2[5] time, Ky=1; Otherwise, Ky=0.As shown in Figure 5, during Ky=0, take a bank on the vertical direction as can be known, during Ky=1, take two bank on the vertical direction.
2. after determining to open several bank, then determine to open several row, i.e. numerical relation between them.
If take a bank on the horizontal direction, then see to take several bank on the vertical direction, if take a bank on the vertical direction, illustrate that this request only need open a bank, therefore, need open a row; If take two bank on the vertical direction, if the bank address of start address correspondence is bank, then if bank[1]=1, illustrate that the initial logical address physical address corresponding is bank2 (next bank must be bank0) or bank3 (next bank must be bank1), on the vertical direction, does not wait the row address of adjacent bank2 and bank0 correspondence, and does not wait the row address of adjacent bank3 and bank1 correspondence, therefore, need open two row.According to the address mapping relation among the present invention, a macro-block line of SD image takies 64/8=8 row address, and a macro-block line of high-definition image takies 128/8=16 row address.If the row address of initial logical address is row, then second row address is row+8 (SD) or row+16 (high definition).Otherwise, if bank[1]=0, the physical address corresponding bank that start address is described is bank0 or bank1, as shown in Figure 5, a row all in a row, only need be opened in all addresses of this request.
If take two bank on the horizontal direction, then see and take several bank on the vertical direction, if take a bank on the vertical direction, if the bank address of start address correspondence is bank, then if bank[0]=0, illustrate that initial bank is bank0 (next bank must be bank1) or bank2 (next bank must be bank3), as shown in Figure 5, need open a row, otherwise bank[0]=the initial bank of 1 explanation is bank1 or bank3, need open two row, the row address of establishing initial logic is row, and then obvious second row address is row+1; If take two bank on the vertical direction, then take 4 bank altogether, see initial bank address again.If the bank address of start address correspondence is bank0, then obviously only need open a row; If initial bank is bank3, then need to open four row, the row address of establishing the initial logical address correspondence is row, then these four row addresses are respectively row, row+1, row+8 (or row+16), row+9 (or row+17); If initial bank is bank1 or bank2, then need to open two row, the row address of establishing the initial logical address correspondence is row, when then initial bank was bank1, the row address of these four bank correspondences was respectively row, row+1, row, row+1; When start address was bank2, the row address of these four bank correspondences was respectively row, row, row+8 (or row+16), row+8 (or row+16);
If take three bank on the horizontal direction, then need to open two row.If the memory block of initial logical address correspondence is bank, corresponding row address is row, then bank[0]=1 o'clock, initial bank is bank0 or bank2, establishes these three row and is respectively row, row, row+1; If bank[0]=0, then these three row are respectively row, row+1, row+1.
Controller is responsible for and ddr sdram interface, according to the work schedule of sdram, sends read write command.
The present invention has utilized the storage space of four bank of ddr sdram, therefore when the different bank of different request accesses, must send corresponding RAS (row access strobe) order, open the row of a certain bank, after waiting the sequential of sdram to satisfy then, just can send out CAS (column access strobe) order, some column addresss of the delegation that some bank are opened are read and write.Therefore, in a solicit operation, RAS and CAS can intert, can improve the efficient of controller like this, improve performance, but because a plurality of bank work this mode itself simultaneously with regard to more complicated, RAS, CAS order intersection sends this working method and makes that the control of state machine is extremely complicated, and taking hardware resource can be a lot.So, must find and a kind ofly not only satisfy the requirement of system data throughput but also consume less control of hardware resources method.Because the relation between the address of eight client's modules request of being connected with moderator of the present invention can think to have no rule, so during each new request comes, nearly all to change the row address, therefore the present invention has adopted and has been hit by a bullet when making new advances request from wfifo at every turn, controller cuts out all bank earlier, opens the bank of required memory access then earlier, sends out the mode of read write command again, this control mode is more succinct, can satisfy the performance requirement of system.
The worst condition of work of supposing the system is all to be the B frame, and all predictive modes are two-way in the B frame, and this extreme case can not occur, and the request quantity of each module is maximum under this extreme case.High definition each macro block of decoding must be finished in 610 system clocks.Therefore for the read-write requests of each macro block, Memory Controller must be finished the request service in 610 system clocks.
The request situation of each module is as shown in table 2 in each macro block.
The request situation of client's module in each macro block of table 2
Clients |
CAS cycles |
Access |
Overhead |
Notes |
host interface write |
1 |
0.125 |
1.2125 |
|
Transport write |
1 |
0.125 |
1.2125 |
|
VLD read |
1 |
0.125 |
1.4625 |
|
MV Prediction(read for B) |
1 |
0.125 |
1.4625 |
|
MC RefRead |
232 |
16 |
173.152 |
|
Reference store |
24 |
3 |
25.08 |
|
DispFeeder Read |
24 |
3 |
62.1 |
|
Audio read |
1 |
0.125 |
1.2125 |
|
Refresh |
|
|
8.54 |
|
SubTotal |
285 |
22.395 |
279.7345 |
564.7345 |
Total |
|
|
|
636.9186 |
Clients represents each client requests of linking with Memory Controller in this table, and CAScycles is illustrated in the macro block processing procedure, and each request takies the periodicity of the effective read write command of Memory Controller.Access is illustrated in the request number of each module in the macro block processing time.After calculating the Access of each module, can know ratio in macro block read request and write request in the processing time.Overhead is illustrated in a macro block in the processing time, the invalid expense when Memory Controller is handled the read-write requests of each module, the i.e. periodicity of non-effective read write command.
1, host interface module (host interface) is is once read and write eight continuous data of logical address, though host interface module has read request, but read request only is used for debugging, therefore its memory access expense is not very within effective memory access. therefore, though the request of host interface module involves the notion less than macro block, the unified map addresses mode that the present invention adopts makes and it can be handled as macro block.Therefore, the CAS cycle of an interior host interface module of macro block processing time is 8/8=1, and Access is 1/8=0.125.
2, analyzing atransmission stream demodulation module (transport) can only be to read or write though read to write, and therefore, the expense .CAS cycle that only needs to calculate its write request is 8/8=1, and Access is 1/8=0.125.
3, variable length decoding module vld.CAS cylce is 8/8=1, and Access is 1/8=0.125.
4, motion vector prediction module premv is the same with transport, though read to write, only need consider 8 macro block requests of one of them .premv once, so CAS cycle is 8/8=1, and Access is 1/8=0.125.
5, motion compensating module sends the brightness request of 13 * 13 pixel blocks of eight two-way references in a macro block, the colourity request of eight 5 * 5 pixel blocks, 13 * 13 luminance block has 12/16 probability to stride macro block in the horizontal direction, 5 * 5 chrominance block has 4/16 probability to stride macro block in the horizontal direction. thus CAS cycle=(2 * 12/16+1 * 4/16) * 13 * 8+ (2 * 4/16+1 * 12/16) * 5=232, Access=2 * 4 * 2.
6, in each macro block of reference pictures store module the brightness write request is arranged twice, so a colourity write request. can suppose that each write request is and write 8 data. CAS cycle=2 * 8+8=24, Access=2+1=3.
7, show that output module display feeder is similar with the request of reference pictures store module.CAS Cycle=2×8+8,Access=2+1。
8, the audio decoder module is asked 8 data of continuous logic address at every turn, so CAScycle=1, Access=8/8;
By last surface analysis as can be known, read request Access accounting example is 19.375/22.395=0.86.Think that the probability that a request is followed after read command is 0.86 so can be similar to.For ddr sdram chip of the present invention, the expense of read-write conversion is promptly from effective read write command being spaced apart to the effective read write command of the next one: read command is to read command: 9cycles, read command is to write order: 7cycles, write order is to read command: 14cycles, write order to write order: 12cycles in addition, an every bank, expense of two cycle of increase opened more.
Therefore the Memory Controller overhead that serves each module request is calculated as follows:
1, host interface module: overhead=(0.86 * 7+12 * 0.14+2)/8=1.2125.
2, transport module: overhead=(0.86 * 7+12 * 0.14+2)/8=1.2125.
3, variable length decoding module: overhead=(0.86 * 9+0.14 * 14+2)/8=1.4625
4, motion vector prediction module: suppose that read request is an one dimension, expense is bigger in this case.Overhead =(0.86×9+0.14×14+2)/8=1.4625。
5, motion compensating module: according to the address mapping relation that the design adopts, the probability that its read operation need be opened 1 bank is: P1=(20/32) * (1-12/ (4 * 16))=0.51; The probability of two bank is P2=(20/32) * (12/ (4 * 16))+(12/32) * (1-12/ (4 * 16))=0.421; The probability of four bank is 1-P1-P2=0.07.So overhead=(9 * 0.86+0.14 * 14+0.421 * 2+0.07 * 4) * 4 * 2 * 2=173.152.
6, reference pictures store module: analyze its request rule as can be known, the probability that its write request takies two bank in vertical direction is under the frame pattern 20%, under the field mode 33%, gets 33%.So overhead=(0.86 * 7+0.14 * 12+0.33 * 2) * 3=25.06.
7, ' hxxxx1 or ' hxxxx9 that shows output module: because its request start address is always.Therefore, as can be known, its one dimension read request always will take 3 bank in the horizontal direction, its overhead=3 among the design * (0.86 * 9+0.14 * 14+2+9)=62.1.
Read-write requests operation among the table 2Notes in macro block decode time of 564.7345 expressions, can in 564.73 dram clk (133MHz), finish, can in the system clock of 564.73 * (148.5/133)=636.91, finish as can be known after converting system clock.610/636=88%, this just requires the B frame and is that the situation of two-way reference is no more than 88%, and this condition satisfies when reality is decoded.Because the worst case that the present invention considers is to be the B frame entirely, this is non-existent in the reality decoding.Therefore, the data throughput of the design of controller of the present invention can satisfy the requirement of system performance.
As shown in Figure 7, the control flow of controller of the present invention.This state machine has initialization (initialization), wait command (wait_cmd), open bank (comprising active_1bank, active_2bank, three kinds of situations of active_4bank), begin to read (begin_rd), begin to write (begin_wr), read data (read), write data (write), precharge (precharge_all), refresh 0 (rfsh_0), refresh 1 (rfsh_1), refresh 2 (rfsh_2) totally 13 states.State machine carries out the necessary initialization of operate as normal to sdram earlier in the back that resets, and enters wait command (wait_cmd) state then, tells the ctrl_if module to have sent out read write command simultaneously.At the Wait_cmd state, if find refresh requests, then enter Flushing status, get back to wait_cmd after being over.If not finding refresh requests but finding has new request, then the open_bank_cnt that sends here according to ctrl_if determines to open 1,2 still 4 bank, enters active_1bank, active_2bank or active_4bank.For the situation of 3 bank of one dimension request visit, the present invention has adopted when moderator is served for it, and it is decomposed into access modes twice, and its request of being about to is decomposed into two bank of visit and adds the form of visiting a bank.In memory interface module ctrl_if, in advance this request is decomposed into two bank of visit and adds the combination of visiting a bank, help to reduce the status number of controller state machine like this, reduce the realization difficulty of controller.Because the once request of client's module may be used four bank of ddr sdram, therefore, be to improve memory access efficient, the present invention can all open these four bank opening bank (active) state.Normal read write command is the visit to the bank that opens, and therefore, visible Active state is that control state machine is finished the most important link with ddr sdram interface sequence.Opening bank (active) state, the information that this memory access address that controller is sent here according to the ctrl_if module distributes, open the bank that memory access is wanted in this request, send corresponding RAS address, bank address and corresponding ras address have detailed description in the ctrl_if module in front, and controller is given sdram according to the sequential relationship of ddr sdram successively with these address informations.After opening all bank that need memory access, begin to read (begin_rd) state, send initial column address (collum address) and read command if read command just enters.Begin to write (wait_wr) state otherwise enter, send initial column address and write order.After the normal read-write beginning, state machine begins to write (wait_wr) state transitions to write data (write) state by beginning to read (wait_rd) state transitions to read data (read) state.Because all bank of this request are opened, therefore at read data (read) state, controller can be accepted the read command that ctrl_if sends here continuously.Write data (Write) state, controller can be handled the write order that ctrl_if sends here continuously, all orders up to this request are all sent, this processing of request finishes, at write data (write) or (read) state, when finishing, this request checks whether refresh requests is arranged, if there is the request of brush feelings then to enter Flushing status, through refreshing 0 (closeall bank), refresh 1 (to sdram from moving refresh command), refreshing three states of 2 (finish and refresh) and get back to wait_cmd.Refreshing 0 state, controller at first sends precharge command (precharge), this is because the storage space of ddr sdram is made up of capacitor array, direct read electric capacity can cause the physical damage of capacitor array, therefore, when storage space is carried out memory access, at first to be latched into the data 1 or 0 on delegation's electric capacity of corresponding memory access address correspondence in delegation's static register, this is exactly the function that the active order is finished in fact, then static register is read and write.Refreshing is the operation that electric capacity is discharged and recharged, Gu this, needs at first the data in the register to be write back in the electric capacity function that comes to this that the precharge order is finished.Refresh 1 state, controller sends refresh command to ddr sdram, and the ddrsdram internal logic is controlled it and finished automatically and refresh then.Refresh 2 states, satisfy after the sequential is that ddrsdram has finished and refreshes, in the time of can accepting newer command, if find that ctrl_if sends the memory access order here, then directly forward to and open bank (active) state and (comprise active_1bank, active_2bank, or active_4bank), the read-write operation of a beginning new round.Otherwise get back to the wait_cmd state and continue to wait for the memory access order.Refresh requests is produced by the refresh counter in the controller, and per 1000 clock period of this counter are sent out refresh requests one time.If do not find refresh requests when read or write state request processing end, then detecting to have does not have new memory access order, then changes precharge (precharge_all) state over to if having.Based on the discontinuous prerequisite of the logical address of each memory access, therefore, physical address corresponding is also discontinuous.Such as: if the memory access of client's module is arranged a of bank0 capable, back to back another client's module needs the b of memory access bank0 capable, therefore must shut a is capable, opens b then and could visit the capable data of b after capable.The present invention each memory access before the bank Close All opened, open the bank that need open then because the number of each request visit bank is fixing,, just all bank are closed so open before the new bank.Therefore, at precharge (precharge_all) state, send the precharge_all order, earlier closeall bank.State machine can be transferred to by precharge (Prearge_all) state and open bank (active) under the normal condition, for the RAS operation is carried out in the read-write of a new round.Refresh 1 (rfsh_1) state but also might forward to.Owing to, remove unnecessary waiting time in order to increase efficient, therefore reduced the handshake between controller and the control unit interface ctrl_if module as far as possible, therefore there is not refresh command.When if state machine is transferred to precharge (precharge_all) state by read data (read) state or write data (write), refresh counter does not send refresh requests as yet, owing to do not check refresh requests opening bank (active) state, read data (read), write data (write) state are just checked after a Request Processing finishes, so through precharge_all, active, read or write state, refresh requests is through just meeting with a response for a long time, and the data of storing on the electric capacity in the sdram might be lost.Therefore, if arrived refresh requests, then directly enter and refresh 1 (rfsh_1), hair brush newer command in precharge (precharge_all) state-detection.The maximum wait time that refresh requests can not get responding is the situation of 13 * 13 read requests of response mc module, state machine the active state to this time of time of read state no longer than 40 cycle.Because the ddr sdram of 128Mbit need all brush row once to 4096 row capacitor arrays at 32 milliseconds, the running frequency of controller is 133MHz, and therefore every 32ms/4096/7.5ns=1041 clock period need send refresh command one time.Here it is, and per 1000 clock period of refresh counter are sent out the reason of a refresh requests.
It should be noted last that: above embodiment is the unrestricted technical scheme of the present invention in order to explanation only, although the present invention is had been described in detail with reference to the foregoing description, those of ordinary skill in the art is to be understood that: still can make amendment or be equal to replacement the present invention, and not breaking away from any modification or partial replacement of the spirit and scope of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.