WO2001037098A1 - Dispositif et systeme informatique - Google Patents

Dispositif et systeme informatique Download PDF

Info

Publication number
WO2001037098A1
WO2001037098A1 PCT/JP1999/006371 JP9906371W WO0137098A1 WO 2001037098 A1 WO2001037098 A1 WO 2001037098A1 JP 9906371 W JP9906371 W JP 9906371W WO 0137098 A1 WO0137098 A1 WO 0137098A1
Authority
WO
WIPO (PCT)
Prior art keywords
cache
memory
burst
data
data processing
Prior art date
Application number
PCT/JP1999/006371
Other languages
English (en)
Japanese (ja)
Inventor
Masayuki Ito
Yutaka Yoshida
Original Assignee
Hitachi, Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi, Ltd filed Critical Hitachi, Ltd
Priority to JP2001539124A priority Critical patent/JP3967921B2/ja
Priority to PCT/JP1999/006371 priority patent/WO2001037098A1/fr
Publication of WO2001037098A1 publication Critical patent/WO2001037098A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0879Burst mode

Definitions

  • the present invention relates to a data processing device such as a microprocessor having a cache memory or a digital signal processor (DSP), and a data processing system having a memory capable of performing a burst operation together with such a data processing device.
  • a data processing device such as a microprocessor having a cache memory or a digital signal processor (DSP)
  • DSP digital signal processor
  • a synchronous DRAM dynamic 'random access memory'
  • a memory that supports a typical burst operation also referred to as burst transfer
  • continuous data can be read and written at high speed by an operation control system including a circuit such as an internal address counter in the memory, and the processing system can operate at high speed and high speed. It is easier to improve performance.
  • the synchronous DRAM has a mode register, and the operation mode is designated by the mode register.
  • the mode information including burst length information, also referred to as burst transfer length or block transfer length, for the synchronous DRAM is determined according to a setting program such as a system initialization setting program executed after power-on reset of the processing system. Mode Regis is set in the evening.
  • Information such as burst length information The setting period requires a setting period different from the burst operation. That is, it takes a relatively long time to set the mode information in the mode register. For this reason, it is common that the burst length information once set with the power-on reset is not changed later.
  • the burst length is set to a large value, when transferring a large amount of data, the amount of data that can be transferred by accessing the synchronous DRAM at one time increases, and the data amount becomes large. Evening transfer performance can be obtained. However, if it is sufficient to transfer a smaller amount of data than the set burst length, block transfer is performed with the set burst length, which increases unnecessary data transfer cycles and increases the data transfer performance. Will drop.
  • a wrap-around function that can start transfer from any address data between burst transfer boundaries during burst transfer Is supported.
  • the start location of the data to be accessed is specified from the outside, and the subsequent location address is generated by an internal count such as a column address count in the memory.
  • SDRAM has, for example, an access unit of 4 bytes and a burst length of 16 bytes.
  • the column address of the byte unit is preset at the column address count, and from the preset address as a base point, the counting from the least significant bit to the fourth bit is sequentially performed three times, and thereby the continuous count operation is performed. Access operation is performed.
  • the data requested by the CPU can be obtained from the external memory at the beginning of the burst transfer.
  • the number of cycles in which the CPU waits for data can be reduced.
  • a memory that supports burst access such as synchronous DRAM, provides features according to the burst length as described above. Therefore, it is desirable that this type of memory be able to meet both requests to set a large burst length and requests to set a small burst length. Therefore, the present inventors use a memory that performs wrap-around with different burst lengths, for example, a synchronous DRAM having a burst length of 32 bytes and a synchronous DRAM having a burst length of 16 bytes. We examined the effectiveness of the control method used.
  • the present inventors have clarified the following matters through examination.
  • the synchronous DRAM with a burst length of 16 bytes is wrapped around the 8th byte in the 16-byte column address location as a base point, and the burst length is 16 bits.
  • the synchronous DRAM is returned in a wrap-around operation.
  • the order of the night is different. Therefore, a memory control circuit for controlling the synchronous DRAM needs to be configured to recognize a data arrangement mismatch due to a difference in burst length or to take a method for eliminating the mismatch.
  • the first method consists of two sets of 16 bytes returned from a memory with a block transfer length of 16 bytes, and a wraparound of 32 bytes with a burst length of 32 bytes.
  • the same data order as the operation data order is used.
  • a buffer memory and an aligner for data sorting are set together with the memory control circuit.
  • the memory control circuit outputs data from the memory so that two sets of data wrapped around by 16 bytes are matched in order of the data obtained by the wraparound operation of 32 bytes.
  • the data is temporarily buffered in the buffer memory, and then the data is rearranged using the aligner, and then the control operation is performed so that the data is output.
  • extra waiting time is spent on data buffering for reordering data.
  • the second method is to provide a 16-byte boundary fixed constraint in the memory access start address so as not to cause a mismatch in the data return order.
  • the microprocessor is assumed to include a CPu, a cache memory, and a memory control circuit for accessing an external memory including an external synchronous DRAM.
  • the cache line length of the cache memory is assumed to be 32 bytes, and when the CPU starts accessing the memory at address N + 08 (N is a multiple of 32), it becomes a cache miss hit and the external memory is accordingly An access is made, a cache fill for the cache line is performed, and thereafter, the CPU continuously requests address data of N + 12, N + 16, N + 20, ⁇ + 24, ⁇ + 28. Think about when to do it. Access to such a continuous address is performed by the instruction This can be considered as a very natural example that frequently occurs in the case of access or processing of data placed in a continuous area.
  • the data at address N + 08 is represented as @ 08
  • the data at address N + 12 is represented as @ 12.
  • the data order obtained from the memory in the burst operation with the block transfer length of 16 bits is, for example, @ 08, @ 12, @ 00, @ 04, @ 24, @ 28 , @ 16, @ 20.
  • Several penalty cycles are required to sort the data and return it to the cache memory in the same order as in the case of the burst operation with a burst length of 32 bytes. Evening performance will be degraded. That is, in the data order, the data order corresponding to the 32-byte burst operation is @ 08, @ 12, @ 16, @ 20, @ 24, @ 28, @ 00, @ 04 It becomes the order of.
  • the third data to be returned @ 16, corresponding to a 32-byte burst operation, must be returned from memory in a 16-byte burst operation. Only the seventh arrives, resulting in at least four penalty cycles.
  • the data order from the external memory and the data order to the cache corresponding to the case where the first data required by the CPU is @ 08 are both @ 00, @ 04, and @ 08, @ 12, @ 16, @ 20, @ 24, @ 28. That is, the data @ 08 requested first by the CPU is the third data.
  • the CPU will wait at least two cycles for the required first overnight arrival.
  • the second method also causes a drop in CPU performance.
  • An object of the present invention is to provide a burst operation which can be performed with a size shorter than the cache line length of a cache memory and to use a memory having a wrap around function, and to use a memory having a wraparound function to wait for a CPU to wait for data relating to a cache miss.
  • Another object of the present invention is to provide a data processing system and a data processing system which can shorten the processing time and contribute to the improvement of the data processing performance.
  • Another object of the present invention is to improve the bus performance and the CPU performance even if a plurality of memories having a wrap-around function and different burst lengths are connected and used, and a memory access penalty cycle is small.
  • Another object of the present invention is to provide a data processing device capable of processing data and a data processing system.
  • Another object of the present invention is to provide a data processing device capable of coping with various connection configurations or usage forms of a memory having a wrap-around function and capable of performing a burst operation.
  • the data processing device includes a CPU, a cache memory accessible by the CPU, a cache control unit for controlling the cache memory, and a memory accessible in response to a cache miss of the cache memory. It has a control unit.
  • the memory control unit accesses a burst operable memory in response to a cache miss.
  • the cache control unit can control an operation of cache-filling data obtained by the single or multiple burst operations into a cache memory in a wrap around based on the first information.
  • the burst length of the memory to be accessed with respect to the cache line length is clarified by the first information, and based on this the number of burst operations corresponding to the memory to be accessed is controlled, and the cache Block data corresponding to the pin length can be obtained from the memory by a burst operation.
  • the obtained block data is the
  • the cache controller is enabled to cache the block data transferred to the cache memory by wrap-around operation according to the length of the string. Therefore, the data output from the memory need not be rearranged by the aligner, and there is no need to provide a constraint for fixing the beginning of the boundary of the burst block to be burst-operated to the access start address. Therefore, even when a memory capable of burst operation with a size smaller than the cache line length of the cache memory and having a wrap-around function is used, the waiting time of the CPU before acquiring data related to a cache miss can be reduced, and It can contribute to the improvement of evening processing performance.
  • the cache control unit inputs, in the cache fill operation, address information relating to a cache miss, the first information, and a synchronization signal synchronized with a data segment obtained by a burst operation by the memory control unit. Wrap-around control based on the address information is performed within the burst length range defined by the first information, and the wrap-around control is performed on the synchronization signal. It may be configured to generate a cache fill address that determines the order of data of the cache fill during the execution.
  • the cache fill operation can be advanced while the memory control unit follows the operation of sequentially reading data from the memory in a burst operation in response to the cache miss hit, and the memory burst length can be reduced. Regardless, a fast cache fill operation can be guaranteed.
  • the memory control unit when performing memory access by a plurality of burst operations in response to a cache miss, performs a burst operation in a wraparound based on a data position of an address related to the cache miss in a first burst operation.
  • the burst operation may be controlled, and in the subsequent burst operation, the burst operation may be controlled based on the beginning of the data block boundary defined by the burst length.
  • the burst length If memory access is performed from the beginning of the boundary specified in, the data to be accessed first by the CPU at the time of continuous data access will reach the cache memory or CPU first. Useful for improvement.
  • the burst A plurality of block data output from the memory performing the access operation to the wraparound can be combined and cache-filled in the cache memory.
  • the write data is processed based on the cache line length. In the case where data is written from a short write-through buffer (for example, 8 bytes) to the memory, the data transfer cycle is less wasted because of the relatively short burst length. In the latter eight bytes of the burst access operation at this time, a data overnight mask may be performed to suppress the actual data write operation.
  • the data processing system has a data processing device having a CPU and a cache memory, and a memory connected to the data processing device and capable of burst operation and constituting a main memory for the cache memory.
  • the memory may be singular or plural. Each burst length may be different or the same.
  • the cache memory has a cache line length of L bytes.
  • the memory can perform a wrap-around burst operation in a range of a burst length of 1 / n of L (n is a natural number) bytes.
  • the data processing device forms first information for indicating a burst length of the memory with respect to a cache line length of the cache memory in response to a cache miss of the cache memory. Based on the first information, the memory is bus-operated one or more times to obtain a data length corresponding to the cache line length, and a plurality of block transfer data obtained by the bus operation are obtained. And control to return L bytes of data to the cache memory.
  • a burst length (for example, 16 bytes) that is relatively shorter than the cache line length (for example, 32 bytes) of the cache memory is set in the first memory. And perform a burst access operation. A plurality of blocks output from the memory in a wraparound manner can be combined and cache-filled in the cache memory. If the data processing system includes a second memory having a burst length equal to the cache line length, the process for the cache miss hitting the second memory includes the burst of the second memory. The cache fill operation according to the length is enabled.
  • write data is written from the write-through buffer shorter than the cache line length (for example, 8 bytes) to the first memory. In such a case, there is little waste in the data transfer cycle due to the relatively short burst length. In the latter half of the burst access operation at this time,
  • the 8-byte data mask may be used to suppress the actual data write operation. If the second memory in which a burst length equal to the cache line length is set is to be written by a write right, the number of useless cycles increases as compared with the first memory even if the write mask is performed. Even so, if the second memory is temporarily excluded from the cache, the amount of data that can be accessed or transferred at one time can be increased.
  • the data processing device may convert the data obtained by the single or plural burst operations based on the first information.
  • control may be performed to cache-fill data transferred in the wrap-around operation into the cache memory.
  • the data processing device generates a synchronization signal synchronized with a break of data obtained from the memory in the burst operation, and a range of a burst length indicated by the first information.
  • a wraparound control using the address information as a base point may be performed to generate a cache fill address that determines the order of cache fill data in synchronization with the synchronization signal.
  • the data processing apparatus may use a data base position of the address related to the cache miss in the first burst operation as a base point.
  • the burst operation may be controlled by wrap-around, and in the subsequent berth operation, the burst operation may be controlled based on the beginning of the boundary of the data block defined by the burst length.
  • FIG. 1 is a block diagram showing an example of a data processing system according to the present invention.
  • FIG. 2 is a block diagram showing a detailed example of a block transfer length determination unit.
  • FIG. 3 is a block diagram showing an example of an external memory address generation circuit.
  • FIG. 4 is an explanatory diagram showing an example of an address generation rule of the subsequent access address generation logic.
  • FIG. 5 is a timing chart illustrating a burst operation with respect to a synchronous DRAM having a burst length of 32 bytes.
  • Figure 6 shows a synchronous DRAM with a burst length of 16 bytes. 6 is a timing chart illustrating a burst operation with respect to FIG.
  • FIG. 7 is a block diagram showing an example of a logical configuration for generating a cache access address and a memory access address in the cache control unit.
  • FIG. 8 is an explanatory diagram illustrating the address generation logic of the cache address generation circuit.
  • FIG. 9 is a timing chart showing a cache fill operation by the microprocessor of FIG. 1 including a comparative example.
  • FIG. 10 is a block diagram showing another example of the data processing system according to the present invention. BEST MODE FOR CARRYING OUT THE INVENTION
  • FIG. 1 shows an example of a data processing system according to the present invention.
  • the data processing system shown in FIG. 1 includes a microprocessor 1 as an example of a data processing apparatus according to the present invention, a synchronous DRAM 2 as an example of an external memory capable of burst operation, and R0M. (Read-only memory) 14 are typically provided.
  • a peripheral circuit may be provided in addition to the synchronous DRAM 2 and ROM 14.
  • the microprocessor 1 includes, but is not limited to, a CPU 3, a cache memory 4, a cache control unit 5, and a memory control unit 6, and is formed on, for example, one semiconductor substrate (semiconductor chip).
  • the overnight buses 8, 9, and 10 are 4 bytes (32 bits), although not particularly limited.
  • the CPU 3 includes a control unit and an execution unit (not shown).
  • the execution unit includes, for example, a general-purpose registry file and a computing unit.
  • the control unit decodes a fetched instruction and performs an operation of the execution unit. Control and so on.
  • the cache memory 4 has a so-called data array. Overnight The array is composed of, for example, SRAM (Static Random Access Memory) and has memory cells arranged in a matrix. The selection terminals of the memory cells are connected to word lines for each row, for example, and the memory cell The evening input / output terminals are connected to complementary bit lines for each column.
  • the lead line is selected by an index address given from the cache control unit 5.
  • the unit of each row selected by the index address in the data array is the cache line.
  • the cash line has a cache line length of 32 bytes. For the selected cache line, 4-byte selection is performed by the longword selection signal provided from the cache control unit 5.
  • the index address and long code select signals are shown as cache access address signal 7.
  • the cache control unit 5 has a so-called address array and cache control logic.
  • the address array is also composed of SRAM as in the data array.
  • the address array has an evening field in a one-to-one correspondence with each cache line.
  • the evening field holds tags for the corresponding cache line and valid bits indicating the validity of the cache line.
  • the evening field is also selected by the index address.
  • the cache control logic determines cache hits and cache misses, and performs cache fill control at the time of cache misses.
  • the memory control unit 6 performs bus control for accessing the synchronous DRAM 2 and the ROM 14 according to instructions from the CPU 3 and the cache control unit 5.
  • the memory control unit 6 is connected to the above-described representative synchronous DRAM 2 and the like via an external data bus 10 and an external address bus 13 and the like.
  • a control bus for transmitting control signals such as a strobe signal for external bus access or external memory access is not shown. It is.
  • the memory control unit 6 may be understood as a so-called bus state controller or a part of a memory controller included therein.
  • a part of the effective address 11 output by the CPU 3 is an index address
  • the tag of the evening field indexed by the address array is a part of the effective address 11 by the cache control logic. It is compared with the evening grace address included in, and if they match, it is a cache hit, and if they do not match, it is a cache miss.
  • the cache control unit 5 In the read access of CPU 3, if it is a cache hit (cache read hit), the corresponding 4-byte data of the indexed cache line is supplied to CPU 3 via data bus 8. You.
  • the cache control unit 5 In the case of a cache miss (cache read miss) in the read access, the cache control unit 5 generates a memory access address 12 and sends a memory access request MREQ to the memory control unit 6 together with the memory access address 12. .
  • the memory control unit 6 reads, for example, one cache line of data from the synchronous DRAM 2 and transfers the read data to the cache memory 4 via the data bus 9.
  • the cache control unit 5 generates the cache access address 7 in synchronization with the supply, and cache-fills the data into a required cache line.
  • the cache control unit 5 stores a tag corresponding to the data of the cache line in the evening field corresponding to the cache line. At this time, the data relating to the cache miss is given to the CPU 3 via the data bus 8.
  • the write data is supplied from the CPU 3 via the data bus 8 to the corresponding 4 bytes of the indexed cache line.
  • You. Cache miss in write access (cache If it is a write miss, the cache control unit 5 generates the memory access address 12 and gives the memory control unit 6 a memory access request MR EQ.
  • the memory control unit 6 reads data for one cache line from the synchronous DRAM 2 according to the memory access address 12 and supplies the read data to the cache memory 4 via the data bus 9 for example. In synchronization with this, the cache control unit 5 fills the cache line with the data, and stores a tag corresponding to the data of the cache line in the evening field corresponding to the cache line.
  • a write-through method is used as a method for maintaining consistency between data held in the cache memory 4 and data stored in an external memory such as the synchronous DRAM 2. That is, the cache memory 4 has a write-through buffer (not shown) for holding the write data at the time of the cache light hit.
  • the cache control unit 5 writes the write data relating to the cache write hit to the cache memory 4 and then instructs the corresponding address such as the synchronous DRAM 2 to write. This is given to the memory control unit 6.
  • the memory control unit 6 controls the writing of the data held in the write-through buffer into the synchronous DRAM 2.
  • the synchronous DRAM 2 has a memory cell array in which dynamic memory cells are arranged in a matrix, and the information storage format is dynamically performed via a storage capacity like a DRAM, and the refresh of stored information is also performed. Needed.
  • the major difference from the DRAM is that the operation is synchronized with an external clock signal, and the burst operation is enabled by wraparound. For example, there is a column address count for latching an externally supplied column address signal, and the column address count preset is maintained while the word line selection state by the row address is maintained. The column address is sequentially updated with the column address count based on the data value, so that continuous data access operation can be efficiently performed.
  • the number of consecutive data accesses is called the burst length, and the column address count is counted the number of times specified by the burst length.
  • a memory address in units of bytes is preset in a column address counter, and based on this preset address. Then, the count operation is sequentially performed three times from the least significant bit to the fourth bit, and a continuous access operation may be performed. Therefore, if the access point in 4-byte units is not at the boundary of the 16-byte column data location in the 16-byte column data location, the count address by the column address count will be replaced by the next 16-byte column data in the middle.
  • the address is returned from the boundary with the location to the previous 16-byte column data location boundary, that is, the access order of the burst operation is performed wrap-around within the 16-byte column data overnight location. .
  • the burst length is set in the mode register of the synchronous DRAM 2. For example, a part of the memory control data 15 set in the memory control unit 6 from the CPU 3 in the power-on reset processing is initialized from the CPU 3 in the mode register as data indicating the burst length. .
  • the burst length is not particularly limited, but can be selected and set from 16, 32 bytes.
  • the operation of the sink-port eggplant DRAM 2 is performed by signals such as row address strobe (RAS), column address strobe (CAS), write enable (WE), data mask (DM), and data strobe (DQS). It is indicated by the state of.
  • the signal is generated by the memory control unit 6. A command is defined for each specific state of the signal, and the synchronous DRAM 2 operates according to the command instruction.
  • An active command instructs a line selection operation.
  • the read command accompanied by the column address signal instructs a read operation on the memory cell of the already activated active line.
  • a write command accompanied by a column address signal instructs a write operation on a memory cell of a previously activated single line.
  • the read operation and the write operation are performed by burst access that can be wrapped around according to the burst length set in the mode register. In the write operation, in an access cycle in which the data mask (DM) signal is enabled, only the access cycle is spent, and actual data writing is suppressed.
  • DM data mask
  • the memory control unit 6 typically includes a block transfer length determination unit 20 and an external memory address generation unit 30.
  • the block transfer length determination unit 20 determines the synchronization with the cache line length (32 bytes) of the cache memory 4.
  • the wrap-around information WRP A which is the first information for indicating the burst length of the eggplant DRAM 2, is formed.
  • the external memory address generation unit 30 controls the single or multiple burst operations necessary to obtain a data length corresponding to the cache line length based on the wrap-around information WRPA, and outputs data from the synchronous DRAM 2. Burst read.
  • the cache control unit 5 generates a cache fill address for writing the data of 32 notes read by the memory control unit 6 in the burst read to the cache memory 4 every four bytes in a wrapped array.
  • the data block of the wrap-around operation corresponds to the burst length of the synchronous DRAM 2, and if the burst length is 16 bytes, the wrap-around operation is performed for each 16-byte address range. If is 32 bytes, wrap-around operation is performed within the 32-byte address range.
  • the cache fill address of the wrap-around operation is the above-mentioned index address and long word selection signal 7, and the long word selection signal is read out by the memory control unit 6 in the no-read mode and the data bus. Synchronized with the data ready signal DRDY, which indicates the division of the data output every 9th to 9th.
  • FIG. 2 shows a detailed example of the block transfer length determination unit 5.
  • the block transfer length determination unit 20 includes an access request determination circuit 22, a memory control register 23, and a block transfer length determination circuit 24.
  • the external memory information 15 such as the data bus width, access cycle number, and burst length for the external address area of the microprocessor 1 is initialized by the CPU 3. .
  • the burst length indicating the burst length set in the synchronous DRAM 2 by the CPU 3 is also set in the memory controller port register 23.
  • the access request determination circuit 22 receives the memory access request MREQ and the memory access address 12 from the cache control unit 5, and activates the detection signal 25 when the access target is the synchronous DRAM 2. I do. After detecting the memory access request by the memory access request MREQ, the access request determination circuit 22 The memory access address 12 is decoded, an area is selected according to the access target error, and an access area selection signal (not shown) is generated.
  • the area selection signal is used, for example, as a memory chip selection signal or a memory enable signal.
  • the block transfer length determination circuit 24 inputs the detection signal 25 and the burst length information 26 of the synchronous DRAM 2 set in the memory control register 23, and outputs wraparound information WRP A.
  • the burst length of the synchronous DRAM 2 is 16 bytes or 32 bytes
  • the cache line length of the cache memory 4 is 32 bytes. Therefore, the wraparound information WRPA is not particularly limited.
  • Bit information For example, the logical value "0" means burst length 16 bytes, and the logical value "1" means burst length 32 bytes.
  • FIG. 3 shows an example of the external memory address generation circuit 30.
  • the external memory address generation unit 30 has an address buffer 31, a subsequent access address generation circuit 31, and a selector 32.
  • the external memory address generation unit 30 When receiving the memory access address 12 from the cache control unit 5, the external memory address generation unit 30 holds this in the address buffer 31. Next, the address held in the address buffer 31 is selected by the selector 33 and output to the address bus 13 as the external memory address 16. If the area selection by the request determination circuit 22 at this time is the synchronous DRAM 2, the synchronous DRAM 2 is chip-selected, and the synchronous DRAM control logic (not shown) in the memory control unit 6 is executed. Commands such as read and write are supplied via this terminal. As a result, the sink eggplant DRAM 2 performs a burst operation. If the wrap-around information WRP A is a logical value "1", the burst operation is completed once.
  • the subsequent access address generation circuit 32 performs, for example, +16 on the address (byte address) of the address buffer 31 according to the address generation logic described later, and performs the next burst. Generates the head address of the operation. The generation logic of the subsequent access address will be described later in detail.
  • the output of the subsequent access address generation circuit 32 is selected by the selector 33 and supplied to the synchronous DRAM 2.
  • FIG. 4 exemplifies the rules of the generation logic of the subsequent access address.
  • the cache line length is 32 bytes
  • the synchronous DRAM burst length is 32 bytes or 16 bytes
  • the data bus width is 4 bytes
  • N is a multiple of 32.
  • the data of 4 bytes from address N is represented as D1
  • the data of 4 bytes from address N + 4 is represented as D2.
  • the first access address is the start address of the first burst operation
  • the second access address is the second burst operation required for the burst length of 16 bytes. Start address.
  • the second access address is not the value obtained by uniformly adding 16 bytes to the first access address. If the burst length is 16 bytes and the first access address is N + 4, N + 8, N + 12, the second access address is N + 16. Therefore, in the second burst access, the overnight output is in address order.
  • the burst length If memory access is performed from the beginning of the boundary specified in, the data that CPU 3 accesses first during continuous data access will be cached memory 4 or CP As it reaches U3, it helps to improve data processing performance. Accordingly, when the first access address is N + 20, N + 24, and N + 28, the second access address is N + 0.
  • FIG. 5 illustrates a timing chart of the burst operation for the synchronous DRAM 2 in which a burst length of 32 bytes is set.
  • the transfer start address given from the external memory address generator 30 means N + 8
  • the wrap-around information means 32 bytes.
  • the synchronous DRAM does not need the second burst access.
  • a bank active command (not shown) has been issued before the read command, and the code selection operation has already been completed.
  • the burst read of 32 knots is performed in the order of D3, D4, D5, D6, D7, D8, D1, and D2 by wraparound.
  • FIG. 6 illustrates a timing chart of the burst operation for the synchronous DRAM 2 in which a burst length of 16 bytes is set.
  • the first transfer start address given from the external memory address generation unit 30 is set to N + 8
  • the transfer start address of the second burst operation is set to N + 16 according to FIG.
  • a bankactive command (not shown) is first issued before the read command, and that the code selection operation has already been completed.
  • burst reading is performed in the order of D3, D4, D1, and D2 by wraparound.
  • burst read is performed in the order of D5, D6, D7, and D8 from the top of the removable package.
  • FIG. 7 shows an example of a logical configuration for generating a cache access address and a memory access address in the cache control unit 2.
  • the cache control unit 5 includes an address buffer 40, a memory access address generation circuit 41, a cache fill address generation circuit 42, and a selector 43.
  • the cache controller 5 receives the effective address 11 from the CPU 3, it holds this in the address buffer 40.
  • the address held by the address buffer 30 is selected by the selector 43 and supplied to the cache memory 4 as the cache access address 7.
  • the memory access address generation circuit 41 generates the memory access address 12 relating to the cache miss in response to this. .
  • the access control of the synchronous DRAM 2 by the memory control unit 6 using the memory access address 12 is as described above.
  • the cache fill address generation circuit 42 writes the 32-byte data read from the synchronous DRAM 2 by the memory control unit 6 in the burst memory to the cache memory 4 in a wrap-around manner every 4 bytes. Generate a cache address to be included. At this time, the cache fill address generation circuit 42 inputs the wrap-around information WRPA so that the de-blocking of the wrap-around operation can correspond to the burst length of the synchronous DRAM 2, and the burst length is reduced. If it is 16 bytes, wrap-around operation is performed for each 16-byte address range, and if burst length is 32 bytes, wrap-around operation is performed for a 32-byte address range.
  • the head address of the cache address in the wrap-around operation is the address related to the cache miss held in the address buffer 40.
  • the cache fill address of the wrap-around operation is the above-mentioned index address and the wake-up select signal 7.
  • the memory controller 6 reads from synchronous DRAM 2 by burst read. When the output data is output to the data bus 9 every four bytes, a data ready signal DRDY indicating the data separation is output.
  • the cache fill address generation circuit 42 sequentially increments the head address of the cache fill address by +4 in synchronization with the overnight ready signal DRDY.
  • FIG. 8 illustrates the address generation logic of the cache fill address generation circuit 42.
  • the head address of the cache file address is determined by the effective address associated with the cache miss hit. And the corresponding data are shown in pairs.
  • N be a multiple of 32
  • the data at address N be called D1
  • the data at address N + 4 be called D2
  • the data at address N + 28 be called D8.
  • the cache fill address generation circuit 42 converts the cache fill address according to the data ready signal DRDY issued by the memory control unit 6 in synchronization with the switching of the data.
  • the cache access address 7 is generated in the order of N + 8, N + 12, N, N + 4, N + 16, N + 20, ⁇ + 24, 28 + 28.
  • FIG. 9 shows the cache file operation by the microphone port processor 1 described above, including a comparative example.
  • the address related to the cache miss is ⁇ + 8
  • the data read out from the synchronous DRAM having a burst length of 32 bytes by the burst operation is the second data.
  • the order is @ 08, @ 12, @ 16, @ 20, @ 24, @ 28, @ 00, @ 04.
  • data read out from a synchronous DRAM having a burst length of 16 bytes in two burst operations are @ 08, @ 12, @ 00, @ 04, and @. 24, @ 28, @ 16, @ 20.
  • the order of data read out from the synchronous DRAM differs depending on the burst length.
  • a data aligner that unifies the data in the order of 32 bytes during the burst operation is provided before the cache fill so that the cache fill is not performed with this inconsistency, as shown in Fig. 9 (C).
  • Four penalty cycles occur during the overnight reordering, and the bus performance decreases.
  • the conventional technology that imposes a fixed boundary on the access start address, as illustrated in FIG.
  • the data order from the synchronous DRAM is @ 08, @ 12, @ 00, @ 04, @ 24, @ 28, @ 1 6, @ 20, which is the third request from the CPU, @ 16, arrives from memory only at the 7th, causing a penalty of at least 4 cycles.
  • the sequence of data from the sink-mouth eggplant DRAM 2 is as follows: @ 08, @ 12, @ 00, @ 04, @ 16, @ 20, @ 24, @ 28, and the third request from CPU3, @ 16, can arrive from synchronous DRAM 2 at # 5, reducing the penalty cycle to two. Therefore, the data processing performance of the CPU 3 can be improved. Since access to such a continuous address occurs very frequently, such as in the case of instruction access and continuous data processing, a great effect can be obtained in improving the data processing efficiency. .
  • the cache control unit 5 receives the wraparound information WRPA together with the data from the synchronous DRAM 2, access from other than the beginning of the boundary of the data packet specified by the burst length is also possible. It is possible to start, and the performance of data processing by the CPU can be improved. Specifically, in (D) of FIG. 9, although the data required first by the CPU is @ 08, there is a restriction that the order of data returned from the memory is based on the beginning of the memory program. Because of this, the memory cannot be accessed from @ 08, the burst transfer start address becomes N + 0, and the data returned from the memory is @ 00, @ 04, @ 08, @ 12, @ 16, @ 20, @ 24, @ 28.
  • data @ 08 is the third data, and the CPU waits at least two cycles for the first data to arrive, causing a decrease in the data processing performance of the CPU.
  • the burst transfer start address can be set to N + 8. 2, @ 00, @ 04, @ 16, @ 20, @ 24, and @ 28, arrived first from synchronous DRAM 2 from synchronous DRAM 2 at the first request @ 08 requested by CPU 3 CPU penalty cycle can be reduced to two cycles, and the data processing performance of CPU 3 can be improved.
  • FIG. 9 (E) uses the cache field address generation logic of FIG. 8, as shown in FIG.
  • the penalty cycle that occurred in 4 cycles in Fig. 9 (C) can be reduced to 2 cycles, and in this regard, the performance of the CPU by overnight can be improved. it can.
  • the memory control unit 6 grasps the burst length of the memory to be accessed (synchronous DRAM) 2 with respect to the cache line length based on the wrapped array information WR PA, and responds to the memory 2 to be accessed based on this. By controlling the number of burst operations, block data corresponding to the cache line length can be obtained from the synchronous DRAM 2 by a burst operation. In the obtained process, the cache control unit 5 wraps around the cache memory 4 according to the burst length obtained from the wrap-around information WR PA.
  • the data output from the synchronous DRAM 2 does not need to be rearranged by the aligner, and the constraint that the beginning of the boundary of the burst-operated data latch is fixed to the access start address is fixed. It is not necessary to provide it. Therefore, even when using a memory capable of burst operation with a size shorter than the cache line length of the cache memory and having a wrap-around function, it is possible to reduce the CPU wait time until acquiring the data related to the cache miss, and to reduce the data delay. This can contribute to improving processing performance.
  • the cache control unit 5 proceeds with the cache fill operation while following the operation in which the memory control unit 6 sequentially reads out data from the synchronous DRAM 2 in a burst operation in response to a cache miss hit. Therefore, a high-speed cache fill operation can be guaranteed.
  • the burst length is specified. If memory access is performed from the beginning of the boundary, the data that the CPU accesses first during continuous data access can be made to reach the cache memory or CPU first. Helps improve performance.
  • one synchronous DRAM 2 is connected to the microprocessor 1. If a burst length (for example, 16 bytes) that is relatively shorter than the cache line length (for example, 32 notes) of the cache memory 4 is set in the synchronous DRAM 2, a synchronous access for performing a burst access operation is performed. Combining multiple block data output from DRAM2 to wraparound to cache Cache memory 4 can be cache-filled. Also, when write through is used as one of the processes for the cache write hit of the cache memory 4, the write data is transferred from the write through buffer shorter than the cache line length (for example, 8 bytes). In the case of writing to the synchronous DRAM2, there is little waste of the data transfer cycle due to the relatively short burst length. For the latter eight bytes of the burst access operation at this time, the actual data write operation may be suppressed by performing data masking using the data mask signal DM.
  • a burst length for example, 16 bytes
  • the cache line length for example, 32 notes
  • FIG. 10 shows another example of the data processing device.
  • the data processing system shown in the figure is provided with a memory capable of performing a burst operation in a wraparound manner, for example, two synchronous DRAMs 2A and 2B.
  • Each of the synchronous DRAMs 2A and 2B has a configuration similar to that of the synchronous DRAM 2 described above.
  • One synchronous DRAM 2A has a burst length of 16 bytes
  • the other synchronous DRAM 2A has a burst length of 16 bytes.
  • a burst length of 32 bytes is set for 2B.
  • the burst lengths of the synchronous DRAMs 2A and 2B are individually set by the software in the synchronous DRAM mode registers 2A and 2B from the CPU 3 after power-on reset.
  • the bus control information such as the burst length of the external memory such as the synchronous DRAMs 2A and 2B is set in the memory controller 23 in the memory controller 6.
  • Other configurations are the same as those in FIG. 1, and thus detailed description is omitted.
  • a burst length (for example, 16 bytes) that is relatively shorter than the cache line length (for example, 32 bytes) of the cache memory 4 is set in the synchronous DRAM 2A.
  • a plurality of block data output wraparound from the synchronous DRAM 2A can be combined and cache-filled to the cache memory 4.
  • a synchronous DRAM 2B with a burst length equal to the cache line length is included in the data processing system, the synchronous DRAM 2B is processed in the cache miss process for the synchronous DRAM 2B.
  • the cache fill operation according to the burst length of the frame is also enabled.
  • the write data is transferred from the write-through buffer shorter than the cache line length (for example, 8 bytes) to the synchronous DRAM 2A.
  • the data transfer cycle is less wasted because of the relatively short burst length.
  • the last 12 bytes of the burst access operation may be performed by masking the data with the data mask signal DM to suppress the actual data write operation.
  • the synchronous DR A M2B in which a burst length equal to the cache line length is set, is to be written by write-through, the number of useless cycles increases as compared with the synchronous DRAM 2A even if the write mask is performed.
  • the amount of data that can be accessed or transferred to the synchronous DRAM 2B at one time can be increased, and the data amount of the CPU 3 can be increased. It can contribute to the improvement of evening processing performance.
  • the control to temporarily remove the synchronous DRAM 2B from the cache target is performed by the operation mode of the microprocessor 1 or the cache (not shown) of the cache control unit 5. This can be done by setting the CPU 3 for the queue control register. Therefore, in the above data processing system in which different burst lengths are set for a plurality of synchronous DRAMs, the performance of efficiently transferring a large amount of data such as 32 bytes to a cache memory or the like can be improved without impairing the performance. Useless cycles can be minimized for the transfer of relatively small data such as knots, and various connection configurations or utilization forms of memories having a plurality of memories having different burst lengths are realized.
  • the sink DRAM 2B holds the program code and data having a size equal to or larger than the cache line length, and the synchronous DRAM 2A By retaining data having a size smaller than the cache line length, the processing performance of the microprocessor 1 can be improved.
  • the cache memory may be for storing programs, or for storing data and programs in a mixed manner.
  • associative memory format such as set-associative, full-associative, or direct map can be used for the cache memory.
  • a write-back method may be employed for the cache memory instead of the write-through method.
  • the data processing device may incorporate other arithmetic units such as a floating-point arithmetic unit, other bus modules such as a direct memory access controller, and other peripheral circuits such as a timer or a RAM.
  • arithmetic unit such as a floating-point arithmetic unit
  • other bus modules such as a direct memory access controller
  • other peripheral circuits such as a timer or a RAM.
  • the memory capable of burst operation is limited to synchronous DRAM. Instead, a synchronous SRAM or the like may be used. The number of memories capable of burst operation included in the data processing system may be appropriately increased.
  • the present invention can be widely applied to a data processing device and a data processing system that can access a memory capable of burst operation.
  • a microprocessor, a microcomputer, a data processor The present invention can be applied to various types of semiconductor integrated circuit data processing devices called DSPs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Un contrôleur d'antémémoire (5) stocke des données prises dans une mémoire (2) par au moins une opération rafale exécutée en rebouclage sous la forme d'un fichier antémémoire dans une antémémoire (4) conformément à une première information (WRPA) représentant la longueur de rafale de la mémoire (2) capable d'exécuter une opération rafale pour la longueur de ligne d'antémémoire. Les données produites par la mémoire n'ont pas besoin d'être réalignées par un aligneur, et la première partie de limite du bloc de données à soumettre à une opération rafale n'a pas besoin d'être calée sur une adresse de début d'accès. Pour cette raison, même si la longueur de rafale de la mémoire (2) utilisée est plus courte que la longueur de ligne d'antémémoire, la durée d'attente de l'UC (3) mise à obtenir les données concernées par l'échec de la mise en antémémoire est réduite.
PCT/JP1999/006371 1999-11-16 1999-11-16 Dispositif et systeme informatique WO2001037098A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2001539124A JP3967921B2 (ja) 1999-11-16 1999-11-16 データ処理装置及びデータ処理システム
PCT/JP1999/006371 WO2001037098A1 (fr) 1999-11-16 1999-11-16 Dispositif et systeme informatique

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP1999/006371 WO2001037098A1 (fr) 1999-11-16 1999-11-16 Dispositif et systeme informatique

Publications (1)

Publication Number Publication Date
WO2001037098A1 true WO2001037098A1 (fr) 2001-05-25

Family

ID=14237286

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP1999/006371 WO2001037098A1 (fr) 1999-11-16 1999-11-16 Dispositif et systeme informatique

Country Status (2)

Country Link
JP (1) JP3967921B2 (fr)
WO (1) WO2001037098A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006172240A (ja) * 2004-12-17 2006-06-29 Nec Corp データ処理システム及びそのメモリ制御方法
JP2011070253A (ja) * 2009-09-24 2011-04-07 Mitsubishi Electric Corp メモリ制御システム
WO2012172694A1 (fr) 2011-06-17 2012-12-20 富士通株式会社 Unité de traitement arithmétique, dispositif de traitement d'informations et procédé de commande d'unité de traitement arithmétique
KR20150100565A (ko) * 2014-02-24 2015-09-02 스펜션 엘엘씨 랩핑된 판독 대 연속적인 판독을 갖는 메모리 서브 시스템

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6272041A (ja) * 1985-09-25 1987-04-02 Nec Corp キヤツシユメモリ制御装置
US5394528A (en) * 1991-11-05 1995-02-28 Mitsubishi Denki Kabushiki Kaisha Data processor with bus-sizing function
US5715476A (en) * 1995-12-29 1998-02-03 Intel Corporation Method and apparatus for controlling linear and toggle mode burst access sequences using toggle mode increment logic

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6272041A (ja) * 1985-09-25 1987-04-02 Nec Corp キヤツシユメモリ制御装置
US5394528A (en) * 1991-11-05 1995-02-28 Mitsubishi Denki Kabushiki Kaisha Data processor with bus-sizing function
US5715476A (en) * 1995-12-29 1998-02-03 Intel Corporation Method and apparatus for controlling linear and toggle mode burst access sequences using toggle mode increment logic

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006172240A (ja) * 2004-12-17 2006-06-29 Nec Corp データ処理システム及びそのメモリ制御方法
JP2011070253A (ja) * 2009-09-24 2011-04-07 Mitsubishi Electric Corp メモリ制御システム
WO2012172694A1 (fr) 2011-06-17 2012-12-20 富士通株式会社 Unité de traitement arithmétique, dispositif de traitement d'informations et procédé de commande d'unité de traitement arithmétique
KR20150100565A (ko) * 2014-02-24 2015-09-02 스펜션 엘엘씨 랩핑된 판독 대 연속적인 판독을 갖는 메모리 서브 시스템
JP2015158910A (ja) * 2014-02-24 2015-09-03 スパンション エルエルシー ラップ読出しから連続読出しを行うメモリサブシステム
US10331359B2 (en) 2014-02-24 2019-06-25 Cypress Semiconductor Corporation Memory subsystem with wrapped-to-continuous read
KR102180975B1 (ko) 2014-02-24 2020-11-19 사이프레스 세미컨덕터 코포레이션 랩핑된 판독 대 연속적인 판독을 갖는 메모리 서브 시스템

Also Published As

Publication number Publication date
JP3967921B2 (ja) 2007-08-29

Similar Documents

Publication Publication Date Title
US7017022B2 (en) Processing memory requests in a pipelined memory controller
US5371870A (en) Stream buffer memory having a multiple-entry address history buffer for detecting sequential reads to initiate prefetching
US6295592B1 (en) Method of processing memory requests in a pipelined memory controller
US20180322054A1 (en) Multiple data channel memory module architecture
US5586294A (en) Method for increased performance from a memory stream buffer by eliminating read-modify-write streams from history buffer
US5659713A (en) Memory stream buffer with variable-size prefetch depending on memory interleaving configuration
US5490113A (en) Memory stream buffer
US5043874A (en) Memory configuration for use with means for interfacing a system control unit for a multi-processor system with the system main memory
US5388247A (en) History buffer control to reduce unnecessary allocations in a memory stream buffer
US5530941A (en) System and method for prefetching data from a main computer memory into a cache memory
US5461718A (en) System for sequential read of memory stream buffer detecting page mode cycles availability fetching data into a selected FIFO, and sending data without aceessing memory
EP0407119B1 (fr) Dispositif et procédé de lecture, écriture et rafraîchissement de mémoire avec accès physique ou virtuel direct
US6895475B2 (en) Prefetch buffer method and apparatus
US5752272A (en) Memory access control device with prefetch and read out block length control functions
JP2509766B2 (ja) キャッシュメモリ交換プロトコル
US20050253858A1 (en) Memory control system and method in which prefetch buffers are assigned uniquely to multiple burst streams
JPS6297036A (ja) 計算機システム
US5452418A (en) Method of using stream buffer to perform operation under normal operation mode and selectively switching to test mode to check data integrity during system operation
TW491970B (en) Page collector for improving performance of a memory
US5649232A (en) Structure and method for multiple-level read buffer supporting optimal throttled read operations by regulating transfer rate
WO2001037098A1 (fr) Dispositif et systeme informatique
KR100298955B1 (ko) 데이타처리시스템
JPH06342400A (ja) プロセッサ・メモリのアドレス制御方法
JP2851777B2 (ja) バス制御方法及び情報処理装置
JPH08227376A (ja) コンピュータシステム及びその動作方法

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CN JP KR US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 10070313

Country of ref document: US

122 Ep: pct application non-entry in european phase