AU607594B2

AU607594B2 - Data memory system

Info

Publication number: AU607594B2
Application number: AU31220/89A
Authority: AU
Inventors: John Richard Eaton; Robert Malcolm Lister; Keith Turley
Original assignee: Fujitsu Services Ltd
Current assignee: Fujitsu Services Ltd
Priority date: 1988-02-12
Filing date: 1989-03-10
Publication date: 1991-03-07
Anticipated expiration: 2009-03-10
Also published as: GB2215887A; GB8828848D0; ZA89345B; AU3122089A; GB2215887B; GB8805908D0

Description

i I i I /75 Z COMMONWEALTH OF AUSTRALIA PATENTS ACT 1952-69 COMPLETE SPECIFICATION

(ORIGINAL)

Class Application Number: Lodged: Int. Class Complete Specification Lodged: Accepted: Published: Priority: Related Art: o Nane of Applicant: SAd'dress of Applicant: Actual inventor: Address for Service INTERNATIONAL COMPUTERS LIMITED ICL House, Putney, London SW15 1SW, England JOHN RICHARD EASTON, ROBERT MALCOLM LISTER and KEITH TURLEY EDWD. WATERS SONS, 50 QUEEN STREET, MELBOURNE, AUSTRALIA, 3000.

Complete Specification for the invention entitled: DATA MEMORY SYSTEM The following statement is a full description of this invention, including the best method of performing it known to US

I.

CL

C1099 DATA MEMORY SYSTEM Background to the Invention.

e~g.

C

This invention relates to a data memory system for storing data in a data processing system.

It is well known to provide a memory system comprising a relatively large, slow memory (referred to as a main memory) and a smaller, faster memory (referred to as the cache or slave store).

C

o, In operation of such a system, data ite is are accessed from the slave store if they are present in S that store. If not, the data items are accessed from the main memory and are copied into the slave store so that they are available for future access. When the slave store becomes full, it is necessary to cast out data items from the slave store so as to create space for new data items. The item to be cast out may be selected, for example, on a least-recently-used basis. If the item to be cast out has been modified since it was loaded into the slave store, it is necessary to write the updated version of that item back to the main memory, so as to ensure that the main memory is kept up to date.

Such a two-level memory system can provide a high access speed, assuming that most of the accesses are from the slave store, at a relatively low cost per unit of storage.

7 0040 e o 0000 0 0 0t o oi e« o 0 0 0 6 00 cse oa 9 0 0 09 00 a oo a o So o o 0 0 0 a 00 04 0 00 90000 0 oe a a o 0090 00000 ao 6 4 2 The object of the present invention is to provide a novel data memory system which allows more effective use of the slave or cache store.

Summary of the Invention.

According to the present invention, there is provided a data memory system comprising: a main memory, a fast slzve store of smaller size and faster access speed than the main memory, and a slow slave store of size and access speed intermediate those of the main memory and fast slave store, wherein in operation, upon receipt of a request for a data item: (i if the data item is present in the fast slave store, it is accessed in that store, (ii if the data item is not present in the fast slave store, but is present in the slow slave store, it is read from the slow slave store and loaded into the fast slave store, and (iii) if the data item is present in neither the fast slave store nor the slow slave store, it is read out of the main memory and loaded into the fast slave store but not into the slow slave store, data items being loaded into the slow slave store only when they are cast out of the fast slave store.

Thus, it can be seen that the slow slave serves to hold data items that have been cast out of the fast slave. If the item is subsequently required again, it can be accessed from the slow slave, rather than from the main memory. It has been found that the use of a slow slave in this way is very cost-effective, since it _I II -L 3 is quite likely that an item which has been cast out of the fast slave will be required again. The invention allows a smaller fast slave to be used, without the penalty of an increased number of accesses to the main memory.

Brief description of the Drawings.

One data memory system in accordance with the invention will now be described by way of example with reference to the accompanying drawings.

Figure 1 is an overall block diagram of a data processing system including a data memory system in accordance with the invention.

0 Figure 2 is a detailed block diagram of a fast data slave forming part of the data memory system.

Figure 3 is a detailed block diagram of a slow data slave forming part of the data memory system.

Description of an embodiment of the invention.

CI

Referring to Figure i, the data processing system comprises a scheduler unit 10, a main processing unit 11, a fast data slave 12, a fast code slave 13, a slow slave 14, and a main memory tool The scheduler 10 generates a sequence of instruction addresses, which are applied to the code slave 13. If the required instruction is in the code slave, it is returned to the scheduler, which decodes the instruction and passes it to the main processing unit 11 for execution. If, on the other hand, the required instruction is not present in the code slave, an access is made to the slow slave 14. If the required instruction is present in the slow slave, it is loaded into the code slave 13. The scheduler 10 will then be able to access the required instruction.

-4 When the main processing unit 11 executes an instruction, it generates an operand address, which is applied to the fast data slave 12. If the required operand is present in the data slave, it is returned to the main processing unit. If, on the other hand, the required operand is not present in the data slave 12, an access is made to the slow slave 14. If the required operand is present in the slow slave,it is loaded from the slow slave into the data slave. The main processing unit will then be able to access the operand.

oo0 0000 o 0 If a required data item (instruction or operand) is not present in the slow slave, an access is .o o0 made to the main store 15 to retrieve that item. The 00000: data item is then loaded from the main store into the o 0 0 code slave 13 or data slave 12 as the case may be. It o 0 should be noted, however, that the data item is not loaded into the slow slave 14 at this point.

oou o oo When the code slave or data slave becomes full, a block of data items is cast out of that slave so as to 0 0 create space for a new block of data items. The block of data items to be cast out is selected on a least-recently-used basis i.e. the block which has been ooo °accessed least recently is selected for casting out. The 000000 selected block is sent to the slow slave 14 as an "unloaded" block.

When the slow slave 14 receives an unloaded block, it first checks whether or not that block is already present in the slow slave. If it is present, then a check is made as to whether any of the data items in that block have been written to while the block was resident in the data slave, as indicated by a block write bit BW in the block. (In the case of a block unloaded from the code slave, no writes will have 3 i 7

I

II

to to lot.

oI I 44400 5 occurred, since the scheduler never modifies the instructions: it simply reads them).

If the block has not been written to (BW 0), then no further action is required, since the copies of that block in the slow slave and main store will still be up-to-date.

If a block write has occurred (BW 1) then the block is written into the slow slave, and also into the main memory, so as to provide them with the latest up-to-date copies of the data items in the block.

If the unloaded block is not already resident in the slow slave, then the action taken is as follows.

If the block has not been written to (BW then the block is written in the slow slave only (overwriting any existing block in that location of the slow slave). If, on the other hand, the block has been written to (BW then it is written into both the main memory and the slow slave.

Referring now to Figure 2, this shows the fast data slave 12 in more detail.

The fast slave comprises a fast random access memory (RAM) 20, which holds 1024 bytes of data, organised as 32 cells of 32 bytes.

The data slave receives operand virtual addresses VA from the main processing unit 11. These virtual addresses are stored in a buffer memory 21 while waiting for the data slave to become free.

When the data slave is free, it reads a virtual address out of the buffer memory 21. This address is translated, by means of a set of global segment registers 22, into a global virtual address GVA, -6identifying a 32-byte block which holds the required data item. At the same time, -the least significant bits of the virtual address are decoded, in a decoder circuit 23, to provide a byte shift value signal BSV indicating the position of the required operand within the specified 32-byte block. The signals GVA and BSV are stored in respective registers 24,25.

The virtual address GVA is applied to a contents addressable memory (CAM) 26, having 32 entries, corresponding to the 32 cells in the RAM 20. If a match is detected between GVA and one of the entries in the 0000 oo CAM, this indicates that the required data item is 0 0present in the corresponding cell of the RAM. The CAM 0 °then outputs a cell address CA, which is fed, by way of 0 b0 a multiplexer 27 and a register 28, to the address input O 0: of the RAM 20, so as to select the required cell. Data can then either be written into the selected cell from the main processing unit, or read out of the selected 0oo o 0 cell and returned to the main processing unit.

o °0 04 In the case of a write operation, the write so o o data WD from the main processing unit is applied to a byte alignment circuit 29, controlled by the signal BSV.

.oogThe circuit 29 aligns the data with the required byte 0 0 oo0 positions within the 32-byte block. The aligned data is 0:0 ~then fed, by way of a register 30 and multiplexer 31, to the data input of the RAM In the case of a read operation, a 32-byte block of data is read out of the selected cell of the RAM 20. The block is fed by way of a multiplexer 32 and register 33 to a byte alignment circuit 34, which is controlled by the signal BSV, by way of registers 35,36.

The circuit 34 selects the required data item from the specified byte positions, to produce a read data signal RD which is reLurned to the main processing unit.

__1

~C_

7 If the virtual address GVA does not match any of the entries in the CAM 26, a MISS signal is produced, indicating that the required data item is not present in the data slave. This causes the address GVA to be sent to the slow slave, so as to request the slow slave to supply the required data item.

At the same time, one of the c.ils in the data slave is unloaded, so as to clear a space for the data item when it is returned from the slow slave or main store. The data block to be unloaded is selected by means of a least-recently-used (LRU) circuit 37 associated with CAM 26. The LRU circuit keeps a record of the usage of the entries in the CAM 26, and provides an output signal which indicates the least recently used entry. LRU mechanisms are well known in the art and so it is not necessary to describe the circuit 37 in further detail.

When it is required to unload a cell from the data slave, the LRU circuit selects the least recentlyused entry in the CAM 26. The virtual address in this entry is read out, and supplied to the slow slave as an unloaded address DUA. At the same time, the CAM 26 provides a cell address CA, which selects the least Srecently used cell in the RAM 20. The contents of this cell are then read out of the RAM and supplied to the slow slave as an unloaded data block DUD. The cell address CA is also stored in a buffer memory 38.

When a data block is returned from the slow slave or main memory, the cell address CA is read out of the buffer 38 and is fed to the address input of the RAM by way of the multiplexer 27. This selects the cell which has been unloaded. Data MRD from the main memory can then be written into the selected cell of the RAM 8 by way of a register 39, or alternatively data SRD from, the slow slave can be written into the selected cell of the RAM by way of a register 40. The multiplexer 31 selects the required source of input data.

The fast code slave is similar to the fast data slave shown in Figure 2, with the exception that it receives instruction addresses from the scheduler, rather than the operand addresses from the main processing unit, and there is no connection for writing into the code slave from the scheduler, since the scheduler never requires to modify instructions.

Referring now to Figure 3, this shows the slow I S 0 0' slave in more detail.

000001 a °a The slow slave includes a RAM 50, containing 32K bytes of data, organised in 32-byte cells. The RAM is implemented in slower memory technology than the fast slave RAM 20: each access to the slow slave RAM occupies two clock beats, whereas each access to the fast slave RAM 20 takes just one clock beat.

Each cell of the RAM 50 holds a block of 32 data bytes, along with a tag (VATAG) indicating the 0 virtual address of that block. The data output of the RAM 50 is fed to a register 64, the output of which provides the signal SRD.

As mentioned above, when the fast data slave does not hold the required data item, it sends the virtual address GVA of that item to the slow slave.

Similarly, when the fast code slave does not hold a required code item, it sends the address CVA of that item to the slow slave.These addresses are fed by way of a multiplexer 51 to a first-in first-out (FIFO) memory 52, which holds a queue of addresses waiting to be handled by the slow slave.

Also as mentioned above, when the fast data slave unloads a block of data, it sends the address of that block DUA to the slow slave. Similarly, when the code slave unloads a block of code, it sends the address of that block CUA to the slow slave. The addresses are selected by a multiplexer 53 and then multiplexed with the output of the FIFO 52 by means of a multiplexer 54, to form an input address SSA for the slow slave.

The address SSA is fed, by way of a register to the input of a hash coding circuit 56, which forms a hash address HA. The hash address may be formed, for example, from predetermined bits of the address SSA or by forming the exclusive -OR of predetermined pairs of bits of SSA.

The hash address HA is fed by way of register 57 to the address input of RAM 50, so as to select one of the 32-byte cells of the RAM. The hashing function is a many-to-one mapping operation, such that each of the cells in the RAM 50 corresponds to several different values of the input address SSA. To resolve this ambiguity, whenever a cell in the RAM 50 is addressed, the virtual address tag (VATAG) contained in that cell is read out, and is compared with the input address SSA by means of a comparator 66.

If they are equal, a HIT signal is generated, to indicate that the required data block is present in the slow slave RAM The data input of the RAM 50 comes, by way pf registers 62 and 63, from a multiplexer 60, which selects either the unloaded block DUD from the fast data slave, or a corresponding unloaded block CUD from the fast code slave.

10 In operation, when the slow slave receives an unload address DUA or CUA, that address is hashed, and is used to address one of the cells in the RAM 50. If the data block being unloaded is already present in the slow slave, then a HIT signal will be produced.

If a HIT is produced, and if the k write bit BW of the unloaded block (DUD or CUD) is not set, then no further action is taken. However, if the bit BW is set, then the unloaded block (DUD or CUD) is written into the selected cell of the RAM If, on the other hand, the data block being unloaded is not present in the slow slave, the HIT C o signal will not be produced. In this case the unloaded 6 0 data block (DUD or CUD) is written into the addressed S. cell of the RAM 50. At the same time, the virtual address of that block (DUD or CUD) is written into the virtual address tag of the block.

a S Whether or not the unloaded data block is already present in the slow slave, if it has been a modified (BW 1) it has to be copied to the main memory. The unloaded data block (DUD) is therefore sent to the main store, as a memory write data signal MWD. At the same time, the virtual address of the block (DUA) is a00' translated into a real address by means of an address translation unit 61, and sent to the main memory, by way of register 65, as a memory address signal MRA.

When the slow slave is free, and does not have any unloaded data blocks to deal with, it examines the FIFO 52. It will be recalled that this contains a queue of addresses (GVA or CVA) of required data items which were not present in the fast data slave or code slave, and hence have to be brought in from the slow slave or main memory. The slow slave selects the first address in 11 the FIFO 52, hashes it, and applies the resulting hash address to the RAM If a HIT signal is now produced, this indicates that the data block containing the required data item is present in the slow slave RAM 50. The data block is therefore read out of the RAM 50 and returned to the data slave and the code slave as the data signal SRD. As described above, the data block will then be written into the fast data slave RAM 20, or into the corresponding code slave RAM, into a block location 2 'which has been unloaded.

If, on the other hand, no HIT signal is produced, the data block must be retrieved from the main memory. The virtual address GVA or CVA is therefore translated into a real address by the circuit RA as already described, and sent to the main memory. The data clock will then be accessed in the main memory, and returned to the fast data slave and code slave as the 0 signal MRD.

It should be noted that the data returned in this way from the main memory is not entered into the slow slave. As mentioned above, data items (operands and a instructions) are entered into the slow slave only as a 2212* result of being unloaded from the fast data slave or code slave.

The main memory 15 may be a conventional memory and need not be described in detail. It is substantially larger than the slow slave, for example, it may hold one million bytes of data, and is implemented in a slower memory technology, e.g. each access from the main memory may take 25 clock beats.

Claims

1. (a) (b) (c) wherein data ite A data memory system comprising: a main memory, a fast slave store of smaller size and faster access speed than the main memory, and a slow slave store of size and access speed intermediate those of the main memory and fast slave store, in operation, upon receipt of a request for a m: C It I, I C C i if the data item is present in the fast slave store, it is accessed in that store, (ii if the data item is not present in the fast slave store, but is present in the slow slave store, it is read from the slow slave store and loaded into the fast slave store, and (iii) if the data item is present in neither the fast slave store nor the slow slave store, it is read out of the main memory and loaded into the fast slave store but not into the slow slave store, data items being loaded into the slow slave store only when they are cast out of the fast slave store.

2. A system according to Claim 1 wherein the fast slave store is addressed by means of a contents addressable memory.

3. A system according to Claims 1 or 2 wherein the slow slave store is addressed by means of a hash coding circuit

4. A system according to any preceding claim wherein data items are selected for casting out of the fast slave store on a least recently used basis. A system according to any preceding claim wherein a data item cast out of the fast slave store is written into the slow slave store only if it is not already present in the slow slave store, or if it has been modified while in the fast slave store.

L i i L 13

6. A data processing system comprising a data processing unit a main memory a fast slave store of smaller size and faster access speed than the main memory, a slow slave store of size and access speed intermediate those of the main memory and fast slave store, wherein, in operation i when the processing unit requires a data item for processing it requests the item from 1 the fast slave store, (ii when the fast slave store receives a request for a data item from the processing unit, it returns the item to the processing unit if that item is resident in the fast slave ,store, or else requests that item from the slow slave store, (iii) when the slow slave store receives a request for a data item from the fast slave o store, it returns the item to the fast slave store if that item is resident in the slow slave store or else requests that item from the main memory, (iv when the main memory receives a request for a data item, it returns that data item to the fast slave store but not to the slow slave store, v when the fast slave store casts out data items to make room for new data items, the data items cast out are written into the slow slave store.

7. A data memory system substantially as hereinbefore described with reference to the accompanying drawings. -14-

8. A data processing system substantially as hereinbefore described with reference to the accompanying drawings. Os 0 I. S Is 00 O 0 000000 @5 00 o O 0 0 OOO 0 0 @0 00 0 DATED this 9th day of March 1989. INTERNATIONAL COMPUTERS LIMITED EDWD. WATERS SONS PATENT ATTORNEYS QUEEN STREET MELBOURNE. VIC. 3000.