GB2276964A - Buffering write cycles. - Google Patents

Buffering write cycles. Download PDF

Info

Publication number
GB2276964A
GB2276964A GB9406387A GB9406387A GB2276964A GB 2276964 A GB2276964 A GB 2276964A GB 9406387 A GB9406387 A GB 9406387A GB 9406387 A GB9406387 A GB 9406387A GB 2276964 A GB2276964 A GB 2276964A
Authority
GB
United Kingdom
Prior art keywords
address
microprocessor
data
memory
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB9406387A
Other versions
GB9406387D0 (en
Inventor
James Torossian
Neville Allan Clark
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HYPERTECH Pty Ltd
Original Assignee
HYPERTECH Pty Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by HYPERTECH Pty Ltd filed Critical HYPERTECH Pty Ltd
Publication of GB9406387D0 publication Critical patent/GB9406387D0/en
Publication of GB2276964A publication Critical patent/GB2276964A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0897Caches characterised by their organisation or structure with two or more cache hierarchy levels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0804Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A buffer system, operating as an array of contents addressable temporary storage locations, buffers write cycles between a processor and locations in main memory so that the processor is not constrained by the write cycle times of other devices. If an address is rewritten before the previous write data has left the buffer, the data of the previous write cycle is overwritten with the new data. <IMAGE>

Description

CACHE MEMORY PROCESSOR SYSTEM The present invention relates to computer systems, and in particular, to cache memory systems.
Background Art It is known to use cache memory to provide for efficient temporary storage for data processed by an associated microprocessor, and which can be accessed at a significantly higher rate than that of main memory within a computer system. Early cache memory systems comprise static random access memory (RAM) arranged adjacent to the microprocessor. More recently, microprocessor manufacturers have incorporated the cache memory into the microprocessor. In modern microprocessor systems, such an arrangement is highly desirable as generally, the microprocessor has the ability of operating at speeds generally greater than the remainder of the computer system and random access memory included therein. Accordingly, when the microprocessor is executing an entirely internal function without accessing devices peripheral thereto, the internal cache provides a high speed internally accessible temporary storage area.
However, such an arrangement is not without its problems. The most significant of those is that when the internal cache is written into, it is necessary that the main memory of the personal computer system be also updated to maintain coherence between the two memory blocks. Accordingly when the microprocessor is operating at speeds in excess of the remainder of the computer system, and a stream of write cycles are generated, those write cycles must be stored in both the internal cache and the main memory. Because the microprocessor operates at a speed in excess of the remainder, the microprocessor is therefore limited to issuing the write cycles at the speed of the remainder and not at the higher microprocessor operating speed. Accordingly, the advantages of high microprocessor speed are not realised.
One attempted solution to this problem has been proposed by Compaq Computer Corporation of the United States. In the Compaq arrangement, a write-back cache is provided external to the microprocessor. The write-back cache operates as a cache memory to store all write cycles generated by the microprocessor. When a high speed stream of write cycles is generated, the write-back cache stores the write cycles and marks the locations in the cache as being "dirty", which indicates the need to update the main memory at a later date. Data written to the cache is then written to main memory only when the cache location needs to be reused or when the cache is flushed, to ensure coherence. However, this arrangement has the disadvantage in that there can be a significant lag in the updating of the main memory which is undesirable in view of the main memory being shared with other devices which have direct access thereto and who may wish to access the main memory immediately upon availability being signalled from the microprocessor.
In order to circumvent this time lag in the updating of main memory, current write-back cache techniques must provide the capability to off-load, on request from other devices or other processors connected to main memory, data from the write-back cache when the cache contains "dirty" data. This adds significant complexity and hence cost to the design implementation.
Summary of the Invention It is therefore an object of the present invention to substantially overcome, or ameliorate, the abovementioned difficulties through provision of a cache memory system which provides for the timely updating of main memory to ensure coherence with cache memories internal and/or external to a microprocessor.
In accordance with one aspect of the present invention there is disclosed a method for handling write cycles each having address and corresponding data from a microprocessor to enable the microprocessor to issue write cycles at a faster rate than can be received by a destination of the write cycles, said method comprising the steps of: (a) buffering the address and corresponding data being written from said microprocessor in a first write cycle in respective temporary storage locations; (b) when said first or another temporarily stored write cycle is followed by another write cycle, comparing the address of said another write cycle with that of the first and other temporarily stored write cycles, and if they match, (i) overwriting the data of the matched said first or other write cycle with the data of said another write cycle; and if they do not match, (ii) buffering the address and corresponding data of said another write cycle in respective temporary storage locations distinct from those of the first and other write cycles; and (c) identifying an availability of access to the destinations of said write cycles and for each temporarily stored write cycle, using the stored address to write the corresponding stored data to its destination.
In accordance with another aspect of the present invention there is disclosed a memory processor system comprising: temporary storage means for storing address and data components of an information transfer from a source to a destination; comparator means associated with said temporary storage means for comparing an address component of a current information transfer with address components retained in said temporary storage means whereby if an address match occurs, the data associated with said information transfer overwrites the data stored at the corresponding data location for that stored address in said temporary storage means, and if no match occurs, the address and data associated with said current information transfer are stored at a new location in said temporary storage means; and output means for identifying when no transfer is occurring from said source to said destination for enabling outputting each of said temporarily stored address and corresponding data component to said destination.
In accordance with another aspect of the present invention there is disclosed a computer system comprising: a microprocessor; a cache memory connected to a processor address bus and a processor data bus both leading from said microprocessor; a cache memory processor system comprising an address processor interconnecting said processor address bus with a system address bus, and a data processor interconnecting said processor data bus with a system data bus, said system buses interconnecting other devices of said system, said other devices comprising a plurality of memory locations accessible by at least said microprocessor, characterised in that said cache memory system operates as a contents addressable array of temporary storage locations for write cycles from said microprocessor to said memory locations whereby substantial coherence between data stored in said cache memory and data stored at one or more selected said memory locations is maintained whilst said microprocessor is enabled to operate at a rate in excess of a rate at which said other devices operate.
Although the present invention is applicable to many different type of computer systems, in addition to also being able to be integrated into the microprocessor, it finds particular application in a microprocessor upgrade assemblies where, for example, a computer motherboard configured to operate using an Intel i286 microprocessor, is upgraded to operate with a i386 microprocessor, which is capable of significantly higher data processing rates.
Brief Description of the Drawings A preferred embodiment of the present invention will now be described with reference to the drawings in which: Fig. 1 is a schematic block diagram representation of a typical prior art microprocessor upgrade assembly; Fig. 2 is a schematic block diagram representation of a microprocessor upgrade assembly incorporating one embodiment of the present invention; Fig. 3 is a schematic block diagram representation of a microprocessor upgrade assembly incorporating a preferred embodiment of the present invention; Fig. 4 is a schematic block diagram representation of the data ASIC of Fig. 3; Fig. 5 is a representation of the dual ported RAM of Fig. 4; Fig. 6 is a schematic block diagram representation of the address ASIC of Fig. 3; Fig. 7 is a representation of the comparative memory buffer of Fig. 6; and Fig. 8 is a schematic representation of a multi-processor system incorporating an embodiment of the invention.
Best and Other Modes for Performing the Invention Figs. 1 and 2 each show a microprocessor upgrade assembly lA and lB respectively which is configured for interconnection to a microprocessor socket 21 of a computer system motherboard 20. Microprocessor upgrade assemblies are known in the art and are used to re-configure an aging, but generally high quality computer system motherboard to incorporate features available and speeds provided by newer model microprocessors. Examples of such boards are the HYPERACE 386SX and HYPERACE 486 (trade marks) systems manufactug by the present applicant and which are produced
to provide for the upgrading of IBMtyie 286 and 386 motherboards, respectively.
In Fig. 1, the upgrade assembly 1A has a microprocessor 2 such as an Intel 80486DX (trade mark) (when the motherboard 20 is an Intel 80386DX (trade mark) system) which includes an internal cache 3, and a external cache 4 connected to the microprocessor 2 via an address bus 5 and data bus 6. The assembly 1A also includes an array of control devices which would be understood by those skilled in the art and which is schematically represented by a control block 7 which interfaces control signals of the microprocessor 2 to the socket 21 . The control block 7 is also responsible for the control of the external cache 4, address buffers 100 and data buffers 101. The microprocessor 2 operates at a speed determined by its own clock 11.
The embodiment shown in Fig. 2, is implemented by replacing the address buffers 100 and data buffers 101 between the microprocessor 2 and the socket 21, with two applications specific integrated circuits (ASIC's), an address ASIC 40, and a data ASIC 60. The ASIC's 40, 60 implement the buffering previously provided by the address buffers 100 and the data buffers 101. Further to this, they provide for the storing of writes in a contents addressable array of write temporary storage locations and allow multiple reads around data in the write temporary storage locations.
The various embodiments are also particularly useful when the speed of a system clock 22 residing on the motherboard 20 is lower than the speed of the processor clock 11 at which the microprocessor 2 operates, or when the microprocessor 2 does have an internal cache.
The ASIC's 40 and 60 are together configured to store writes from the microprocessor 2 at the clock speed of the microprocessor 2, into the contents addressable write temporary storage locations. Writes to memory addresses that currently have data previously written to them by the microprocessor 2, and still stored in the write temporary storage locations, have this data overwritten by the new data currently being written by the microprocessor 2 to that same address. This process effectively merges writes to the same address, thereby reducing the number of writes needed to the main memory 23.
The advantages of this array of contents addressable write temporary storage locations becomes evident when one considers the nature of typical code executed by a microprocessor. For example, consider the code example below written in the programming language called C. In this example, a function called fred(), that for the purposes of the example performs no function, is repeatedly called by the execution of a for loop in the function main(), such as: /* the main program function */ main() ( int i; for(i = 0; i < 1000; i++) fred(); } /* the function fred that when called, just returns */ fred() ( In each instance, when the function fred() is called, it is necessary to push a return address onto the stack, (which implements a write from the microprocessor 2), execute the routine (in this case doing nothing) and then return to the function main(), which entails popping the return address (necessitating a read to the microprocessor 2).
The process of calling the function fred() in the example above, generates 1,000 writes (pushes) of the return address to the program's stack. These writes will effectively be to the same memory address, but despite this, the main memory 23 must be updated by writing to it. In the prior art arrangement of Fig. 1, the process of off-loading this stream of writes, to main memory 23 via the motherboard bus 24 will quickly slow the fast microprocessor 2 down to the speed determined by the system clock 22.
The contents addressable write temporary storage locations provided by the ASIC's 40,60 of Fig. 2 circumvent this slow down by handling the writes generated by the above example in the following manner. When the first return address is pushed onto the stack, it is written out to cache memory, either 3 or 4 and, because data for that processor address is currently not stored in the write temporary storage locations, it is loaded into the first free storage location in the write temporary storage locations. This happens at a speed determined by the microprocessor 2 and its clock 11 and with zero wait states. The data in the write temporary storage locations is now ready to be written to the motherboard main memory 23, and this will occur during the next motherboard bus cycle.
In the interim, the faster microprocessor 2 has already returned from the function fred() by reading (popping) the return address stored in the cache memory 3,4, incremented the variable i and again called the function fred, in doing so again writing the return address (pushing it) onto the stack stored in the cache 3,4. As a write has again been generated by the microprocessor 2, the main memory 23 must be updated to maintain coherency between the cache 3,4 and main memory 23. As all writes to main memory 23 go via the contents addressable write temporary storage locations, this return address (data) must also be loaded into the write temporary storage locations.
As data is written into the temporary storage locations, it's system address is compared to the addresses of all data already stored in the temporary storage locations.
Because the data representing the first return address that was written into the temporary storage locations is still there waiting for a motherboard bus cycle, the logic of the contents addressable write temporary storage locations detects an address match between data already stored in one of it's locations and the new data (the second return address), now being presented to the write temporary storage locations input. As an address match has occurred, the temporary storage location already storing data for that microprocessor address is overwritten with the new data associated with that address.
This process will continue with individual storage locations having their contents overwritten several times before finally being written out to the main memory 23.
This process of overwriting data in the write temporary storage locations allows the microprocessor 2 to run at full speed, while allowing main memory 23 to be updated at the maximum rate allowed by the motherboard bus cycle timing. This in turn keeps the contents of main memory 23 closely matching the latest changes in the contents of the cache 3,4 thereby minimising the time taken for main memory 23 to be updated.
When the microprocessor 2 generates an I/O cycle or locked memory cycle, the write temporary storage locations are unloaded to main memory 23 prior to executing the UO cycle or locked memory cycle.
When the microprocessor 2 has to read a memory location not stored in either of the caches 3 and 4, that is the data it requires is only located in main memory 23, the address of the required data is checked against the addresses of all data stored in the write temporary storage locations. If data belonging to that address is not stored in the write temporary storage locations, the read from main memory 23 can proceed immediately. If data belonging to the address to be read is in the temporary storage locations, the microprocessor 2 waits for the data associated with that address to be written to main memory 23 before reading it from the same main memory 23 location.
A preferred embodiment is shown in Fig. 3 in the form of a cache memory processor system 30 which incorporates a control arrangement 31, and the ASIC's 40 and 60 described above, the system 30 being incorporated into a microprocessor upgrade assembly 25 similar to those already described. The control arrangement 31 is specifically configured (in a manner that would be apparent to those skilled in the art) to provide certain functions via a control connection 32 which connects to the ASIC's 40,60, a further connection 33 which permits the control arrangement to "snoop" or otherwise examine the address bus for certain types of data transfers, and a system control bus connection 34.
Turning now to Fig's. 4 to 7, the arrangement of the address ASIC 40 and the data ASIC 60 are shown schematically. Not shown in Fig's. 4 to 7 however, are internal control connections and components which would be understood by those skilled in the art.
Referring to firstly to Figs. 4 and 6 when a READ of main memory 23 is performed, as seen in Fig. 6, the READ address is supplied by the microprocessor 2 through a bi-directional buffer 41, a latch 43, a multiplexer 102, and a bi-directional buffer 51 to the microprocessor socket 21. The data stored in main memory 23 and associated with the address presented to the microprocessor socket 21 is then read from the microprocessor socket 21 and, referring to Fig. 4, through a bi-directional buffer 62, a latch 63 and, a bi-directional buffer 61, and is stored into the external cache 4, the internal cache 3, or the processor 2 for processing.
When a bus mastering device, or a direct memory access (DMA) device (not illustrated but known per se), or another microprocessor device has control of the motherboard bus 24, that device will be periodically updating memory locations in the main memory 23. As an image of parts of the main memory 23 is contained in both the external cache 4 and/or the internal cache 3, it is necessary to monitor the addresses generated by bus mastering and DMA devices, by snooping using the connection 33, to see if they have modified any data in main memory 23 which is also stored in either the external cache 4 or the internal cache 3. If an address is generated by such devices that corresponds to the address of data stored in either of the caches 3 or 4, then those locations in either cache must be invalidated. The address path for this process of snooping is (as seen in Fig. 6) from the microprocessor socket 21, through the bidirectional buffer 51, a latch 52 and the bi-directional buffer 41 to the external cache 4 and the internal cache 3. At this point the control arrangement 31 tells both these caches 3,4 to invalidate the contents of that address if data associated with that address is currently stored in either cache.
In order to guarantee that the bus mastering device, DMA device, other microprocessor device will not write to memory locations which are currently stored in the temporary storage locations, it is necessary to flush the temporary storage locations to main memory before allowing I/O and locked memory accesses to proceed. This is sufficient as the bus mastering device, DMA device , or other microprocessor device transfer will be initiated by an I/O or locked memory access transferring control of the memory region in question to that device.
When the microprocessor 2 performs a WRITE, the WRITE address is supplied via the address bus 5 and buffered into the latch 43 on the address ASIC 40. The latch 43 de-couples the address ASIC 40 from the microprocessor 2 and loads the WRITE address into a vacant location in a comparative memory buffer 44. In the preferred embodiment, the comparative memory buffer 44 has 16 temporary storage locations each comprising multiple bytes of data. As a WRITE is performed from the microprocessor 2, each write address is sequentially loaded into the comparative memory buffer 44 which therefore acts as a temporary storage of write addresses.
The address ASIC 40 incorporates a read pointer address generator 46 and a write pointer address generator 47. The read pointer address generator 46 is a cyclical counter which provides a read pointer 54 which is supplied to an input of the comparative memory buffer 44. The write pointer generator 47 is a cyclical counter which outputs to a multiplexer 48 which creates a write pointer 56 which is input to the comparative buffer 44, via a latch 50. The read pointer 54 is used to sequentially select a read address from the comparative buffer 44 and output that address via the multiplexer 102, the bidirectional buffer 51 and address connection 8, to enable updating of the main memory 23.
Simultaneously with the write address operation, when data is enabled upon the data bus 6, it is de-coupled via a latch 64 and sequentially stored in a dual-ported RAM 65. Each of the read pointer 54 and write pointer 56 generated within the address ASIC 40 are supplied to the data ASIC 60 to permit appropriate selection of the data bytes stored within the dual-ported RAM 65. In this configuration, when the corresponding WRITE address is output from the address ASIC 40 to update the main memory 23, the corresponding data stored within the dual ported RAM is buffered out as a WRITE data output 66 which is supplied to the motherboard via the data bus connection 9.
In operation, when a string of WRITE's are performed by the microprocessor 2, each WRITE address is buffered in the temporary storage locations provided by the address and data ASIC's 40, 60 thereby enabling the microprocessor 2 to perform the next operation at the speed of the microprocessor 2. In the meantime, the address and data ASIC's 40 and 60 can unload the WRITE address and data into the main memory 23 at the speed of the motherboard 20.
Those skilled in the art will appreciate that if a high-speed string of WRITE's instituted by the microprocessor 2 is randomised, then no significant advantage is obtained as the comparative memory buffer 44 and the dual-ported RAM 65 will quickly fill depending on it's relative size (16 double byte locations in each case in this embodiment).
However, most WRITE's that occur in microprocessors at high speed occur within loops as indicated earlier which invariably write to a limited set of addresses.
When a WRITE occurs, the write address is compared with each address stored in the comparative memory buffer 44. If the write address matches a previously stored address, rather than being stored in the next free write temporary storage location, a match is indicated, via an output 107, by an address cell 120 storing the same address as the address presented on the unlatched address input 42. As seen in Fig. 7, this match is signalled by a HIT signal out of the address cell 120 of the comparative memory buffer 44 containing the matching address, which is fed to an encoder 103 that generates the match address 107 of the corresponding data cell 140, as seen in Fig. 5, currently storing the data associated with the matched address. The output of the encoder 103 is fed into a write address decoder 104 in the data cell 140 via the multiplexer 48. The write address decoder 104 generates a write select signal 105 to the cell 140 containing the data associated with the matched address. When the selected cell 140 receives the write select signal 105, it stores the data currently on the latched data bus 106.
The above is a description of a write hit. In the case of a write miss, the write pointer 47 generates the address of the next free address cell 120 and corresponding data cell 140. The address is loaded into an address cell latch 108 in the cell 120 and the associated data being loaded into one or both of two byte buffers 141,142 in the data cell 140.
In this manner, where multiple writes to the same address occur frequently (eg. at least twice in every 16 write operations in the preferred embodiment), that address in the main memory need only be updated once, resulting in significant processing time savings.
Data is read out of the write temporary storage locations under control of the control arrangement 31, which provides timing signals compatible with the motherboard 20. The read pointer 46 provides the address of the next cell to have it's contents passed on to the motherboard.
Under certain circumstances dependent upon the state of the microprocessor 2 and/or the instruction sequence, the contents of the data cells 140, and their corresponding addresses in the address cells 120, are flushed or written to the main memory 23. For example, when a Read-Modify-Write sequence (such as used in Intel architecture) is detected by the control arrangement 31, in view of it's access to the address bus 5 via the connection 33 and control data from the microprocessor 2 via the control bus 15, execution of the sequence is suspended until such time as the contents of the data cells 140 are flushed.
Where, in the computer architecture, for example, video data is not stored in the cache memory 3,4, it is passed directly from the microprocessor 2 to a video controller (not shown but well known in the art). In the preferred embodiment, because all writes from the microprocessor 2 pass through the ASIC's 40, 60, it is therefore necessary , in some applications, for the write cycles to bypass the temporary storage locations. This is achieved through the control arrangement 31 detecting that particular type of write cycle and directing the write cycle data and address information through the latch 64/multiplexer 67, and latch 43/multiplexer 102 respectively.
Where in the computer architecture memory associated with a device, for example video memory, is cachable via the microprocessor internal or external cache, then this memory can be considered part of the total memory of the system, and write cycles to that memory can also be buffered into the temporary storage locations.
Whilst the preferred embodiment has been specifically described in relation to microprocessor upgrade assemblies and finds direct and advantageous application therein, it is by no means limited to such assemblies.
The invention can be applied to any processor system in which processing devices can create WRITE data at rates faster than they can be stored in memory. Furthermore, although the preferred embodiment is implemented in two applications specific integrated circuits, alternative embodiments can be configured such as a single application specific integrated circuit, or alternatively integrating such devices into the microprocessor itself.
Furthermore, the external write cache temporary storage locations of Fig. 2 are not essential as they can be integrated into the microprocessor. In addition, where wide bus systems are used, the write cache processor system can be expanded by adding further data ASIC's 60. For example, for a 32-bit data bus, two of the data ASIC's 60 would be required.
In a multi-processor system, such as the computer system 200 shown in Fig. 8, a number of cache memory processor systems 30A, 30B, 30C incorporating multiple sets of temporary storage locations, one set for each respective microprocessor 2A, B, 2C, can be used either internal or external (as shown) to each microprocessor such that write cycles to common memory storage locations 202 via a common memory bus 201 can be minimised. An Input/Output (I/O) device 203 is also allowed access to the memory locations 202, only after each of the cache memory processor systems 30A, 30B and 30C has been flushed.
The foregoing describes only a number of embodiments of the present invention, and modifications obvious to those skilled in the art, can be made thereto without departing from the scope of the present invention.

Claims (39)

CLAIMS:
1. A method for handling write cycles each having address and corresponding data from a microprocessor to enable the microprocessor to issue write cycles at a faster rate than can be received by a destination of the write cycles, said method comprising the steps of: (a) buffering the address and corresponding data being written from said microprocessor in a first write cycle in respective temporary storage locations; (b) when said first or another temporarily stored write cycle is followed by another write cycle, comparing the address of said another write cycle with that of the first and other temporarily stored write cycles, and if they match, (i) overwriting the data of the matched said first or other write cycle with the data of said another write cycle; and if they do not match, (ii) buffering the address and corresponding data of said another write cycle in respective temporary storage locations distinct from those of the first and other write cycles; and (c) identifying an availability of access to the destinations of said write cycles and for each temporarily stored write cycle, using the stored address to write the corresponding stored data to its destination.
2. A method as claimed in claim 1, wherein said method steps occur in the absence of control thereof by said microprocessor.
3. A method as claimed in claim 1 wherein said method comprises the further step of: (d) detecting an access by an I/O device to said destination, preventing the I/O access from proceeding until any contents of the temporary storage locations have been transferred to their respective destinations.
4. A method as claimed in claim 1 wherein said method comprises the further step of: (e) detecting a read-modify-write sequence of accesses to said destination, and then preventing the execution of each part of the sequence until the contents of the temporary storage locations have been transferred to main memory.
5. A method as claimed in claim 1, wherein steps (a) or (b), operate simultaneously and independently, of step (c).
6. A method as claimed in claim 5, wherein the number of temporary storage locations is limited, and when each is occupied subsequent to step (b)(ii), operation of said microprocessor, when performing a further write cycle, is suspended whilst step (c) acts to empty at least one temporary storage location into which said further write cycle can be temporarily stored.
7. A method as claimed in claim 1 wherein, upon said microprocessor instituting a read cycle, each said address stored in said temporary locations is checked for matching with the address of the read cycle, and (i) if a match occurs, the corresponding data retained in said temporary locations is written to the corresponding destination before being read by the microprocessor, and (ii) if not, the read cycle is permitted to execute.
8. A method as claimed in claim 7, wherein said checking comprises simultaneously examining each temporarily stored address.
9. A method as claimed in claim 1 wherein, upon said microprocessor instituting a read cycle, each said address stored in said temporary locations is checked for matching with the address of the read cycle, and (i) if a match occurs, the corresponding data retained in said temporary locations is read from said temporary locations directly by the microprocessor, and (ii) if not, the read cycle is permitted to execute.
10. A method as claimed in claim 9, wherein said checking comprises simultaneously examining each temporarily stored address.
11. A method as claimed in claim 1, wherein said temporary storage locations mirror complementary locations of a cache memory associated with said microprocessor and, said temporary storage locations provide for the maintenance of a main memory at said destination substantially in concert with said cache memory.
12. A method as claimed in claim 11, wherein said cache memory is externally connected to said microprocessor.
13. A method as claimed in claim 11, wherein said cache memory is integrally formed within said microprocessor.
14. A method as claimed in claim 1, wherein step (c) comprises cyclically writing out each of the occupied temporary storage locations to its corresponding destination without intervention of the microprocessor.
15. A memory processor system comprising: temporary storage means for storing address and data components of an information transfer from a source to a destination; comparator means associated with said temporary storage means for comparing an address component of a current information transfer with address components retained in said temporary storage means whereby if an address match occurs, the data associated with said information transfer overwrites the data stored at the corresponding data location for that stored address in said temporary storage means, and if no match occurs, the address and data associated with said current information transfer are stored at a new location in said temporary storage means; and output means for identifying when no transfer is occurring from said source to said destination for enabling outputting each of said temporarily stored address and corresponding data component to said destination.
16. A system as claimed in claim 15, further comprising: detection means for identifying accesses to said destination not initiated by said source, and for suspending said accesses until any contents of said temporary storage means has been transferred to said destination.
17. A system as claimed in claim 15, wherein said temporary storage means comprises separate address storage means and corresponding data storage means, said comparator means being configured with said address storage means in an address path between said source and said destination.
18. A system as claimed in claim 17, wherein said data storage means comprises a dual-ported memory configured in a data path between said source and said destination.
19. A system as claimed in claim 18, wherein said output means comprises a read address generator for selecting addresses from said address storage means and the corresponding data from said dual-ported memory to output same to said destination.
20. A system as claimed in claim 19 wherein, when an information transfer is to occur from said destination to said source, said comparator means compares an address component of said information transfer with address components stored in said temporary storage means, and if an address match occurs, said information transfer is suspended until said output means outputs the corresponding address and data components to said destination, before subsequently re-enabling said information transfer.
21. A system as claimed in claim 18, wherein said temporary storage means further comprises a write pointer generator for identifying the next empty temporary storage location.
22. A system as claimed in claim 21 wherein, when an information transfer is to occur from said source to said destination and an address component of said transfer does not match with any address component stored in said temporary storage locations, said write pointer generator is used to select the corresponding address in both said dualported memory and said address storage means, at which the address and data components of said transfer are temporarily stored.
23. A system as claimed in claim 22, wherein when a match occurs, with a particular one of said temporary storage locations, the address of said one location is selected to generate the corresponding address in said dual-ported memory at which said data component of said transfer is stored.
24. A computer system comprising: a memory processor system as claimed in claim 15; a microprocessor operating as said source; a main memory means operating as said destination; a first bus means interconnecting said microprocessor and said memory processor system, and a second bus means interconnecting said memory processor system with said main memory means; whereby said microprocessor is configured for operation at a speed greater than that of said main memory means.
25. A system as claimed in claim 24, further comprising a cache memory means interposed between said microprocessor and said memory processor system, whereby the latter operates to maintain substantial coherence between said cache memory means and said main memory means.
26. A system as claimed in claim 25, wherein said cache memory means is selected from the group comprising a cache memory integrally formed with said microprocessor, a cache memory separate from said microprocessor, and two cache memories, one integral with, and one external to said microprocessor.
27. A system as claimed in claim 25, wherein said cache memory means and said memory processor system are integrally formed within said microprocessor.
28. A computer system comprising a plurality of microprocessors each acting as a source for a memory processor system as claimed in claim 15 associated therewith, each said memory processor system being connected to a common bus through which access to a common destination is obtained.
29. A computer system comprising: a microprocessor; a cache memory connected to a processor address bus and a processor data bus both leading from said microprocessor; a cache memory processor system comprising an address processor interconnecting said processor address bus with a system address bus, and a data processor interconnecting said processor data bus with a system data bus, said system buses interconnecting other devices of said system, said other devices comprising a plurality of memory locations accessible by at least said microprocessor, characterised in that said cache memory processor system operates as a contents addressable array of temporary storage locations for write cycles from said microprocessor to said memory locations whereby substantial coherence between data stored in said cache memory and data stored at one or more selected said memory locations is maintained whilst said microprocessor is enabled to operate at a rate in excess of a rate at which said other devices operate.
30. A system as claimed in claim 29, wherein said address processor comprises: a first bidirectional buffer having a bidirectional port connected to said processor address bus; a second bi-directional buffer having a bidirectional port connected to said system address bus; a first latch interposed in an address path from a unidirectional output of said second bidirectional buffer to a unidirectional input of said first bidirectional buffer; a second latch for receiving a unidirectional output of said first bidirectional buffer supplying an unlatched address, said second latch supplying a latched address to a first input of a first multiplexer, an output of which being connected to a unidirectional input of said second bidirectional buffer; a comparative memory buffer for storing said latched address of a first write cycle in one of a plurality of address cells and comparing same with said unlatched address of a subsequent write cycle, said comparative memory buffer having a match output indicating a matching of the unlatched address with an address in one of said address cells, said match output being input to a first input of a second multiplexer, a second input of which being supplied with a write pointer from a write pointer generator, the second multiplexer outputting a write address to said data processor and to a third latch which generates a latched write address which is input to said comparative memory buffer, an output of said comparative memory buffer being supplied to a second input of said first multiplexer; said data processor comprises: a third bidirectional buffer having a bidirectional port connected to said processor data bus, a fourth bidirectional buffer having a bidirectional port connected to said system data bus, a fourth latch interposed in a data path from a unidirectional output of said fourth bidirectional buffer to a unidirectional input of said third bidirectional buffer, a fifth latch for receiving a unidirectional output of said third bidirectional buffer and supplying latched data to first input of a third multiplexer, an output of which being connected to a unidirectional input of said fourth bidirectional buffer, the latched data also being input for storage in a dual-ported memory having a like plurality of data cells, said dual-ported memory having an output connected to a second input of said third multiplexer, said write address being supplied to a sixth latch which outputs to a write input of said dual-ported memory; and further comprising a read pointer generator for generating a read address that is input to both said comparative memory buffer and said dual-ported memory; whereby, when a write cycle is generated by said microprocessor, said cache memory is updated and the address and data components thereof are buffered into said cache memory processor system in which the address component is compared with any address components stored in said address cells and if a match occurs, the corresponding data cell is updated with the data component of the write cycle, and if a match does not occur, the address and data components are temporarily stored in respective address and data cells.
31. A method for handling write cycles each having address and corresponding data from a microprocessor substantially as herein described with reference to Fig. 2.
32. A method for handling write cycles each having address and corresponding data from a microprocessor substantially as herein described with reference to Figs. 3 to 7.
33. A method for handling write cycles each having address and corresponding data from a microprocessor substantially as herein described with reference to Fig. 8.
34. A memory processor system for connection to a microprocessor and substantially as herein described with reference to Fig. 2.
35. A memory processor system for connection to a microprocessor and substantially as herein described with reference to Figs. 3 to 7.
36. A memory processor system for connection to a microprocessor and substantially as herein described with reference to Fig. 8.
37. A computer system substantially as herein described with reference to Fig. 2.
38. A computer system substantially as herein described with reference to Figs. 3 to 7.
39. A computer system substantially as herein described with reference to Fig. 8.
GB9406387A 1993-03-31 1994-03-30 Buffering write cycles. Withdrawn GB2276964A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AUPL808893 1993-03-31

Publications (2)

Publication Number Publication Date
GB9406387D0 GB9406387D0 (en) 1994-05-25
GB2276964A true GB2276964A (en) 1994-10-12

Family

ID=3776814

Family Applications (1)

Application Number Title Priority Date Filing Date
GB9406387A Withdrawn GB2276964A (en) 1993-03-31 1994-03-30 Buffering write cycles.

Country Status (1)

Country Link
GB (1) GB2276964A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0800137A1 (en) * 1996-04-04 1997-10-08 International Business Machines Corporation Memory controller

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0244540A2 (en) * 1986-05-05 1987-11-11 Silicon Graphics, Inc. Write request buffering apparatus

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0244540A2 (en) * 1986-05-05 1987-11-11 Silicon Graphics, Inc. Write request buffering apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ELECTRONICS & WIRELESS WORLD, JANUARY 1989, PAGES 75-77, CACHEING IN THE CHIPS. *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0800137A1 (en) * 1996-04-04 1997-10-08 International Business Machines Corporation Memory controller
US5778422A (en) * 1996-04-04 1998-07-07 International Business Machines Corporation Data processing system memory controller that selectively caches data associated with write requests

Also Published As

Publication number Publication date
GB9406387D0 (en) 1994-05-25

Similar Documents

Publication Publication Date Title
JP3431626B2 (en) Data processing device
US5696937A (en) Cache controller utilizing a state machine for controlling invalidations in a network with dual system busses
US4747043A (en) Multiprocessor cache coherence system
US4646233A (en) Physical cache unit for computer
US4942518A (en) Cache store bypass for computer
US6665774B2 (en) Vector and scalar data cache for a vector multiprocessor
EP0192202B1 (en) Memory system including simplified high-speed data cache
US5579504A (en) Multi-processor computer system having shared memory, private cache memories, and invalidate queues having valid bits and flush bits for serializing transactions
US4939641A (en) Multi-processor system with cache memories
JP3067112B2 (en) How to reload lazy push into copy back data cache
US5185878A (en) Programmable cache memory as well as system incorporating same and method of operating programmable cache memory
EP0121373B1 (en) Multilevel controller for a cache memory interface in a multiprocessing system
US5845324A (en) Dual bus network cache controller system having rapid invalidation cycles and reduced latency for cache access
US5423016A (en) Block buffer for instruction/operand caches
EP0514024B1 (en) Method and apparatus for an improved memory architecture
JP2902976B2 (en) Cache flush device
CA2127081A1 (en) Processor interface chip for dual-microprocessor processor system
EP0726523A2 (en) Method for maintaining memory coherency in a computer system having a cache
US6370617B1 (en) Non-stalling pipeline tag controller
JP2695017B2 (en) Data transfer method
US5737756A (en) Dual bus computer network using dual busses with dual spy modules enabling clearing of invalidation queue for processor with store through cache while providing retry cycles for incomplete accesses to invalidation queue
EP0418621B1 (en) Data processing device for maintaining coherency of data stored in main memory, external cache memory and internal cache memory
JPH05324468A (en) Hierarchical cache memory
EP0640930A2 (en) A multiprocessor system and a method of controlling such a system
GB2276964A (en) Buffering write cycles.

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)