AU628531B2

AU628531B2 - Method and apparatus for interfacing a system control unit for a multiprocessor system with the central processing units

Info

Publication number: AU628531B2
Application number: AU53949/90A
Authority: AU
Inventors: Mohamad Basheer-Uddin Ahmed; Scott Arnold; Stephen G. Delahunt; Michael E. Flynn; Ricky C. Hetherington
Original assignee: Digital Equipment Corp
Current assignee: Digital Equipment Corp
Priority date: 1990-04-27
Filing date: 1990-04-27
Publication date: 1992-09-17
Anticipated expiration: 2010-04-27
Also published as: AU5394990A

Description

I- b2853 1 FORM COMMONWEALTH OF AUSTRALIA PATENTS ACT 1952 COMPLETE SPECIFICATION S F Ref: 128465

(ORIGINAL)

FOR OFFICE USE: Class Int Class Io I III I Complete Specification Lodged: Accepted: Published: Priority: Related Art: Name and Address of Applicant: Address for Service: f Digital Equipment Corporation 111 Powdermill Road Maynard Massachusetts 01754-1418 UNITED STATES OF AMERICA Spruson Ferguson, Patent Attorneys Level 33 St Martins Tower, 31 Market Street Sydney, New South Wales, 2000, Australia

I

I r'It Itr I I IIr I ri Complete Specification for the invention entitled: Method and Apparatus for Interfacing a System Control Unit for a Multiprocessor System with the Central Processing Units The following statement is a full description of this invention, including the best method of performing It known to me/us 5845/3 4 METHOD AND APPARATUS FOR INTERFACING A SYSTEM CONTROL UNIT FOR A MULTIPROCESSOR SYSTEM WITH THE CENTRAL PROCESSING UNITS

ABSTRACT

A computer system includes a plurality of central processing units (CPUs), a main memory, and a system control unit (SCU) for controlling the transfer of data between the CPUs and main memory. In addition to main memory, each of the CPUs includes a high-speed cache of selected portions of the main memory. Accordingly, these is an inherent risk that data contained in the caches may be altered while the identical data contained in main memory remains in the unaltered state. This, of course, leads to data inconsistency problems. Thus, in order to avoid data inconsistency, the SCU effects control over all data transfers to ensure that multiple versions of the same data are not maintained in the computer system.

Further, memory references can be speeded by accessing *the high-speed CPU caches when the desired data exits in a t I the cache of another CPU.

Cc0 A:\DIGM\003\PA\FOREIGN *6 r* PD88-0241 DIGM:003 FOREIGN4 DIGM:036 A PD88-0241 DIGM:003 DIGM:036 METHOD AND APPARATUS FOR INTERFACING A SYSTEM CONTROL UNIT FOR A MULTIPROCESSOR SYSTEM WITH THE CENTRAL PROCESSING UNITS The present application discloses certain aspects of a computing system that is further described in the following Australian patent applications and United States patents: Evans et al., AN INTERFACE BETWEEN A SYSTEM CONTROL UNIT AND A SERVICE PROCESSING UNIT OF A DIGITAL COMPUTER, Serial No. 53954/90, filed April 27, 1990; Gagliardo et al., METHOD AND MEANS FOR INTERFACING A SYSTEM CONTROL UNIT FOR A MULTI-PROCESSOR SYSTEM WITH THE :o SYSTEM MAIN MEMORY, Serial No. 53938/90, filed April 27, 1990; D. Fite et al., DECODING MULTIPLE SPECIFIERS IN A SVARIABLE LENGTH INSTRUCTION ARCHITECTURE, Serial No.

53939/90, filed April 27, 1990; D. Fite et al., VIRTUAL 25 INSTRUCTION CACHE REFILL ALGORITHM, Serial No. 53940/90, Sfiled April 27, 1990, and issued on May 12, 1992 as U.S.

Patent 5,113,515; Murray et al., PIPELINE PROCESSING OF REGISTER AND REGISTER MODIFYING SPECIFIERS WITHIN THE SAME INSTRUCTION, Serial No. 53955/90, filed April 27, 1990; I 30 Murray et al., MULTIPLE INSTRUCTION PREPROCESSING SYSTEM WITH DATA DEPENDENCY RESOLUTION FOR DIGITAL COMPUTERS, Serial No. 53936/90, filed April 27, 1990; D. Fite et al., SBRANCH PREDICTION, Serial No. 53937/90, filed April 27, 1990; Fossum et al., PIPELINED FLOATING POINT ADDER FOR DIGITAL COMPUTER, Serial No. Serial No. 53948/90, filed April 27, 1990, and issued as U.S. Patent 4,994,996 on Gagliardo eE 3. 20 SSE OTO NTFRAMUT-RCSO YTMWT H 0t *-/iP ,SYTMMIMEOYSeilNo538/0fieArl27 4 lB- Feb. 19, 1991; Grundmann et al., SELF TIMED REGISTER FILE, Serial No. 53941/90, filed April 27, 1990, issued as U.S. Patent 5,107,462 on April 21, 1992; Beaven et al., METHOD AND APPARATUS FOR DETECTING AND CORRECTING ERRORS IN A PIPELINED COMPUTER SYSTEM, Serial No. 53945/96, filed April 27, 1990 and issued as U.S. Patent 4,982,402 on Jan.

1, 1991; Flynn et al., METHOD AND MEANS FOR ARBITRATING COMMUNICATION REQUESTS USING A SYSTEM CONTROL UNIT IN A MULTI-PROCESSOR SYSTEM, Serial No. 53946/90, filed April 27, 1990; E. Fite et al., CONTROL OF MULTIPLE FUNCTION UNITS WITH PARALLEL OPERATION IN A MICROCODED EXECUTION UNIT, Serial No. 53951/90, filed April 27, 1990, and issued on November 19, 1991 as U.S. Patent 5,067,069; Webb, Jr. et al., PROCESSING OF MEMORY ACCESS EXCEPTIONS WITH PRE-FETCHED INSTRUCTIONS WITHIN THE INSTRUCTION SPIPELINE OF A VIRTUAL MEMORY SYSTEM-BASED DIGITAL 20 COMPUTER, Serial No. 53943/90, filed April 27, 1990, and issued as U.S. Patent 4,985,825 on Jan. 15, 1991; Hetherington et al., METHOD AND APPARATUS FOR CONTROLLING THE CONVERSION OF VIRTUAL TO PHYSICAL MEMORY ADDRESSES IN A DIGITAL COMPUTER SYSTEM, Serial No. 53950/90, filed 25 April 27, 1990; Hetherington et al., WRITE BACK BUFFER WITH ERROR CORRECTING CAPABILITIES, Serial No. 53934/90, filed ALpril 27, 1990, and issued as U.S. Patent 4,995,041 on Feb. 19, 1991; Chinnaswamy et al., MODULAR CROSSBAR INTERCONNECTION NETWORK FOR DATA TRANSACTIONS BETWEEN 30 SYSTEM UNITS IN A MULTI-PROCESSOR SYSTEM, Serial No.

53933/90, filed April 27, 1990 and issued as U.S. Patent 4,968,977 on Nov. 6, 1990; Polzin et al., METHOD AND APPARATUS FOR INTERFACING A SYSTEM CONTROL UNIT FOR A MULTI-PROCESSOR SYSTEM WITH INPUT/OUTPUT UNITS, Serial No.

53953/90, filed April 27, 1990, and issued as U.S. Patent 4,965,793 on Oct. 23, 1990; and Gagliardo et al., MEMORY -2- CONFIGURATION FOR USE WITH MEANS FOR INTERFACING A SYSTEM CONTROL UNIT FOR A MULTI-PROCESSOR SYSTEM WITH THE SYSTEM MAIN MEMORY, Serial No. 53942/90, filed April 27, 1990 and issued as U.S. Patent 5,043,874 on August 27, 1991.

This invention relates generally to an interface between functional components of a computer system and, more particularly, to an interface between a system control unit of a multiprocessor computer system and its associated central processing units (CPUs).

In the field of computer systems, it is not unusual for a system to include a plurality of central processing units (CPUs) operating in parallel to enhance the system's speed of operation. Typically, each of the CPUs operate on a particular aspect of a single computer program and 20 will, therefore, require access to the same program and .variables stored in memory. It can be seen

B

4* 4 -3that each of the CPUs requires access to a shared common main memory, as well as, input/output units The I/O allows the computer system, in general, and the CPUs, in particular, to communicate with the external world.

For example, the I/O includes such well known devices as disc and tape drives, communication devices, printers, plotters, workstations, etc.

This parallel operation presents some problems in the form of access conflicts to the shared memory and I/O. A system control unit (SCU) is employed to manage Sr *these inter-unit communications. The SCU links the CPUs J to the main memory and to the I/O through a series of 1 independent interfaces. Data requests are received by 15 the SCU from each of the units which, owing to the parallel nature of the CPU operation, occur at unscheduled times, and in particular, at the same time.

These requests for data transfer are scheduled according to an arbitration algorithm and processed through the appropriate interface to/from the identified unit.

k t Efficient communication betweenr all system units ported into the SCU is critical to optimize parallel operation of the computer system. The speed of the interfaces is important to the overall operation of the computer system to ensure that a bottleneck of data does t tit not develop. There is little point in individual units St being able to operate at high speed if they must *continually wait for data from other units.

i 3 0 The present invention provides an efficient, highspeed interface in a computer system having a plurality of central processing units, a main memory, and a system control unit for controlling the transfer of data between the central processing units and main memory, each of the PD88-0241 DIGM:003 FOREIGN: DIGM:036 4* -4central processing units having a cache for containing selected portions of the data maintained in the main memory. The system control unit comprises: means for receiving a request for data contained in a selected address of the main memory from one of the central processing units; means for initiating data retrieval from the main memory in response to receiving a data request; means for comparing the requested data address to the addresses of the data contained in each of the central processing unit caches and delivering one of a respective hit and miss signal in response to detecting a c match and the absence of a match therebetween; means for tSaborting the data retrieval initiating means in response to receiving the hit signal; means for retrieving the S 15 requested data from the central processing unit cache and tmdelivering the retrieved data to the requesting central processing unit cache in response to receiving a hit signal; and means for delivering the data retrieved from main memory to the requesting central processing unit cache in response to receiving the miss signal.

ther objects and advantages of the invention will become apparent upon reading the following detailed description and upon reference to the drawings in which: FIG. 1 is a top level block diagram of a computer r (system; FIG. 2 is a general block diagram of the SCU CPU interface; FIG. 3 is a block diagram of the CPU control architecture portion of the interface; and PD88-0241 DIGM:003 FOREIGN: DIGM:036 i 4 FIGS, 4 and 5 are block diagrams of the SCU control architecture portion of the interface.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that it is not intended to limit the invention to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

o* I 15 Turning now to the drawings and referring first to 4* FIG. 1, there is shown a top level block diagram of a multiprocessing computer system 10 which includes a plurality of central processing units (CPU1-CPU4) 11, 12, 13, 14. The CPUs require access to a shared common main memory 16, as well as, input/output units 18. The I/O 18 allows the computer system 10, in general, and the CPUs, in particular, to communicate with the external world. For example, the I/O 18 includes such well known devices as disc and tape drives, communication devices, printers, plotters, workstations, etc.

pa.

A4 a

P

a a To take full advantage of the multiple CPUs, the system is configured to allow CPU1-CPU4 to operate in parallel. This parallel operation presents some problems in the form of access conflicts to the shared memory 16 and I/O 18, A system control unit (SCU) 20 is employed to manage these inter-unit communications. The SCU links CPU1-CPU4 to the main memory 16 and to the I/O 18 through a series of independent interfaces. Data requests are received by the SCU 20 from each of the PD88-0241 DIGM:003 FOREIGN: DIGM:036 -1-

'I

i 4f -6units which, owing to the parallel nature of the CPU operation, occur at unscheduled times, and in particular, at the same time. These requests for data transfer are scheduled according to an arbitration algorithm and processed through the appropriate interface to/from the identified unit.

Inside the CPUs, the execution of an individual instruction is broken down into multiple smaller tasks.

These tasks are performed by dedicated, separate, independent functional units that are optimized for that purpose.

00 0 a0 0000 0 80 0 e0 0e 4 Qoco 0 0 00 0 a 0 0000 oO 4d 0im 0 00 0 Although each instruction ultimately performs a 15 different operation, many of the smaller tasks into which each instruction is broken are common to all instructions. Generally, the following steps are performed during the execution of an instruction: instruction fetch, instruction decode, operand fetch, execution, and result store. Thus by the use of dedicated hardware stages, the steps can be overlapped, thereby increasing the total instruction throughput.

The data path through the pipeline includes a respective set of registers for transferring the results of each pipeline stage to the next pipeline stage. These transfer registers are clocked in response to a common system clock. For example, during a first clock cycle, the first instruction is fetched by hardware dedicated to instruction fetch. During the second clock cycle, the fetched instruction is transferred and decoded by instruction decode hardware, but, at the same time, the next instruction is fetched by the instruction fetch hardware. During the third clock cycle, each instruction is shifted to the next stage of the pipeline and a new PD88-0241 DIGM:003 FOREIGN: DIGM: 036 0*: 00 0 00* 00 00 a 00 A 6 i;I -7instruction is fetched. Thus, after the pipeline is filled, an instruction will be completely executed at the end of each clock cycle.

This process is analogous to an assembly line in a manufacturing environment. Each worker is dedicated to performing a single task on every product that passes through his or her work stage. As each task is performed the product comes closer to completion. At the final stage, each time the worker performs his assigned task a completed product rolls of the assembly line.

S 9 •To accomplish this pipelining of instructions, the CPUs are partitioned into at least three functional 15 units: the execution unit 22, the instruction unit 24, o and the memory access unit 26. As its name suggests, the execution unit 22 is ultimately responsible for the actual execution of the instructions. The instruction unit 24 prefetches instructions, decodes opcodes to obtain operand and result specifiers, fetches operands, and updates the program counter.

The memory access unit 26 performs the memory related functions of the CPU. For example, the memory access unit 26 maintains a high-speed cache 28. The cache 28 stores a copy of a small portion of the 4Q• information stored in main memory 16 and is employed to t increase processing speed by reducing memory access time.

The main memory 16 is constructed of lower cost and slower memory components. If the CPU were required to access main memory 16 during each memory reference, the overall speed of the CPU would be reduced to match the main memory speed since the CPU could not execute the instruction until the memory reference had returned the desired data. Accordingly, the cache 28 is constructed PD88-0241 DIGM:003 FOREIGN: DIGM:036 r 1 c -8of high-speed, high-cost semiconductor memory components, but owing to its high cost, the cache contains considerably fewer storage locations than does main memory. These relatively few high-speed storage locations are used to maintain that portion of main memory which will be most frequently used by the CPU.

Therefore, only those memory references which are not maintained in the cache 28 result in access to the main memory 16. Thus, the overall speed of the CPU is improved.

It should be noted that the memory maintained in the cache 28 changes as the program proceeds. For example, r" the memory locations which are frequently referenced at 't f 15 the beginning of the program may not be accessed in later $1 #1 stages of the program. Conversely, memory locations used I frequently by the middle portion of the program may be of little use to the beginning or ending portions of the program. Thus, it can be seen that the contents of the cache 28 needs frequent updates from the main memory 16 t* and there is a need for efficient communication between the cache 28 and the SC 20.

«*tt the cache 28 and the SCU 44$t 14 4 4 4 44 t; t

I

The SCU 20 also links the various system units to a 25 service processor unit (SPU) 30 which performs traditional console functions. The SPU 30 has responsibility for status determination and control of the overall operation of the processing system. In particular, the SCU 20 provides the SPU 30 with means for 30 communicating with the plurality of CPUs and provides access to all storage t .,ements within the CPUs.

Efficient communication between all system units ported into the SCU 20 is critical to optimize parallel operation of the computer system 10, I/O can reference PD88-0241 DIGM:003 FOREIGN: DIGM:036

-U~

-9memory, the CPUs can reference memory, and the CPUs can reference I/O, but the bulk of the traffic is between the CPUs and the memory. The SCU 20 1 te central switching station for all of the message packets throughout the system.

Referring now to FIG. 2, the interface between the SCU 20 and the CPU 12 and its corresponding signals are illustrated. It should be rioted that while each of the signals is represented by a single line, the signals are actually transmitted as differential pairs to reduce noise coupling. Further, because the interface employs series terminated emitter coupled logic, none of the r lines are bidirectional. The. interface includes sepaate t 15 lines in each direction for transmitting the same type of e signals. For example, the interface includes sixty-four lines for transmitting data from the CPU to the SCU and sixty-four lines for transmitting data from the SCU to the CPU. Each set of these sixty-four lines allows a full quadword of data (8 bytes) to be transmitted in a S single clock cycle. The signals which are common to ,II' both the CPU and the SCU are discussed first, followed by similar discussions of each unit's unique signals.

10 Parity signals also accompany the data signals. Once again, the parity signals are transmitted on separate lines by both units. The data parity is four ipt;, wJ-a, 5 with each bit representing odd parity for a word o data I, %t (2 bytes). The receiving unit calculates the parity of the received data and compares this calculated parity to its corresponding data parity signal. If the two signals differ, then the receiving unit generates an error signal which causes an error recovery sequence to be initiated.

A four bit command field is also delivered by each unit. The command field includes sixteen separate PD88-0241 DIGM:003 FOREIGN: DIGM:036 li commanc unique their C the CPI 7 il I -I- Is which cause the receiving unit to perform a function. Table I lists the sixteen commands and corresponding four-bit codes that the SCU issues to

J.

TABLE I SCU TO CPU COMMANDS IC 4 4i 4 44 4 .49 4

CODE

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 25 1111

COMMAND

GET DATA WRITTEN GET DATA READ GET DATA INVALIDATE RETURN DATA READ RETURN DATA WRITTEN OK TO WRITE INVALIDATE READ BLOCK RETURN I/O REG DATA RETURN READ ERROR STATUS LOCK ACKNOWLEDGE MEMORY READ NXM I/O READ NXM LOCK DENIED INVALIDATE WRITTEN BLOCK

UNUSED

COMMENT

BYTES LONG BYTES LONG BYTES LONG BYTES LONG BYTES LONG 4 BYTES LONG '949 9 4: 4V 4 4 94R GET DATA WRITTEN is a command issued by the SCU when data requested by I/O is present in a CPU. The SCU sends this command to the CPU containing the desired data and passes that data to the requesting I/O. The tag store is not altered.

GET DATA READ is a command issued by the SCU ~hen the data requested by one CPU is present and written in PD88-0241 DIGM:003 FOREIGN: DIGM:036 -11-

-II

another CPU, but is only intended to be read by the requesting CPU. Data is transferred to the requesting CPU and marked as "read" in the CPU of origin.

GET DATA INVALIDATE is a command issued by the SCU to retrieve data from one CPU which has been requested by another CPU. However, in this case the requesting CPU intends to alter the data by writing to it once the data is placed in the requesting CPU cache. To avoid data consistency problems, the data which had been present and written in the first CPU is transferred to the requesting CPU and then invalidated to prevent the old version of that data from being considered to be the most recent Sversion of that data. To invalidate the data, the CPU S 15 sets a single bit present in the appropriate memory I location of the tag RAMs 32 of the cache 28.

RETURN DATA READ is a command issued by the SCU to return data having been read from memory.

SRETURN DATA WRITTEN is a command issued by the SCU to return data having been written or altered by a CPU.

OK TO WRITE is a command issued in .esponse to a CPU request to write to data which had previously been retrieved for read purposes only. In other words, the I CPU now desires to write to data which it did not believe Sa it would need to write. Thus, the SCU simply checks the 00 other CPUs to determine that no other CPU is writing to that data. It is not necessary to return the actual data to the CPU, only to grant permission to write.

INVPLIDATE READ BLOCK is a command which resets the valid bit for a block of data that had previously been retrieved for read purposes only. This occurs when PD88-0241 Fr 'M:.003 FOREIGI DIGM:036 r -;i i i r 1 4 1 -12another CPU desires to retrieve the same block of data for write purposes. Thus, to avoid data consistency problems, the read block is invalidated. It should be recognized that since another CPU has requested the data with an intent to alter that data, the version of that data maintained in the first CPU is likely to be different if not similarly modified data.

RETURN I/O REG DATA is a command issued to return I/O register data in response to a CPU request (READ I/O REG) for data from a designated I/O register.

'I

4 44 4* 4 4 4 4 4 I 4. 4 44.9 4 RETURN READ ERROR STATUS is a command issued to return read error status information.

*4

*C

44' *444t 4 4444r LOCK ACKNOWLEDGE and LOCK DENIED are alternative signals which are d&ivered by the SCU in response to a lock request by the CPU. The lock command is used by the CPU to perform the read portion of an interlock instruction. The SCU responds with either the LOCK ACKNOWLEDGE signal indicating that the request has been granted, or, conversely, LOCK DENIED when access has previously been granted to another CPU.

MEMORY and I/O READ NXM are commands issued by the SCU when the CPU attempts to read non-existent memory and I/O locations. It is possible for the CPU to generate a reference location that does not physically exist. In an actual implementation of a computer system the addressable memory space is generally larger than the size of the actual physical memory. For example, the 32bit address field is capable of identifying four gigabytes of storage locations; however, because of the intended use of the computer system, only one gigabyte of memory might physically exist. Accordingly, any memory PD88-0241 DIGM:003 FOREIGN: DIGM:036 l II r 2 I i i 1 11 f 4 -13reference outside of the one gigabyte of physical memory results in the SCU sending the NXM command.

INVALIDATE WRITTEN BLOCK is substantially similar to the INVALIDATE READ BLOCK signal with the exception that this signal is intended to invalidate a block of data that was previously retrieved for write purposes. This occurs when I/O direct memory accesses conflict with written blocks in CPU caches.

Similarly, Table II lists the sixteen commands which the CPU issues to the SCU.

TABLE II CPU TO SCU COMMANDS 'itt I r 4

I

I~ I a4a.

'4 41 rrr 4 4

CODE

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

COMMAND

READ REFILL WRITE REFILL READ REFILL LINKED WRITE REFILL LINKED WRITE REFILL LOCK WRITE REFILL UNLOCK SWEEP WRITEBACK LONGWORD WRITE UPDATE READ I/O REG WRITE I/O REG LINKED WRITEBACK LONGWORD WRITE UPDATE LINKED

INVALIDATE

WRITE REFILL LINK LOCK

UNLOCK

UNUSED

COMMENT

4 BYTES LONG CACHE SWEEPS PD88-0241 DIGM:003 FOREIGN: DIGM:036 -14- READ REFILL is a command issued by the CPU when the CPU has attempted to read memory, but the memory location was not present in the cache 28. The SCU responds to the command by fetching the desired block of data and returning it to the requesting CPU. It should be noted, however, that the retrieved memory is for read purposes only. If the CPU wishes to write to that memory block it must obtain permission from the SCU to do so.

WRITE REFILL is a command issued when the CPU wishes to retrieve data from memory which it intends to alter by writing. The CPU issues this command for two separate reasons. First, WRITE REFILL is issued when the CPU attempts to write data into a block in the cache 28, but the data block is not present therein. The SCU returns c y h the data block to the cache 28 and the CPU proceeds to write to the designated memory locations within the block of data. The second reason for issuing this command is when the CPU attempts to write to data which is present in the cache 28, but is present as the result of a READ SC, REFILL. In this case, the SCU looks up in a set of its tag RAMs and discovers that the status of that block of t data in that CPU is read only. The SCU does not send Stor refill data, but instead returns an OK TO WRITE signal, which allows the CPU to proceed with the write operation.

At this point it is sufficient to understand that the SCU t maintains a list of each block of i ata stored in the cache 28 of each CPU in order to prevent memory access Sconflicts. As a general rule, the SCU will not allow multiple CPUs t contain a copy of the same memory block if any of the copies are present as the result of a LONGWORD WRITE UPDATE or a WRITE REFILL command. This provision is implemented to prevent a CPU from erroneously reading a stale copy of data from its own cache. For example, if another CPU has the ability to PD88-0241 DIGM:003 FOREIGN: DIGM:036 r i write to its copy of that same data, then the first CPU will read the old data which has not been similarly written.

READ REFILL LINKED is a command issued by the CPU which is similar to READ REFILL, but in this case there is a writeback asociated with this refill. In other words, the cache data which is to be overwritten by the refill data has previously been written by the CPU.

Since the CPU has altered the data, it only exists in its ot., present form in that cache location. Thus, if it is Soverwritten by the refill, it will be lost. Accordingly,

*O

ft the CPU has a provision for writing that displaced data back to the main memory 16. This writeback will occur 15 immediately after the refill is completed. Therefore, READ REFILL LINKED simply informs the SCU that, while it is currently performing a refill, a writeback operation will immediately follow.

t 20 WRITE REFILL LINKED is a command issued by the CPU which is like the READ REFILL LINKED except that the refill data is available to be written by the CPU. In l other words, WRITE REFILL LINKED is initiated when the CPU attempts to write to the cache 28, but misses. Thus, the refill data will be immediately altered once it is S, stored in the cache 28.

t tt WRITE REFILL LOCK is a command issued by the CPU when the read portion of an interlock instruction is to 1 30 be executed. Similarly, WRITE REFILL UNLOCK is a command issued by the CPU when the write portion of an interlock instruction is to be executed. The interlock instruction is further described below.

PD88-0241 DIGM:003 FOREIGN: DIGM:036 -16- SWEEP WRITEBACK is a command used by the CPU when an error has been detected and the CPU wishes to send all written data blocks to the main memory. The command is issued for each "written" block that is being sent to main memory during a "sweep" of the cache. Sweeps are typically performed in response to a detected hardware error.

LONGWORD WRITE UPDATE is a command issued by the CPU when it has just written an aligned longword to an *ft invalid block. The command tells the SCU to update its copy of the CPU'S tag store.

SREAD I/O REG is a command issued by the CPU when information contained in a designated I/O register is odesired by the program currently executing in the CPU.

This command is substantially similar to a READ REFILL command, but accesses an I/0 register rather than a memory location. Up to four bytes of data are returned.

20 The address field becomes a 28 bit address while tor bits <33:30> become a mask field. In contrast to a read refill, the tag store in the SCU is not altered, since the tag store does not have entries which correspond to these addresses.

WRITE I/O REG is a command issued by the CPU when Sthe program currently being executed by the CPU desires to send data to an external device by writing that data to an I/O register.

LINKED WRITEBACK is a command issued by the CPU when a written block is being replaced by a read refill, a write refill, or a longword write update.

PD88-0241 DIGM:003 FOREIGN: DIGM:036 -17- LONGWORD WRITE UPDATE LINKED is a command issued by the CPU that is like LONGWORD WRITE UPDATE except that the CPU will displace some number of previously written bytes in that block. Thus, the CPU write necessarily generates a writeback to avoid loring that data.

INVALIDATE is a command issued by the CPU when an error has been detected and the CPU is sweeping its cache. The CPU sends a SWEEP WRITEBACK for each block that is written and INVALIDATE for each block that is read status.

SWRITE REFILL LINK LOCK is a command issued by the CPU that is like WRITE REFILL LOCK except that the read for the lock causes a writeback.

Returning again to the description of the common signals delivered between the SCU and CPU as shown in FIG. 2, a COMMAND PARITY signal provides odd parity over 20 the command field and a collection of individual signals.

The SCU to CPU command parity signal reflects odd parity over the following signals: COMMAND FIELD, WHICH SET, ~LOAD COMMAND, COMMAND BUFFER AVAILABLE, SEND DATA, and FATAL ERROR. Likewise, the CPU to SCU COMMAND PARITY signal reflects odd parity over the following signals: COMMAND FIELD, WHICH SET, LOAD COMMAND, COMMAND BUFFER AVAILABLE, and DATA READY.

4 The cache 28 maintained in each of the CPUs is twoway set-associative. This means that two possible storage locations exist within the cache 28 for each main memory location. The CPUs employ an algorithm for determining which of the two memory locations to use; however, operation of the algorithm is autonomous to the CPU. Therefore, neither the SCU nor the CPU would know PD88-0241 DIGM:003 FOREIGN: DIGM:036 f 1 -18which set that the other is referring to when a command is received. Accordingly, the "WHICH SET" signal is also delivered to indicate the particular set that the command references. For example, an asserted value indicates set 1 while an unasserted value indicates set 0.

The LOAD COMMAND signal acts to clock the command field and the address field into the opposite unit. This signal is asserted only if a buffer is available in the opposite unit to receive the associated command and address fields. The CPU contains only a single command and address buffer. Therefore, the SCU can send only a single command and address and then must wait until the CPU asserts its command buffer available signal.

15 Alternatively, the SCU contains three command and address buffers for each CPU Thus, each CPU can send up to three command/address pairs before having to wait for the SCU to return its command buffer available signal. Both the SCU and the CPU assert there command buffer available I 20 signals when its buffers have been emptied and are ready f to receive an additional command/address pair.

Clearly, both the SCU and the CPU need to be able to deliver an address along with their data and command fields. The address field indicates which memory or •cache location that the memory reference is intended to access. In this particular application, the physical memory can be as large as 4 gigabytes. Thus, in order to address all of the main memory, the address field must be f 30 at least thirty-two bits wide. The communication lines over which the address field is delivered are, however, only sixteen bits wide and must be double cycled to deliver the entire address. The first half of thk\ address is sent in the same clock cycle as the command, and the second half of the address is delivered in the PD88-0241 DIGM:003 FOREIGN: DIGM:036

I

1 4 I -19following clock cycle. Further, the transfer of data between the SCU and CPU is done in 64-byte blocks, and the lower order three bits of the address can be ignored since these bits are only indicative of byte locations within a quadword. The addresses can be to any quadword within the 64 byte block.

To ensure the accuracy of the address fields, both the SCU and the CPU provide parity signals along with the address field. The address parity field is four bits wide with each parity bit providing odd parity for four bits of the 16-bit address.

The final common signal between the SCU and the CPU 15 is the BEGINNING OF DATA signal. The SCU generally sends data to the CPU in 64-byte blocks. Since the data field is 8-bytes wide, eight consecutive clock cycles are required to complete the transfer. The BEGINNING OF DATA signal is asserted during the first of these eight cycles 20 and succeeds the LOAD COMMAND signal by three cycles.

It The SCU to CPU portion of the interface includes three signals which are unique to it: SEND DATA; FATAL #44: ERROR; and the INTERRUPT CODE. SEND DATA is a final handshaking signal, which is asserted by the SCU to force the CPU to begin sending data from its writeback buffer, I or directly from its cache. FATAL ERROR is a signal which is asserted when the SCU has detected a fatal error and is about to assert an attention line to the SPU 22.

The attention line signals the SPU 22 to begin recovering from the error. FATAL ERROR signals are sent to all four CPUs simultaneously, and cause the CPUs to stop processing. The final SCU to CPU signal is the interrupt signal. A single line is used with the interrupt code being serially encoded. A start bit is followed in seven PD88-0241 DIGM:003 FOREIGN: DIGM:036 i subsequent cycles by the interrupt code, a parity bit across the interrupt code, and a stop bit.

The Cru to SCU portion of the interface includes two signals which are unique to it: DATA READY and LONGWORD MASK. The DATA READY signal is the companion handshaking signal to the SEND DATA signal. DATA READY is asserted by the CPU when it has received the data in response to a GET DATA command from the SCU. Use of this signal allows better utilization of the SCU resources since the SCU will not reserve the resources required for the get data *t writeback until receiving the DATA READY signal. The LONGWORD MASK signal is a two-bit wide field that is transmitted with each quadword of data and is used to 15 indicate the status of each of the two longwords within the quadword of data, that is, whether they should be written to memory or not.

At this point the interplay of signals and flow of 20 data between the SCU and CPU may best be described by way of example. Consider, for example, the GET DATA protocol. There are four possible scenarios which may be C* followed when the CPU receives a GET DATA command form the SCU. First, assume that the GET DATE address hits in the CPU cache and the writeback buffer is empty. Thus, 04.* the CPU moves the desired data into the writeback buffer So and sends DATA READY. The SCU responds to the DATA READY signal with the SEND DATA signal. Thereafter, the CPU responds with the BEGINNING OF DATA signal and the actual data, thereby emptying the writeback buffer.

In the second scenario, assume that the GET DATA address hits in cache; however, the writeback buffer is full. The CPU responds to the GET DATA signal by sending the DATA READY signal. The SCU then sends the SEND DATA PD88-0241 DIGM:003 FOREIGN: DIGM:036 i S'r0 1

I

-21rc tc t .tr 4l 4 *i 4 4444- 4.4 4 4 44 4- .44-4 4 44signal, intending to retrieve the data stored at the GET DATA address not the data stored in the writeback buffer.

the CPU then responds with the BEGINNING OF DATA signal and the actual data directly from the cache bypassing the writeback buffer. Thereafter, the SCU will respond with a SEND DATA signal for the data contained in the writeback buffer, and the CPU will respond with a BEGINNING OF DATA signal and the actual data, emptying the writeback buffer. In the third scenario, the GET DATA address misses in the CPU cache; however, the writeback buffer is full. The assumption, in this case, is that the data for the GET DATA command is already coincidentally in the writeback buffer. The memory access unit 26 does not compare the address of the GET DATA command with the address of the data already in the writeback buffer. Rather, the CPU sends the DATA READY signal. The SCU responds with the SEND DATA signal, and the CPU sends a BEGINNING OF DATA signal and the actual data from the writeback buffer. Thereafter, the SCU 20 sends the second SEND DATA signal for the previous GET DATA request, and the CPU sends a BEGINNING OF DATA signal and the actual data from the writeback buffer, emptying the buffer. It should be noted that, in this case, the CPU does not allow the writeback buffer to be overwritten until after it has sent the data to the SCU for the second time.

In the fourth scenario, the GET DATA address misses in the cache, .and the writeback buffer is empty. This case should not happen and will be flagged as an error by the CPU.

Referring now to FIG. 3, a block diagram of the CPU portion of the interface is illustrated. The cache 28 of the memory access unit 26 is shown to include a set of tag RAMs 32, a set of data RAMs 34, and a cache tag '4 64-4 44 1 4 44r 4a: PD88-0241 DIGM:003 FOREIGN: DIGM:036 -:i ",i

I

r t i 1 -22manager 36. As noted previously, the cache 28 is two-way set-associative. Thus, the data and tag RAMs are split in half with each half capable of storing the addresses of 1024 different memory locations. The data RAMs 34 contain the actual data stored at the locations identified in the tag RAMs 32. Bits 33:6 of the physical address are delivered to the cache tag manager 36. Bits 15:6 of the address are forwarded to the tag RAMs 32 to address the tag RAMs. Bits 33:16 are compared against the contents of the tag RAMs. Therefore, it can be seen that the arrangement of the cache 28 is to provide two storage locations for a much larger section of main *memory 16.

oft# t 4 15 Operation of the cache 28 begins with a memory orequest by, for example, the execution unit 22. The physical address of the desired memory location is delivered to the cache tag manager 30 which checks to determine if the data corresponding to that desired 20 memory location is present in the data RAMs 34. Each of the two tag ram locations, where the address may have been stored, are consecutively recalled and compared to "o t the physical address of the memory request. A "hit" or "miss" is communicated by the cache tag manager 36 to a data traffic manager (DTM) 38 which directs the flow of all data within the memory access unit 26. At the same Stime, bits 15:0 of the physical address are also delivered to the data RAMs 34. The data RAMs 34 respond to the address by delivering the contents of both corresponding memory locations to the DTM 38. The "hit" signal from the cache tag manager 36 also contains information on which of the sets within the cache 28 contains the desired data. Thus, in the event of a "hit", the DTM 38 properly selects which set of data delivered by the data RAMs 34 corresponds to the PD88-0241 DIGM:003 FOREIGN: DIGM:036

I!

IC-L l I I I -23requested data. Conversely, on a cache "miss" the cache tag manager 38 initiates a memory refill from the main memory 16.

The memory access unit 26 also includes a write back buffer (WBB) 40, which, as discussed previously in conjunction with the WRITEBACK commands, is employed to prevent the loss of data during a cache refill. The WBB is connected to the cache tag manager 36 from which it receives the LONGWORD MASK. The WWB 40 is also connected I n t to the CTM 38 from which it receives the DATA FIELD and the PARITY FIELD. The WWB 40 delivers the LONGWORD MASK, 0' the DATA FIELD and the PARITY FIELD over the interface to et* t the SCU o* The cache tag manager 36 also receives information from the SCU 20. To facilitate this transfer of information, the cache tag manager 36 includes a pair of one-deep buffers 46, 44. The buffer 46 is a command 20 buffer for holding command information, and the buffer 44 is an address buffer. The command field is 4-bits wide and is delivered in a single clock cycle, and therefore command buffer 46 is only 4-bits wide. Similarly, the address field is 32-bits wide (transferred as two consecutive 16-bit words), and the address buffer 44 is 32-bits wide to accommodate the entire address field.

44 The DTM 38 contains a single refill buffer 42. Its dimensions are 8 bytes wide per entry and it is eight entries deep. Transmissions of data from the SCU to the DTM 38 occur in eight consecutive cycles to achieve a 64 byte refill.

It should be noted that the SCU can only transfer a single group of related information and must then wait PD88-0241 DIGM:003 FOREIGN: DIGM:036 >1s~ -24until the cache tag manager 36 has cleared the buffers 42, 44, 46 before sending additional information.

Referring now to FIG. 4, a block diagram of the SCU portion of the interface is illustrated. Unlike the CPU portion of the interface, the SCU includes multiple buffers for receiving information from each of the multiple CPUs. Preferably, the SCU includes three respective address buffers, one respective data buffer, and three respective command buffers for receiving Lformation from each of the CPUs. For example, the SCU has a total of twelve address buffers 60, four data buffers 62, and twelve command buffers 64. It should be I understood that, while only one of each type of buffer is shown in FIG. 4, the remaining buffers of each type are of a substantially identical construction.

Each (i the twelve address and command buffers has an output connected to a respective twelve-to-one S 20 multiplexer 66, 68, 70i Arbitration logic 72 controls the select inputs of the address and command multiplexers 66, 60. The arbitration logic 72 also indirectly controls a data multiplexer 68 that makes the data path connections between the CPU's, 1/O units, and main memory that are required to execute the command selected by the arbitration logic. The data multiplexer 68 preferably is t L in the form of a crossbar switch matrix controlled by a searate data crossbar controller (69 in FIG. The preferred arbitration scheme implemented by the arbitration logic 72 is described in detail in the above referenced Flynn et al. U.S. patent application Ser. No.

filed and entitled "Method and Means for Arbitrating Communication Requests Using A System tCntrol Unit In A Multi-Processor System," incorporated herein by reference. The preferred data multiplexer 68 PD88-0241 DIGM:003 FOREIGN: DIGM:036 and its associated data crossbar controller (69 in FIG.

are described in detail in the above referenced Chinnaswamy et al. U.S. patent application Ser. No.

,filed and entitled "Modular Crossbar Interconnection Network For Data Transactions Between System Units in a Multi-Processor System," incorporated herein by reference. With respect to the present invention, the relevant aspects of the arbitration logic 72 and the connection resources provided by the data multiplexer or crossbar 68 are described below in connection with the execution of the t commands issued by the CPU's and recognized by the SCU.

St, The LOAD COMMAND signals from each of the CPUs are r 15 delivered across the interface to the inputs of the arbitration logic 72. As indicated previously, the CPU LOAD COMMAND signals act to clock the command field and the address field into the S.CU. However, it should be remembered that the operation of each of the CPUs is autonomous; therefore, it is potentially possible to receive load commands from all of the CPUs simultaneously. Clearly, the SCU is incapable of *processing all four of the commands in parallel, but rather, selects the order in which to process these multiple commands. The arbitration logic 72 performs this prioritizing scheme and selects one of the commands to begin processing immediately. Further, since there are a total of twelve buffers available, it is possible to actually have a total of eleven commands awaiting processing while one command currently undergoes execution by the SCU.

The output of the arbitration logic 72 is connected to the select inputs of the command and address multiplexers 66, 70. Thus, the LOAD COMMAND which wins PD88-0241 DIGH:003 FOREIGN: DIGM:036 2 i -26arbitration will result in its corresponding address and command buffers being selected and passed through the multiplexers 66, 70. The address field is received by a memory mapping ram 74 which converts the raw address into a location definition used by the SCU. The main memory is organized into three levels of hierarchy: unit, segment, bank. There are two memory units with each memory unit containing two memory segments and each of the four memory segments having two memory banks. So there are a total of two memory units, four memory segments, and eight memory banks. Effectively, the output of the memory mapping ram 74 is a 3-bit field •p which defines which of the eilghts banks is being 5 addressed.

The output of the command multiplexer 70 is connected to a command decoder 76, The command decoder 76 also receives the select signal from the arbitration logic 72. These two signals combined allow the decoder &0 to determine the meaning of the 4-bit command field. The aI., select signal is necessary to identify which unit issued S* the command. This is important because units other than the CPUs can also issue commands, and these other unit .oe commands differ from the CPU commands. Accordingly, by "knowing the command and its source, the decoder 76 determines what the command means.

This 3-bit memory mapping field and the decoded command are delivered to resources required logic 78.

The resources required logic 78 performs the primary function of determining which SCU resources are required to perform the current command on the current address.

The SCU resources include, for example, the memory banks, data paths, etc. which the SCU will employ to perform the desired command on the specified memory location. The

-II

PD88-0241 DIGM:003 FOREIGN: DIGM:036 I, -27output of the resources required logic is vectorized data having bits set therein to correspond to the particular resources needed.

Resources check logic 80 receives the resources required output and a resources available vectorized input. The resources available input is similar to the resources required output, but rather than reflecting the resources needed, it indicates the resources which are not currently being used. Each of the resources available to be used by the SCU includes an output (busy bit) which is set by the resource when that resource is active. The combination of all of the resource busy bits forms the resources available vectorized signal. The 15 resources check logic 80 compares the resources required S* to those available. If all of the desired resources are available, such that the command could properly be executed, then the logic 80 delivers an OK signal and execution of the command begins. Otherwise, the logic delivers a BUSY signal. This busy signal is communicated back to the arbitration logic 72 to indicate that the selected command cannot be executed at this time and that another command should be selected. The unexecutable command is not retired, but simply remains in its corresponding buffers with its LOAD COMMAND competing for arbitration during subsequent cycles.

The OK signal initiates the setting up of any required data paths by sending a signal to the data crossbar controller (69 in FIG. For the execution of memory requests, the OK signal also initiates two other operations in the SCU. First, the memory request is started. However, it should be recognized that the data corresponding to the current address can be in the cache of another CPU. Therefore, the OK signal also initiates PD88-0241 DIGM:003 FOREIGN: DIGM:036 /11 /s -28an attempt to retrieve data from that CPU cache instead of from main memory 16. For data consistency purposes, data which is present in another CPU cache is retrieved to ensure that the most current version of the data is used. Further, the process of determining whether the data is present in the cache of another CPU requires two clock cycles of operation, while retrieving data from main memory requires considerably more clock cycles.

Thus, even when the data present in another cache is identical to the main memory data, it is a faster operation to retrieve the data from the CPU cache.

rw4 The purpose af starting the memory request before it too: is known whether the data is present in the cache of 15 another CPU is to reduce the effective time required by a main memory request. For example, if the data is not present in the cache of another CPU, then the main memory request is allowed to continue and has begun processing two clock cycles before the cache status of the other 20 CPUs was known. Conversely, if the data is present in te the cache of another CPU, then the memory request is •aborted. Therefore, starting the main memory request early has beneficial results when there are no cache conflicts.

For retrieving data from a CPU cache instead of main memory in response to a memory request, the OK output of II I 'a the resources check logic 80 is fed to the input of a tag lookup queue 82. The tag lookup queue temporarily stores memory reference requests. Temporary storage is desired because the cache lookup requires two clock cycles to complete operation while it is possible for a LOAD COMMAND to be arbitrated on each clock cycle. Therefore, a temporary storage location is needed to prevent the 36 arbitration logic 72 from stalling. The tag lookup queue PD88-0241 DIGM:003 FOREIGN: DIGM:036

F

11 rs"

I

r

V

4,c r* 4 -29- 82 is preferably configured to maintain four consecutive cache lookup requests.

The information maintained in the queue includes the select address signal delivered by the arbitration logic 72 when the LOAD COMMAND corresponding to this cache lookup won arbitration.

This select address signal is delivered to the select input of a tag address multiplexer 84. The inputs to the tag address multiplexer 84 are shared in common with the inputs to the address multiplexer 66.

Accordingly, the output of the tag address multiplexer 84 is identical to the output of.the address multiplexer 66 15 when its corresponding LOAD COMMAND won arbitration. In other words, the output of the tag address multiplexer 84 is the address which corresponds to the current tag lookup.

In order to determine whether the data requested by the CPU exists in the cache of any of the other parallel CPU's the output of the tag address multiplexer 84 is compared to the addresses of the data contained in each of the CPU caches. Bits 15:6 of the output of the tag 25 address multiplexer is connected to a set of tag RAM 86.

This 10 bit address identifies the tag RAM location corresponding to the selected address from the CPU. Thus the tag RAM 86 delivers bits 32:16 of the address of the data which is currently stored in the cache of one of the data which is currently stored in the cache of one of the other parallel CPU's. At the same time, bits 32:16 from the tag address multiplexer 84 are placed in a buffer 88 awaiting completion of retrieval from the Jag RAM 86.

Both of the upper 17- bits of the addresses from the address buffer 88 and the tag RAM 86 are delivered to 4 44 It #444

V

#4 4 lz 4 c 44 4 4 4 4 44 i L- PD88-0241 DIGM:003 FOREIGN: DIGM:036

I

i II inputs of a 17 bit comparator 90. In this manner, the comparator 90 produces an asserted signal when the address stored in the tag RAM 86 corresponds to the address requested from the CPU. Conversely, the comparator 90 produces an unasserted value when there is a miss between the address CPU and the address in the tag RAM 86. It should be recognized that the tag RAM 86, address buffer 88, and comparator 90 are replicated four times, one instance corresponding to each CPU cache. The outputs of each of the four comparators 90 are delivered to a microsequencer 92.

StE I*t The tag RAM 86 corresponds to the tag RAM 32 contained in each of the memory access units of the t, 15 CPU's. The tag RAM 86, 32 contain identical information and allow the SCU to keep track of the memory addresses currently contained in the CPU caches. Accordingly, the tag RAM 86 like the tag RAM 32 is also two-way setassociative. This two-way set-associative aspect of the 20 tag RAM 86 accounts for the cache lookup requiring two Ilt clock cycles to complete operation. Since the tag RAM 86 is two-way set-associative, each comparison requires comparing the addresses stored in set zero and set 1 in consecutive cycles., *E O t t t At this point the signals from the comparators merely indicate whether there is a hit or miss for the data requested by one CPU in the caches of the remaining CPU's. In order to determine whether a hit results in data consistency problems, the type of command being executed by the requesting CPU must be known, as well as, the state of the data in the "hit" CPU. Where the requesting CPU desires the data for WRITE REFILL purposes, then there is the risk of data inconsistency if the data is allowed to remain in the "hit" CPU and also PD88-0241 DIGM:003 FOREIGN: DIGM:036 -s~p -31placed in the requesting CPU. For example, if the requesting CPU merely wants to accomplish a READ REFILL and the data is in a read state in the "hit" CPU, there is no data inconsistency problem. Accordingly, the tag RAM 86 also stores a pair of cache data bits that indicate one of four states: invalid; read; written partial; or written full. These cache data bits are also delivered from the tag RAM 86 to the microsequencer 92.

Likewise, the original command is also stored in the tag lookup queue 82 and passed to the microsequencer 92.

The microsequencer 92 has sufficient inputs to c" determine whether the requested data exists in a parallel CPU cache and, if so, whether such a request produces S 15 data inconsistency problems. Accordingly, the S' microsequencer now has the option to abort the memory request initiated by the OK signal from the resources check lock 80 or allowing the memory request to continue operation. The microsequencer 92 will, of course, abort 20 the memory request only if the requested data is found to *ge be written in one of the parallel CPU caches. Otherwise, the memory request continues to its logical conclusion and returns data to the requesting CPU. Thus, by initiating the memory request before checking to see if the requested data is present in any of the parallel CPU caches, the relatively slow process of a memory request obtains a two cycle head start.

tt t Referring now to FIG. 5, the microsequencer 92 is shown connected to a fixup queue 94. At this point the microsequencer 92 is capable of performing two alternative functions. First, the microsequencer 92 can allow the memory cycle to continue operating and retrieve the desired data from the indicated address in response to a "miss" in each of the CPU caches. Alternatively, PD88-0241 DIGM:003 FOREIGN: DIGM:036 -32the microsequencer 92 can also proceed to retrieve the desired data from the CPU cache in which the hit was detected and move that data to the requesting CPU cache.

It should be recognized that in order to accomplish this transfer of data between CPU caches certain resources, such as, data paths, address paths, command paths, and miscellaneous other resources are required. So, in order to execute this "fixup" the resources necessary to accomplish the "fixup" must be determined and inquiry must be made into as to whether these resources are currently available.

tt Accordingly, the microsequencer produces a resource required vector and places that vector in the fixup queue S 15 94. The fixup queue 94 includes four locations for storing for resources required vectors. Periodically, fixup resources required logic 96 removes one of the resources required vectors from the fixup queue 94 and reserves those resources, thereby precluding the memory ,e 20 startup logic from using these same resources.

Therefore, even if the required resources are currently busy, by reserving them from future use they will eventually become available to accomplish the desired movement of data between the requesting CPU cache and the "hit" CPU cache.

r* l This reservation of resources is accomplished by t sending this resources required vector to the resource check logic. This effectively indicates to the resource logic that these resources are currently unavailable.

Thus, a subsequent memory request which requires some of the same resources will produce a busy signal from the resource check logic and place that memory request back into the arbitration logic. In this manner, those memory requests which do not interfere with the resources PD88-0241 DIGM:003 FOREIGN: DIGM:036 r K' -33required to accomplish the fixup are allowed to continue.

Whereas, those memory requests that do conflict are given a lower priority so that the fixup can be accomplished.

4* 4* 4 q4~44 941$ 4* 4 9~ 9* 4 44 .4 449 4 9*44 4'1* a~*4 14 *1 a 0~ 44 .4 ~44 48 4 14 PD88-0241 DIGM'.OO3 FOREIGN: DIGM:036

Claims

1. A multi-processor computer system having a plurality of central processing units, a main memory, and a system control unit for controlling the transfer of data between the central processing units and main memory, each of the central processing units have a cache for containing selected portions of the data maintained in the main memory, said system control unit comprising: means for receiving a request for data contained in a selected address of the main memory from a first one of the central processing units; Ioo°* 15 means for initiating data retrieval from the main memory in response to receiving the data request, whereby main memdry begins a multi- cycle operation to retrieve the requested data from the selected address of main memory and 20 deliver the retrieved data to the first one of the central processing units; means for comparing the selected address to addresses associated with data contained in at least the "a 25 central processing unit caches of the other ones of the central processing units and delivering a hit signal in response to detecting a match between the selected address and an address associated with data in one of said central processing unit caches; means for aborting the multi-cycle retrieve operation in main memory in response to receiving the hit signal; and means for retrieving the requested data from said one of said central processing unit caches and delivering the retrieved data to the first one of said central processing units in response to receiving the hit signal.

2. The multi-processor computer system, as set forth in .I0:o *claim 1, wherein the comparing means includes a separate comparator associated with each of the plurality of central processing unit caches, the separate comparators 15 being adapted for simultaneous, parallel operation. 0 9f °o 3. The multi-processor computer system, as set forth in claim 1, wherein the comparing means includes tag random "o:I oaccess memory containing memory address information that 20 is identical to memory address information contained in each of said central processing unit caches.

4. The multi-processor computer system, as set forth in claim 1, wherein the system control unit includes a 25 plurality of input buffers connected to said plurality of central processing units and adapted for receiving and storing requests for data from each of said central processing units, and means for arbitrating between the requests for data received from each of the central processing units to determine an order of processing the requests for data. A i d i: I i t i i I I, S -36-

44.. 4II 4 4 4 44 a4 0 4I 0 4 t *I 4 4444 44 44 4* 4l 4 4 A method of operating a multi-processor computer system to control the transfer of data between central processing units and main memory in said computer system, each of the central processing units having a cache for 5 containing selected porti.onis of the data maintained in the main memcry, said method comprising the steps of: receiving a request for data contained in a selected address of the main memory from a first one of .0 the central processing units; initiating data retrieval from the main memory in response to receiving the data request, whereby main memory begins a multi-cycle operation to .5 retrieve the requested data from the selected address of main memory and deliver the retrieved data to the first one of the central processing units; !0 comparing the selected address to addresses associated with data contained in the caches of at least the other central processing units and delivering a hit signal in response to detecting a match between the selected address and an address associated with data in one of the central processing unit caches; aborting the retrieval of data from the selected address in main memory in response to receiving )0 the hit signal; and retrieving the requested data from said one of said central processing unit caches and delivering f >i -37- the retrieved data to the first one of the central processing units in response to receiving the hit signal. 6. A method, as set forth in claim 5, wherein said step of comparing includes substantially simultaneously comparing the requested data address to the addresses of the data contained in each of the central processing unit caches and delivering a hit i gnal in response to detecting a match between the requested data address and the address of the data contained in one of said central processing unit caches. 7. The method, as set forth in claim 5, wherein said step of receiving a request for data includes: re¢ iving a plurality of requests for data from said 4 plurality of central processing units; storing each of said plurality of requests for data in an input buffer; and i arbitrating between said plurality of requests for data received from each of the central processing units to detciiine an order of processing the requests. 8. A multi-processor computer system substantially as descrlbed herein with reference to the drawings. 7 -38- 9. A method of operating a multi-processor computer system substanially as described herein with reference to the drawings. DATED this EIGHTH day of JULY 1992 Digital Equipment Corporation Patent Attorneys for the Applicant SPRUSON FERGUSON 0 4 #0 4 t 0 StoS TAD/748W