GB2216306A

GB2216306A - Load and synchronize computer architecture and process

Info

Publication number: GB2216306A
Application number: GB8903960A
Authority: GB
Inventors: Stephen C Johnson; Jr William S Worley; Dennis Way Ting; Richard Lowenthal; Jonathan Rubinstein; Iii William Spencer Worley
Original assignee: Ardent Computer Corp
Current assignee: Ardent Computer Corp
Priority date: 1988-02-29
Filing date: 1989-02-22
Publication date: 1989-10-04
Also published as: GB8903960D0; KR890013552A

Abstract

An architecture for a parallel processing computer system to implement a load and synch mutual exclusion indivisable operation is described. The individual processors have no special logic for implementing the instruction. However, additional circuitry in an interface unit and logic on the memory itself is used to detect and implement the indivisible operation. Thus, semaphore operations can be accomplished with a parallel processing computer architecture comprised of processing units that have no mutual exclusion primitives themselves. Each processor (as 50) uses virtual address mapping. A semaphore circuit 30 is coupled between the CPU and a multiplexer 21 for storing 31 the virtual address of the semaphore and the value 32 of the semaphore. A load and synch instruction stores the virtual address of the semaphore into the address decoder, 31, and retrieves an instruction from the semaphore value register, 32. <IMAGE>

Description

LOAD AND SYNCHRONIZE COMPUTER ARCHITECTURE AND PROCESS BACKGROUND OF THE INVENTION 1. Field of the Invention.

The field of the invention is that of computer architectures for implementing mutual exclusion instructions. More specifically, an architecture lying outside of the central processing unit that implements a type of a semaphore instruction.

2. Prior Art.

In order to share resources safely, In parallel or concurrent computer programs, methods were developed in order to assure mutual exclusion.

However, in order to implement these methods, modifications in the computer architecture were necessary.

The problem of mutual exclusion exists when two or more executing computer programs, termed 'processes', attempt to utilize the same resource.

That resource could be a common area of computer memory, a shared peripheral device such as a printer, or a common communication line. In order to assure that only one process or only a limited number processes can use a resource simultaneously, methods were developed to provide for reliable mutual exclusion. This is typically done by having a memory location used to store the current state of availability of the resource. Before a process attempts to access a resource the process first checks the state to see if the resource is available. A problem with this method, in time sharing computer systems, is that between the instruction of loading the state into the processor and comparing the state to determine whether the resource is available, the process itself can become inactive.In this way when the process is reactivated, that process determines whether the resource was available on the basis of old and potentially incorrect information. Alternately, the process could be temporarily halted after it had compared the state. In this case, the process would not have had an opportunity to reset the state, thus other processes would have the ability to use the resource before the state is reset. This led to the need for an instruction that would simultaneously access, compare and store in one integrated operation.

A type of mutual exclusion primitive which is well known in the art is a semaphore. Semaphores are described in the article "Cooperating Sequential Processes" by E.W. Dijkstra, appearing in Proarammlna Languages, edited by r. Genys (1968). Semaphores are conventionally implemented as a special instruction that is built into the processor. This may be an instruction that is hard wired into the processor or one that is simulated by an interrupt and trapping method. With the hard wire method the access comparison and storage is all done by the processor in one execution cycle, so that it can not be interrupted anywhere within the execution of the steps.In the interrupt and trapping method the same is implemented by masking all possible interrupts and exceptions during the execution of the subroutine located at the trap. At the end of the trap's execution the former interrupt level is restored. By either method the facility for providing an uninterruptable mutual exclusion instruction is provided for in the processor.

These prior art semaphores provide control information. That is, they signal a process as to whether to begin to consume a resource or to wait until the resource is available. There is no provision for sending non-control data to the process executing the semaphore operation.

In large time sharing systems the prior art system works well. However, with the advent of multiple processor parallel processing computer architectures there are problems. One processor no longer has exclusive control over the memory. In the parallel processing environment multiple processors, running multiple processes, have access to resources. Thus, even though a mutual exclusion instruction is indivisible with respect to an individual processor, other processors can access a semaphore's location between the first processors access and corresponding store, causing the mutual exclusion problems mentioned above. Also, there should be a way of sharing semaphores between processes and processors.

What is needed is a method for sharing resources in a parallel processing environment. One aspect of the present invention is the implementation of a semaphore instruction in a parallel processing environment. Another aspect of the present invention is to provide a semaphore operation that returns data as well as control information.

SUMMARY OF THE INVENTION The present invention is a system for implementing an indivisible mutual exclusion instruction on a parallel processing computer architecture. It provides for an uninterruptable r#itrieval, comparison. and storage instruction on a memory location by a processor, without interlerence from any other processor, in a multi-processor architecture.

The present invention uses a processor that requires no special mutual exclusion instruction. This processor is coupled with a semaphore circuit, which is in turn coupled directly to an address bus and to an external translation look-aside buffer (ETLB). The ETLB is also coupled to the address bus. When a semaphore access is attempted, the semaphore circuit uses the ETLB in place of the normal connection line to the bus. By an extra signal transmitted as a part of the address on the bus, the memory boards are alerted to the semaphore access. Special memory controller logic circuits on the memory boards are designed so as to receive the semaphore access, execute the instruction as an indivisible instruction, and return the desired information.

The semaphore operation of the present invention includes returning information of more significance than control signals. The present invention's semaphore algorithm comprises first returning the current value of the semaphore. Then if the current value is non-negative, setting the semaphore to zero. If the current value is negative, the semaphore is incremented by one.

Thus the present invention provides for a computer architecture having a semaphore instruction that can be used in a parallel processing environment.

A semaphore can be shared between processes and processors. The semaphore can return data of a greater significance. In addition, the processors used need not have a special instruction for the implementation of mutual exclusion. Rather, mutual exclusion is provided by additional circuitry coupled to the processor in conjunction with logic circuitry in the memory board itself.

DETAILED DESCRIPTION OF THE DRAWINGS Figure 1 is a schematic diagram of the parallel processing architecture of the present invention.

Figure 2 is a schematic diagram of the additional circuitry necessary for implementation of the present invention's semaphore instruction.

Figure 3 is a flow chart diagram of the memory board s semaphore algorithm.

Figure 4 is a flow chart diagram of a typical use of a semaphore of the present invention.

Figure 5 is a flow chart diagram of an emulation of the test and set operation using the semaphore of the present invention.

DETAILED DESCRIPTION OF THE PRESENT INVENTION The present invention describes a load and synchronize, or load and synch (L & S), computer architecture. In the following description numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be obvious however, to one skilled in the art, that the present invention may be practiced without these specific details. In other instances, well known methods have not been described in detail so as not to unnecessarily obscure the present invention.

Figure 1 shows the multiple processor architecture ot the present invention. A number of processors 5 are coupled to both an address bus 1 and a data bus 2. Also, a number of memory boards 7 are coupled to the address bus 1 and the data bus 2. I/O interface 10 is coupled to both address bus 1 and data bus 2, serving to mediate between competing players to send messages on the buses. For the preferred embodiment of the present invention, I/O interface 10 only allows for one message to be sent on a bus for one period of time. However, it is only necessary that some protocol be established for using the buses to practice the present invention.

The present invention has no special reserved area for the location of semaphores. Semaphores can be created by any of processors 5 and stored on any of memory boards 7. Thus a semaphore is available to any process or processor.

Figure 2 shows the architecture of the hardware used to implement the semaphores of the present invention. A processor 50 is coupled to a memory board 40. In the preferred embodiment, the processor is a MIPS chip of an ordinary variety having virtual address mapping. Processor 50 and memory board 40 are coupled by address bus 1, multiplexor by-pass 21, then either address line 11 or alternate line 12 to semaphore circuit 30. Address line 11 is coupled directly from semaphore circuit 30 to multiplexor by-pass 21, while alternate line 12 is coupled through semaphore circuit 30 and external translation look-aside buffer (ETLB) 20.

Multiplexor by-pass 21 decides which of address line 11 or alternate line 12 is directed to address bus 1. By-pass 21 directs the lines according to the presence or absence of the load and synch signal on the address line 11.

If L & S signal is not present on the address line 11, then the signal on address line 11 is directed to the address bus 1. However, if the load and synch signal is present on the address line then by-pass 21 directs the signal from alternate line 12 to address bus 1.

Semaphore circuit 30 comprises an address decoder, a memory controller, and semaphore registers. The address decoder distinguishes between four address types. The first two are for the two semaphore registers.

The third is for access to the ETLB 20, in which the semaphore circuit 30 acts as a memory control to ETLB 20. The fourth is every other address type, which is sent on address line 11 to multiplexor by-pass 21.

The ETLB 30 is coupled between semaphore circuit 30 and by-pass 21.

ETLB 20 has a semaphore address table 23 that stores virtual addresses of semaphores with their corresponding physical addresses. Semaphore address table 23, as well as any other memory in ETLB 20, can be accessed like an ordinary memory location, although this feature is not necessary for practicing the present invention.

The signal that by-pass 21 receives from alternate line 12 is generated by the semaphore circuit 30 working in conjunction with ETLB 20. The semaphore circuit 30 has both a semaphore address (ADDR) register 31 and a semaphore value (VAL) register 32. A load and synch instruction comprises an instruction to store the virtual address of the semaphore into ADDR 31, then an instruction to retrieve from VAL 32. The physical address of the semaphore is generated by ETLB 20's semaphore address table 23.

The memory board 40 has a memory board control 41. Memory board control 41 is for implementing the load and synch operation's algorithm. Once the memory board control 41 receives the load and synch signal and its accompanying address, the memory board control 41 indivisibly executes the operation. In the preferred embodiment, each memory board has eight memory board control logic circuits. The logic circuits are designed to implement the algorithm depicted in Figure 3. Additionally, only one memory control board logic circuit can perform a semaphore operation on a single location simultaneously.

A semaphore is created by a user program independent of the special circuitry of the present invention. The semaphore can be stored at any location in memory. In fact, there is no special semaphore initialization routine necessary. It is only when the semaphore is accessed by a command to load a value from VAL 32 that the memory location becomes a semaphore. and a new entry is created in semaphore address table 23. In the preferred embodiment, ETLB 20 creates a new entry in the semaphore address table 23 by generating an exception. The exception handler makes a system call to find the physical address that corresponds to the virtual address in the ADDR 31. Then the system updates the semaphore address table before returning control.

However, all that is necessary for the practice of the present invention is that there be some means for determining the physical address of the virtual address in ADDR 31.

The semaphore instruction is accomplished as follows. First, the process loads the address of the location of the semaphore desired to be called into ADDR 31. After the semaphore address is loaded into ADDR 31, the process executes a load requesting data from VAL 32. When the VAL read request instruction is received, the multiplexor is signaled. This load and synch request signal comes to the multiplexor 21 by an extra line or lines coming from the CPU 50 to the address bus 1. The by-pass 21 then waits for an address to be generated from ETLB 20. The ETLB 20 generates a physical address determined by the virtual address in ADDR 31. In order to access the correct physical address, the ETLB 20 s semaphore address table 23 maintains a mapping of virtual addresses to physical addresses. The semaphore address table 23 provides one entry for every semaphore that is accessed. The ETLB 20 then locates the virtual address in its semaphore address table 23 and sends to by-pass 21 a corresponding physical address. In this method the memory board and its associated memory controller receives the appropriate physical address as well as the load and synch signal from the address bus 1.

The indivisible load and synch operation is executed by the memory board control 41. The memory board control 41 has logic circuitry which is activated by the presence of the load and synch signal present on the address bus 7. Once the memory board 40 receives the signal of the load and synch operation and the correct physical address, one of its memory board control 41 circuits will execute the semaphore algorithm, returning the appropriate value.

The semaphore algorithm implemented by the preferred embodiment of the present invention can be classified as a general semaphore of the classical busy wait definition, using a negative semaphore value to indicate resource availability. With this algorithm, when a load from semaphore VAL command is issued, the await on semaphored decision rule proceeds as follows. The current value of the semaphore is sent to the VAL register. If the semaphore is non-negative then semaphore is set to 0 which indicates that the resource is not available. If the semaphore is less than 0 that indicates the resource is available and the semaphore is incremented by 1. By this method any finite number of processes can access the same resource, limited only by the storage word size of the semaphore.Also, information of greater significance than control information is returned. This can be quite useful in a parallel processing programming dealing with arrays of information that require processing. The semaphore can be given an initial value equal to the negation of the number of elements in the array. As each semaphore call is issued, the calling process receives a new position in the array and the semaphore is incremented by one. When every element of the array has been accessed by a process then the semaphore will equal 0 and no further processing of the array will take place.

With the present invention's implementation of the general semaphore, many useful processes can be accommodated. A binary semaphore may be provided by using a value of -1, or any positive value as the value indicating that the semaphore's resource is available. When the process is finished with the resource the process must reset the semaphore to a -1 or positive value.

An array can be efficiently processed without mutual exclusion problems, as shown in Figure 4. An emulation of the test and set operation can also be made as shown in Figure 5.

Figure 3 shows a flow chart diagram of the semaphore algorithm used in the present invention. In the following diagrams, the symbol 'loc(ADDR)" refers to the value stored in the location pointed to by ADDR. The first step is 101 where the initial value of the semaphore pointed to by ADDR is copied to VAL.

Next is step 102 where the value of the semaphore is compared to zero. If the value of the semaphore is greater than or equal to zero, then step 104 is executed and the semaphore is set to zero. At this point the memory board control is finished and so terminates execution with step 105. However, if the semaphore's value is less than zero execution proceeds to step 103 where the semaphore is incremented by adding a positive one to the semaphore. Then the memory board control is finished and so terminates execution with step 105.

It is important to note that the operations depicted in Figure 4 are executed by the memory control board, which are uninterruptable. The only step that is not totally within the memory board control is that of 101, where the value of loc(ADDR) is sent over a data bus to be stored in VAL. All the steps which modify, compare, or assign are indivisibly executed by the memory board control.

Figure 4 shows a flow chart diagram of the algorithm executed on a plurality of processors in a parallel processing computer for processing an array of information. Execution starts at step 201 where the ADDR register is loaded with the address of the semaphore used to synchronize access to the array. Next is step 202, where an instruction to load the VAL register into the local variable Var. Var can be a location in the memory or a register in the processor. The instruction of loading from VAL is the load and synch operation. The next step executed is that of checking the results of the load and synch. This is step 203, where the results as stored in Var are compared to zero. If Var is greater than or equal to zero, then the algorithm terminates in step 204. If Var is less than zero, execution continues at step 205. Step 205 is where an array element is processed.That array element could the one at the index position of Var plus a predetermined constant value or one that is at the index position of the negation of Var. Assuming that ADDR was unmodified during the processing of the array element, after step 205 is finished the program execution resumes at step 202. But if ADDR has been changed by any of the steps following the initial step 201, then program execution must resume at step 201.

Figure 5 is a flowchart of a process using the L & S operation to emulate the test and set operation. The first step is step 301, where the semaphore's address is stored in ADDR. Step 302 is the load and synch operation, where the value stored in VAL is loaded into local variable Var. Next, at step 303 the value of Var is compared to zero. If Var is equal to zero, then control loops back to step 302. If Var is not equal to zero, then control proceeds to step 304, where the resource is processed. When the semaphore is equal to zero, the resource is busy. When the resource is available, the semaphore is not equal to zero. Note that for the test and set emulation to succeed, the semaphore must be initially set to some positive value or -1.Because any other initial value will not cause the semaphore to be at zero after the first access, improper initialization will defeat the mutual exclusion provided by the test and set operation. After finishing with the resource, the process then sets the semaphore to one in step 305. Any positive value, or -1, to be stored as the semaphore's value is adequate in step 305. After resetting the semaphore's value to indicate resource availability, the test and set emulation is complete at step 306.

In the preferred embodiment, the processor used has both a vector processor and an integer processor. The vector processor does not utilize the L & S instruction, only the integer processor does. The vector processor does not have a virtual mapping ability, and so it must have the ETLB perform its virtual mapping. The integer processor has an internal translation look-aside buffer (TLB) to produce the physical addresses. Thus any process running has access to only the semaphore's virtual address, so virtual addresses are the only type available to be stored in ADDR. During the exception generated on the first semaphore access to a particular location, the system uses the integer processors TLB to determine the physical address of the semaphore.

If the computer system had only physical addressing and a means to signal the semaphore operation, the ETLB and the semaphore circuit would not be needed. All that would be necessary to perform a L & S operation is to execute a load instruction from a memory location, provided that there is a signal to identify the access as a L & S operation That signal could be provided in a number of ways, including but not limited to having a separate semaphore signal line, mapping half of the address space to non-semaphore operations and the other half to semaphore operations, or having a separate semaphore instruction hard wired into the processor. The logic circuit of the memory board control can be designed to implement the semaphore operation regardless of the type of addressing scheme (virtual or physical), given there is a semaphore signal present.

Thus, a load and synch computer architecture is described.

Claims

1. In a parallel processing computer architecture, an apparatus for implementation of semaphores comprising: plurality of communication buses; plurality of computing means, coupled to said communication buses, for processing information and executing instructions including a semaphore operation; plurality of memory means coupled to said communication buses, for storing data and executing indivisible semaphore operations on data stored within.

2. The apparatus described in Claim 1 where said computing means comprises: computer processor having virtual address mapping; address line coupled between said computer processor and a multiplexor; semaphore means coupled between said computer processor and a multiplexor, for storing a semaphore's virtual address and returning the semaphore's value; multiplexor for selecting which signal, from either of said address line or said semaphore means, to send one of said communication buses.

3. The apparatus described in Claim 2 where said semaphore means comprises: semaphore circuit having a semaphore address register and a semaphore value register; virtual memory table having a semaphore table mapping virtual addresses of semaphores to physical addresses.

4. The apparatus described in Claim 3 where said address line has a portion to indicate to said multiplexor the presence of a semaphore operation.

5. The apparatus described in Claim 4 where said memory means comprise: plurality of memory cells for storing information; plurality of memory control logic circuits for executing a semaphore operation on any of the memory controller processor's memory cells.

6. The apparatus described in Claim 5 where said memory control processor has a plurality of logic circuits to implement a semaphore operation, said semaphore operation comprising the steps: storing the value of a semaphore, whose address is stored in said semaphore address register, into said semaphore value register; comparing said value with zero; if said value is greater than or equal to zero then setting said semaphore to zero, else incrementing said semaphore's value by one.

7. The apparatus described in Claim 6 where said semaphore can be located at any location in memory.

8. The apparatus described in Claim 7 where said semaphore is accessible by any process or processor in said parallel processing computer architecture.

9. The apparatus described in Claim 8 where said semaphore operation returns a value that has greater significance than that of program control information.

10. In a parallel processing computer architecture, an apparatus for implementation of semaphores comprising: plurality of communication buses; plurality of computing means, coupled to said communication buses, for processing information and executing instructions including a semaphore operation; plurality of memory means coupled to said communication buses, for storing data and executing indivisible semaphore operations on data stored within, where said memory means has logic circuits to implement a semaphore operation, said semaphore operation comprising the steps: storing the value of a semaphore, whose address is stored in said semaphore address register, into said semaphore value register; comparing said value with zero; if said value is greater than or equal to zero then setting said semaphore to zero, else incrementing said semaphore's value by one.

11. The apparatus described in Claim 10 where said computing means comprises: computer processor having virtual address mapping; address line coupled between said computer processor and a multiplexor; semaphore means coupled between said computer processor and a multiplexor, for storing a semaphore's virtual address and returning the semaphore's value; multiplexor for selecting which signal, from either of said address line or said semaphore means, to send on one of said communication buses.

12. The apparatus described in Claim 11 where said semaphore means comprises: semaphore circuit having a semaphore address register and a semaphore value register; virtual memory table having a semaphore table mapping virtual addresses of semaphores to physical addresses.

13. The apparatus described in Claim 12 where said address line has a portion to indicate to said multipiexor the presence of a semaphore operation.

14. The apparatus described in Claim 13 where said memory means comprise: plurality of memory cells for storing information; plurality of memory control logic circuits for executing a semaphore operation on any of the memory controller processor's memory cells.

15. The apparatus described in Claim 14 where said semaphore can be located at any location in memory.

16. The apparatus described in Claim 15 where said semaphore is accessible by any process or processor in said parallel processing computer architecture.

17. The apparatus described in Claim 16 where said semaphore operation returns a value that has greater significance than that of program control information.

18. In a parallel processor computer system having a plurality of processors and a plurality of memory devices, each said memory device having a plurality of memory control logic circuits, a process for implementing a semaphore operation, comprising the steps: processor specifying a semaphore operation and providing a semaphore address; memory control logic circuit, said memory control logic circuit being an element of the memory device specified by said semaphore address, said memory control logic circuit performing the steps of: sending said semaphore's value to said processor; comparing said semaphore's value to zero; setting said semaphore to zero if said semaphore's value was greater than or equal to zero, else incrementing said semaphore's value by a positive one.

19. The process described in Claim 18 where the step of said processor specifying a semaphore operation and providing a semaphore address comprises: storing said semaphore address in a semaphore address register to provide said semaphore address; accessing a semaphore value register to specify a semaphore operation on a semaphore located at the address provided in said semaphore address register.

20. The process described in Claim 19 where said semaphore address stored in said semaphore address register is a virtual address, the step of accessing said semaphore value register further comprising translating said semaphore address rsgister s virtual address to a physical address.

21. The process described in Claim 20 where said semaphore address can be any location in memory.

22. The process described in Claim 21 where said semaphore is accessible by any process or processor in said parallel processing computer architecture.

23. The process described in Claim 22 where said semaphore operation returns a value that has greater significance than that of program control information.

24. In a parallel processor computer system having a plurality of processors and a plurality of memory devices, each said memory device having a plurality of memory control logic circuits, a method for processing an array of information, having a plurality of processes executing in parallel, each of said parallel processes having a loop comprising the steps: executing a load and synchronize instruction, retrieving a semaphore value; comparing said semaphore value to zero; if said semaphore's value is less than zero then processing one element of said array determined by said semaphore's value; terminating execution of said loop when said semaphore s value is greater than or equal than zero, else resume execution of said loop.

25. The method as described in Claim 24 where the step of processing an element of said array uses said semaphore's value added to a predetermined constant to determine which of said array elements to process.

26. The method as described in Claim 25 where the step of processing an element of said array uses the negation of said semaphore's value to determine which of said array elements to process.

27. The method described in Claim 26 where said semaphore address can be any location in memory.

28. The method described in Claim 27 where said semaphore is accessible by any process or processor in said parallel processing computer architecture.

29. In a parallel processor computer architecture, a system tor implementing an indivisible memory operation, comprising: address and data buses coupled between processors and memory devices for sending addresses and data; plurality of memory devices, each said memory device having a plurality of memory control logic circuits for executing an indivisible memory operation; plurality of processors with virtual address translation, each said processor having a bus interface unit comprising:: semaphore circuit having a semaphore address register, a semaphore value register, a decoder, and a memory controller for storing information about indivisible memory operations, said decoder distinguishing between access to said semaphore address register, indivisible memory operations, access to an external translation look aside buffer (ETLB), or other access, said memory controller for controlling accesses to said ETLB; ETLB having virtual address translation circuitry and having a semaphore address table, said semaphore address table for mapping a virtual address to a physical address during an indivisible memory operation; multiplexor for sending an address over said bus from said ETLB during an indivisible memory operation and sending an address over said address bus from said processor for other accesses.

30. An apparatus for implementation of semaphores in a parallel processing computer architecture substantially as hereinbefore described with reference to the accompanying drawings.

31. A process for implementing a semaphore operation in a parallel processor computer system having a plurality of processors and a plurality of memory devices, each said memory device having a plurality of memory control logic circuits substantially as hereinbefore described.