CA1234636A

CA1234636A - Method and apparatus for handling interprocessor calls in a multiprocessor system

Info

Publication number: CA1234636A
Application number: CA000485141A
Authority: CA
Inventors: Dennis L. Debruler
Original assignee: American Telephone and Telegraph Co Inc
Current assignee: NCR Voyix Corp
Priority date: 1985-06-25
Filing date: 1985-06-25
Publication date: 1988-03-29

Abstract

Abstract:
A multiprocessor arrangement in which the individual program functions of a program process are executed on different processors. Data shared by different program functions is stored in shared memory and the programs are stored in local memory of the individual processors. One processor calls for the execution of a program function by another processor by causing the program address and a pointer to the program function context to be loaded into a work queue of the called processor. Input/output modules are treated as processors. Facilities are provided for the transfer of blocks of data over the interconnection bus system. Virtual addresses are translated to physical addresses in one facility common to all processors.

Description

D. L. DeBruler 1 L63~ii MET~OD AND APPARATUS FOR ~ANDLING
INTERPROC~SSOR CALLS IN A MULTIPROCESSOR SYSTE~I

Tec_nical_Field This invention relates to multiprocessor systems and more specifically, to means for transferring data and program control in such systems.
Back~round_of_the_Invention A multiprocessor system is a data proaessing system in which a number of processors cooperate to e~ecute the total overall task of the system. It is used when one processor cannot handle the full data processing load demanded of the system. While a great deal of progress has been made in solving the problems of multiprocessor systems having a small number of processors, no satisfactory arrangement exists for the achievement of a very high throughput in a system having a large number of modest performance processors.
~ lultiprocessor systems, in common with oth0r data processing systems, use random access storage such as semiconductor`random access memories, and bulk storage, such ns magnetic disks or tapes. lYhen a particular task is being performed by the system, the program and data associated with this task is stored in random access memory so that the data can be processod. At other times, the data is stored in bulk storage, ready to be transferred or paged into random access storage when the need arises.
No really satisfactory technique e~ists in prior art systems for efficiently and economically sharing random access memory, especially that containing programs, among many processors. Some prior art systems share all random access memory, including programs and data, among all processors. When program memOry is to be fully shared D. L. DoBrulor l ~, . .

- 2 - ~ ~3~3~

among all processors, a bottlenock exists in accossing and communicating program instructions from common momory to eaeh of the processors upon domand. Either an oxtremely high throughput bus or a comple~ bus intorconnection scheme is used to transmit the instructions from the memory to several processors. Such prior art busos are espensive and the systems are limited to a small number of processors since they require the sending of vast quantities of program instruetions over the buses with an inevitable loss of performance eapability.
In other prior art systems, local ranaom access memory is provided for each proeessor. ~lultiprocessor systems generally operate in the multiprocessing mode, wherein the system exeeutes a number of broad tasks, ealled program processes, simultaneously. Associated with each program process are a number of vari~blec and parameters, stored in an aroa of memory called tho program function oontext. Each of these program processes accomplishes its objectives by executing a number of sub-tasks or program functions which utilize the data of theassociated program function contest. In prior art multiprocessing systems, a program process is usually confined to a single processor. Placing an entire program process on one processor requires an expensive, large local memory for that processor and degrades performance by requiring a great deal of manipulation of memory contents. The alternative of breaking largo processes down into small procosses is also inefficient and leads to an unwieldy software structure.
Prior art multiprocessor systems use restricted and specialized communication means betwoen processors and input/output controllers and among input/output controllers in order to avoid overloading the common systom bus. Input/output controllors are associated with various combinations of devices such as magnetic disk or tape memories, input/output terminals and displays, high spoed printers, punchod card roaders and punchors.

D. L. DeBruler 1 ~ ~ ~23~

3 --Usually, these controllers are interconnected by arrangements with limited access; all procossors cannot directly accoss all input/output units without considerable eYpense. This means that system performance S is degraded if large amounts of data must be exchanged between two input/output controllers which were initially designod to exchange very little data.
~ lany modern processors and multiprocessor systems use a highly flexible method of addressing momory called virtual addressing. A virtual address is an address of main (random access) memory in a simulated processor system; the virtual sddress is translated into a physical addrsss in ths actuai prooossor systom beforo it is used to access random access momory. The translation mechanism is flexible so that at different times, a given virtual address may oorrespond to different pLysical addresses of random access memory; a virtual address may also correspond to an address of bulk storage. Virtual addresses tend to be fixed in a program; physical addresses are assigned to a given segment of virtual addresses when needed. A page fault occurs when a virtual address does not correspond to a physical address of random access memory, i.e., when the translation mechanism fails to find such a correspondence. Page faults always require the adjustment of the translation mechanism, and sometimes, the paging of data from bulk storage into newly assignod random access memory space.
The design of economical address translation mechanisms for translating virtual addresses to physical addresses presents a problem in a multiprocsssor system.
In prior art multiprocessor systems, these mechanisms are implemented using very fast circuits because the delay of addrsss translation is added to each access of random access memory. The size of address translation mechanisms is usually restricted by cost because of the high speed re~uirement, A result is that many page faults, i.e., system indioations that a desired memory location cannot ~l~3~16~6 be accessed, occur because, although the required segment is available in storage, the translation to reach that location is not currently in the acldress translation mechanism. In prior art multiprocessor systems with many individual processors and address translator mechanisms, address translation is expensive and tends to limit system performance. Furthermore, prior art bus schemes inter-connecting the processors and shared memories of a multi-processing system are frequently a limitation on the total throughput.
Summary of the Invention In accordance with an aspect of the invention there is provided in a multiprocessing system having a plurality of processors each of the processors having an associated work queue, the method of executing a program process, having an associated program function context stored in memory accessible by each of said processors, by different processors, comprising the steps of: initiating a process comprising a first and a second function in any of said plurality of processors; storing an indication of the identity of a first processor and of the address of a first function at an address specified as part of a second function; executing said second function by means of a pre-determined one of said processors, including entering data into said program function context stored in memory acces-sible by each of said processors; storing a link to said program function context and said indication of the address of said first function in the work queue associated with said first processor; and executing said first function by means of said first processor using said program function context containing said data entered into said program function context during execution of said second function, whereby a first program function executed by means of a first processor is linked to a second program function - 35 executed by any processor and uses a program function t - 4a - 1~3~63~

context containing data entered during execution of said second function.
In accordance with another aspect of the invention there is provided a multiprocessor system for executing a plurality of program processes each process having an associated program function context, comprising: a plurality of processors; storage means for storing an indication of the identity of a designated one of said plurality of processors and of a first program function and for storing said program function contexts; bus means for accessing said storage means; each of said plurality of processors comprising means for accessing said program function contexts in said storage means via said bus means and each operative under program control to initiate a process; work queue means associated with each of said processors; said processors operative under program control to store a link to a predetermined program function context and to said first program function in the work queue means associated with said designated processor; wherein said designated processor is operative to execute said first program function using said predetermined program function context.
In accordance with this invention, each processor of a multiprocessor system is adapted to cause a program function to be executed by any processor of the system, by linking a request for such execution to a work queue of the called processor. The calling processor links the program function context, specifying the called program function, to the ~ork queue of the called processor. The called processor then executes that program function. In one embodiment of this invention, the link to the program function context is augmented by a link to a program address of the called program function; the latter link need not be provided by the program function context in this case. Alternatively, the link can be part of the data provided by the program function context.

39L~i3~

- 4b -In one embodiment of this invention, each proces-sor has local memory directly accessible only to the central processing unit of that processor used for storing programs to be executed by the associated processor.
Preferably, shared memory means also exist, accessible to all processors, and used to link the called program function context to the work queue of another processor.
Advantageously, by permitting individual program functions to be assigned to any processor, thus allowing a process to be spread over several processors, the size of local memory for each processor can be reduced. Repeated copies D. L. DeBruler ~

~ 3~

of a function used by many processes can be oli~inatcd.
In one embodiment of this invention, oxternal bus means interconnect processors and shnred memory moans.
Further, input/output controllers have tho same access to S the bus means as processors, can call or be called for the execution of a program function using the same techniques, and can similarly access shared data. Advantag~ously, this provides full access among all input/output controllers, processors, and shared memory.
In one embodiment of this invention, the shared memories, processors, and bus means are adapted to transfer blocks of data rapidly and efficiently by treating the words of the block together instead o as separate and independent entities. Consecutive addresses 1~ need not be transmitted repetitively for such a block transfer. Advantageously, this reduces the number of address translations required and permits the system to operate efficiently even with a slower address translation mechanism. Advantageously, the bus means are used for transferring blocks of data among input/output controllers, shared memory means, and processors, thus providing highly fle2ible interconnection apparatus among these units.
In one embodiment of this invention, virtual addressing is used to access memory. The virtual address translation mechanism used in one specific embodiment is a single common facility accessible by all processors, and ~omprises one or several independent translation modules.
These modules operate on different portions of the virtual address spectrum. Advantageously, the use of a common facility makes it economically feasible to make this facility large enough to eliminate those page faults which in prior systems result from required translations not being available in the translation mechanism. In one embodiment of this invention, each program process occupies a different portion of the virtual address spectrum. AdYantageously, this facilitates a relatively D. L. DeBruler ~
~39163~

uniform distribution of translation load among the translation modules. In one embodiment, data indicating the identity of a processor designated to execute a called program function and an indication of ths address of that S called program function is stored in memory.
Advantageously, such memory can be shared memory. If virtual addressing is used in a system, such memory can be addressed using virtual addressing means. Alternatively, such data may be stored in the virtual address translation mechanism.
In an alternative embodiment of a virtual address translator, such a translator is implemented using a group of at least two serially connectad blocks to perform the translation. Each block comprises memory, register and adder means. The input to the first block represents an initial virtual address, the output of each block is the input to the next block and the outupt of the last block includes the physical address corresponding to the initial virtual address. The blocks can be operated in a pipeline mode to allow action on several translations to proceed simultaneously. In a specific alternative embodiment, three blocks are used in tandem. The alternative embodiment of an address translator is used in an address translation module of which there may be one or a plurality of independent modules.
In one embodiment of this invention, processors are arranged to generate physical addresses after having generated a translation from a virtual to a physical address. The address translation means are adapted to transmit such phys;cal addresses without translation.
Physical addresses may, for example, occupy a dedicated portion of the total address spectrum. This arrangement rcduces the number of translations required and permits the system to operate efficiently even with slower address translation mechanisms. Advantageously, the use of slower addross translation meohanisms makes it economically possible to greatly e~pand the address capacity of those - 7 - ~2~636 mechanisms, thus roducing page faults.
In ono embodiment of this invontion, tho bus moans are split into throe segmonts oporating simultaneously and independontly. Ono sogmont transmits signals from processor output means to virtual addross translation means. A second segment transmits signals from tho output of virtual address translation means to shared memory addressing and data input means. A third segment transmits signals from shared memory output means to processor input means. Intersogment connections are provided for cases in which the output of one bus segment can be transmitted directly to the next bus segment. In addition, each segmont of the bus system can carry many simultaneous transactions. Each memory result and each data word to be written is tagged with the identification of the requesting or sending unit respoctively so that several memory responses and several data words to bo written can be interleaved. The bus means can also be used for transmitting data among processors and input/OUtpUt units. Advantageously, a bus system in accordance with this invention provides high throughput without the prohibitive e~pense of prior art high performance buses.
Brief_Descriptio__of_the_D_a_~_~
The invontion will be better understood from the following detailed description when road with reference to the drawing in which:
FIG. 1 is a block diagram of a multiprocessor systsm representing an illustrative embodiment of the invention, FIG. 2 is a memory layout of memory entries used in e~ecuting programs in the system of FIG. l; and FIGS. 3A, 3B~ 3C and 4 are a memory layout of work queues for the processors of the system of FIG. 1.

,~

~39L636 ( Detailed_Description FIG. 1 is a block diagram of an illustrative multiprocessing systom consisting of procossors lO.l,...,m, input/output (I/0) modules 12.1,...,n, address translators l5.1,...,p and shared memory modules 18.1,...,q, interconnected by bus system 51, 5~, 53. The processors are identical in structure and each includes a central processing unit (CPU) 22, a local memory 24, and bus interface circuits 21 and 23. The CPU 22, which may be a commercially available unit, is connected to the local memory 24 which stores programs and data dedicated to the CPU. The interface circuits 21 and 23 provide an interface between the CPU and buses 51 and 53. The I/0 modules 12.1,...,n are identical in structure and each compriseS a central processing unit (~PU) 32, interface circuits 3:L and 33 by which the CPU is connected to buses 51 and 53, and input/output equipment 34, local memory 35, and bulk memory 36 all connected to the CPU.
The CPU 32 may be the same kind of machine as the CPU 22.
Local memory 35 is adapted to store data and the programs for the CPU 32. The input/output equipment 34 includes input and output terminals. ~ulk memory 36 consists of bulk storage devices such as magnetic disks and tapes.
Uemory modules 18.1,... ,q are identical in structure, each including a standard random access memory unit such as memory 62 and bus interface circuits 61 and 63. Any of the memory modules may be addressed by any of the processors or I/O modules. The memory addresses 30 transmitted on bus 51 mny be either virtunl addresses or addresses of physical memory locations (physical addresses). A virtual address is the identification of an item (progra~ or data) which may be stored at any physical address in one of the memory modules or in bulk memory in one of the I/0 modules. Virtual addresses occurring on bus 51 are translatod by means of the address translators 15.1,...,p into physlcal addrosses dofining .

D. L. DeBruler 1 - g ~ 23~63t~

physical location in the memory modules. Each of the address translators includes a translator unit ~2 and interface circuits 41 and 43. The translator units are well-known virtual address translators which include information defining the present physical location of the item identified by tho virtual address.
If the data contained in`the translator 42 indicates that the physical address corresponding to the translated virtual address is in one of the memory modules 18.1,...,q, the translator module will transmit the physical ~ddress to the appropriate msmory module Yia bus 52. In the event that the corresponding physical address does not fall within the range of addresses of the memory modules, the item identified by the Yirtual address is obtained from the bulk memory location corresponding to the virtual address and placed in a selected one of the memory modules. Furthermore, the information in the translators is changed to reflect the selected memory module location for the virtual address. The memory access operation may be completed using the new physical address. Address tables stored in memory define the present location of all data, whether in bulk storage or in a rlemory module.
In this embodiment of the invention, the bus system is broken into three parts. Bus 51 is used to transmit address and data outputs of the processors and I/0 modules to the translators 15.1,...,p. Bus 5~ is used to transmit addresses and data to the memory modules 18.1,...,q. Bus 53 is uscd to transmit data from the memory modules to processors lO.l,...,m and I/O modules 12.1,...,n. Also connected between buses 51 and 52 is a pass-through unit 17 consisting of interface circuits 44 and 46, to allow signals, such as physical address signals generated by a processor or I/O module to be passed directly from bus 51 to 52. A similar arrangement allows the direct passage of signals from bus 52 to bus 53 via bypass unit 2~ nnd ~rom bus S3 to bus 51 via bypass D. L. DeBruler ~

- 1 - :~L23~,3~, unit 14.
Each of the units connected between bus 51 and 52 have an input interface circuit C and an output interface circuit D. Similarly, each of th0 units S connected between bus 52 and bus ~3 hava an input interface circ~it E and an output interface circuit F, and units connested between buses 53 and 51 have an input interface circnit A and an outpu~ interface circuit B. In each instance, interface circuits identified with the same lettsr are identical and each of the circuits is designed to assure compatibility between the bus circuitry and the circuitry to which it is connected.
The address spectrum of the system is broken down into three parts. The total range is from 0 to (232 - 1). The first part, 0 to (224 - 1) is reserved for local addresses within each processor. The same addresses can be used by 811 the processors and input/output modules for addressing their own local memories. The sacond range, 224 to (226 - 1) is dedicated to physical addresses of shared memory. Each such physical address dcfines a unique location in the physical memory accessed by bus 52.
The third range, 226 to (231 - 1) is used for virtual addresses of shared memory.
Each procassor is adapted to recognize local addresses and retain these internally. Thus, local (first ranga) addresses are not transmitted on bus 51. The bypass module 17 is adapted to recognize addresses in the second range and to transmit such addresses directly to bus 52 without translation. ~nly third range addresses are translated by modules l5.1,...,p. hlodule 17 is also adapted to recognize and transmit data (as opposed to addresses) directly to bus 52.
In the illustrative multiprocessin~ system, difEerent portions of the total virtual address spectrum are devoted to different processes. This helps to prevent unwanted inter-process interference, and makes it possible to protect memory, co~mon to several processas but D. L. DeBruler t 3~6~, associated at ono instant wi~h nnly ono process, from boing illogally accessed by an unauthorizod procoss. Tho translation mochanism generates memory protection codos for all memory accossos, and oan ganorato a difforent S momory protection eods for the samo location if it is associated with a different procoss.
Tho translation mechanism is broken up into different modules, eaeh of which troats a difforent portion of the total address spectrum. This makos it unnecessary to store any addross translation in moro than one modulo. Each procoss has its own virtual addross sot, and each addross translation modulo than operatos upon a diffesent set of processos. This arranBement is usod to oqualizo th~ load on the different address translation modules.
The uso of a singlo overall address translation mechanism which stores each translation only once makes it eeonomically feasiblo, using eonvontional addross translation methods, to storo enough translations so that pago faults due to missing translation data in tho translation mechanism can be substantially oliminatod. An alternatiYo is to uso a translation mochanism implemented through the use of random access memory and successive look-up oporations basod on tho procoss numbor, sogmont and pago, sineo this ombodiment of the invention has sharply reduced the number of translations roquirod, such an approach is foasible and offors an oconomical, vo~y large translator module.
If a processor or input/output module must perform a number of memory operations in a particular block, it can access the basic addross translation tables of tho system, stored horo in shared memory, to generato the physical address of that block. Thoroaftor, tho procossor or input/output modulo can e~ecute programs using the physical address of that block. Those physical address~s (socond rango) aro thon transmitt~d from bus Sl to 52 by module 17, without requiring address translatlon.

D. L. DeBruler ' - 12 - ~23~3~

This reduces the address translation load on tho systom.
Alternatively, it is also possible to add a special read command to the system which would roturn to a roquosting processor a physical address instead of tho data stored at that address; such a facility would spoed up the process of generating a physical address.
In a multiprocessor system, it is frequently necessary to transfer substantial amounts of data between bulk memories or from bulk memory to random access memory and vice versa. In this embodiment of the invention, this is accomplis~ed by using buses 51, 52, and 53 to implement block transfers of data among input/output modulos, tho local memories of tho processors, and tho shared memorios.
The writo transfer of a block of data is accomplished by sending to tho appropriate memory an initial address, a block write command, and the length of a block of data, and thereafter transmitting only data. The length and initial address aro stored in interface E (61) which subsoquently contsols the writing of the individual words of data until the block writo has boen accomplished. A
block read is accomplished the same way e~cept that a block read command is sent snd the individual words of data are sent from the shared memOry to the input/output or processor modules. Interface A ~21, 31) is used to store the initial address and len~th of block for a block read, in order to control the action of writing into the local memory (24, 25).
In order to implement a block transfer from a procsssor or input/output module to another such unit, a special command to alert the destination unit of a block transfer is provided. This command includes the identification of the destination unit and is recognized by interface A (21, 31) of that unit. Interface A then interrupts the associated CPU (22, 32). The destination unit is now ready to recoive the initial address and length of block for a block road or write and to accept the data words for a block write or to send out the data D. L. DeBruler ' gl23~L~i3~3 words associatod with a block read.
Interface cirsuits A through F associated with the various modules of this system are adapted to transmit signals to and receive signals from the bus system. In some cases, they are further adapted to store estra signals and to control block operations. These interface circuits are made up of registers, counters, rango recognition circuits, module identification recognition circuits, first-in, first-out register stacks, bus access arbiter circuits to resolve bus access conflicts, bus receivers, and bus transmitters, all well kno~n in the srt.
Intorfaces A (21, 27, 31) and B (23, 29, 33) are adapted to implement the block transfer operations, and to recognize those signals on bus 53 which are destined for a particular processor. They include counters and addressing registers to allow the successive reads or writes associated with a block transfer to be read from or stored into the correct portion of local memory. In addition, interface B is adapted to resolve bus access conflicts ~etween different processors.
Interface C (41) in address translation modules 15.1,...,p, is adapted to recogni~e addresses within the assigned range of the associated translator. Interface C
has storage to accept a number of almost simultaneous requests for translations by the same translator module.
Interface C (44) in modulc 17 is similarly adapted to recognize the range of addresses not requiring translation and to recognize data to be passed directly to bus 52.
Interface D (43, 46) is adapted to resolve bus access conflicts to bus 52 between different address tr~nslator modules and the bypass module 17.
Interface E (61) is adapted to recognize data signals and addresses on bus 52 destined for a particular shared memory module, and is adapted to control 'olock transfer operations. It may be desirable for some applicntions to furthor adapt interfaoo ~ to accopt D. L. DeBruler ~
23~ 3~

isolated read or writo requests in the middle of a block transfer operation. In addition, interface ~ is adaptod to accept additional requests while memory 62 is reading a given location. Interface F(63) is adapted to resolvc bus access conflicts to bus 53.
Since a number of shared memory read and write transactions, including some block transactions, are taking place simultaneously, interfaces B are adapted to tag address, block length, and data signals with the identification of the requesting processor module 10.1,...,m or input/output module 12.1,...,n, and interfaces ~ are adapted to tag data returned from a shared memo~y modu1e ~8.1,...,q with the idantification of the requcsting module. Interfaces A are adapted to check for this tag on any response from a rend or block read command. Interfaces E are adapted to recogni~e the tag of a previously sont address that is attached to a subsequent block length or write data signal, and to ba responsive to such tagged signals. In the case of a block transfer from a processor or inputloutput unit to another such unit interface A of the destination unit looks for the tag of the source unit. In addition, interfaces F are adapted to generate an acknowledgement including the identification tag of the requesting processor or I/~ module on a write or block write command, and interfaces ~ are adapted to recognize such acknowledgements. Interfaces C and D are adapted to transmit the identity of the roquesting processor or input/output module along with address, block length, or data sent to shared memories.
FI~. 2 shows an alteTnate implementation for the translators 42 of the address translation modules.
Address translator 42 comprises three random access memories (RA~I's) 310, 311, and 312 each with an associated adder (320, 321, 322, respectively) for combining input signals to produce the appropriate output and an associated register (330, 331, 332, respectively~. A
virtual address is composed of a process nu~ber, scgment D. L. DeBruler ' - ~23~63~i numbor, page number, and page offset. The process numbor is used to addross RAM 310, while thc segmont numbor, page numbor and pago offset are stored in register 330. The output of RAM 310 (segment address table location) is S added to the segment number selected from rcgister 330 and used as an address of RAM 311. The contents of RA~I 311 (pagc address table location) aro then addcd to the page number in register 331 to locate the data specifying the specific page address in RA~I 312. The output of RAM 312 ~page address) is addod to the page offset to generato tho physical address corresponding to the original virtual address. In effect, RAM 310 storcs the iocations of virtual address segment tables of each process in ~AM 311;
the increment of the segment number then loca$es the data in RA~I 311 specifying the location of the page address table in RAM 312; the increment of the page number then locates the page address stored in RAM 312. Note that address translator 42 can work in a pipelinc mode, in which each of the thrae RA~I's is simultaneously reading data for the translation of a different virtual address.
Note further that address translator 42 is composed of three ossentially identical blocks 300, 301, 302 difforing only in thc portion of the register 330, 331, 332 connected to adder 320, 321, 322. In some systems, it is possible to use concatenation arrangements or othsr specialized adding arrangements f3r some or all of the adders 320, 321, 322. In mOre or less complex virtual addressing arrangements, it is possible to use only two blocks, or more than three bloc~s, for implementing the translator.
FI~. 3 and 4 show memory layouts associated with the flow of program control in the multiprocessor system.
The individual major tasks which a da~ta processing system must carry out in order to accomplish its total objective are usually called program processes. These processes aro carriod out by e~ecuting a number of subtasks or program functio~s. In a multiprocessor system, several processes D. L. De~ruler ~

~ ~6 - ~2 ~ ~3 6 are normally boing executed simultaneously on difEcrent processors, and using different portions of shared momory.
In the multiproccssor system of this invontion, the various program functions associated with a given program process need not be e~ecuted on one processor; instead, one processor can call for the execution by another processor of a program function within the same process.
Each process that is activs in the system has a process control block (FIG. 3A) of memory dedicated to keeping track of status information and parameters needed by the process. The time that the process was originally invoked is stored in location 211 and the time that the most recent function boing executed on this process was invoked is stored in location 212 in the typical process control block 210. FIG. 3B shows the layout of a program function context 220, which is maintained during a program process. The context is a last in, first out memory stack. Th~ most recently generated parameters in locations 222 and variables in locations 223 can then be found by going a small number of le~els into the stack.
The program status indication of the program function execution is kept with the program function context in location 221.
FIG. 3C shows two process tables 230 and 240, one for each of two processes. Each table contains a list of processor identifications and addresses for all the functions used by a process. For e~ample. as indicated in locations 234 and 231 respectively, the first function, G(l), used by tho first process shown is designated to be executed on processor P(G(l)); the address inside that processor in which this first function is stored is A(G(l)). In similar fashion, as indicated in locations 235,...,236 the processors, P(G(2)),...,P(G(N)) are designated to execute functions G(2),...,G(N); as indicated in locations 232,...,233, these functions are stored at internal addresses A(G(2)),...,A(G(N)), within these processors, Similarly, the processor D. L. DeBruler ~
3LZ3~3~

identifications and addresses for thc ~ functions ~(1),~(2),...,~(hl), of a second proccss are shown in table 240 of FIG. 3C. Alternativoly, it is possiblo to store indicators, such as an index or the address of a pointer, from which a processor exeouting a program can generate a program address and/or a procassor idontification.
Table 230 is initialized when the system first recognizes the need to execute the first program process and loads the various program fnnctions required to execute that process in various processors. The table entries are filled in as each program function is loaded into its procsssor. ~Yhen virtual momory addressing is used, as in the system b0ing described, this table is stored at a virtual address, either defined within the code of the various program functions which may call any of thc program functions of this program process, or found via pointers located at such a virtual address. In a similar manner, table 240 is initialized when the system recognizes the need to execute the second program process.
~ lternatively, the contents of table 230, 240, and other sim~lar tables can be used as the primary translation table mapping from virtual to physical addresses. This primary translation table could be stored in memory and inserted into the translation mechanism as needed, or, in case the translation mechanism were implemented through the use of random access memory and successive look-up operations, could be stored directly in the address translation mechanism.
FIG. 4 shows the memory layouts of work queues which are used to allow one program function being executed on one processor to call for a second program function to be e~ecuted either on the same processor or on any other processor in the system. Block 100 skows memory associated with a first processor, block 101 with a second processor,.,., block r with a last processor. In this discussion, inputtoutput controllers are included among D. L. DeBruler ~
g ~3~i3Ç, procossors. In a systom with i procossors and ;
input/output modulos, a total of i ~ j such blocks would bo maintained.
Each procossor has associated with it a work S queue such as 110 containing a list of program functions which that processor has boon called upon to execute and which it has not yot e~ecuted. In the described system, there is both 8 regular work queue (110, 160) and a high priority work queue (120, 170), in order to allow special high priority functions to bo executed before regular functions. It is possible, using techniques well known in the art, to implemont moro oomple~ ~riority schemos in which thoro can be more than two prioritios of tasks.
~loroo~er, methods of intorrupting tho curront exoQution of a function in order to execute a special high priority task are also well-known in the art.
In this system, in order to simplify tho problems which aro associatod with assuring that access to the work queuo of one procossor is not simultanoously ~0 sought by sevoral other procossors, oach processor has an associatod output quouo tl30, 140). Tho calling procossor loads a roquost for the e~ecution of a callod function into its own output queue. A separate process, controlled, for e~ample, by a third processor, is used to e~amine called program function roquosts in tho output queues of each processor and load corresponding entries into the work queue of the procossor dosignatod to oxecuto tho called program function.
Each entry in a quoue includos a program address or an addross in a procoss tablo, such as ~30, and tho addross of the program function contoxt. Each quouo has a load and an unload pointer for e~ample, 111, 112. Tho load pointor is controllod by the procossor loading tho quoue, tho unload pointer by tho processor unloading the queue. Thus, tho first procossor contIols load pointor 131 and writo~ ontrio9 such as 133 and 134 into its output quoue as it o~ocutos program functions which D. L. DeBruler .
- 19 ~ ~L234G3~

call for the execution of other program functions. Tho first procassor also controls unload pointer 112 and 122 of its work queue and reads entries such as 113, 114, 123, and 124 preparatory to e1~ecuting the eorresponding called S functions which have been requested through the execution of corresponding calling functions.
The third processor, e~ecuting a separate prosess, axecutes a program function which examines the contents of output queues such as 130 and l80 and loads 10 entries into the work queues such as 110, 120, 160 and 170. The third processor reads entries such as 133 in the output ~ueue of eaeh proeessor. After it reads one of these entries, it changes nnload pointer 132. It examines the corresponding table entry in FIG. 3C to find he 15 selected processor and program addrsss and writes the program address into the appropriate work queue of the selectsd processor. The third processor then modifies the load pointer 111 or 121 of the appropriate work queue of the selected processor. In this arrangement, there is no 20 restrietion preventing the calling and called processor from being the same. The procsdure for the call is identical.
Consider now a case in which the first processor is en~ecuting a first function and recogni~es that thero is 25 a need to eall for the e~ecution-of a second function.
Both functions are in the first process in FIG. 3C and labeled Gtl~ and G(2). Processors P(G(l)) snd P(G(2)) are the first and second processors in this e2cample. During the course of execution of function G(l), tho first 30 processor will have entered data into a program function conte~t including the status of the program and the parameters and variables that funetion G(2) will need in order to execnte. In order to sxeeute its part of the work of the function call, the first proeessor will load 35 into its output queue 130 tho address of entry 235 of table 230 of FIG. 3C. The first processor will then modify the load pointer 131 to point to this new entry.

D. L. DeBruler ~
~23~
- 20 ~

The third processor will subsequently examine the load pointer 131 and unload pointer 132 of the output queue of the first procsssor and will recognizo that tho two do not match. This is an indication that the first processor has made an entry in its output queue. The third processor will examine the word pointed to by tho entry in 133 of the output queue, where it will find tho identity of the processor P(G(2)) (the second processor, in this case~ which will csecute function G(2), and the address A(G~2)) in that processor. The third processor will use the identity of the processor P(G(2)) to find the regular work queue 160 of the second processor and will enter the address of G(2), A(G(2)), in that work queue.
In addition, it will copy into work queue 160 the pointer 135 to the program function contest that the second processor will need in order to esecute the second function. The third processor will then increment the load pointer of the regular work queue 160 of the second processor.
Subsequently. the sacond processor when it has finished other work will check to seo if there is additional work by looking at the load pointer and unload pointer of its work queues 160 and 170. If these do not match, for either work queue, new work has been loaded into one of its work queues. The second processor will prepare to execute the ne~t called function whose address and the address of whose program function contest it will find in its work queue. It will then update the unload pointer so that after finishing the eseGution of this program function, it will be prapared to e~ecute the nast requested program function.
In the case of a function call in which the calling program function espects a return, the return from the callcd program function is implementad as a call to the original calling function. The return address data is storod in the program function contest by the original calling program function and found there by the called ` D. L. DeBruler 1 ~ %~463~-program function; values derived by the called progra0 function are stored in locations specified by the calling program function.
In this embodiment, program function context, output queues, work queues, and their associated pointers are all in shared msmory. Since facillties, such as those used for block transfers, exist to allow the roading from and storage into local memosy of other processors, it is possible to place some of these itcms in local storage.
An alternate solution to the process of calling for the execution of a process, is for the calling processor to write directly into the work quoue of the called processor. This by-passes the prOCeSs carriod out in the 3bove example by the third processor. It is also possible to implement the work queus using a first-in, first-out register stack in interface A (e.g., 21, 31) ~etween a central processing unit and bus 53. This would allow one processor to write a call request directly into memory associated with a called processor.
In this embodiment, the address of the program function context and of the called program function are both stored in the work queue of the called processor. An alternative solution is to store only the address of the program function context, and to store the address of the program function, or a pointer to such an address, in the program function context. Furthermoro, in this embodiment, program function addresses are recordod directly in the work queue. An alternative solution is to record a pointer or other indicator of such an address.
It is to be understood that the above-described embodiment is merely illustrative of the principles of this invention; other arrangements may be devised by those skilled in the art without departing from the spirit and scope of the invention.

Claims

Claims:

1. In a multiprocessing system having a plurality of processors each of the processors having an associated work queue, the method of executing a program process, having an associated program function context stored in memory accessible by each of said processors, by different processors, comprising the steps of:
initiating a process comprising a first and a second function in any of said plurality of processors;
storing an indication of the identity of a first processor and of the address of a first function at an address specified as part of a second function;
executing said second function by means of a predetermined one of said processors, including entering data into said program function context stored in memory accessible by each of said processors;
storing a link to said program function context and said indication of the address of said first function in the work queue associated with said first processor; and executing said first function by means of said first processor using said program function context con-taining said data entered into said program function con-text during execution of said second function, whereby a first program function executed by means of a first proces-sor is linked to a second program function executed by any processor and uses a program function context containing data entered during execution of said second function.

2. In the system of claim 1 in which each of said plurality of processors comprises local memory, and in which said indication of the address is an indication of a local memory address in said first processor, said method further comprising the steps of:
storing said first program function in local memory of said first processor; and storing said second program function in local memory of said predetermined one of said processors.

3. In the system of claim 2 in which said system comprises storage accessed by virtual addressing means wherein said step of storing an indication comprises the step of storing said indications at a virtual address specified by said second function.

4. In the system of claim 1 in which said system comprises storage accessed by virtual addressing means, wherein said step of storing an indication comprises the step of storing said indications at a virtual address specified by said second function.

5. The method of claim 4 in which said linking step further comprises the step of storing said indication of the address of said first function in said work queue associated with said first processor.

6. The method of claim 4 in which said linking step further comprises the step of storing said indication of the address of said first function in said program function context.

7. The method of claim 4 in which said step of linking comprises the step of storing the address of said program function context in said work queue associated with said first processor.

8. The method of claim 4 further comprising the step of initializing the system with the virtual addresses specified for the program functions executed by a given program process prior to executing said given program process.

9. The method of claim 8 in which said proces-sors generate virtual and physical addresses occupying distinct virtual and physical address ranges further comprising the method of accessing said storage accessed by virtual addressing means comprising the steps of:
generating a specific address;
determining whether said specific address is with-in said physical address range or said virtual address range;

translating said specific address to a correspond-ing physical address and addressing said storage accessed by virtual addressing means if said specific address is within said virtual address range; and addressing said storage accessed by virtual addressing means without translation if said specific address is within said physical address range.

10. The method of claim 4 in which said proces-sors generate virtual and physical addresses occupying distinct virtual and physical address ranges further comprising the method of accessing said storage accessed by virtual addressing means comprising the steps of:
generating a specific address;
determining whether said specific address is within said physical range or said virtual address range;
translating said specific address of a correspond-ing physical address and addressing said storage accessed by virtual addressing means if said specific address is within said virtual address range; and addressing said storage accessed by virtual addressing means without translation if said specific address is within said physical address range.

11. A multiprocessor system for executing a plurality of program processes each process having an associated program function context, comprising:
a plurality of processors;
storage means for storing an indication of the identity of a designated one of said plurality of proces-sors and of a first program function and for storing said program function contexts;
bus means for accessing said storage means;
each of said plurality of processors comprising means for accessing said program function contexts in said storage means via said bus means and each operative under program control to initiate a process;
work queue means associated with each of said processors;

said processors operative under program control to store a link to a predetermined program function context and to said first program function in the work queue means associated with said designated processor;
wherein said designated processor is operative to execute said first program function using said predeter-mined program function context.

12. The system of claim 11 further comprising:
local memory means associated with each of said processors;
means for storing said first program function in local memory means associated with said designated processor; and means for storing a second program function in local memory means associated with a predetermined one of said processors, said second pogram function specifying the address of said indication of the identity of said designated processor and said indication of the address of said first function.

13. The system of claim 12 in which said proces-sors are operative to generate virtual addresses, in which said storage means is acessed by virtual addressing means, and in which said system further comprises address trans-lation means to translate from virtual addresses to physical addresses.

14. The system of claim 13 in which said address translation means comprise a plurality of address trans-lation modules each of which translates addresses in different virtual address ranges.

15. The system of claim 14 in which different processes occupy different virtual address ranges.

16. The system of claim 11 in which said proces-sors are operative to generate virtual addresses, in which said storage means is accessed by virtual addressing means, and in which said system further comprises address trans-lation means to translate from virtual addresses to physical addresses.

17. The system of claim 16 in which said address translation means comprise a plurality of address trans-lation modules each of which translates addresses in different virtual address ranges.

18. The system of claim 16 in which at least one of said processors is an input/output controller and in which said bus means comprises first bus means connecting output signals of said processors to inputs of said trans-lator means, second bus means connecting output signals of said address translator means to inputs of said storage means, and third bus means connecting output signals of said storage means to inputs of said processors.

19. The system of claim 18 further comprising means for selectively sending signals directly from said first to said second bus means.

20. The system of claim 19 further comprising means for selectively sending signals directly from said second bus means to said third bus means.

21. The system of claim 20 further comprising means for selectively sending signals directly from said third bus means to said first bus means.

22. The system of claim 19 in which said address translation means comprises a plurality of serially inter-connected translator blocks, each of said translator blocks comprising memory, register, and adder means, interconnect-ed so that memory data and register data of each block are added to generate memory addresses for the next block, for generating intermediate translator data and physical address data, said plurality of blocks comprising:
a first block responsive to said virtual addresses to generate first intermediate translator data, and a last block responsive to intermediate translator data from the preceding block to generate physical address data.

23. The system of claim 18 further comprising means for selectively sending signals directly from said second bus means to said third bus means.

24. The system of claim 18 further comprising means for selectively sending signals directly from said third bus means to said first bus means.

25. The system of claim 16 in which said proces-sors further operative to generate physical address signals and in which said address translation means further com-prise means for recognizing and transmitting physical address signals without translation.

26. The system of claim 25 in which at least one of said processors is an input/output controller and in which said bus means comprise first bus means connecting output signals of said processors to inputs of said translator means, second bus means connecting output signals of said address translator means to inputs of said storage means, and third bus means connecting output signals of said shared memory means to inputs of said processors.

27. The system of claim 26 in which said means for accessing said program function contexts in said storage means are further operative to transmit to said bus means signals representing a first address and a processor identification, a first length of a block of data and said processor identification, and a first series of data words each with said processor identification, and signals representing a second address and said processor identification and a second length of a block of data and said processor identification, and to receive from said bus means signals representing a second series of data words, each with said processor identification, correspond-ing to said second address and length; and in which said storage means comprise means for receiving from said bus means signals representing said first address and said processor identification, said first length of a block of data and said processor identification, and said first series of data words each with said processor identification, and signals represent-ing said second address and said processor identification, and said second length of a block of data and said proces-sor identification, and is responsive to said second address and said second length of a block to transmit to said bus means signals representing a series of data words each with said processor identification, whereby the con-secutive data words for a number of transactions from and to different processors can be interleaved on the bus and can be identified by said processor identification.

28. The system of claim 18 in which said means for accessing said program function contexts in said storage means are further operative to generate and transmit signals representing a processor identification with each address and read/write command word, and each write data word; in which said storage means further comprises means for storing said identifying signals transmitted with said address words and to store data corresponding to write data words transmitted with identifying signals matching said stored identifying signals, and responsive to read commands to transmit said stored identifying signals with data read in response to said read command; and in which said means for accessing said program function context in said storage means are further operative to recognize said data read in response to said read command by recognizing said identifying signal; whereby the consecutive data words for a number of transactions from and to different processors can be inter-leaved on the bus and are identified by said identifying signal.

29. The system of claim 28 in which said address translation means comprises a plurality of serially inter-connected translator blocks, each of said translator blocks comprising memory, register, and adder means, interconnect-ed so that memory data and register data of each block are added to generate addresses for the next block, for generating intermediate translator data and physical address data, said plurality of blocks comprising:
a first block responsive to said virtual addresses to generate first intermediate translator data, and a last block responsive to intermediate translator data from the preceding block to generate physical address data.

30. The system of claim 11 in which at least one of said processors is an input/output controller.

31. The system of claim 16 in which said address translation means comprises a plurality of serially inter-connected translator blocks for generating translator data, each of said translator blocks comprising memory means and register means, and adder means for adding the contents of said memory means and said register means to generate memory addresses for the next block, said plurality of blocks comprising:
a first block responsive to said virtual addresses to generate first intermediate translator data, and a last block responsive to intermediate translator data from the preceding block to generate physical addresses.

32. In a multiprocessing system having a plurality of processors each of the processors having an associated work queue, comprising storage accessed by virtual addressing and a translation table specifying virtual to physical address mapping, the method of executing a program process having an associated program function context, stored in memory accessible by each of said processors, by different processors, comprising the steps of:
initiating a process comprising a first and a second function in any of said plurality of processors;
storing an indication of the physical identity of a first processor and an indication of the address of a first function in said translation table;

executing said second function by means of a second processor said executing step including the step of entering data into said associated program function context stored in memory accessible by each of said processors;
storing a link to said associated program function context and said indication of the address of said first function in the work queue associated with said first processor; and executing said first function by means of said first processor using said program function context containing said data entered into said program function context during execution of said second function, whereby said first program function executed by means of said first processor is linked to said second program function executed by said second processor and uses said program function context containing data entered during execution of said second function.

33. A multiprocessor system for executing a plurality of program processes each process having an associated program function context stored in memory, comprising:
bus means;
storage means accessed by virtual address signals via said bus means for storing said program function contexts;
a plurality of processors connected to said external bus means for accessing data in any of said program function contexts;
work queue means associated with each of said processors;
address translation means to translate from virtual addresses to physical addresses;
said address translation means operative to store table data specifying the identity of a processor designated to execute a first program function and specifying an indication of the address of said first program function;
said system operative under program control to store a link to a predetermined program function context and said indication of the address of said first program function in the work queue means associated with said designated processor; and said designated processor further operative to execute said first program function linked to the queue means associated with said designated processor and further specified by said predetermined program function context.