EP0139727A1

EP0139727A1 - Multi-computer computer architecture

Info

Publication number: EP0139727A1
Application number: EP19840901711
Authority: EP
Inventors: Richard Lowenthal; Jonathan Huie; Milan Momirov; Ben Wegbreit; David Cline; John P. Burger
Original assignee: Convergent Technologies Inc
Current assignee: Convergent Technologies Inc
Priority date: 1983-04-15
Filing date: 1984-04-12
Publication date: 1985-05-08
Also published as: WO1984004190A1

Abstract

Un système d'ordinateur à processeur multiple comprend plusieurs processeurs parallèles asynchrones indépendants à fonction spécialisée (11-14). Chaque processeur possède un système de fonctionnement discret et indépendant; les processeurs sont interconnectés pour assurer une communication transparente entre les processeurs à un niveau de fonctionnement sur un bus parallèle et asynchrone (10). Chacun des processeurs comprend une unité centrale de traitement (18) et une mémoire (17). Les processeurs s'envoient des messages en les plaçant dans une mémoire du processeur qui les reçoit. Le processeur de réception est notifié de la présence du message par un signal d'interruption "sonnette de porte" reçu du processeur transmettant le message. Les processeurs sont couplés entre eux par l'intermédiaire de plusieurs fentes de connexion qui définissent une enceinte (15) laquelle forme une unité à ordinateur multiple indépendante, fonctionnelle. Une pluralité d'enceintes (15, 16) peuvent être interconnectées de manière transparente pour définir un système à ordinateur multiple. De cette manière, le système d'ordinateur obtenu peut être étendu à partir d'un système de mini-ordinateur jusqu'à obtenir une grande unité centrale en fonction de l'application et de l'utilisation.A multi-processor computer system includes several independent asynchronous parallel processors with specialized functions (11-14). Each processor has a discrete and independent operating system; the processors are interconnected to ensure transparent communication between the processors at an operating level on a parallel and asynchronous bus (10). Each of the processors includes a central processing unit (18) and a memory (17). Processors send messages to each other by placing them in a memory of the processor that receives them. The receiving processor is notified of the presence of the message by a "doorbell" interrupt signal received from the processor transmitting the message. The processors are coupled to each other via a plurality of connection slots which define an enclosure (15) which forms an independent, functional multiple computer unit. A plurality of speakers (15, 16) can be transparently interconnected to define a multiple computer system. In this way, the obtained computer system can be extended from a minicomputer system until a large central processing unit is obtained depending on the application and use.

Description

MULTI-COMPUTER COMPUTER ARCHITECTURE

A Microfiche Appendix having 41 frames on one fiche is included with this document.

BACKGROUND OF THE INVENTION

I. Field of the Invention

The present.invention relates to computer architectures. More particularly, the present invention relates.to a computer architecture including a plurality of parallel asynchronous independent computers intercon¬ nected by a transparent parallel bus to form a computer network.

II. Description of the Prior Art

One of the traditional drawbacks of prior art shared logic computer systems is the inability of the computer system to grow as system users are added. When using a prior art shared logic computer system, system users confront the reality that computing power is finite. Additional system users draw against the single central processing unit (CPU) resource, signifi¬ cantly impairing system performance.

As additional system users are added, the computer system is required to grow in any or all of . three directions by adding additional: 1) terminals or communications ports;

2) disc and file capabilities; and

3) additional applications processing power.

The first system bottleneck is that of terminal I/O. Terminals interrupting a CPU a character at a time drastically slow down prior art shared logic computer systems. A partial remedy for this situation involves dedicating a front end processor to off-load the communications overhead from the main CPU, such as in the IBM 3705 front end processor manufactured by International Business Machines of Armonk, New York. The second system bottleneck involves file I/O. File access demands that the CPU spend some of its time handling the disc and file system, rather than executing main line code. A partial remedy for this situation was provided by dedicating back end processors to off-load the file processing overhead from the main CPU. The third system bottleneck results from the fact that existing large scale minicomputers and main frame computers are limited by the fixed amount of processing power inherent in the single CPU which executes the applications code. The amount of applica- tions processing available to the system user is thereby limited. The Digital Equipment Corporation VAX-11/782, manufactured by Digital Equipment Corporation of Maynard, Massachusetts, is an example of a system that can add an additional processor. It should be noted, however, that on most mainframes, users are unable to add multiple processors or field upgrade their existing processor to get increased processing power. Thus a user requiring more computing power must purchase a new system. The mentioned bottlenecks in mainframe per¬ formance have been caused by the mainframe's dependence on the traditional shared logic architecture which has dominated computer system design for the last thirty years. The prior art design philosophy has recognized that the CPU and memory more the most expensive elements of any computer system and that therefore they should be shared among a large number of users - timesharing/re¬ source sharing. Until redently this philosophy was sound. With the advent of the very large scale inte- grated circuit (VLSI) technology, dependence on this outdated philosophy has tended to inhibit advances in comϋuter architecture. Manufacturers of the so-called "supermicros" have taken advantage of the low cost of the VLSI by implementing monolithic versions of traditional shared logic designs at greatly reduced cost. Unfortunately, the "supermicro" manufacturers have also copied every one of the traditional main frame architecture bottle¬ necks - limited terminal and file I/O, and finite applications processing power. Since prior art solutions to these bottlenecks were known to mainframe manufac- turers for over thirty years, the "supermicro" systems have tended to adopt the same shared logic. solutions without advancing the art. Therefore, the inherent weakness in all shared logic approaches is still as serious a problem as it was several computer generations ago.

SUMMARY OF THE INVENTION The present invention addresses the three single processor bottlenecks - file, communications, and application processing - by providing multiple concurrent processors. The present invention multi-com¬ puter computer architecture provides a series of independent parallel processors of which any number may be added to structure the system as desired. Each of the file, terminal, application, and communications processors runs its own operating system, and they all execute in parallel. As more users are added to the present invention, multiple applications, file, terminal, and communications processors may also be added to meet the additional computing requirements. The system resources gracefully grow to meet user requirements.

The present invention is a system of multiple processors tied together on a high speed asynchronous bus. Each of the processors on the bus consists of a CPU and memory; the processors can also include I/O interfaces. The bus in the present invention may be extended across multiple enclosures, each with a multi-slot backplane. Each enclosure supports integral mass storage.

The present invention uniquely provides a file system which executes on parallel processors con- currently with applications execution by adding multiple file processors as system needs grow. Terminal handling executes on parallel processors concurrently with application execution and multiple terminal processors may also be added as system needs grow. The total available computing power provided by the present invention can grow by adding multiple applications processors, each of which may run a distributed version of the UNI operating system, developed and licensed by Bell Laboratories of Murray Hill, New Jersey. The present invention can support a mix of dumb terminals, intelligent terminals and work stations, and can thereby allow system users to tailor a system to their needs.

The hardware and software architecture pro- vided by the present invention is modular and includes selected system entry points and multiple upgrade paths as system needs grow. Thus, an eight-user minicomputer configuration may be upgraded to a 12S-user mainframe configuration without software modification. Traditional computer architectures use a single, synchronized operating system to control the overall operation of the computer and to perform such tasks as assigning places in memory to programs and data, processing interrupts, scheduling jobs, and controlling the overall input/output of the computer. The present invention combines a message passing operatmg system with the UNI operating system to maximize reliability and software compatibility. Thus, two or more operating systems can run concurrently in a manner completely transparent to the application or the system user. Accordingly, each processor is supplied with its own independent and unique operating system. The architecture includes virtual memory hardware on each application processor to provide a demand paged virtual memory system. The memory management hardware provides a high speed two-level paging scheme. The present invention provides a departure for operating systems, such as UNIX , which have previously been executed as a monolithic program on a single processor. In the present invention, multiple processors are dedicated for each function. The file processor, a back-end data base processor, runs the file system under the message-based operating system. Up to twelve file processors may be run in parallel in any system. The terminal processor, a front-end processor, runs all communications protocols and terminal handling under the message-based operating system. Up to sixteen terminal processors can be run in parallel on any system. The applications processor is capable of being replicated up to sixteen times in the present invention. Each applications processor runs its own copy of the UNIX ® kernel..

The UNIX operating system is distributed across multiple processors in the following manner:

1) A copy of UNIX concurrently runs on each applications processor in the system; 2) UNIX .terminal handling code runs on one or more separate processors each of which runs a message based operating system;

3) The file system code runs on one or more file processors each of which also runs the message-based operating system; and

4) Each of the processors and their applica¬ tions communicate over the system bus.

Each parallel processor within the present invention communicates via short messages which allow the processors to DMA directly into each others memory. The only connection the UNIX - kernel has with the other operating systems in the system is through the inter-computer communications (ICC) module. The ICC provides request and response blocks to the kernel and all processors. The software running on the applications processor communicates with the file system, paging, and terminals via the ICC module.

The system bus is a high speed asynchronous backplane interconnect. It provides the throughput necessary to insure that all of the processors in a system can communicate and process in parallel. In the exemplary embodiment of the invention, the system bus provides a 32-bit wide data path and has a maximum transfer rate of 11 Mbytes/second. The system bus is a central feature in the present architecture in which hardware and software functionality are bundled in subsystems rather than specific devices. Device dependent information is not transferred across the system bus but logical concepts are. Thus, the system bus allows the processors in the system to communicate without using interrupt structure necessary in conven- tional unit processor systems. The hardware provides:

1) A doorbell interrupt that enables one processor to pass requests to another;

2) Separate hardware registers for local and system bus memory access; and 3) Dual and triple ported shared memory.

Each processor gains the attention of another processor as if it were accessing its own memory rather than interrupting an external processor. The inter-CPU bus traffic consists of the request and response blocks, to and from all processor boards, and DMA transfers to and from the discs.

The present computer architecture invention consists of three main processing elements or computers: the file processor, the applications processor, and cluster processor. Various embodiments of the invention can also include a disc processor and terminal processor. The applications processor contains a 10-MHz Motorola 68010 CPU, memory management hardware to support a two-level paging scheme, and 512 Kbytes - 4 Mbytes of dual-ported error-correcting RAM. The applications processor is dedicated to run both the

UNIX operating system and UNI applications. Multiple applications processors can be added to the system to increase available processing power and improve response time in multiple user environments. The memory management hardware provides a high speed (no wait states), two-level paging scheme with 8 Mbytes of virtual address space per applications processor. Each page is 4 Kbytes and has associated status. The page is either not present, present but not accessed, accessed but not written, or written

(dirty). There may be up to 32 pages per segment. A segment map provides protection for up to 64 system segments. Each segment can be an executed only, read only, writeable, or user segment. The system handles multiple processor address¬ ing via an extended address. When addressing memory "off board", the processors issue a 5-byte address (CPU number, address). The appropriate CPU recognizes its CPU number and uses the address as an input to its map. The file processor contains an 8-MHz INTEL

80186 processor with 256 Kbytes - 760 Kbytes of triple- ported RAM including full error correction and an LSI Winchester controller. The high speed bus and memory allow the system to provide DMA access to and from other processor boards. The file processor -runs the UNIX file system concurrently with application execu¬ tion on the applications processor. Additionally, the file processor runs file-oriented data management tools. The file processor can support up to three 5^ inch 50 Mbyte or more Winchester disc drives and a removable 5 Mbyte cartridge. Four additional drives may be provided by expansion units. As users are added to the system, additional file processors - each with its- own parallel file system - and up to four additional discs, may be added, off-loading disc and file system overhead. Each file processor is responsible only for the discs and files that reside on the particular processor. All file processors are "known" to the first file processor installed within the system and which is designated as the master file processor. The master file processor redirects file requests from one file processor to another.

The cluster processor in the present invention contains an 8-MHz INTEL 8.0186 processor having 256 Kbytes - 768 Kbytes of triple-ported RAM with full error correction. The cluster processor controls two cluster RS-422 ports and can run terminals at speeds from 300 Kbits/second up to 500 Kbits/second; work stations may be run at 1.8 Mbits/second. All RS-422 lines have DMA to provide high throughput.

In the exemplary embodiment of the invention, the cluster processor supports three RS-232 ports; two RS-232 port lines are synchronous or asynchronous, while the third line is a serial printer interface (asynchronous-only in the present embodiment of the invention). The terminal processor contains an 8-MHz

INTEL 80186 processor having up to 768 Kbytes of dual-ported RAM and including full error correction. The terminal processor contains ten RS-232 ports, four of which support synchronous or asynchronous operation, while six ports support asynchronous operation only. Each RS-232 line can operate at up to 19.2 Kbaud.

The terminal processor includes an operating system kernel and provides a virtual terminal interface for dumb terminals-. Additionally, the terminal processor can run communications oriented products, such as modems. The terminal processor has access to a table kept on disc that describes the default characteristic of the devices attached to each port. After the system is reset, the terminal processor monitors each terminal for activity, and if active state data are received, it requests service from one of the applications processors. Upon power-up, a list of file and applications processors are made known to the terminal processor. If the system contains applications processors, the terminal processor dispenses initial requests in a round-robin fashion. When additional terminals log on to a system that already has terminals assigned to all applications processors, the terminal processor assigns the latest requests to the least loaded applications processor. A storage processor is provided which contains an 8-MHz INTEL 80186 processor having up to 768 Kbytes of triple-ported RAM and including full error correction. Additionally, the storage processor provides a tape interface for ι_.-inch tape drive units. The storage processor also provides memory, DMA, and compute power for the disc controller. The disc controller contains a microcontroller circuit and controls up to six 600 Mbyte disc drives. The controller interfaces to the system bus via the storage processor. BRIEF DESCRIPTION OF THE FIGURES

Fig. 1 is a block diagram of the multi-computer computer architecture basic components;

Fig. 2 is a block diagram showing individual computer structure and message passing across the system bus within the multi-computer computer architec- ture;

Fig. 3 is a block diagram of a file processor; Fig. 4 is a block diagram of operating system intercommunications via the inter-CPU communication module;

Fig. 5 is a block diagram of operating system file structure; Fig. 6 is a block diagram of interprocess communication;

Fig. 7 is a block diagram of an inter-CPU request; Fig. 8 is a block diagram of remote DMA initiation by a file processor;

Fig. 9 is a block diagram of file system request routing;

Fig. 10 is a schematic diagram of an exemplary computer microprocessor circuit;

Fig. 11 is a schematic diagram of an exemplary computer memory circuit;

Fig. 12 is a schematic diagram of an exemplary computer system bus interface circuit; and Fig. 13 is a schematic diagram of an exemplary doorbell interrupt PAL circuit.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

A significant feature of the present invention (Fig. 1) is that it dedicates several types of single board computers to specific functions. The computer boards provide high throughput in four functional areas by providing the following specialized computers: I)- Applications processor 11; 2) File processor 12; 3) Cluster processor 13; and

4) Terminal processor 14.

The self-contained computers are interconnected via a high-speed, 11 Mbytes/second, 32 bit asynchronous system bus 10. The bus is used primarily for inter-com- puter communications. Although bus speed and width have been stated for the exemplary embodiment of the invention, other such speeds and system bus widths may be provided in different embodiments of the invention. The present invention may be expanded by the inclusion of an expansion enclosure (Fig. 2) which is added to form a powerful collection of computers linked together on bus 10. The exemplary embodiment of the invention provides a bus that may span up to six system enclosures 15/16. If one of the four above defined functional areas requires extra processing, more processes of that type may be added to an enclosure and/or additional enclosures may be added.

Each applications processor is a 68010 based computer with 512 K - 4 Mbytes of error correcting memory. The 68010 is manufactured by Motorola of Phoenix, Arizona and supports true virtual memory with instruction restart capability, accessing virtual memory through a two-level segment/page map. The first level of the segment/page map consists of up to 16 contexts in use at once, where a context is a region in memory in which a process runs. A context may contain up to 64 segments of 64 Kbytes each, providing up to 4 Mbytes of virtual space to each processor.

In each segment there are 16 pages of 4 Kbytes each. Segments are protected against unauthorized access, and both segments and pages are protected against accessing non-present entries.

At the page level there is a two bit code that indicates the status of that page. The four page states are: 1) not present;

2) present but not accessed;

3) accessed but not modified; and

4) modified.

The file processor and storage processors (Fig. 3) are dedicated to controlling secondary storage devices 21. These devices include high capacity interface Winchester disc drives that can be in the 5 inch format, large capacity SMD interface disc drives, and -inch streaming tape drives. The file processor may be provided in two embodiments - the first embodiment for Winchester type discs and the second embodiment for SMD discs and i_, inch tape units. The first embodiment of the file processor computer uses an 80186 microprocessor, manufactured by Intel of Sunnyvale, California. The first embodiment includes 256 K of random access memory and controls the Winchester 5 discs in memory 17. Disc control is provided by disc controller circuit 19 which performs all formatting, sector ID scans with sector interleading, CRC calculations, data encoding, multiple sector reads and writes, and implied seek operations. The second embodiment of the file processor computer controls SMD discs and streaming tape drives- This embodiment of the file processor includes a disc controller 20, disc drives 21, memory 17, and an Intel 80186 microprocessor 18. The file processor is coupled to the high speed bus via one connection only. A local (private) interconnection is used for command and data transfer. The file processor and associated software provide back-end support for secondary storage I/O. The file processor computer is a critical element in the present multi-computer computer system. Therefore, high bandwidth between the secondary storage devices and applications is provided. To accomplish this, the file processor performs DMA transfers directly between a disc and any remote processor's memory (Fig. 2). The file processor executes a DMA transfer across the high speed bus to or from another processor's memory using only a small buffer in the controlling file processor's main memory.

Each file processor may control up to four disc drives (in the first embodiment of the file processor; or up to six SMD devices and four tape drives in the second embodiment of the file processor). When more devices are needed, more file processors can be added. The addition of autonomous file processors does not affect the software appearance of the secondary storage because software communication modules assure that all file Drocessors act in concert. The hardware is configured to provide further assistance in the interaction of various file processors by designating one file processor the master. Such designation insures a reliable bootstrap procedure, allows a central CPU to coordinate certain types of operations, and permits a unique connection to control the system via a system control panel.

The file processors do not operate in a master-slave relationship. Rather, each can initiate transfers over the bus. The master file processor serves only as a single point of coordination for events which have system-wide implications.

The hardware architecture is shown in Fig. 3 for the file processor and is similar to that for all of the specialized function computers in the present invention. That is, each computer includes a micropro¬ cessor, a memory store, and (with the exception of the applications processor) I/O circuitry.

The cluster processor computer also contains an Intel 80186 with 256 K of random access memory. The major function of the cluster processor is to provide back-end support for work stations via two RS422 multi-drop lines running at 307 Kbits/second to 1.8 Mbits/second in this embodiment of the invention. The cluster processor also includes three RS232 ports, two of which are intelligent HDLC USARTs, and a parallel interface for supporting a line printer.

The terminal processor computer contains an Intel 80186 microprocessor and 256 K of random access memory. To support asynchronous terminals, the terminal processor contains ten RS232 ports, four of which are intelligent HDLC USARTs. The terminal processor also provides a parallel printer interface. The terminal processor serves as a front-end processor dedicated to support up to ten RS232 compatible user terminals.

The present invention can provide multiple copies of the same operating system, running on

^■ different processors, which appear to the user as a single processor system. The operating system known as "CTOS" (on all but the Applications Processor) is a message based operating system with a real time kernel. All system services needed by the operating system

(e.g. terminal handling and file management) are built on top of the kernel. Accordingly, each computer is endowed with only the necessary functions to enable it to perform its unique specialized function. For example, the terminal processor has no file management capability, but has extensive terminal handling capabil¬ ity. The file processor, ^"however, has no terminal handling capability, but does have a sophisticated file management capability. Much of the operating system effort is therefore generalized for all 80186 based computers. Thus, each separate computer within the present invention architecture runs its own specialized operating system. This is in contrast to the traditional approach of using a single, synchronized operating system to control the overall operation of the computer and to perform such tasks as assigning places in memory to programs and data (file processing), processing interrupts, scheduling jobs, and controlling the overall input/output of the computer (terminal process- ing).

To communicate between operating systems, a protocol called "inter-CPU communications" (ICC) is used. The protocol allows the passing of discrete messages across the system bus. Each message is either a request for a task to be performed or a response saying whether or not the task was performed and, if not, why. Every processor can access the memory of any other processor, permitting the message based transfers of information. The present invention includes a message based operating system, CTOS. Because the applications processor incorporates the UNIX ® operating system, developed by Bell Laboratories of Murray Hill, New

Jersey, an additional layer of code was added to the

® UNIX system. This layer, the ICC, allows the operating system to communicate all other processors in a system (Fig. 4). To improve performance and allow sharing of file .resources, the UNIX file system is off-loaded and included as a server process in the present invention's operating system on the file processor computer.

The UNIX system is properly interfaced with the operating system of the present invention by converting the file system into a message based server process (Fig. 4). As a result, the file system is now single-threaded in a multi-processor environment. That is, the file system operating system executes any given task from beginning to end without interruption. A parallel server process which uses the same data structures as the regular file system, but does not modify them, is added. Any operation that modifies the data structure is sent to the single-threaded file system server for completion. This arrangement allows high volume traffic, such as normal reads and writes, to flow quickly around the single-threaded bottleneck. This approach also eliminates concurrency problems that arise in a multiple threaded operation. Thus, each file system in UNIX ® is managed as if it was a continuous piece of raw disc (Fig. 5). In reality, each UNIX ® file system is a separate operating system file in the present invention. The files are stored such that the file processor CPU can read/write as a raw device to understand the control structures within the file. The UNIX ® version incorporated within the present invention allows as many simultaneous reads and writes as desired. In prior art multiple processor architectures, it is often difficult to keep track of the processor with the most recent file update. Generally, there is only one write to a particular file at a given time. In the present invention, even if there are multiple writes on the same processor, the UNIX ® system is buffered properly. A problem could arise when a file is already opened in any mode, and the same file is opened for write by a different processor. This situation is known as a controlled access situation and in the present invention causes the file processor to pick a file master for that file. From that point on, or until the file is closed by the last processor, all operations concerning this file are funnelled through the file master to the file processor. The UNIX ® system is also configured to consider all of its processes to be running on the same processor, which implies that all information concerning the processes is in one central place. In the multi¬ computer computer architecture, the problem of process location is solved by assigning unique process identi- fication numbers and arranging the utilities so that they use the information to look the same as for a standard UNIX system.

Running multiple operating systems in the same computer architecture also requires a method of sharing service resources and data. To let the UNIX ® system share resources with the operating system of the present invention, it is necessary to enhance the message handler running m the UNIX ® kernel. In this way, the message passing operating system present in the present invention is available with full capability, including direct sending of messages to another proces- sor, which messages may be used on the UNIX ® system.

It is now possible with the present invention to route messages to the proper service process, allowing any UNIX ® user to access a standard operating services.

The message structure used by the present invention (and m the adapted version of UNIX ®) is a request/response algorithm. For each request sent to a process, the process must respond in some way (even by dying). To reach a process, a request or a response must pass through an exchange - a place where messages wait to be received and where processes wait for messages to arrive.

The entire function is driven by a request code and associated service exchange (Fig. 6). A request code is a number which specifies the basic format of a request and location of the service exchange. A service exchange is an exchange on which a request is queued for service by some process. In turn, the service process de-queues each message and services its request. On completion, the service process calls a subroutine to respond to the request. Embedded in the request is the exchange to which the response is queued. Because request blocks are self-describing, they can be checked for validity regardless of the actual requests. The request/response function is illustrated in Fig. 6.

Where there is communication from one processor or computer to another - inter-CPU communications (ICC) - the request and response circular queues are accessible by all processors (Fig. 7). ICC is achieved through the setting of a lock flag in the requestee's (server's) memory. Because this operation must be uninterruptable, a lock test and set instruction is used at all proces¬ sors. Once the flag is set, the requester (client) places a processor identification code and the address of the request/response in its memory and in the circular queue on the server processor, ^'and updates its queue pointers. The lock is then removed and the processor is interrupted with the doorbell interrupt. A doorbell interrupt is used to inform a processor that an addition has been made to the ICC request or response queue. Once the processor receives the interrupt, it determines whether the message is a request or a response. The message is then copied from the client processor to an area in the server processor. If the message is a response, the processor finds out which exchange the response is to be queued on. If there is a processor waiting for something to be queued on that exchange, the message is broken up to process the response. If the message is a request, the processor finds out if there is a functional unit that services the requests on this particular processor. If not, an error code is set within the request and a response is initiated. If there is a server for the request, the message is queued to the proper service exchange; if the server is waiting for a request, it is woken up. Once the server finishes with the request, and is ready is send a response, the processor repeats the same sequence as above with a response. The particular design of the computer archi¬ tecture allows the software to be functionally parti¬ tioned. Most significant in the software structure is the ICC module (included as a Microfiche Appendix with this document) . The ICC module is discussed in more detail below.

In a system configured for an intelligent work station, a cluster processor needs only a file processor to complete the architecture. The cluster processor controls communication on the high speed cluster lines, on which both intelligent work stations and terminals can be placed. The cluster processor polls each work station and terminal connected to the cluster line; it also allows multiple printers to be operated at the same time. The terminal processor is a sophisticated

RS232 communication module designed to handle up to ten RS232 lines at speeds of up to 19,200 baud. To run the lines at this speed, it is necessary to poll for input characters every 500 M/seconds. A polling loop is included which consumes 18% of the total processor wide bandwidth in this embodiment of the invention. The rest of the processor's bandwidth is used to run communication utilities. The application processor runs a UNIX ® kernel operating system. In the present embodiment of the invention, the UNIX ® kernel has been converted from a swapping system to a virtual memory system. In the present architecture there can be many processors running the UNIX ® concurrently. Lacking the described adaptations, the location of a single processor is difficult. Assigning a unique process identification makes a processor data command look as it would on any standard UNIX ® system. The command is modified to clearly process information for processors across the bus. This information is then processed to be displayed in the same manner as m a standard UNIX ® process command.

Each processing module in the present computer architecture invention executes only from its own local memory, but has the ability to read and write the local memory of all other processors. Additionally, each computer module has the ability to interrupt other processors to perform indivisible transfer to read- modify-write in access to remote memory. Although all of the memory is sharable, only the ICC module makes remote references. In this way, bus bandwidth is preserved for operations that cannot tolerate latency, such as disc DMA.

The ICC module is message based and performs all of the transport, routing, and presentation functions transparently to the user. Messages in this system are entirely self-describing. It is not necessary for the ICC module to understand anything of the content of the message to be able to write, transport, or present it. At the highest level, a client process makes a request of a service process and receives a response. This high level view is exactly what occurs if the client and the server are actually located on the same processor. This is the user's model of the transaction and in fact, the user is not generally aware of the request is actually being serviced at a remote location.

At a lower level, location transparency is accomplished by the client agent on the requester's processor and the service agent on the remote processor. The ter -"agent" is used in a descriptive sense. In order to provide efficient message transport, some of the agent functions are implemented at different levels in the kernel. One function of the ICC module is to avoid burdening service processes with the details of the addressing structure of the client processor, as in dissimilar operating systems. A presentation function is provided to reconcile the internal architectures of the 68010 and 80186 processors resident in ;the various computers. The 80186 stores words in low byte/high byte order, while the 68010 stores words in high byte/low byte order.

Addresses on the 80186 consist of a 16 bit segment addressed together with a 16 bit offset; effective address calculation consists of shifting the segment address up 4 bits and adding the offset to achieve a 20 bit memory address. Addresses on the 68010 are 32 bit quantities wherein a 24 bit address is presented to the memory bus. The presentation function maps the remote data into local format, completely isolating the service processes from these variations between the microprocessor circuits. Thus, various microprocessor-based operating systems transparently intercommunicate one with the other. This enables each computer in the architecture to have a diverse operating system adapted to the computer's specialized function. Each processor (computer) in the present computer architecture is identified by an 8 bit hardware assigned "slot number". The pair (slot number, local memory address) can address any byte on any computer in the computer architecture. The processors address remote memory using special hardware registers to establish a mapping of a portion of remote address space into local address space. Normal memory reference instructions are used to manipulate remote data. Typically, an address in the present architec¬ ture is a 40 bit quantity - 8 bits of slot number and 32 bits of address. Such addresses are called full bus addresses (FBAs). The address portion of an FBA is stored in 68010 format (high order byte at lowest memory address). The present format accommodates the fact that the 80186 format cannot address more than 1 Mbyte, while an application processor can have consid¬ erably more than 1 Mbyte of memory. This innovation avoids the necessity of creating a new addressing format which neither of the processors used in the architecture could support directly. The FBAs allow the lowest layers of the ICC module to be ignorant of the remote processor type. .The FBAs also simplify the initialization and configuration of the system. When the present invention architecture is first powered on, each processor executes the local read only memory (ROM) program. The ROM program performs a self-test diagnostic, initializes its CPU description table (CDT), arms the interrupt system, and waits to be awakened. All of the processors in the system perform this sequence with the exception of the master file processor.

The master file processor is hardware desig¬ nated by slot location at a control panel portion (not shown) of the system. The master file processor boot¬ straps a system image from disc, rather than waiting to be awakened. It then probes the CDT of the potential processors in the process. There is a point at which the CDT is at the same location of every processor within the architecture, although the CDT itself may be located anywhere in memory. The CDT contains a three byte signature field which is used to distinguish between a valid CDT and random memory contents. One of the fields which is initialized by each processor is the CPU type field. CPU type information allows the master file processor to build a map of the slot numbers-to-processor types. Initialization failures entered during these self tests- result in an entry in the CDT.

If the processor fails in the initialization, then this information is logged. The master file processor otherwise reads a configuration file, downloads the appropriate operating system image into the remote processors, initializes certain information in the remote CDTs, and awakens remote processors with a door¬ bell (ICC) interrupt. The remote processor then performs its initialization and marks the CDT as ready for operation.

The master file processor is fault tolerant. That is, it brings up the system even if one or more of the remote processors fail during system initialization. The master file processor also "watchdogs" each of the remote processors once per second by setting a flag word in the remote CDT. If the flag is not reset by the time the master file processor checks, the remote processor is assumed to have died and the master file processor begins logging a shutdown sequence.

During the bootstrap sequence, the master file processor initializes certain fields in the remote CDTs. These fields tell the remote processor what the system configuration is and which processor is the master file processor.

In addition to the fields already monitored, each CDT contains a lock byte and the request and response circular buffer pointers. The request and response circular buffers are used during message transport. Each circular buffer is described by four pointers (START, AND, GET, and PUT). Each of these pointers is a single 16 bit word. The words are taken to be offsets relative to the start of the CDT. Double words (a full 32 bit address) are not storable in an 80186 microprocessor in a single individual operation and are therefore not used. Because multiple remote processors may attempt to update the PUT pointers simultaneously, they must lock the CDT. Only the local processor has information from the buffers; this operation does not require a lock if the operation of updating the pointers is an indivi- sible one. The master file processor CDT also contains additional information which is used for routing requests.

Messages are routed by a request code which is one of the fields in the fixed length portion of the message header. The first step in routing a request is to look the code up in a local table to determine the routing class. Possible values for the routing class include "local", "possibly remote", "route to master file processor", and several other types of remote (that is, see request block for exact destination). In the latter two types of requests, the target is known and the messages transported immediately. If the routing code has the value "local", the request is routed locally by consulting a table that maps request codes to exchanges. If the routing class is "possibly remote", a special table in the master file processor CDT is consulted to determine if there is a server for this request and if so, where it is to be found (slot number) . If the result of any of the routing lookups is the client processor's slot number, then the request is treated exactly as a local request. Keeping the

-g fRJΞAU master routing table in the master file processor allows the dynamic installation of various operating services, the UNIX ® filing system, etc. without the necessity of either distributing the information (which creates inconsistencies during distributed updates) or hardwiring the locations of the service processors.

Additional user defined requests may be added to the system and routed using the same mechanisms.

Once the routing information is determined, sending a message consists of the following steps:

1) lock the target CDT,

2) insert the FBA of the request into the request circular buffer,

3) unlock the CDT, and 4) ring the doorbell.

When the remote processor receives the doorbell interrupt, it awakens the ICC module in the server agent. The ICC module server agent removes the FBA from the request circular buffer, allocates space to hold the request, and copies the request in from the client's address space. The request is converted to local format (the previously mentioned presentation service) and the request-is sent to the local exchange, which serves the request. The request is then sent directly to the proper exchange to avoid potential loops in the routing function. For example, global printer name resolution is performed by a routing process on the master file processor. A cluster processor routes requests which require printer name resolution to the master file processor. The result of the master file processor's name resolution process may in fact be the cluster processor that originated the request.

The cluster processor server agent must route the requests locally to avoid a loop. In the process of making a local copy of the request, it is modified so that when the user responds, the response goes to the ICC module response process. The server processes the request just as it would process a local request. In fact, it cannot tell the difference. When finished, the server processor puts output data into the request block and responds, which awakens the ICC module response process.

The ICC response process copies the output data back to the client. The client CDT is then locked, the FBA of the original request is put into the client's response circular buffer, the CDT is unlocked, and the client's doorbell is rung. The client ICC module server is then awakened by the doorbell interrupt handler. In response thereto the response is removed from the circular buffer and a response is made to the process that initiated the request. The present embodiment of the invention is able to handle several hundred such messages per second.

Several efficiency optimizations have been made to avoid unnecessary copying whenever possible. Because the message format clearly distinguishes between input and output data, the server agent avoids copying in a request's output data areas and the ICC module response agent only copies back these output data areas. Remote read and remote write requests are treated specially, an exception to the general rule that the ICC module knows nothing of the type of the request. Remote read and remote write requests are converted to a special, remote read or remote write request format, which includes the FBA of the client data area, so that the I/O routine can do direct DMA.

Character output in the present architecture is as follows: Characters to be output are stored directly into a circular buffer maintained on the terminal processor or requester processor that is handling the terminal. The processors periodically poll their output circular buffers and emit any characters that are found. The terminal processor can support I/O to all ten of its RS232 ports at the full 19,200 baud rate. High speed input is also supported with a request to "read up to X characters in Y milli- seconds". If "Y" milliseconds elapses before "X" characters are received, the number of characters received up to that time are returned. The "X" and "Y" parameters are adjustable, but the defaults which are based on the terminal baud rate are acceptable for interactive use.

Several benefits are derived from the combina¬ tion of a procedure oriented operating system with a message based operating system. Exporting the UNIX ® file system to the file processors balances the system processing load and increases throughput. Off-loading the communications applications also favors the interac¬ tive user. The terminal input/output function supports an extremely high bandwidth with no interrupt overhead on the applications processor. The file processor is of major importance in the present architecture, not only as a device control¬ ler, but as a complete computer that runs the operating system configured with the file system and that runs the UNIX ® file system interface subsystem. Additionally, the file processor may run other service processes.

The primary role of the file processor is to provide direct service between the other processors and the secondary storage devices. The master file processor serves as a coordination point for many of the activities outlined above. One activity controlled by the master file processor provides an essential name service that is used by any processor which is used to access a system resource that is known only by its name. The essential name service determines where the resource is located.

As the system file server, the file processor provides all the services that implement a base file system. The base file system supports the operating system files directly, the UNIX ® system being built on top of it. The base file system provides both I/O on a disc sector level and provides directory services, such as creation and deletion of files and directories.

The base file system has a base, simple directory hierarchy - the top level addresses the physical disc device, the middle level a particular directory within the device, and the bottom level a particular file within that directory. The interface subsystems allow a.specific file and directory access method to be based on these capabilities, such as the UNIX ® access methods, and provide the code that maps the particular structures to the base file system. The base system remains constant while retaining flexibility in types of application file access methods that can be built on top of it. Such structure affords several benefits. First, the struc¬ ture provides flexibility in supporting diverse operat- ing systems with their special need for file services. Second, the base system allows the storage devices to have one unique formula, such that backup and restoration operations always function regardless of the type of application file system supported in present embodiments or future embodiments of the invention. Third, the base system allows the basic operating system file system software running in the file processor to remain unchanged when a new file system or file access method is added in other embodiments of the invention. In providing directory services, each file processor controls all accesses to the files physically resident on the devices actually connected to that file processor; and only in those files and on those devices. That is, each file processor has complete control over file storage on the devices connected to it, but it cannot operate on files residing on devices connected to any other file processor.

- - Additionally, a file processor (other than the master) has no information about the devices and files on the other file processors. Each file processor is only aware that other file processors exist. The particular parameters describing the directories, files, or open files reside exclusively in the control¬ ling file processor. The separateness of this approach is further imposed on file handles which are created for open files, and on the device names themselves so that the device is uniquely addressed from any computer in the system. As a result of the functional partition¬ ing of the device control, requests which originated on other processors must be routed to a particular file processor by the ICC. Thus, the ICC plays an important role in the interaction of multiple file processors.

In its support of a base file system, the file processor performs all the functions required of an efficient secondary storage driver. For disc devices, the file processor optimizes the execution of multiple sector transfers by transferring as much data as is physically possible for each single I/O operation, depending upon the characteristics of the disc controller and the drive itself. The disc driver schedules all pending I/O operations using an elevator structure. The driver code performs overlapping seeks by issuing buffered seeks to each drive at the highest possible rate.

The file processors function as the main data servers for the distributed processing system. In fact, they are the only type of processor that controls storage devices. This status puts a high demand on the file processors for producing and consuming secondary storage data. The demand comes from all the other file processors, but particularly from the applications processor. Generally, the file processor has two responsibilities: 1) Providing secondary storage service for the system when on-line; and

2) Bootstrapping and downloading code to all other processors (as described above for the master file processor) .

Because file processors have the ability to establish a direct DMA channel between the disc device and the memory of remote processors, the operating system uses this capability to achieve a high disc bandwidth. The most common service a file processor provides is reading and writing the discs.

When a file processor receives a request to read some number of sectors (Fig. 8), the destination of the disc data may be a remote processor. That is, the requesting process could be running in a processor other than the one receiving the request. As part of the request block information, an address of the buffer where the disc data is to be delivered is given. As described above, the address has two basic components: 1) A single byte hardware encoded bus address of the processor; and

2) The linear address relative to the beginning of the destination processor's memory.

When the file processor determines that the disc is ready to start the data transfer, it issues a read operation to the disc controller along with a start remote DMA operation to the DMA logic. This causes the entire disc transfer to run to completion, although the hardware is performing several discrete steps, as follows: As the disc controller starts to transfer data, the file processor hardware captures the byte stream as it is sent by the controller and assembles it into four byte word aligned packets called quads, placing them into the small circular buffer in the file processor main memory. The quads are then transferred with a hardware DMA operation over the main bus to the correct location within the destination processor's memory. Each discrete DMA transfer length (or "burst") is 8 quads. After the transfer, the processor releases the buffer used by other processors.

When all the bytes from the disc have been transferred, the operation ends. The disc transfers data at 5 Mbits/second. Because file processor buffering is minimal, it must transfer disc data across the system bus at a high speed. The very high speed at which the bus runs in burst mode - 11 Mbytes/second - makes the high transfer rates possible.

When the operation is determined to be com¬ pleted, a signal is issued to the requester process, posting the status of the request in the requester's address space. If the destination buffer is within the servicing file processor, the operation is essentially the same as described above, except that the inter-com¬ puter bus is not used and there is no intermediate buf¬ fering of the data. Rather, the data is transferred directly to the destination memory address. The downloading function is a special respon¬ sibility of the master file processor 12a (Fig. 9). During the downloading process, the master file proces¬ sor is in control and all other processors in the system act as slaves. During system boot time, the following occurs in all processors except the master file processor: Each processor enters its ROM code, which inserts its processor type code in a special table in RAM, along with a signature bit pattern. ' In this way, the CDT is assembled. The processor then runs ROM-based diagnos¬ tics. If the diagnostics succeed, the processor sets a flag indicating it is okay, sets another flag represent¬ ing a request to be bootstrapped and downloaded, enables interrupts _^so that the master file processor can communicate with it, and enters an idle loop waiting for service. The master file processor executes the boot strap ROM in the same way as the other processors. If the master file processor finds that all is well after polling the various system processors, it reads each processor's request for service, downloads the appro¬ priate system image into each processor, and issues an interrupt to that processor, waking it up and causing it to execute the code that was downloaded to it. The master file processor is thus the critical element in bringing the system to life.

The power and flexibility of the present architecture invention comes from its ability to permit several autonomous processors of the same or different types to function by interacting with each other and by delivering higher throughput. Multiple file processors in the same system act in concert to support a very large unified data area capability.

The ICC module is the functional part of the invention that ties all the multiple file processors together, allowing them to function as a unit. Multiple file processors use the ICC in three different ways to achieve unified file system service. First, the master file processor "broadcasts" requests to insure that all file processors are synchronized. Second, the ICC routes all file system requests which involve a path name (that is, the device/directory/file specification) to the master, which then determines which file proces¬ sor services the device name specified and which performs another ICC route to that file processor. Third, the file processor uses the ICC's ability to route file system requests containing a file handle for an open file directly to the file processor serving that file.

Although each file processor manages the devices and the data on them almost completely indepen¬ dently, there is a small class of information and activity that is "synchronized" among all the file processors. When this synchronization is required, the relevant original request is routed to the master file processor, which implements the synchronization. The shared information is the operating system user profile information. Since a user or some application process may potentially access data on any file processor, each file processor must have the user profile information available for every other user.

The activities that must be synchronized are those which have implications global to all file system storage. There are two such cases: First, the file system supports requests which permit a user to close all files that the user currently has open. Because the user may have files open on any device, each processor must receive the requests so that any users open files controlled by that processor can be closed. Second, the file system requires the ability to quiesce all disc activity. This request must be broadcast to all file processors so that all activity can be quieted. The enhanced multiple file processor code running in the master file processor executes and duplicates the request, and sends it to each of the other file proces¬ sors for execution. When all of the other file proces¬ sors have replied to the master that they have executed the request, the master file processor posts the completion to the original requesting user, indicating the activity has been globally executed.

Two other capabilities are used in handling a normal series of requests for file service. A typical series of file requests may be for simple open-read/ write-close files. When a user application issues an open request that species the name of a file, the ICC module server local to the processor in which the request is made routes the request to the master file processor. When the master receives the request and determines, via a table maintained therein, which file processor serves the advice being addressed, it routes the request via the ICC module to the processor. The master is not initiating a new request, but merely passing on the original request to the destination file processor. The result is that when the request is completed, the completion status and response are sent to the original requester. That is, the master file processor functions as a filter process.

One exception to this procedure is when the master acts as a processor which services the request. In such instance, the master does not act as a filter but, instead, actually services the request and posts completion to original requester using the ICC module.

One of the primary functions of the open request in terms of request routing to cause the open to establish a logical connection between the requesting user application and the file processor serving the file being opened (Fig. 9). Each file processor controls the volumes with names as labelled. For example, an applications processor can request OPEN FILE [c] <JOB> NAME, where file "NAME" is in directory "JOB" on volume "C". When a file is opened, the file system returns the file handle to the user which uniquely identifies the file. The file handle is then used when subsequent requests for service on that file are issued.

To enable the ICC module to route the subse¬ quent requests for that open file which pass file handles, the file processor servicing the open request places an encoding of its processor bus address into the file handle. The ICC module uses this encoding to route requests, such as read sector or write sector, from the user application directly to the servicing file processor. This procedure produces the logical connection that allows the efficient, direct routing to take place. When the user application is finished with the file, it issues a close file request using the file handle. The servicing file processor closes the file and considers that file handle invalid. The logical connection is now severed.

In the event of a request from another processor, such as an applications processor, the request is first routed to the master file processor. The file processor volume location is determined by the master file processor which redirects the request accordingly. The appropriate file processor completes the request and responds directly to the requesting applications processor.

The ICC module itself knows nothing of the file system activities. If the user application erroneously issues another file request using that file handle, the ICC module routes it to the correct file processor. Upon arrival at the file processor, it is determined that the file handle is not currently valid and a completion status is posted with an error indicat¬ ing an invalid file handle. The ICC module and various functions in the file processor operate in concert to form a unified storage system distributed among the devices controlled by several processors.

Schematic diagrams of a processor's functional components are provided'in Fig. 10-13. The schematics are provided for an exemplary embodiment of the file processor computer. It should be appreciated that similar computers may readily be constructed for the applications processor, terminal processor, and cluster processor in view of the teachings of the present patent application. Fig. 10 is a schematic diagram of the exemplary computer microprocessor circuit. An Intel 80186 microprocessor integrated circuit 16E is shown coupled via a microprocessor bus to a plurality of address latches 13F-16F and 22E. Address latches 15F, 16F, and 22E produce an internal memory address; address latches 13F and 14F produce an internal I/O address. CPU data is transported to and from microprocessor 16E by data latches 13G and 14G.

Microprocessor bus 30 couples microprocessor 16E to local ROM 13E/14E, which is the boot strap ROM mentioned above. The microprocessor bus is coupled to a file processor internal bus via microprocessor bus transceiver latches 9G-12G.

Fig. 11 is a schematic diagram of an exemplary computer memory circuit as is present in a file proces- sor. The memory circuit provides a three bus structure including a memory address bus 31, a data bus 32, and a control bus 33. The circuit shown in Fig. 11 is configured for 256 K of local random access memory. Latches U8, U19, U30, and U41 provide memory address decoding of the memory address supplied to the RAM memory chips shown in the figure.

Fig. 12 is a schematic diagram of an exemplary computer system bus interface circuit by which the system bus 34 is coupled to .the local memory data bus 35 and to CPU data bus 36. Latches 8H-11H decode the system bus to produce the memory data bus.

The processor address in terms of slot number is decoded from the CPU data bus by a slot number decoding circuit 20H and coupled to a slot number converter 22H. The local slot number - my slot - is also coupled to converter 22Ξ. When a processor receives a message, a slot compare is performed. If the message is intended for the processor, a slot match signal is produced by converter 22H which is also coupled to program array logic (PAL 20G). Depending on input terminal states, PAL 20G determines if the slot match refers to a memory access from a remote processor or if slot match refers to a doorbell interrupt.

Fig. 13 is a schematic diagram of an exemplary doorbell interrupt PAL circuit. Referring to the inputs (marked I) it can be seen that various states produce various outputs (marked 0). A logic flow listing for the PAL function which controls the bus special function reset, interrupt, and valid address check, follows:

SLOTEQ A8 A9 A10 All A12 A13 A14 A15 GND

A7 /BUSSF /LATCH /VADR /DBI /INVADDR /SFRES /INVl /SFACK VCC

IF(VCC) SFACK = SLOTEQ * SFRES + SLOTΞO * DBI + SLOTEQ * INVADDR

IF(VCC) VADR SLOTEQ * /A15 * /A14 * /A13 * /A12 * /All * /A10 * /A9 * /A8 * /A7 * /LATCH + SLOTEQ * VADR

IF(VCC) SFRES = SLOTEQ * A15 * A14 * A13 * A12 * All^" * /LATCH + SLOTEQ * SFRES IF(VCC) DBI SLOTEQ * A15 * A14 * A13 * A12 * /All * /LATCH + SLOTEO * DBI

IF(VCC) INVl = /A14 * A13 + A14 * /A13 + /A15 * A14 + A15 * /A14 +

/A15 * /A14 * /A13 * A12 A15 * A14 * A13 * /A12

IF(VCC) INVADDR = SLOTEQ * INVl * /LATCH +

SLOTEQ * /A15 * /A14 * /A13 * /A12 * All * /LATCH + SLOTEQ /A15 * /A14 * /A13 * /A12 * A10 * /LATCH +

SLOTEQ * /A15 * /A14 * /A13 * /A12 * A9 * /LATCH +

SLOTEQ * /A15 * /A14 * /A13 * /A12 * A8 * /LATCH +

SLOTEQ * /A15 * /A14 * /A13 * /A12 * A7 * /LATCH +

SLOTEQ * INVADDR

IF(VCC) LATCH = SLOTEQ * SFRES +

SLOTEQ * DBI +

SLOTEQ * INVADDR +

SLOTEQ * VADR IF(VCC) BUSSF SLOTEQ * SFRES SLOTEQ * DBI

The present invention provides a significant step in full realization of microprocessor based systems. The use of true distributed processing within a local environment produces a powerful high bandwidth system. By supporting multiple operating systems, the present invention provides a powerful base for a diverse set of applications encompassing all data processing environments. The use of multiple back-end file processors for modular additions of disc storage enhances system throughput by off-loading the major portions of I/O activity from the other processors and is a critical feature of the present invention. Because the file processors are true computer systems themselves, they help support the sophisticated applica¬ tions that may be required in. such a computer system, such as data base management, which applications now have the advantage of system-wide availability.

The foregoing was given for purposes of illustration and example and the embodiments recited herein are not considered exhaustive of the invention. For example, the applications to which the present invention are put dictate the number of specialized functional computers within the architecture. Adding more terminals requires additional terminal or cluster processors, adding more mass storage requires the addition of more file processors, etc. Additionally, the present invention is considered to teach the coordination of foreign and dissimilar operating systems. To that end, the operating systems recited herein are provided for exemplary purposes. The present invention may as well be practiced with any computer operating systems, the crucial link between operating systems being the ICC module as described herein. Therefore, the scope of the invention should be limited only by the claims.

Claims

1. A multi-computer computer architecture, comprising: a plurality of specialized function, indepen- dent asynchronous parallel computers, each computer having a discrete and independent operating system, said computers being coupled for transparent inter-com¬ puter communication at an operating system level across an asynchronous parallel bus.

2. The computer architecture of claim 1, each of said computers further comprising a central processing unit.

3. The computer architecture of claim 2, each of said computers further comprising a memory store.

4. The computer architecture of claim 2, wherein said computers pass messages to each other by placing a message from a message sending computer in the memory store of a message receiving computer.

5. The computer architecture of claim 4, further comprising a doorbell interrupt structure whereby a message sending computer notifies a message receiving computer of the presence of a message in the receiving computer's memory store.

6. The computer architecture of claim 1, further comprising: a plurality of interconnecting slots, each slot configured to couple one of said computers to each of the other computers; and an enclosure defining a plurality of slots and forming an independent, functional ulti-computer unit.

7. The computer architecture of claim 6, further comprising a plurality of enclosures, transpar¬ ently interconnected to define a multi-computer system.

8. A multi-computer computer architecture comprising: a plurality of specialized function, indepen- dent asynchronous parallel computers, each computer having a discrete independent operating system, each of said computers including: a central processing unit; and a memory store; and a high-speed asynchronous system bus intercon¬ necting said computers for inter-computer communications, wherein said computers pass messages to each other over said bus by placing a message from a message sending computer in the memory store of a message receiving computer.

9. The computer architecture of claim 8, further comprising: a plurality of interconnecting slots, each slot configured to couple one of said computers to each of the other computers; and an enclosure defining a plurality of slots and forming an independent, functional multi-computer unit.

10. The computer architecture of claim 9, further comprising a plurality of enclosures, transpa¬ rently interconnected to define a multi-computer system.

TFRE

11. The computer architecture of claim 8, further comprising a doorbell interrupt structure whereby a message sending computer notifies a message receiving computer of the presence of a message in the receiving computer's memory store.

12. The computer architecture of claim 11, further comprising at least one file processor for controlling secondary storage devices.

13. The computer architecture of claim 12, further comprising a master file processor for initial¬ izing and coordinating said computer architecture operation.

14. The computer architecture of claim 11, further comprising an applications processor for processing computer architecture applications.

15. The computer architecture of claim 11, further comprising at least one cluster processor for interconnecting peripheral devices to said computer architecture.

16. The computer architecture of claim 11, further comprising at least one terminal processor for interfacing peripheral devices to said computer archi¬ tecture.

17. The computer architecture of claim 11, further comprising at least one storage processor for controlling disc and tape storage devices.

18. The_^ computer architecture of claim 11, further comprising: an inter-computer communications module for interfacing said discrete and independent computer operating systems to each other to provide transparent ^' inter-communications therebetween.

19. A multi-computer computer architecture comprising: a plurality of specialized function, indepen¬ dent asynchronous parallel computers, each computer having a discrete independent operating system, each of said computers including: a central processing unit; and a memory store; a high-speed asynchronous system bus intercon¬ necting said computers for inter-computer communications, wherein said computers pass messages to each other over said bus by placing a message from a message sending computer in the memory store of a message receiving computer; a doorbell interrupt structure whereby a message sending computer notifies the message receiving computer of the presence of a message in the receiving computer's memory store; and an inter-computer communications module for interfacing said discrete and independent computer operating systems to each other to provide transparent inter-communications therebetween.

20. The computer architecture of claim 19, further comprising at least one file processor for controlling secondary storage devices.

21. The computer architecture of claim 19, further comprising a master file processor for initial- izing and coordinating said computer architecture operation.

22. The computer architecture of claim 19, further comprising an applications processor for processing computer architecture applications.

23. The computer architecture of claim 19, further comprising at least one cluster processor for interconnecting peripheral devices to said computer architecture.

24. The computer architecture of claim 19, further comprising at least one terminal processor for interfacing peripheral devices to said computer archi¬ tecture.

25. The computer architecture of claim 19, further comprising at least one storage processor for controlling disc and tape storage devices.

26. In a multi-computer computer architecture including a plurality of specialized function, asynchro¬ nous parallel computers, each computer having a discrete independent operating system, a method for interfacing said discrete independent computer operating systems to each other to provide transparent inter-communications therebetween, comprising: queueing a request code on a service exchange which request code specifies a format of the request and the location of the service exchange for service by a computer architecture process; de-queueing each message by said service exchange; servicing said request; and responding to said request.

27. The method of claim 26, further compris¬ ing: setting a lock flag in a communication requestee processor's memory; placing a processor identification code in a communication requestor processor's memory; placing the requestee processor's address in the requestor processor's memory; placing the processor identification code and requestee's address in a circular queue at a requestee's processor; updating all queue pointers; removing said lock flag; interrupting said requestee processor with a doorbell interrupt; determining at the requestee processor whether the message is a request or a response; copying the message from the requestor processor to an area in the requestee processor's memory; determining which exchange the response is to be queued on if the message from the requestor processor is a response; determining if there is a functional unit which services the request of the requestee processor if a message is a request; queueing the message to an addressed service exchange if the message is a request; and sending a response to the requestor processor.