GB2206714A

GB2206714A - Multiprocessing architecture

Info

Publication number: GB2206714A
Application number: GB08811530A
Authority: GB
Inventors: Michael Harry Field
Original assignee: Link Miles Ltd
Current assignee: Link Miles Ltd
Priority date: 1987-05-18
Filing date: 1988-05-16
Publication date: 1989-01-11
Anticipated expiration: 2008-05-16
Also published as: GB8811530D0; GB2206714B; GB8711663D0

Abstract

A multiprocessing architecture for use in a frame-based software system has a system bus connecting a plurality (N) of processor circuit cards, one (GLOBAL PROC) of which controls the system operation while the others execute application code. Global data are broadcast between identical areas of memory on each circuit card via the system bus, blocks of data being transferred from one card to all the others simultaneously at a single short interval in a frame, and task control instructions are transmitted in each frame by the global processor (GLOBAL PROC) to the other processor circuit cards. The architecture described ameliorates reduction of performance associated with increasing the number of processors sharing a common bus. <IMAGE>

Description

MULTIPROCESSING ARCHITECTURE The present invention relates to a multiprocessing architecture for use with frame-based software systems.

The relatively low cost of a microprocessor compared with the cost of a minicomputer has made microprocessors attractive for many computationally modest applications, especially as the current 32bit microprocessor performance (in terms of instructions per second) has reached that previously expected of minicomputers. To increase further the computational performance of a system while retaining the flexibility and cost advantages of microprocessors, techniques have been developed to allow a number of microprocessors to be interconnected and share the processing tasks between them. An arrangement of a plurality of microprocessors or similar processing agents by such techniques is referred to as a multiprocessing architecture.Typically a multiprocessing architecture has been achieved by employing a multi-master system bus with global memory (e.g., MULTIBUS I, VME) wherein data which is required by more than one agent (i.e. processing circuit) is stored in global memory on the system bus which may be accessed by any agent. The process of accessing the data involves agents arbitrating for use of the system bus since only one agent may "own", i.e. transfer data to or from, the bus at any one time. Whenever an agent attempts to gain access to the global memory while the bus is in use, that agent must wait until the agent currently using the bus has completed its operating cycle(s). If many agents are requesting access simultaneously the delay becomes significant.Consequently as further processing power is added to the bus and therefore more system bus accesses are required, all processors spend more time waiting. Typically a limit is reached with seven processors, above which the increase in processing power is balanced by the performance degradation due to bus loading.

Two recent multiprocessing architectures developed to minimise the performance impact of inter-processor communication are the INMOS (Registered Trade Mark) Transputer Link and the Bus Interface Chip developed by INTEL (Registered Trade Mark) for MULTIBUS II. In the former architecture inter-processor communication is provided by point-to-point communication links.

Since no arbitration is required for communication link access, communication bandwidth is independent of the number of processors; however, global data must propagate along many links in order to reach all processors. In the latter architecture, the process of accessing a system bus is removed from the CPUs of the individual microprocessors and given to respective interface processors.

The CPU is thus not slowed down by waiting for access to bus, but there is a latency associated with the passing of messages (including system interrupts) that degrades performance particularly where a computation is unable to proceed until certain system data is acquired.

It is an object of the present invention to provide an improved architecture for interconnecting microprocessors or other processing agents which amelioriates the performance reduction associated with increasing the number of microprocessors or other processing agents sharing a common system bus.

It is also an object of the present invention to provide a multiprocessing architecture adapted for a frame-based system which, by taking advantage of the predictability of data transfers in such a system, allows a larger number of processing agents sharing a common system bus to operate at a speed not degraded by the bus or by the number of agents on the bus.

A frame-based software system is a processing system in which processing is carried out on a periodic basis, each period having the same duration and having a predetermined internal sequence of phases.

According to the present invention there is provided a multiprocessing architecture for use with frame-based software systems comprising a system bus connecting a number of processor circuits, wherein the operation of the processor circuits is determined by the broadcasting of global data between the processor circuits and the transmission of task control instructions from one of the said processor circuits to the other processor circuits in each frame.

In a multiprocessing architecture as defined in the preceding paragraph system variables can be held in memory in each processor circuit, and a CPU of the respective circuit can access the said memory as local memory at full speed and the performance of each individual processor circuit not be influenced by the number of processor circuits connected to the system bus. Preferably the system memory transfers are performed at a single short moment in the frame, whereby there is no uncertainty over the transfer latency of data issued by one processing circuit for use by another, and all the data is transferred in one block whereby the risk of a CPU reading a partially updated block of data is removed thus obviating the need for software precautions and/or bus locking.When each broadcast is performed during a small part of the respective frame, the system bus itself is free for other use (e.g. I/O communication) during the rest of the frame.

The architecture may form part of a larger system comprising additional I/O buses. For example, the invention may be embodied in a multiprocessing architecture comprising eighteen processor circuits in a single cardcage and achieving a system performance of roughly 5OMITS, the cardcage being a unit of a real time dynamic image processing system such as the image processing system of a flight simulator.

The present invention is based on the realisation that a multiprocessing architecture with a frame-based software system in which each of a plurality of processors has the same library of application routines can be implemented in which it is possible to stipulate which agent is to perform a particular task, and hence compute a particular packet of system data, for each frame, and that the frame may comprise two distinct types of phase: broadcast phases and process phases. During a broadcast phase all the data generated in the previous frame can be broadcast between agents and/or commands can be issued to tell the agents which task each is to perform in the current frame, and in one or more process phases the processing is executed. The data broadcasting technique transfers a packet of data from a source agent to all other agents simultaneously.Each agents in turn can be the source agent so that an identical copy of system data is present in each agent's system memory at the end of the broadcast phase. The supervisor(a global processor for example) is an intelligent agent responsible for synchronising the processors within the frame-based software system and for the allocation of tasks and broadcast structure.

During process phases, each processing agent has unrestricted access to the system data, which is stored in its own system memory, and the total number of processing agents using the system data has no effect on the time to acquire access to system data. Furthermore, the time taken by a broadcast phase in which system data is written into the system memories of the processing agents is small compared with the accumulated delays which would occur if all the processing agents were attempting to access a single global memory.

The broadcast phase or phases (e.g., 10% of one frame) do not constitute idle time for the processing agents since they may continue processing restricted to respective local memories during this time.

Since all system data is made available to all agents and processing agents have the same library of applications routines, tasks can be dynamically reassigned by the supervisor (global processor) on a frame basis, ie for each frame if necessary. Since identical software is provided in each agent, a performance increase can be obtained by adding another agent having the same library of applications routines and informing the supervisor (global processor) of its presence.

Thus in a preferred embodiment of the present invention, one processing agent, typically a microprocessor, termed the global processor or supervisor, establishes the frame time and allocates the tasks performed by the other processors, and the tasks allocated to any given processor may be different at each broadcast.

In a preferred embodiment more than one broadcast may be effected in each frame. To achieve a 60 megabytes/second bandwidth, each agent may employ a two-stage pipeline such that the first stage copies a 32 bit word from the system memory of the agent to a register in the agent, and the second stage transfers the 32 bit word from the register to the system bus and hence into the system memory of the other agents.

The invention will now be described in more detail solely by way of example with reference to the accompanying drawings, in which: Fig. 1 is a block diagram of a number of processor circuit cards connected to a common system bus in an example of embodiment of the invention; Fig.2 is a block diagram of a processor circuit card of Fig.1; and Fig.3 is a timing diagram illustrating phases of a sequence of frames in the operation of the embodiment of Fig. 1. In Fig. 1, one processor circuit card, the "Global Processor", is responsible for controlling system synchronisation and communication using the system control signals while the remainder execute only application code. As shown in Fig.2, on each processor card there is an area of system memory which is used for all inter-processor communication.At any particular time, the system memory is connected either to the system bus or to the local bus of the CPU as dictated by the global processor. System control signals from the global processor instruct a respective system memory control in each processor and this control determines the connection of the system memory. A frame-based realtime software system is employed. At the start of each frame the global processor issues a start-of-frame interrupt and, as soon as the global processor has determined that no system memories are being accessed, switches the system memory of each processor to the system bus and causes data to be broadcast over the system bus to the system memories of all the processors which have the common library of applications routines.In response to each interrupt from the global processor, each processor coupled to the system bus issues a signal to indicate to the global processor that its particular system memory is available for broadcasting. When all these processors have issued this signal, the global processor initiates a broadcast. As a precaution, a timing out is effected by the global processor to ensure broadcasting even if another processor on the system bus has failed.

In the example shown in Fig.3, there are two broadcasts. A first broadcast, termed the host broadcast, transfers data from a host interface processor to all the other processors on the system bus.

The host interface processor does not have the same library of applications routines as the other processors on the system bus, but acts as an intelligent interface between the present multiprocessor architecture and a separate host computer (not shown). After the host broadcast is complete, the global processor issues a start-offoreground interrupt and in response the "common library" processors process the host data now stored on their system memories in accordance with respective first foreground programs. The global processor determines when all the first foreground program activity has ceased and causes a second broadcast to occur in which each "common library" processor in turn acts as source and simultaneously broadcasts over the system bus to the system memories of the other "common library" processors the contents of a portion of its own system memory.Upon completion of the second broadcast each system memory contains identical data (including that of the global processor itself). In this particular example, the host processor also receives this system data. The CPUs are then given ownership of their system memories again and a continue-foreground interrupt is issued by the global processor. Each processor executes a respective second foreground program on receipt of the continueforeground interrupt issued by the global processor. Upon completion of the second foreground programs the "common library" processors return to a continuous loop of instructions ("background") until the next start-of-frame interrupt is received.

During foreground computation each "common library" processor has access to its system memory for the purpose of executing application software, whereas during the background it only carries out housekeeping tasks since during at least part of the background it does not have access to its system memory. Advantage is taken of the fact that global data read by processors or other agents need not alter during a foreground pass and will not be accessed in background. When the global processor detects that all agents have completed the previous frame's foreground, and hence are all executing background, the global processor is able to start a new frame.

The broadcast mechanism involves the transfer of blocks of data from the one agent acting a source to all the others simultaneously.

To minimise the time taken for the broadcast a 32-bit multiplexed address/data bus is employed with counters in each system memory control to address the memory. Data is transferred at 60 Mbytes/sec.

A detailed description of one implementation of a multiprocessing architecture in which global data is broadcast at regular intervals over a common bus to processing agents connected to the bus is given in British patent application no. 85 12 097, publication no. GB 217 5421A, and the techniques described therein can be used to implement the data broadcasts of an embodiment of the present invention.

As well as transferring global data between agents the broadcast mechanism allows the global processor to instruct other agents as to which tasks they should perform in the ensuing foreground, and allows the agents to report their status back to the global processor. This permits the global processor to reallocate tasks to ensure an even distribution of workload among agents and, in the event of a failure, to elect to shut an agent down. This latter facility renders the architecture particularly suitable for applications requiring fault tolerance.

The "data" broadcast to the system memories of the "common library" processors from the system memory of the global processor can consist of task allocation instructions, each such instruction being stored at a location or group of locations uniquely allocated to a respective one of the "common library" processors so that, in operation after the broadcast, each "common library" processor reads the contents of its respective task allocation instruction location or group of locations and accordingly executes the allocated application routine. The library of applications routines can be stored in each "common library" processor in read only or read/write memory, eg. ROM or RAM. In a preferred embodiment the library is stored in RAM and is loaded at each power up of the system.

The system bus can be used as an Input/Output bus whenever there is no broadcast.

Claims

1. A multiprocessing architecture for use with frame based software systems comprising a system bus connecting a number of processor circuits, wherein the operation of the processor circuits is determined by the broadcasting of global data between the processor circuits and the transmission of task control instructions from one of the-said processor circuits to the other processor circuits in each frame.

2. A multiprocessing architecture according to claim 1, wherein system variables are held in memory in each processor circuit, and a CPU of the respective circuit accesses the said memory as local memory at full speed.

3. A multiprocessing architecture according to claim 1 or 2, wherein the broadcasting of global data from any one of the said processor circuits to the others is effected by transferring the global data in one block.

4. A multiprocessing architecture according to claim 1, wherein each broadcast is performed during a small part of the respective frame, and the system bus itself is free for other use (e.g. I/O communication) during the rest of the frame.

5. A multiprocessing architecture according to any preceding claim, wherein the architecture forms part of a larger system comprising additional I/O buses.