GB2206714A - Multiprocessing architecture - Google Patents

Multiprocessing architecture Download PDF

Info

Publication number
GB2206714A
GB2206714A GB08811530A GB8811530A GB2206714A GB 2206714 A GB2206714 A GB 2206714A GB 08811530 A GB08811530 A GB 08811530A GB 8811530 A GB8811530 A GB 8811530A GB 2206714 A GB2206714 A GB 2206714A
Authority
GB
United Kingdom
Prior art keywords
processor
frame
global
data
broadcast
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB08811530A
Other versions
GB8811530D0 (en
GB2206714B (en
Inventor
Michael Harry Field
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Link Miles Ltd
Original Assignee
Link Miles Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Link Miles Ltd filed Critical Link Miles Ltd
Publication of GB8811530D0 publication Critical patent/GB8811530D0/en
Publication of GB2206714A publication Critical patent/GB2206714A/en
Application granted granted Critical
Publication of GB2206714B publication Critical patent/GB2206714B/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/161Computing infrastructure, e.g. computer clusters, blade chassis or hardware partitioning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Multi Processors (AREA)

Abstract

A multiprocessing architecture for use in a frame-based software system has a system bus connecting a plurality (N) of processor circuit cards, one (GLOBAL PROC) of which controls the system operation while the others execute application code. Global data are broadcast between identical areas of memory on each circuit card via the system bus, blocks of data being transferred from one card to all the others simultaneously at a single short interval in a frame, and task control instructions are transmitted in each frame by the global processor (GLOBAL PROC) to the other processor circuit cards. The architecture described ameliorates reduction of performance associated with increasing the number of processors sharing a common bus. <IMAGE>

Description

MULTIPROCESSING ARCHITECTURE The present invention relates to a multiprocessing architecture for use with frame-based software systems.
The relatively low cost of a microprocessor compared with the cost of a minicomputer has made microprocessors attractive for many computationally modest applications, especially as the current 32bit microprocessor performance (in terms of instructions per second) has reached that previously expected of minicomputers. To increase further the computational performance of a system while retaining the flexibility and cost advantages of microprocessors, techniques have been developed to allow a number of microprocessors to be interconnected and share the processing tasks between them. An arrangement of a plurality of microprocessors or similar processing agents by such techniques is referred to as a multiprocessing architecture.Typically a multiprocessing architecture has been achieved by employing a multi-master system bus with global memory (e.g., MULTIBUS I, VME) wherein data which is required by more than one agent (i.e. processing circuit) is stored in global memory on the system bus which may be accessed by any agent. The process of accessing the data involves agents arbitrating for use of the system bus since only one agent may "own", i.e. transfer data to or from, the bus at any one time. Whenever an agent attempts to gain access to the global memory while the bus is in use, that agent must wait until the agent currently using the bus has completed its operating cycle(s). If many agents are requesting access simultaneously the delay becomes significant.Consequently as further processing power is added to the bus and therefore more system bus accesses are required, all processors spend more time waiting. Typically a limit is reached with seven processors, above which the increase in processing power is balanced by the performance degradation due to bus loading.
Two recent multiprocessing architectures developed to minimise the performance impact of inter-processor communication are the INMOS (Registered Trade Mark) Transputer Link and the Bus Interface Chip developed by INTEL (Registered Trade Mark) for MULTIBUS II. In the former architecture inter-processor communication is provided by point-to-point communication links.
Since no arbitration is required for communication link access, communication bandwidth is independent of the number of processors; however, global data must propagate along many links in order to reach all processors. In the latter architecture, the process of accessing a system bus is removed from the CPUs of the individual microprocessors and given to respective interface processors.
The CPU is thus not slowed down by waiting for access to bus, but there is a latency associated with the passing of messages (including system interrupts) that degrades performance particularly where a computation is unable to proceed until certain system data is acquired.
It is an object of the present invention to provide an improved architecture for interconnecting microprocessors or other processing agents which amelioriates the performance reduction associated with increasing the number of microprocessors or other processing agents sharing a common system bus.
It is also an object of the present invention to provide a multiprocessing architecture adapted for a frame-based system which, by taking advantage of the predictability of data transfers in such a system, allows a larger number of processing agents sharing a common system bus to operate at a speed not degraded by the bus or by the number of agents on the bus.
A frame-based software system is a processing system in which processing is carried out on a periodic basis, each period having the same duration and having a predetermined internal sequence of phases.
According to the present invention there is provided a multiprocessing architecture for use with frame-based software systems comprising a system bus connecting a number of processor circuits, wherein the operation of the processor circuits is determined by the broadcasting of global data between the processor circuits and the transmission of task control instructions from one of the said processor circuits to the other processor circuits in each frame.
In a multiprocessing architecture as defined in the preceding paragraph system variables can be held in memory in each processor circuit, and a CPU of the respective circuit can access the said memory as local memory at full speed and the performance of each individual processor circuit not be influenced by the number of processor circuits connected to the system bus. Preferably the system memory transfers are performed at a single short moment in the frame, whereby there is no uncertainty over the transfer latency of data issued by one processing circuit for use by another, and all the data is transferred in one block whereby the risk of a CPU reading a partially updated block of data is removed thus obviating the need for software precautions and/or bus locking.When each broadcast is performed during a small part of the respective frame, the system bus itself is free for other use (e.g. I/O communication) during the rest of the frame.
The architecture may form part of a larger system comprising additional I/O buses. For example, the invention may be embodied in a multiprocessing architecture comprising eighteen processor circuits in a single cardcage and achieving a system performance of roughly 5OMITS, the cardcage being a unit of a real time dynamic image processing system such as the image processing system of a flight simulator.
The present invention is based on the realisation that a multiprocessing architecture with a frame-based software system in which each of a plurality of processors has the same library of application routines can be implemented in which it is possible to stipulate which agent is to perform a particular task, and hence compute a particular packet of system data, for each frame, and that the frame may comprise two distinct types of phase: broadcast phases and process phases. During a broadcast phase all the data generated in the previous frame can be broadcast between agents and/or commands can be issued to tell the agents which task each is to perform in the current frame, and in one or more process phases the processing is executed. The data broadcasting technique transfers a packet of data from a source agent to all other agents simultaneously.Each agents in turn can be the source agent so that an identical copy of system data is present in each agent's system memory at the end of the broadcast phase. The supervisor(a global processor for example) is an intelligent agent responsible for synchronising the processors within the frame-based software system and for the allocation of tasks and broadcast structure.
During process phases, each processing agent has unrestricted access to the system data, which is stored in its own system memory, and the total number of processing agents using the system data has no effect on the time to acquire access to system data. Furthermore, the time taken by a broadcast phase in which system data is written into the system memories of the processing agents is small compared with the accumulated delays which would occur if all the processing agents were attempting to access a single global memory.
The broadcast phase or phases (e.g., 10% of one frame) do not constitute idle time for the processing agents since they may continue processing restricted to respective local memories during this time.
Since all system data is made available to all agents and processing agents have the same library of applications routines, tasks can be dynamically reassigned by the supervisor (global processor) on a frame basis, ie for each frame if necessary. Since identical software is provided in each agent, a performance increase can be obtained by adding another agent having the same library of applications routines and informing the supervisor (global processor) of its presence.
Thus in a preferred embodiment of the present invention, one processing agent, typically a microprocessor, termed the global processor or supervisor, establishes the frame time and allocates the tasks performed by the other processors, and the tasks allocated to any given processor may be different at each broadcast.
In a preferred embodiment more than one broadcast may be effected in each frame. To achieve a 60 megabytes/second bandwidth, each agent may employ a two-stage pipeline such that the first stage copies a 32 bit word from the system memory of the agent to a register in the agent, and the second stage transfers the 32 bit word from the register to the system bus and hence into the system memory of the other agents.
The invention will now be described in more detail solely by way of example with reference to the accompanying drawings, in which: Fig. 1 is a block diagram of a number of processor circuit cards connected to a common system bus in an example of embodiment of the invention; Fig.2 is a block diagram of a processor circuit card of Fig.1; and Fig.3 is a timing diagram illustrating phases of a sequence of frames in the operation of the embodiment of Fig. 1. In Fig. 1, one processor circuit card, the "Global Processor", is responsible for controlling system synchronisation and communication using the system control signals while the remainder execute only application code. As shown in Fig.2, on each processor card there is an area of system memory which is used for all inter-processor communication.At any particular time, the system memory is connected either to the system bus or to the local bus of the CPU as dictated by the global processor. System control signals from the global processor instruct a respective system memory control in each processor and this control determines the connection of the system memory. A frame-based realtime software system is employed. At the start of each frame the global processor issues a start-of-frame interrupt and, as soon as the global processor has determined that no system memories are being accessed, switches the system memory of each processor to the system bus and causes data to be broadcast over the system bus to the system memories of all the processors which have the common library of applications routines.In response to each interrupt from the global processor, each processor coupled to the system bus issues a signal to indicate to the global processor that its particular system memory is available for broadcasting. When all these processors have issued this signal, the global processor initiates a broadcast. As a precaution, a timing out is effected by the global processor to ensure broadcasting even if another processor on the system bus has failed.
In the example shown in Fig.3, there are two broadcasts. A first broadcast, termed the host broadcast, transfers data from a host interface processor to all the other processors on the system bus.
The host interface processor does not have the same library of applications routines as the other processors on the system bus, but acts as an intelligent interface between the present multiprocessor architecture and a separate host computer (not shown). After the host broadcast is complete, the global processor issues a start-offoreground interrupt and in response the "common library" processors process the host data now stored on their system memories in accordance with respective first foreground programs. The global processor determines when all the first foreground program activity has ceased and causes a second broadcast to occur in which each "common library" processor in turn acts as source and simultaneously broadcasts over the system bus to the system memories of the other "common library" processors the contents of a portion of its own system memory.Upon completion of the second broadcast each system memory contains identical data (including that of the global processor itself). In this particular example, the host processor also receives this system data. The CPUs are then given ownership of their system memories again and a continue-foreground interrupt is issued by the global processor. Each processor executes a respective second foreground program on receipt of the continueforeground interrupt issued by the global processor. Upon completion of the second foreground programs the "common library" processors return to a continuous loop of instructions ("background") until the next start-of-frame interrupt is received.
During foreground computation each "common library" processor has access to its system memory for the purpose of executing application software, whereas during the background it only carries out housekeeping tasks since during at least part of the background it does not have access to its system memory. Advantage is taken of the fact that global data read by processors or other agents need not alter during a foreground pass and will not be accessed in background. When the global processor detects that all agents have completed the previous frame's foreground, and hence are all executing background, the global processor is able to start a new frame.
The broadcast mechanism involves the transfer of blocks of data from the one agent acting a source to all the others simultaneously.
To minimise the time taken for the broadcast a 32-bit multiplexed address/data bus is employed with counters in each system memory control to address the memory. Data is transferred at 60 Mbytes/sec.
A detailed description of one implementation of a multiprocessing architecture in which global data is broadcast at regular intervals over a common bus to processing agents connected to the bus is given in British patent application no. 85 12 097, publication no. GB 217 5421A, and the techniques described therein can be used to implement the data broadcasts of an embodiment of the present invention.
As well as transferring global data between agents the broadcast mechanism allows the global processor to instruct other agents as to which tasks they should perform in the ensuing foreground, and allows the agents to report their status back to the global processor. This permits the global processor to reallocate tasks to ensure an even distribution of workload among agents and, in the event of a failure, to elect to shut an agent down. This latter facility renders the architecture particularly suitable for applications requiring fault tolerance.
The "data" broadcast to the system memories of the "common library" processors from the system memory of the global processor can consist of task allocation instructions, each such instruction being stored at a location or group of locations uniquely allocated to a respective one of the "common library" processors so that, in operation after the broadcast, each "common library" processor reads the contents of its respective task allocation instruction location or group of locations and accordingly executes the allocated application routine. The library of applications routines can be stored in each "common library" processor in read only or read/write memory, eg. ROM or RAM. In a preferred embodiment the library is stored in RAM and is loaded at each power up of the system.
The system bus can be used as an Input/Output bus whenever there is no broadcast.

Claims (5)

1. A multiprocessing architecture for use with frame based software systems comprising a system bus connecting a number of processor circuits, wherein the operation of the processor circuits is determined by the broadcasting of global data between the processor circuits and the transmission of task control instructions from one of the-said processor circuits to the other processor circuits in each frame.
2. A multiprocessing architecture according to claim 1, wherein system variables are held in memory in each processor circuit, and a CPU of the respective circuit accesses the said memory as local memory at full speed.
3. A multiprocessing architecture according to claim 1 or 2, wherein the broadcasting of global data from any one of the said processor circuits to the others is effected by transferring the global data in one block.
4. A multiprocessing architecture according to claim 1, wherein each broadcast is performed during a small part of the respective frame, and the system bus itself is free for other use (e.g. I/O communication) during the rest of the frame.
5. A multiprocessing architecture according to any preceding claim, wherein the architecture forms part of a larger system comprising additional I/O buses.
GB8811530A 1987-05-18 1988-05-16 Multiprocessing architecture Expired - Fee Related GB2206714B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB878711663A GB8711663D0 (en) 1987-05-18 1987-05-18 Multiprocessing architecture

Publications (3)

Publication Number Publication Date
GB8811530D0 GB8811530D0 (en) 1988-06-22
GB2206714A true GB2206714A (en) 1989-01-11
GB2206714B GB2206714B (en) 1991-05-15

Family

ID=10617500

Family Applications (2)

Application Number Title Priority Date Filing Date
GB878711663A Pending GB8711663D0 (en) 1987-05-18 1987-05-18 Multiprocessing architecture
GB8811530A Expired - Fee Related GB2206714B (en) 1987-05-18 1988-05-16 Multiprocessing architecture

Family Applications Before (1)

Application Number Title Priority Date Filing Date
GB878711663A Pending GB8711663D0 (en) 1987-05-18 1987-05-18 Multiprocessing architecture

Country Status (1)

Country Link
GB (2) GB8711663D0 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0431467A1 (en) * 1989-12-04 1991-06-12 BULL HN INFORMATION SYSTEMS ITALIA S.p.A. Multiprocessor system having distributed shared resources and dynamic global data replication
EP0472753A1 (en) * 1990-08-28 1992-03-04 BULL HN INFORMATION SYSTEMS ITALIA S.p.A. Multiprocessor system having selective global data replication
GB2302743B (en) * 1995-06-26 2000-02-16 Sony Uk Ltd Processing apparatus
WO2005096143A1 (en) * 2004-03-31 2005-10-13 Coware, Inc. Resource management in a multicore architecture
US8533716B2 (en) 2004-03-31 2013-09-10 Synopsys, Inc. Resource management in a multicore architecture
US9164953B2 (en) 2005-09-30 2015-10-20 Synopsys, Inc. Scheduling in a multicore architecture

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2175421A (en) * 1985-05-13 1986-11-26 Singer Link Miles Ltd Computing system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2175421A (en) * 1985-05-13 1986-11-26 Singer Link Miles Ltd Computing system

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0431467A1 (en) * 1989-12-04 1991-06-12 BULL HN INFORMATION SYSTEMS ITALIA S.p.A. Multiprocessor system having distributed shared resources and dynamic global data replication
EP0472753A1 (en) * 1990-08-28 1992-03-04 BULL HN INFORMATION SYSTEMS ITALIA S.p.A. Multiprocessor system having selective global data replication
GB2302743B (en) * 1995-06-26 2000-02-16 Sony Uk Ltd Processing apparatus
WO2005096143A1 (en) * 2004-03-31 2005-10-13 Coware, Inc. Resource management in a multicore architecture
US8533716B2 (en) 2004-03-31 2013-09-10 Synopsys, Inc. Resource management in a multicore architecture
US9779042B2 (en) 2004-03-31 2017-10-03 Synopsys, Inc. Resource management in a multicore architecture
US10268609B2 (en) 2004-03-31 2019-04-23 Synopsys, Inc. Resource management in a multicore architecture
US9164953B2 (en) 2005-09-30 2015-10-20 Synopsys, Inc. Scheduling in a multicore architecture
US9286262B2 (en) 2005-09-30 2016-03-15 Synopsys, Inc. Scheduling in a multicore architecture
US9442886B2 (en) 2005-09-30 2016-09-13 Synopsys, Inc. Scheduling in a multicore architecture

Also Published As

Publication number Publication date
GB8811530D0 (en) 1988-06-22
GB2206714B (en) 1991-05-15
GB8711663D0 (en) 1987-06-24

Similar Documents

Publication Publication Date Title
US4959781A (en) System for assigning interrupts to least busy processor that already loaded same class of interrupt routines
US5437042A (en) Arrangement of DMA, interrupt and timer functions to implement symmetrical processing in a multiprocessor computer system
EP0306252B1 (en) Fault tolerant computer system input/output interface
EP0166272B1 (en) Processor bus access
US5367690A (en) Multiprocessing system using indirect addressing to access respective local semaphore registers bits for setting the bit or branching if the bit is set
US4449183A (en) Arbitration scheme for a multiported shared functional device for use in multiprocessing systems
US4754398A (en) System for multiprocessor communication using local and common semaphore and information registers
JPH02236735A (en) Data processing method and apparatus
US5271020A (en) Bus stretching protocol for handling invalid data
CA2009055A1 (en) Arbitration of bus access in digital computers
US6085273A (en) Multi-processor computer system having memory space accessible to multiple processors
JPS5837585B2 (en) Keisan Kisouchi
JPH04348451A (en) Parallel computer
US7565659B2 (en) Light weight context switching
GB2206714A (en) Multiprocessing architecture
US5590338A (en) Combined multiprocessor interrupt controller and interprocessor communication mechanism
US5517671A (en) System for designating a plurality of I/O devices to a plurality of I/O channels and connecting and buffering the plurality of I/O channels to a single system bus
KR960005395B1 (en) Minimum contention processor and system bus system
EP0318270B1 (en) A multiprocessor system and corresponding method
KR100921504B1 (en) Apparatus and method for communication between processors in Multiprocessor SoC system
Toong et al. A general multi-microprocessor interconnection mechanism for non-numeric processing
KR950008393B1 (en) Arbeiter delay circuit for multiprocessor system
Dejian et al. Asymmetric hardware and software integration design based on multi-core processor
KR930005843B1 (en) Method for controlling subprocessor in multiprocessor system
JPS6145348A (en) Bus priority control system

Legal Events

Date Code Title Description
732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20000516