DE10134981B4

DE10134981B4 - Massively parallel coupled multiprocessor system

Info

Publication number: DE10134981B4
Application number: DE10134981.5A
Authority: DE
Inventors: gleich Patentinhaber Erfinder
Original assignee: Individual
Current assignee: Individual
Priority date: 2001-07-16
Filing date: 2001-07-16
Publication date: 2024-05-29
Anticipated expiration: 2021-07-17
Also published as: DE10134981A1

Abstract

Multiprozessorsystem, dadurch gekennzeichnet,-dass- eine Vielzahl (1, 2, 3, 4, 5..10,11..20 oder mehr) von ProzessModulen (PMs) vorhanden sind, wobei ein PM aus mindestens einem Prozessor (PZ) -Chip (mit internem oder externem Speicher (es muss also auf dem PM nicht unbedingt physik. Speicher vorhanden sein)) besteht- mindestens ein zentraler Logikbaustein-Chip vorhanden ist (der auf Grund seiner Funktion mit CentralArray (CA) bezeichnet wird)- die einzelnen ProzessModule (PMs) direkt (mit den zur Prozessor-Kommunikation relevanten Anschlüssen (wie typ. Adr-, Daten-, WR-RD-Steuer-Anschlüsse)) oder über zwischengeschalteten (weiter unten erklärten) Baustein (SignalExpander (SE) genannt) mit dem CentralArray (CA) verbunden sind, wobei am CentralArray (CA) je angeschlossenem PM (bzw. PM, PN, PZ oder SE) mindestens ein Anschluss/Pin belegt wird (diese einzelnen Ports im CA werden als SignalExpanderPort (SEPT) bezeichnet)- das CentralArray (CA) mit mind. einem Taktsignal (rsp. Oscillator) versorgt wird- dass im CentralArray (CA) die Logikschaltungen und Registersätze so gestaltet/verschaltet sind, dass- der SignalExpanderPort (SEPT) mit dem SignalExpanderRegistersatz (SERS) mittels Logikelementen verbunden ist- eine (zu jedem SEPT) zugehörige log. Zustandsmaschine, Statemaschine (SM), vorhanden ist, die (anhand des vom PZ verschickten Befehls, PZbefehl genannt) den Transfer der entsprechend benötigten Adressen und Daten zwischen SEPT und den betr. Logikelementen des CA steuert, und diese SM (im CA) somit den Transfer zwischen PZ - CA oder PZ - SE - CA steuert- dass im CentralArray (CA) die Logikschaltungen und Registersätze so gestaltet/verschaltet sind, dass- dessen SignalExpanderRegisterSatz (SERS) separate Register für Adressen und Daten (jeweils mit entsprechend benötigter Bitbreite) und Steuerflags besitzt, und in den Steuerflags mindestens die sinngemässen Flags ‚WR‘ für Write (schreiben) und ‚RD‘ für Read (lesen) enthalten sind- Logikschaltungen vorhanden sind, die anhand der (von PM, PN, PZ oder SE) gelieferten Adressbits den Transfer der benötigten Daten von/zu dem SERS und damit von/zu dem SEPT bewirken- dass im CentralArray (CA) die Logikschaltungen und Registersätze so gestaltet/verschaltet sind, dass- die zum jeweiligen SignalExpanderPort (SEPT)) zugehörige Statemaschine (SM), im Falle des inaktiven Zustandes, die Signale auf dem SEPT nach bekannten Bitmustern und somit nach bekannten (vom PM, PN, PZ (mit PZbefehl) oder SE (mit in PZpaketen geliefertem PZbefehl) Befehlen abtastet, und bei erkanntem PZbefehl der entsprechend erforderliche Zustand in der SM eingestellt wird, wobei- im Falle eines erkannten Schreib-Befehls die gelieferten Adressen und Daten, in entsprechend vorgesehene Register übertragen werden, und nach der Übertragung die Statemaschine (SM) wieder den inaktiven Zustand einnimmt- im Falle eines erkannten Lese-Befehls die gelieferten Adressen in entsprechend vorgesehene Register übertragen werden, dann die der gelieferten Adresse entsprechenden Daten zum SignalExpanderPort (SEPT) übertragen werden, und nach der Übertragung die Statemaschine (SM) wieder den inaktiven Zustand einnimmt- dass- im CentralArray (CA) Logikschaltungen vorhanden sind, mit sequentieller Steuerung, die (durch continuierlichen Zugriff auf eine Vielzahl von mit Anschlussports verbundenen Logikelementen, und den Transfer der entsprechenden Adressen und Daten zu/mit einem Speicher (CentralSpeicher (CS)) funktioniert als variable, in Portgrösse und -anzahl scalierbare (hier definierte) MultiPortSpeicherMaschine (MPSM), (die also funktioniert als eine Maschine mit der man über mehrerer Ports (völlig asynchron) auf den Speicher (CentralSpeicher (CS)) zugreifen kann) (wie folgend erklärt)- mindestens ein Speicher(modul) (mit CentralSpeicher (CS) benannt) inner- oder ausserhalb des Centralarrays (CA) (mit Adress-, Daten- und Steuersignalen) vorhanden ist- die in MultiPortSpeicherMaschine (MPSM) vorhandene Logikschaltung mittels Logikelementen (CentralSpeicherTransceiver (CSTR) genannt) mit den Signalanschlüssen des CentralSpeichers (CS) verbunden sind- die in MultiPortSpeicherMaschine (MPSM) vorhandene Logikschaltung eine Vielzahl von Anschlussports enthält, wobei- einzelne Anschlussports für den, völlig unabhängigen und asynchronen, lesenden oder schreibenden Zugriff des einzelnen ProzessNode (PN) auf den CentralSpeicher (CS) bestimmt sind,- der einzelne Anschlussport Anschlüsse besitzt und mit Logikelementen (Portregistersatz (PTRS) genannt) verbunden ist, so dass er Adress-, Daten- und Steuersignale übertragen kann- der einzelne Portregistersatz (PTRS) separate Register für Adressen, Daten und Steuerflags besitzt, und in den Steuerflags mindestens die sinngemässen Flags ‚WR‘ für Write und ‚RD‘ für Read enthalten sind- die Portregistersätze (PTRSs) so gestaltet/verschaltet sind, dass (von PN veranlasst)- für das Daten-Schreiben,- die Adress- und Datenbits, der zu schreibenden Daten, in dem Adress- und Datenregister des PTRS gespeichert werden- das Flag WR aktiviert wird- für das Daten-Lesen- die Adressbits, der zu lesenden Daten, in dem Adressregister des PTRS gespeichert werden- das Flag RD aktiviert wird- die Daten aus dem Datenregister des PTRS ausgelesen werden können- die Portregistersätze (PTRSs) und der CentralSpeicherTransceiver (CSTR) so gestaltet/verschaltet sind, und diese Verbindungsstrukturen so geschaffen sind, dass die Adress- und Datenbits in einem beliebigen PTRS zum CS transferiert werden können, und die Datenbits des CS zu einem beliebigen PTRS transferiert werden können- in der MultiPortSpeicherMaschine (MPSM) eine sequentielle Logiksteuerung in Form einer Statemaschine (SM) (mit RamMaschine (RM) benannt), vorhanden ist, wobei- mindestens ein Taktsignal und die benötigten Flags der PTRSs zu den Eingängen der RM führen- die Ausgänge der RM zu den Steueranschlüssen der PTRSs und des CSTR führen- die RamMaschine (RM), die Portregistersätze (PTRSs) und die betreffenden Register so gestaltet/verschaltet sind, und die PTRSs und die betreffenden Register von der RamMaschine (RM) so angesteuert werden, dass- zum Erkennen eines nötigen Datentransfers zwischen einzelnen Portregistersätze (PTRSs) und CentralSpeicher (CS) mindestens die WR- und RD-Flags der entsprechenden PTRSs abgetastet werden, im folgenden Abscannen genannt, wobei, bei aktivem WR-Flag das Schreiben der Daten in den CS eingeleitet wird, bei aktivem RD-Flag das Lesen der Daten aus dem CS eingeleitet wird, und wenn alle Flags inaktiv sind, kein Transfer ausgeführt wird- das von der RamMaschine (RM) eingeleitete Schreiben der Daten vom entsprechenden Portregistersatz (PTRS) zum CentralSpeicher (CS) dadurch geschieht, dass- die Adress- und Datenbits aus dem PTRS zu den entsprechenden Adress- und Datenanschlüssen des CS transferiert werden- die Steuersignale zum Anlegen der Adressen und zum Einschreiben der Daten in den CS geschaltet werden- das Flag WR deaktiviert wird- das von der RamMaschine (RM) eingeleitete Lesen der Daten aus dem CentralSpeicher (CS) in den entsprechenden Portregistersatz (PTRS) dadurch geschieht, dass- die Adressbits aus dem PTRS zu den Adressanschlüssen des CS transferiert werden,- die Steuersignale zum Auslesen der Daten in den CS geschaltet werden,- die aus dem CS gelesenen Daten zu dem Datenregister des entsprechenden PTRS transferiert werden, dann- das RD-Flag deaktiviert wird- dassdie RamMaschine (RM) so gestaltet/verschaltet ist, dass das Abscannen der Portregistersätze (PTRSs) mit fester, deterministischer Zeiteinteilung geschieht und so für jeden PTRS ein gleich langer Timeslot verwendet wird, womit es unerheblich ist, ob beim momentan gescannten PTRS ein Transfer vom- oder zum CS durchgeführt werden muss oder nichtMultiprocessor system, characterized in that - a plurality (1, 2, 3, 4, 5..10,11..20 or more) of process modules (PMs) are present, whereby a PM consists of at least one processor (PZ) chip (with internal or external memory (therefore there does not necessarily have to be physical memory on the PM)) - at least one central logic module chip is present (which is referred to as CentralArray (CA) due to its function) - the individual process modules (PMs) are connected directly (with the connections relevant to processor communication (such as typical address, data, WR-RD control connections)) or via an intermediate component (explained below) (called SignalExpander (SE)) to the CentralArray (CA), whereby at least one connection/pin is occupied on the CentralArray (CA) for each connected PM (or PM, PN, PZ or SE) (these individual ports in the CA are referred to as SignalExpanderPort (SEPT) - the CentralArray (CA) is supplied with at least one clock signal (or oscillator) - that in the CentralArray (CA) the logic circuits and register sets are designed/connected in such a way that - the SignalExpanderPort (SEPT) is connected to the SignalExpanderRegisterSet (SERS) by means of logic elements - a log (associated with each SEPT). State machine (SM) is present which (based on the command sent by the PZ, called PZ command) controls the transfer of the correspondingly required addresses and data between SEPT and the relevant logic elements of the CA, and this SM (in the CA) thus controls the transfer between PZ - CA or PZ - SE - CA- that in the CentralArray (CA) the logic circuits and register sets are designed/connected in such a way that- its SignalExpanderRegisterSatz (SERS) has separate registers for addresses and data (each with the correspondingly required bit width) and control flags, and the control flags contain at least the appropriate flags 'WR' for Write and 'RD' for Read- logic circuits are present which, based on the address bits supplied (by PM, PN, PZ or SE), effect the transfer of the required data from/to the SERS and thus from/to the SEPT- that in the CentralArray (CA) the logic circuits and register sets are designed/connected in such a way that are designed/connected in such a way that - the state machine (SM) associated with the respective SignalExpanderPort (SEPT) scans the signals on the SEPT for known bit patterns and thus for known commands (from the PM, PN, PZ (with PZ command) or SE (with PZ command delivered in PZ packets) in the case of the inactive state, and when a PZ command is recognized, the corresponding required state is set in the SM, whereby - in the case of a recognized write command, the addresses and data supplied are transferred to the correspondingly provided registers, and after the transfer the state machine (SM) returns to the inactive state - in the case of a recognized read command, the addresses supplied are transferred to the correspondingly provided registers, then the data corresponding to the address supplied is transferred to the SignalExpanderPort (SEPT), and after the transfer the state machine (SM) returns to the inactive state - that - logic circuits are present in the CentralArray (CA) with sequential control, which (through continuous access to a large number of of logic elements connected to connection ports, and the transfer of the corresponding addresses and data to/with a memory (Central Memory (CS)) functions as a variable, scalable in port size and number (defined here) MultiPortStorageMachine (MPSM), (which therefore functions as a machine with which one can access the memory (Central Memory (CS)) via several ports (completely asynchronously)) (as explained below)- at least one memory (module) (named Central Memory (CS)) is present inside or outside the Central Array (CA) (with address, data and control signals)- the logic circuit present in the MultiPortStorageMachine (MPSM) is connected to the signal connections of the Central Memory (CS) by means of logic elements (called Central Memory Transceiver (CSTR))- the logic circuit present in the MultiPortStorageMachine (MPSM) contains a large number of connection ports, whereby- individual connection ports for the completely independent and asynchronous, read or write access of the individual ProcessNode (PN) to the central memory (CS), - the individual connection port has connections and is connected to logic elements (called port register set (PTRS)) so that it can transmit address, data and control signals - the individual port register set (PTRS) has separate registers for addresses, data and control flags, and the control flags contain at least the appropriate flags 'WR' for Write and 'RD' for Read - the port register sets (PTRSs) are designed/wired in such a way that (initiated by PN) - for data writing, - the address and data bits of the data to be written are stored in the address and data register of the PTRS - the WR flag is activated - for data reading - the address bits of the data to be read are stored in the address register of the PTRS - the RD flag is activated - the data can be read from the data register of the PTRS - the port register sets (PTRSs) and the central memory transceiver (CSTR) are designed/wired in such a way that are designed/connected, and these connection structures are created in such a way that the address and data bits in any PTRS can be transferred to the CS, and the data bits of the CS can be transferred to any PTRS- a sequential logic control in the form of a state machine (SM) (referred to as the RamMachine (RM)) is present in the MultiPortStorageMachine (MPSM), whereby- at least one clock signal and the required flags of the PTRSs lead to the inputs of the RM- the outputs of the RM lead to the control connections of the PTRSs and the CSTR- the RamMachine (RM), the port register sets (PTRSs) and the relevant registers are designed/connected in such a way, and the PTRSs and the relevant registers are controlled by the RamMachine (RM) in such a way that- to detect a necessary data transfer between individual port register sets (PTRSs) and central storage (CS), at least the WR and RD flags of the corresponding PTRSs are scanned, hereinafter referred to as scanning, whereby, when active WR flag initiates the writing of the data into the CS, when the RD flag is active the reading of the data from the CS is initiated, and if all flags are inactive no transfer is carried out - the writing of the data from the corresponding port register set (PTRS) to the central memory (CS) initiated by the RAM machine (RM) takes place in that - the address and data bits from the PTRS are transferred to the corresponding address and data connections of the CS - the control signals for creating the addresses and for writing the data into the CS are switched - the WR flag is deactivated - the reading of the data from the central memory (CS) into the corresponding port register set (PTRS) initiated by the RAM machine (RM) takes place in that - the address bits from the PTRS are transferred to the address connections of the CS, - the control signals for reading the data into the CS are switched, - the data read from the CS are transferred to the data register of the corresponding PTRS, then - the RD flag is deactivated - that the RAM machine (RM) is designed/wired in such a way that the scanning of the port register sets (PTRSs) takes place with a fixed, deterministic time division and thus a time slot of the same length is used for each PTRS, which makes it irrelevant whether a transfer from or to the CS has to be carried out for the currently scanned PTRS or not

Description

Stand der TechnikState of the art

SingleProzessorSystemeSingle processor systems

Ein solches System besteht mindestens aus den Modulen Prozessor, Speicher für Befehle und Daten, und InOut-System. Diese Module können aus separaten Chips bestehen oder aber auch aus nur einem Microcontroller, der bereits alle Module auf dem Chip integriert hat.Such a system consists of at least the following modules: processor, memory for instructions and data, and InOut system. These modules can consist of separate chips or of just one microcontroller that already has all the modules integrated on the chip.

Befehls- und Datenspeicher können im gleichen logischen Speicherbereich liegen (Von Neumann; ein Port zum Speicher), oder in verschiedenen Speicherbereichen (Harvard; Befehls- und Datenspeicher haben separate Ports zum Speicher), oder es kann mehrere verschiedene, getrennte Bereiche für Befehle und Daten geben (abgewandelte Harvard-, zum Beispiel in DigitalSignalProzessor (DSP)), oder es sind verschiedene complexe Combinationen aus obigem möglich.Instruction and data memories can be located in the same logical memory area (Von Neumann; one port to memory), or in different memory areas (Harvard; instruction and data memories have separate ports to memory), or there can be several different, separate areas for instructions and data (modified Harvard, for example in Digital Signal Processor (DSP)), or various complex combinations of the above are possible.

Der Prozessor verarbeitet gemäss den gespeicherten Befehlen die Nutz- und IO-Daten. Der Prozessor kann unter anderem vom Typ SISD (Single Instruktion Single Data) oder SIMD (Single Instruktion Multiple Data) sein. SIMD-Maschinen haben den Vorteil, während eines Instruktionzyklus mehrere gleichartige Befehle ausführen zu können und so Programmspeicher einzusparen und entsprechend effizienter zu sein.The processor processes the user data and IO data according to the stored instructions. The processor can be of the SISD (Single Instruction Single Data) or SIMD (Single Instruction Multiple Data) type, among others. SIMD machines have the advantage of being able to execute several similar instructions during one instruction cycle, thus saving program memory and being correspondingly more efficient.

Sollen mehrere verschiedene, nicht zeitsynchrone Tasks von einer CPU bearbeitet werden, muss dies mittels einer multitasking-fähigen Grundsystem-Software, normal Interrupt getrieben, geschehen.If several different, non-time-synchronous tasks are to be processed by a CPU, this must be done using a multitasking-capable basic system software, normally interrupt driven.

(Zu Definition von Prozessor usw, siehe Verwendete Abkürzungen (intuitiv) und Definitionen')(For definition of processor etc., see Abbreviations used (intuitive) and Definitions')

Anzumerken ist, dass viele Prozessoren, trotz hoher Taktfrequenz, ein sehr schlechtes Interruptverhalten aufweisen, was bedeutet das sie evtl. 10.. 100 und mehr Taktzyklen bis zum Einsprung in die Interrupt-Serviceroutine benötigen. Zudem müssen in dieser Interrupt-Serviceroutine noch mehrere Registerhandhabungs-Befehle ausgeführt werden, bis die eigentliche Routine selbst bearbeitet wird.It should be noted that many processors, despite high clock frequencies, have very poor interrupt behavior, which means that they may need 10 to 100 or more clock cycles before they enter the interrupt service routine. In addition, several register handling instructions must be executed in this interrupt service routine before the actual routine itself is processed.

Je nach Grundarchitektur der Software beziehungsweise nach Art der Betriebssystemarchitektur kann diese Einsprungzeit im Worstcase-Fall nochmal um einige hundert bis zu einigen tausend CPU-Takten in Anspruch nehmen. Sollen mehrere Tasks von nur einem Prozessor bearbeitet werden, die nur kurze Laufzeiten haben dürfen und zudem in Echtzeit ausgeführt werden sollen, kann der CPU-Overhead eventuell zu gross werden. Unter Umständen könnte der Overhead alleine bereits ein Vielfaches der Nutzroutine selbst betragen, womit die CPU-Nutzleistung nur sehr gering wäre und nur die CPU-Blindleistung gross wäre. Auch die Einbeziehung von DMA-Transfers (DirectMemoryAccess) kann diesen Umstand nur sehr wenig ändern.Depending on the basic architecture of the software or the type of operating system architecture, this jump-in time can, in the worst case, take up several hundred to several thousand CPU cycles. If several tasks are to be processed by just one processor, which can only have short run times and are also to be executed in real time, the CPU overhead can possibly be too high. In some circumstances, the overhead alone could already be many times the useful routine itself, which means that the CPU useful power would be very low and only the CPU idle power would be high. Even the inclusion of DMA transfers (DirectMemoryAccess) can only change this situation very little.

Sollten solche Echtzeit-Anforderungen mit nur einem Prozessor ausgeführt werden, wären ausser einer sehr hohen Taktfrequenz auch noch extrem kleine Gesamt-Context-Switch-Zeiten (wenige ns) erforderlich, was in dem Bereich nicht realisierbar ist.If such real-time requirements were to be executed with only one processor, not only a very high clock frequency but also extremely small total context switch times (a few ns) would be required, which is not feasible in this area.

Somit ist bei solchen Echtzeit-Anforderungen eine System-Realisierung mit nur einem Prozessor nicht mehr machbar und es müssen andere Lösungen gefunden werden.Thus, with such real-time requirements, a system implementation with only one processor is no longer feasible and other solutions must be found.

MultiProzessorSystemeMultiprocessor systems

Verschiedene Realisierungsmöglichkeiten für ein Multiprozessorsystem sind:

1) Die Prozessoren sind mittels eines, normalerweise asynchronen, Schnittstellenbausteins mit einer seriellen Busleitung verbunden. Dies ist unter anderem bei einem typischen Feldbusystem zu finden, wobei typische Buszugriffsverfahren wie Master-Slave oder Multi-Master verwendet werden. Eine oft benutzte Zugriffssteuerung bei Multi-Master ist CSMA/CD (Carrier Sense with Multiple Access and CollisionDetect) oder CSMA/CA (... and CollisionArbitration). Bedingt durch diese Zugriffssteuerungsverfahren ist der Bussystemdurchsatz zum Teil sehr klein und/oder die Signallaufzeiten der Busleitung fallen bei der Übertragung stark ins Gewicht. Dieses System wird innerhalb Anlagen, Räumen oder Gebäuden, eventuell auch geräteintern eingesetzt.
2) Ähnlich 1), jedoch mit zusätzlichen Busrepeatern, welche eine grössere Bus-Reichweite und -Ausdehnung erlauben, maximal einige kilometer. Durch diese starke Verlängerung der Signallaufzeiten werden CollisionDetect (/CD) und/oder CollisionArbitration (/CA) allerdings schwerer zu handhaben, die entsprechenden Zeitabschnitte werden länger, und die Systemleistung nimmt weiter stark ab.
3) Wird das System 2) mit weiteren Repetern, Bridges, Routern usw erweitert, lässt sich die Bus-Reichweite und -Ausdehnung nochmals sehr stark erhöhen, wobei die entsprechenden technischen Probleme ebenfalls grösser werden. Das Internet, welches ja auch in gewisser Weise ein Multiprozessorsystem, allerdings extrem langsam, darstellt, gehört auch hierzu.
4) Wird beim Feldbussystem nach 1) die räumliche Ausdehnung verkleinert, lässt sich normal auch die Geschwindigkeit erhöhen und die Collisionslaufzeiten senken, da die Signallaufzeiten in etwa proportional zur Leitungslänge sind.
5) Prozessoren sind mittels integrierter serieller Schnittstelle, Beispiel SPI, IIC oder UART, verbunden. Dies wird oft innerhalb des Geräts oder der Leiterplatte eingesetzt.
6) Prozessoren sind mittels verschiedener integrierter Schnittstellen verbunden, wobei die Daten hauptsächlich über DMA übertragen werden. Zwar sind DMA-gesteuerte Daten-Transfers zwischen internen oder externen Periferielementen des Prozessors und des Speichers effizienter als der Daten-Transfer über CPU, aber während des DMA-Transfers sinkt die Rechenleistung des betreffenden Prozessors stark ab. So wird bei Hardware-DMA im CycleStealMode abwechselnd ein Takt für die CPU und für die DMA-Steuerung genutzt, womit die CPU während dieses Zustandes nur zur Hälfte arbeiten kann. Bei Hardware-DMA im ExclusivMode wird die CPU während des Betriebes der DMA-Steuerung complett angehalten und kann somit während dieser Zeit nicht auf den betreffenden Speicher zugreifen. Bei Software-DMA, das zwar mit weniger Siliciumfläche auf dem Chip realisierbar ist, werden aber auch bis zu circa 8.. 12 CPU-Takte Overhead in Anspruch genommen und die Transfer-Ausführung ist langsamer als bei Hardware-DMA.
7) Als weitere System-Leistungssteigerung sind Parallelrechensysteme realisierbar, mit verschieden leistungsfähigen und verschieden aufwendigen Funktionselementen.
- -- Ein mögliches System besteht aus einer Vielzahl von Elementen mit Prozessor und SinglePortSpeicher. Zwischen Prozessor und Speicher befindet sich eine Speicherportschaltung, die mehrere Ports auf den Speicher bereitstellt. Zu einem Zeitpunkt kann immer nur ein Port aktiv sein und somit nur ein Prozessor auf den Speicher zugreifen. Ein Port der Speicherportschaltung ist mit dem zugehörigen Prozessor verbunden, während die anderen Ports mit anderen Prozessoren verbunden sind. Für jede Verbindung des Prozessors mit dem Speicher ist ein separater Port und somit zusätzliche Logikschaltungen und -Leitungen erforderlich. Da viele dieser Prozessoren gegenseitig mit diesen zusätzlichen Ports verbunden sind ist der Aufwand der zusätzlichen Hardware und auch der Aufwand für die Verbindungsleitungen sehr gross. Will ein Prozessor nun auf den Speicher eines anderen Prozessors zugreifen, muss der entsprechende Port zuerst vom momentan aktiven Prozessor freigegeben werden. Zu diesem Zweck löst der sendewillige Prozessor bei dem momentan aktiven Prozessor einen Interrupt aus. Dies verursacht natürlich einen grossen Systemoverhead, weil bis zur Einsprungroutine viele Prozessor-Taktzyklen vergehen. Ausserdem müssen alle Prozessoren, die auch auf diesen Speicher zugreifen wollen, abwarten, bis der momentan aktive Prozessor den Speicher wieder freigegeben hat. Durch Zufügen von, in der Grösse begrenzten, FIFO-Speicher (FirstInFirstOut) und/oder Multiport-Speicher zu der Speicherportschaltung kann die Interrupt-Belastung zwar etwas gesenkt werden und die Systemleistung erhöht werden, aber die Hardware-Kosten steigen ebenfalls stark an.
- -- Eine weitere Realisierungsmöglichkeit besteht darin, dass als Speicher nun Dual- oder Multiport-Speicher oder Dual- oder Multiport-FIFOs benutzt werden. Die Prozessoren können dann direkt an die vorhandenen Ports der Multiport-Bausteine angeschlossen werden. Dual- oder Multiport-Speicher oder -FIFOs haben jedoch relativ wenige Ports und sind in der Speicher-Grösse stark begrenzt und ausserdem sehr teuer.
- -- Weiter kann ein System dadurch bestehen, das die Prozessoren über ein ringförmiges Bussystem, das verschieden breit sein kann, miteinander verbunden sind. Dem geringeren Hardwareaufwand steht eine geringere Systemleistung gegenüber, weil der Bus wesentlich langsamer ist und hier auch nur ein Prozessor gleichzeitig zugreifen kann. Auch wenn statt einem Bus mehrere Busse parallel benutzt werden, steigt zwar die Systemleistung an, aber die grundsätzlichen Nachteile sind noch immer vorhanden.

Various implementation options for a multiprocessor system are:

1) The processors are connected to a serial bus line using an interface module, which is usually asynchronous. This can be found in a typical fieldbus system, where typical bus access methods such as master-slave or multi-master are used. A frequently used access control for multi-master is CSMA/CD (Carrier Sense with Multiple Access and CollisionDetect) or CSMA/CA (... and CollisionArbitration). Due to these access control methods, the bus system throughput is sometimes very low and/or the signal propagation times of the bus line are very significant during transmission. This system is used within systems, rooms or buildings, and possibly also within devices.
2) Similar to 1), but with additional bus repeaters, which allow a larger bus range and extension, a maximum of several kilometers. Due to this strong extension of the signal propagation times, CollisionDetect (/CD) and/or CollisionArbitration (/CA) become more difficult to handle, the corresponding time periods become longer, and the system performance continues to decrease significantly.
3) If system 2) is expanded with additional repeaters, bridges, routers, etc., the bus range and expansion can be increased significantly, although the associated technical problems also become greater. The Internet, which is also a multiprocessor system in a certain sense, albeit extremely slow, is also one of these.
4) If the spatial extent of the fieldbus system according to 1) is reduced, the speed can normally also be increased and the collision propagation times reduced, since the signal propagation times are approximately proportional to the cable length.
5) Processors are connected via an integrated serial interface, for example SPI, IIC or UART. This is often used inside the device or circuit board.
6) Processors are connected via various integrated interfaces, with data being transferred primarily via DMA. DMA-controlled data transfers between internal or external peripherals of the processor and the memory are more efficient than data transfer via the CPU, but the computing power of the processor in question drops significantly during DMA transfer. For example, with hardware DMA in CycleStealMode, one clock is used alternately for the CPU and for the DMA control, meaning that the CPU can only work halfway during this state. With hardware DMA in ExclusiveMode, the CPU is completely stopped while the DMA control is running and cannot access the memory in question during this time. With software DMA, which can be implemented with less silicon area on the chip, up to around 8 to 12 CPU clocks of overhead are used and the transfer execution is slower than with hardware DMA.
7) As a further increase in system performance, parallel computing systems can be implemented with functional elements of varying performance and complexity.
- -- A possible system consists of a large number of elements with a processor and single-port memory. Between the processor and the memory there is a memory port circuit that provides several ports to the memory. Only one port can be active at a time and thus only one processor can access the memory. One port of the memory port circuit is connected to the associated processor, while the other ports are connected to other processors. A separate port is required for each connection between the processor and the memory, and thus additional logic circuits and lines are required. Since many of these processors are mutually connected to these additional ports, the additional hardware and the connection lines are very expensive. If a processor now wants to access the memory of another processor, the corresponding port must first be released by the currently active processor. To do this, the processor that wants to send triggers an interrupt in the currently active processor. This naturally causes a large system overhead because many processor clock cycles pass before the jump routine. In addition, all processors that also want to access this memory must wait until the currently active processor has released the memory again. By adding limited-size FIFO (FirstInFirstOut) and/or multiport memory to the memory port circuit, the interrupt load can be reduced somewhat and system performance can be increased, but the hardware cost also increases significantly.
- -- Another possible implementation is to use dual or multiport memories or dual or multiport FIFOs as memory. The processors can then be connected directly to the existing ports of the multiport components. However, dual or multiport memories or FIFOs have relatively few ports and are very limited in terms of memory size and are also very expensive.
- -- A system can also consist of processors being connected to one another via a ring-shaped bus system, which can be of different widths. The lower hardware expenditure is offset by lower system performance, because the bus is much slower and only one processor can access it at a time. Even if several buses are used in parallel instead of one, the system performance increases, but the basic disadvantages still exist.

Probleme und Nachteile bisheriger SystemeProblems and disadvantages of previous systems

-Sind Prozessoren über einen gemeinsamen Bus miteinander verbunden, so kann zur gleichen Zeit immer nur ein Prozessor auf den Bus und somit auf den Speicher zugreifen, während alle anderen Prozessoren mit dem Zugriff warten müssen. Sind zum Beispiel 20 Prozessoren im System enthalten, so kann bei gleichmässiger Zeitaufteilung, ein Prozessor maximal nur 1/20 der Brutto-Busbandbreite nutzen.-If processors are connected to each other via a common bus, only one processor can access the bus and thus the memory at a time, while all other processors have to wait to access it. If, for example, there are 20 processors in the system, then with evenly distributed time, a processor can only use a maximum of 1/20 of the gross bus bandwidth.

-Soll die Kommunikation zwischen den Prozessoren über separate Logik und separate Verbindugen erfolgen,

-ergibt sich entweder eine eingeschränkte Kommunikationsleistung auf Grund fehlender Bauteil- und Verdrahtungsresourcen, oder
-die Kosten für Bauteil- und Verdrahtung sind sehr hoch.

-If communication between the processors should take place via separate logic and separate connections,

-either there is a limited communication performance due to a lack of component and wiring resources, or
-The costs for components and wiring are very high.

Bei den bisherigen Systemen gibt es immer Prozessoren, die momentan nicht - oder nur mit eingeschränkter Leistung mit anderen Prozessoren kommunizieren können, weil der Verbindungs-Bus, beziehungsweise die Verbindungsleitungen momentan von anderen Prozessoren belegt sind, sei es durch Busbelegung, Interrupt-Overhead oder allgemeinen CPU-Overhead.In previous systems, there are always processors that cannot currently communicate with other processors - or can only do so with limited performance - because the connection bus or the connection lines are currently occupied by other processors, be it due to bus occupancy, interrupt overhead or general CPU overhead.

So haben die bisher bekannten Multiprozessorsysteme alle den Nachteil, im Bezug auf Architektur zu starr und unflexibel, und damit wenig effizient zu sein.The multiprocessor systems known to date all have the disadvantage of being too rigid and inflexible in terms of architecture, and thus not very efficient.

Abstraktionabstraction

Unter Combination aller technischen Möglichkeiten kann man sagen, dass grundsätzlich jede Multiprozessor-Topologie beziehungsweise -Architektur technisch in die Lage versetzt werden kann, dass ein beliebiger Prozessor globale oder lokale Speicherdaten eines anderen Prozessors ausliesst oder beschreibt, beziehungsweise dass ein beliebiger Prozessor mit jedem beliebigen anderen Prozessor kommuniziert. Dies trifft für alle Topologien zu, gleichgültig ob das Bussystem weltumspannend ist, eine Ausdehnung von einem kilometer hat, innerhalb eines Gebäudes liegt, eine Ausdehnung von wenigen metern hat, ob es Geräteintern liegt, ob es innerhalb einer Leiterplatte liegt oder ob es innerhalb eines Chips liegt,
gleichgültig ob die Verbindungen der Prozessoren untereinander, mit einer Linien-Busleitung, mit einem Ringsystem, mit kettenförmigen Verbindungsleitungen, oder mit separaten Leitungen für jeden Prozessor zu speziellen Speicherbausteinen, realisiert sind.Combining all technical possibilities, it can be said that basically every multiprocessor topology or architecture can be technically enabled so that any processor can read or write global or local memory data from another processor, or so that any processor can communicate with any other processor. This applies to all topologies, regardless of whether the bus system spans the world, is one kilometer long, is located within a building, is a few meters long, is located within the device, is located within a circuit board or is located within a chip.
regardless of whether the connections between the processors are realized with a line bus line, with a ring system, with chain-shaped connecting lines, or with separate lines for each processor to special memory modules.

Und es ist diese Kommunikation möglich durch entsprechende, mehr oder weniger complexe Harwareverschaltung oder auch durch ensprechende Software, die zum Teil auf tiefer Maschinenebene die Daten entsprechend transferiert, oder eine Kombination aus Hardwareverschaltung und spezieller Software.And this communication is possible through appropriate, more or less complex hardware interconnection or through appropriate software, which partly transfers the data at a deeper machine level, or a combination of hardware interconnection and special software.

Ein Kommunikationsleistungs-Vergleich von verschiedenen möglichen Systemen kann bis zu 1 : 10^6..9 oder mehr betragen.A communication performance comparison of different possible systems can be up to 1 : 10 ^6..9 or more.

Die Schlüsselfrage bei all diesen Multiprozessorsystemen lautet:

Wie leistungsfähig sind die Systeme, wie intelligent und wie schnell ist die Kommunikation zwischen den Prozessoren realisierbar, und wie aufwendig ist das System zu implementieren.

The key question with all these multiprocessor systems is:

How powerful are the systems, how intelligent and how fast can the communication between the processors be realized, and how complex is the system to implement.

Die neue Multi-Prozessor-ArchitekturThe new multi-processor architecture

Die hier neu vorgestellte erfindungsgemässe Multi-Prozessor-Architektur

- beseitigt die, bei bisher bekannten Systemen, vorhandenen Nachteile, Probleme und Einschränkungen
- stellt eine völlig neue Systemarchitektur mit vielen Vorteilen dar und
- eröffnet eine völlig neue Dimension von Möglichkeiten und völlig neue Gesichtspunkte der Multi-Prozessor-Kommunikation.

The newly presented multi-processor architecture according to the invention

- eliminates the disadvantages, problems and limitations of previously known systems
- represents a completely new system architecture with many advantages and
- opens up a whole new dimension of possibilities and completely new aspects of multi-processor communication.

Eine System-Architektur mit vielen, einzelnen Prozessoren wird umso besser und leistungsfähiger, je mehr frei verfügbare Logikschaltelemente beziehungsweise Logikgatter zur Kopplung vorhanden sind, und je enger und dichter diese Logikschaltelemente angeordnet sind. Viele Logikschaltelemente deshalb, weil dadurch die mögliche Komplexität der Transfers untereinander ansteigt und weil somit viele Funktionen parallel erfolgen können, dichte Anordnung deshalb, weil dadurch die Geschwindigkeit der Logik hoch ist.
Diese Eigenschaften sind zum Beispiel in einem programmierbaren Logikbaustein wie FPGA/CPLD gegeben, wobei die einzelnen Logikfunktionen über programmierbare Bits sozusagen frei verschaltbar und erstellbar sind.A system architecture with many individual processors becomes better and more powerful the more freely available logic switching elements or logic gates are available for coupling, and the closer and denser these logic switching elements are arranged. Many logic switching elements because this increases the possible complexity of the transfers between them and because many functions can therefore be carried out in parallel; dense arrangement because this increases the speed of the logic.
These properties are present, for example, in a programmable logic component such as FPGA/CPLD, where the individual logic functions can be freely interconnected and created via programmable bits.

Für hohe Systemleistung ist es somit ideal, wenn von allen Prozessoren alle relevanten Prozessor-Steuersignale (Adress,Data,Control) in einem zentral angeordneten, complexen, eventuell programmierbaren, Logikbaustein münden.
Dieser zentral angeordnete Logikbaustein trägt die Bezeichnung Centralarray (CA). Es könnte auch mit weniger Prozessor-Pins eine Kommunikation aufgebaut werden, diese wäre aber wesentlich uneffizienter.For high system performance, it is therefore ideal if all relevant processor control signals (address, data, control) from all processors flow into a centrally located, complex, possibly programmable, logic module.
This centrally arranged logic component is called a central array (CA). Communication could also be established with fewer processor pins, but this would be much less efficient.

Sollen nun von allen beteiligten Prozessoren alle relevanten Prozessor-Steuersignale und die zugehörigen Prozessor-Steuerleitungen (Adr, Data, Control, spezielle Portpins und eventuell Interruptpins) in dem Centralarray zusammenfliessen, so ist die mögliche Anzahl der Prozessoren stark begrenzt, weil der Logikbaustein-Chip, in dem das Centralarray implementiert ist, nur eine begrenzte Pinanzahl hat, und auch wenn das Centralarray aus mehreren Chips besteht, ist die zur Verfügung stehende Pinanzahl begrenzt.If all relevant processor control signals and the associated processor control lines (Adr, Data, Control, special port pins and possibly interrupt pins) from all participating processors are to flow together in the central array, the possible number of processors is greatly limited because the logic chip in which the central array is implemented only has a limited number of pins, and even if the central array consists of several chips, the available number of pins is limited.

Es stellt sich also die Schlüsselfrage, wie man alle zur Kommunikation relevanten Prozessor-Steuersignale in das Centralarray führen kann, ohne jedoch die hohe Pinanzahl zu benötigen.The key question is how to route all processor control signals relevant for communication into the central array without requiring a large number of pins.

Um diese Aufgabe zu erfüllen, wird ein zusätzlicher Logikbaustein-Chip, im folgenden Signalexpander (SE) genannt, jeweils zwischen Prozessor und Centralarray geschaltet, wobei die Prozessor-Pins zum Signalexpander führen und verschiedene Leitungen des Signalexpanders zum Centralarray führen, und Signalexpander und Centralarray werden mit speziellen Logikfunktionen zur Kommunikation ausgestattet.To accomplish this task, an additional logic chip, referred to as signal expander (SE), is connected between the processor and the central array, with the processor pins leading to the signal expander and various lines of the signal expander leading to the central array, and the signal expander and central array are equipped with special logic functions for communication.

Diese Logikfunktionen in Signalexpander und Centralarray sind so geschaffen, dass verschiedene vom Prozessor übermittelte Signale oder einzelne Teile oder alle Teile des Speicher- und InOut-Protokolls des Prozessors in mehrere kleine Befehls- , Adress- und Datenpakete, zerlegt werden und diese dann zum Centralarray übertragen werden. Zum Beispiel können einzelne vom Prozessor übermittelte Protokoll-Bestandteile des Prozessors im Signalexpander gespeichert werden und unmittelbar danach als kleine, schnelle Befehls-, Adress- und Datenpakete zum Centralarray geschickt werden. Diese einzelnen Befehls-, Adress- und Datenpakete, die mit Kommunikationspaket oder Kommpaket bezeichnet werden, werden dann im Centralarray wieder rekonstruiert, wodurch die vollständige, ursprüngliche Information nun im Centralarray zur Verfügung steht.These logic functions in the signal expander and central array are designed in such a way that various signals transmitted by the processor or individual parts or all parts of the processor's memory and InOut protocol are broken down into several small command, address and data packets and these are then transmitted to the central array. For example, individual protocol components transmitted by the processor can be stored in the signal expander and immediately sent to the central array as small, fast command, address and data packets. These individual command, address and data packets, which are referred to as communication packets or comm packets, are then reconstructed in the central array, meaning that the complete, original information is now available in the central array.

Bei der Systemkonzeption ist unter anderem die Bitbreite der Kommpakete und damit die Leitungsanzahl der Hauptverbindung von Signalexpander zu Centralarray frei wählbar, wobei natürlich bei gleicher Bitübertragungsfrequenz und zunehmender Kommpaket-Bitbreite beziehungsweise Leitungsanzahl die übertragene Datenmenge je Zeiteinheit grösser wird.When designing the system, the bit width of the comm packets and thus the number of lines of the main connection from the signal expander to the central array can be freely selected, whereby the amount of data transmitted per unit of time naturally increases with the same bit transmission frequency and increasing comm packet bit width or number of lines.

Als Beispiel könnten die Informationen eines Schreibbefehls des Prozessors, bei dem 28 Adress- und 32 Datenbits für einen Prozessortakt aktiv sind, im Signalexpander gespeichert werden, und mit Beginn der Gültigkeit des Schreibbefehls des Prozessors (zB /WR-Signal = low) ein Kommpaket mit dem Inhalt des Schreibbefehls, zum Centralarray übertragen werden und dann die Adressen und Daten mit 4x 16Bit-, 8x 8Bit-, 16x 4Bit- oder 32x 2Bit - Kommpaketen oder mit anderen Kommpaket-Bitbreiten übertragen werden. In dem Fall, dass der Signalexpander 8x 8Bit Kommpakete zum Centralarray verschickt, sind mindestens 8, typisch etwa 10 Leitungen vom Signalexpander zum Centralarray nötig, wobei bei direkter Verbindung vom Prozessor zum Baustein aber circa 60 Leitungen nötig wären. Dies entspricht Faktor 6, womit in diesem Falle also 6 mal soviele Prozessoren an das Centralarray angeschlossen werden können, als anders möglich wäre.As an example, the information of a processor write command, in which 28 address and 32 data bits are active for one processor clock, could be stored in the signal expander, and when the processor write command becomes valid (e.g. /WR signal = low), a comm packet with the content of the write command is transmitted to the central array and then the addresses and data are transmitted with 4x 16-bit, 8x 8-bit, 16x 4-bit or 32x 2-bit comm packets or with other comm packet bit widths. If the signal expander sends 8x 8-bit comm packets to the central array, at least 8, typically around 10 lines are required from the signal expander to the central array, although with a direct connection from the processor to the module, around 60 lines would be required. This corresponds to a factor of 6, which means that in this case 6 times as many processors can be connected to the central array than would otherwise be possible.

(Zu Definition von Logikschaltelementen, Register usw. siehe Verwendete Abkürzungen (intuitiv) und Definitionen")(For definition of logic circuit elements, registers, etc. see Abbreviations used (intuitive) and Definitions")

Damit der Kommunikations- und Signalfluss zwischen Prozessor (PZ). Signalexpander (SE) und Centralarray (CA) stattfinden kann sind diese Chips entsprechend miteinander verbunden.In order for the communication and signal flow between the processor (PZ), signal expander (SE) and central array (CA) to take place, these chips are connected to each other accordingly.

Die mit den Prozessor-Anschlüssen verbundene Anschlussgruppe am Signalexpander (SE) wird mit Prozessorport (PZPR) benannt.The connection group on the signal expander (SE) connected to the processor connections is called the processor port (PZPR).

Am Signalexpander (SE) wird eine Anschlussgruppe, genannt ProzessnodeAport (PAPR), mit am Centralarray (CA) vorhandener Anschlussgruppe, genannt ProzessnodeBport (PBPR), verbunden.At the signal expander (SE), a connection group called process node A port (PAPR) is connected to a connection group on the central array (CA), called process node B port (PBPR).

Chipintern sind im Signalexpander (SE) miteinander verbunden:

der Prozessorport (PZPR) mit dem Prozessorregistersatz (PZRS) und
der ProzessnodeAport (PAPR) mit dem ProzessnodeAregistersatz (PARS). Und im Centralarray (CA) sind
der ProzessnodeBport (PF3PR) mit dem ProzessnodeBregistersatz (PBRS) verbunden.

The following are connected internally in the signal expander (SE):

the processor port (PZPR) with the processor register set (PZRS) and
the process node port (PAPR) with the process node register set (PARS). And in the central array (CA) are
the process node port (PF3PR) is connected to the process node register set (PBRS).

Die Logikfunktionen von Signalexpander (SE) und von dem Logikteil im Centralarray (CA), der über dem entsprechenden ProzessnodeBregistersatz (PBRS) mit dem Signalexpander (SE) verbunden ist, sind so geschaffen, dass die betreffenden, im jeweiligen Chip enthaltenen Register beziehungsweise Registersätze die betreffenden Daten und Informationsbits untereinander, innerhalb des Chips, austauschen können.The logic functions of the signal expander (SE) and of the logic part in the central array (CA), which is connected to the signal expander (SE) via the corresponding process node register set (PBRS), are designed in such a way that the relevant registers or register sets contained in the respective chip can exchange the relevant data and information bits with each other within the chip.

Hierzu existieren die entsprechenden Verbindungsstrukturen in dem Chip. Damit dieser Datenaustauch zwischen den Registersätzen stattfinden kann, sind die einzelnen Steueranschlüsse der betreffenden Registersätze und somit der betreffenden Logikelemente mit den Ausgängen eines Zustandsautomaten, mit Statemaschine (SM) bezeichnet, der typisch mit verschiedenen Registern aufgebaut ist, verbunden.
Die befehlsauslösenden Steuerbits oder Flags des (‚Befehls‘) in den entsprechenden Registersätzen führen zu den Eingängen der Statemaschine (SM), womit diese dann entsprechend reagieren kann.The chip has the appropriate connection structures for this. To enable this data exchange between the register sets to take place, the individual control connections of the relevant register sets and thus of the relevant logic elements are connected to the outputs of a state machine (SM), which is typically constructed with various registers.
The command-triggering control bits or flags of the ('command') in the corresponding register sets lead to the inputs of the state machine (SM), which can then react accordingly.

Dadurch, dass die Statemaschine mit einzelnen, sequentiellen Signalen die Steueranschlüsse der Registersätze ansteuert, ist es möglich, dass die Registersätze gezielt einzelne Informationen und Adress- oder Datenbits aus den Registersätzen auf die angeschlossenen Verbindungsleitungen aufschalten. Somit können also, mittels der Statemaschine, den Registersätzen und den entsprechenden Verbindungen untereinander, die gewünschten Informationsbits untereinander ausgetaucht werden.Because the state machine controls the control connections of the register sets with individual, sequential signals, it is possible for the register sets to specifically switch individual information and address or data bits from the register sets to the connected connecting lines. This means that the desired information bits can be exchanged between the register sets and the corresponding connections between them using the state machine.

Die Statemaschine von Signalexpander und von dem Logikteil im Centralarray, der über dem entsprechenden ProzessnodeBregistersatz mit dem Signalexpander verbunden ist, kann intern so geschaffen sein, dass alle einzelnen Zustände durch einen, vom Taktsignal gesteuerten, Synchron-Binär-Zähler repräsentiert werden. Die kleinste Zeiteinheit für einen Zustand und einen kleinst möglichen Befehl oder Steuerzustand entspricht somit der, normal immer gleich langen, Periodendauer des Taktsignals, beziehungsweise der halben Periodendauer bei Zwei-Flanken-Auswertung.The state machine of the signal expander and of the logic part in the central array, which is connected to the signal expander via the corresponding process node register set, can be created internally in such a way that all individual states are represented by a synchronous binary counter controlled by the clock signal. The smallest time unit for a state and the smallest possible command or control state therefore corresponds to the period of the clock signal, which is normally always the same length, or half the period with two-edge evaluation.

Damit durch diesen Synchron-Binär-Zähler eine Steuerung der einzelnen Registersätze möglich ist, kann dem Zähler ein combinatorisches Dekodiernetzwerk nachgeschaltet werden, das je nach bestimmten Eingangs-Zählerstand bestimmte Ausgänge aktiviert. Diese einzelnen Ausgänge des Dekodiernetzwerks sind zugleich die Ausgänge der Statemaschine und diese führen dann zu den einzelnen Steueranschlüssen an den betreffenden Registersätzen. Somit können je nach bestimmtem Zählerstand einzelne, bestimmte Statemaschine-Ausgänge aktiviert werden.In order to enable the individual register sets to be controlled by this synchronous binary counter, a combinatorial decoding network can be connected downstream of the counter, which activates certain outputs depending on the specific input counter reading. These individual outputs of the decoding network are also the outputs of the state machine and these then lead to the individual control connections on the relevant register sets. This means that individual, specific state machine outputs can be activated depending on the specific counter reading.

Die Steueranschlüsse an dem jeweiligen Registersatz können zum Beispiel einen Multiplexer oder Demultiplexer am Registersatz steuern, der einzelne Bits oder Bitgruppen adressiert, und diese Bits an den Ein- oder Ausgang oder die Eingangs- oder Ausgangsgruppe des Registersatzes schaltet. Somit ist unter anderem ein Transfer von bestimmten Bits oder Bitgruppen, hauptsächlich der Kommpakete zwischen verschiedenen Registersätzen möglich.The control connections on the respective register set can, for example, control a multiplexer or demultiplexer on the register set, which addresses individual bits or groups of bits and switches these bits to the input or output or the input or output group of the register set. This makes it possible, among other things, to transfer certain bits or groups of bits, mainly the comm packets, between different register sets.

Es kann jeder einzelne binäre Zustand des Synchron-Binär-Zählers als eigener, separater Befehl oder separates Signal zum Steuern der Registersätze interpretiert werden. Auch ist es möglich, Gruppen höher signifikanter Bits des Zählers als Befehlsgruppen zu interpretieren.Each individual binary state of the synchronous binary counter can be interpreted as a separate command or signal for controlling the register sets. It is also possible to interpret groups of higher significant bits of the counter as command groups.

Eine konkrete Aufteilung mit einem 14 Bit Synchron-Binär-Zählers kann so aussehen:

Das oberste Bit (B13) markiert, ob die Statemaschine aktiv ist. Ist sie aktiv, kennzeichnet das nächste Bit (B12), ob der Transfer sich auf Programm- oder Datenspeicher bezieht. Bezieht er sich auf Datenspeicher, kennzeichnet das nächste Bit (B11) die Aktion Write oder Read. Das Bit (B10) markiert, ob nun die von Signalexpander zu Centralarray verschickten Kommpakete

- interne Befehlspakete darstellen, wie beispielsweies Write-Befehl mit nachfolgend 16 Adr/Daten-Paketen, Write-Befehl mit nachfolgend 8 Adr/Daten-Paketen oder Read-Befehl mit nachfolgend 10 Adr/Datenpakete, oder
- Adress- und Datenpakete darstellen.

A concrete division with a 14 bit synchronous binary counter can look like this:

The top bit (B13) indicates whether the state machine is active. If it is active, the next bit (B12) indicates whether the transfer refers to program or data memory. If it refers to data memory, the next bit (B12) indicates whether the transfer refers to program or data memory. cher, the next bit (B11) indicates the action Write or Read. The bit (B10) marks whether the comm packets sent from the signal expander to the central array

- represent internal command packets, such as Write command followed by 16 address/data packets, Write command followed by 8 address/data packets or Read command followed by 10 address/data packets, or
- Display address and data packets.

Die nächsten 6 Bits (B9...4) stehen für die Nummer des momentan übertragenen Kommpaketes, womit mit diesen 6 Bits also maximal 64 Kommpakete übertragen werden können, wobei es sich um Befehlspakete oder um Adr/Datenpakete handeln kann. Von den Befehlpaketen werden aber nur sehr wenige bis zu einem einzigen benötigt, weshalb die oberen Bits dieser 6 Bits in diesem Fall keine Bedeutung haben.The next 6 bits (B9...4) represent the number of the currently transmitted command packet, meaning that a maximum of 64 command packets can be transmitted with these 6 bits, which can be command packets or address/data packets. However, only a very small number of command packets, or even just one, are required, which is why the upper bits of these 6 bits have no meaning in this case.

Die 4 untersten Bits (B3...0) kennzeichnen den momentanen Zustand der einzelnen Steuerleitungen der einzelnen Registersätze innerhalb des jeweiligen einzelnen Kommpaketes. Somit sind also maximal 16 einzelne Logik- und Registerzustände je Kommpaket möglich, wobei diese einzelnen Zustände unter anderem bewirken, dass einzelne Steueranschlüsse des betreffenden Registersatzes geschaltet werden.The 4 lowest bits (B3...0) indicate the current state of the individual control lines of the individual register sets within the respective individual comm packet. This means that a maximum of 16 individual logic and register states are possible per comm packet, whereby these individual states cause, among other things, individual control connections of the relevant register set to be switched.

Konkretes Funktionsbeispiel für - Daten Schreiben - mit 26 Adr-Bits und 32 Datenbits, bei einer Kommpaket-Bitbreite von 4 Bit:

Der Prozessor legt die Adress- und Datenbits am Prozessorport des Signalexpanders an, und mit dem aktiven WR-Signal des Prozessors werden die Adressen und Daten im Prozessorregistersatz des Signalexpanders gespeichert. Gleichzeitig wird im Prozessorregistersatz das WR-Flag aktiviert.

Concrete functional example for - writing data - with 26 address bits and 32 data bits, with a comm packet bit width of 4 bits:

The processor applies the address and data bits to the processor port of the signal expander, and with the active WR signal of the processor, the addresses and data are stored in the processor register set of the signal expander. At the same time, the WR flag is activated in the processor register set.

Dieses aktive WR-Flag veranlasst die Statemaschine im Signalexpander in den Write-Befehlsmodus überzugehen.This active WR flag causes the state machine in the signal expander to enter the write command mode.

In diesem Modus veranlasst die Statemaschine zuerst, dass der ProzessnodeAregistersatz ein bestimmtes Bitmuster einspeichert und dieses dann zum ProzessnodeAport übertragen wird. Dieses vom Signalexpander zum Centralarray verschickte Bitmuster wird vom Centralarray erkannt und als Write-Befehl interpretiert, wonach die Statemaschine des Centralarrays in den Emfangs-Modus geht.In this mode, the state machine first causes the process node register to store a specific bit pattern and then transfer this to the process node port. This bit pattern sent from the signal expander to the central array is recognized by the central array and interpreted as a write command, after which the state machine of the central array goes into receive mode.

Ab dem nächsten Takt veranlasst die Statemaschine des Signalexpanders, dass die im Prozessorregistersatz enthaltenen Adressen und Daten als separate Kommpakete zum Centralarray verschickt werden. Dies sind zuerst 7 Adress-Pakete und dann 8 Daten-Pakete, wobei diese Kommpakete jeweils 4 Bit breit sind. Die letzten 2 Bits des letzten Adr-Pakets haben keine Bedeutung, weil nur 26 AdrBits vorhanden sind.From the next clock cycle, the state machine of the signal expander causes the addresses and data contained in the processor register set to be sent to the central array as separate comm packets. First there are 7 address packets and then 8 data packets, with these comm packets each being 4 bits wide. The last 2 bits of the last address packet have no meaning because there are only 26 address bits.

Dadurch dass sich die Statemaschine im Centralarray schon im richtigen Modus befindet, kann diese nun die eintreffenden Kommpakete in der richtigen Form interpretieren und in der richtigen Reihenfolge in die entsprechenden Register im Centralarray transferieren. Sind alle Kommpakete übertragen, wird das WR-Flag wieder deaktiviert, die Statemaschinen von Signalexpander und Centralarray gehen wieder in den inaktiven Modus, und sind so bereit, wieder neue Befehle zu empfangen. Das Übertragungsende der Kommpakete kann die Statemaschine deshalb erkennen, weil die Anzahl der zu übertragenden Kommpakete feststeht, sobald der WR-Befehl vom Centralarray empfangen wird.Because the state machine in the central array is already in the correct mode, it can now interpret the incoming comm packets in the correct form and transfer them in the correct order to the corresponding registers in the central array. Once all comm packets have been transferred, the WR flag is deactivated again, the state machines of the signal expander and central array go back into inactive mode and are ready to receive new commands. The state machine can recognize the end of the transmission of the comm packets because the number of comm packets to be transferred is determined as soon as the WR command is received from the central array.

Konkretes Funktionsbeispiel für - Daten Lesen - mit:

26 Adr-Bits und 32 Datenbits, bei einer Kommpaket-Bitbreite von 4 Bit :
- Der Prozessor legt die Adressbits am Prozessorsport des Signalexpanders an, und mit dem aktiven RD-Signal des Prozessors werden die Adressen und Daten im Prozessorregistersatz des Signalexpanders gespeichert. Gleichzeitig wird im Prozessorregistersatz das RD-Flag aktiviert.

Concrete functional example for - reading data - with:

26 address bits and 32 data bits, with a comm packet bit width of 4 bits:
- The processor applies the address bits to the processor port of the signal expander, and with the active RD signal of the processor, the addresses and data are stored in the processor register set of the signal expander. At the same time, the RD flag is activated in the processor register set.

Dieses aktive RD-Flag veranlasst die Statemaschine im Signalexpander in den Read-Befehlsmodus überzugehen.This active RD flag causes the state machine in the signal expander to enter read command mode.

In diesem Modus veranlasst die Statemaschine zuerst, dass der ProzessnodeAregistersatz ein bestimmtes Bitmuster einspeichert und dieses dann zum ProzessnodeAport überträgt. Dieses vom Signalexpander zum Centralarray verschickte Bitmuster wird vom Centralarray erkannt und als Read-Befehl interpretiert, wonach die Statemaschine des Centralarrays in den Ausgabe-Modus geht.In this mode, the state machine first causes the process node register to store a specific bit pattern and then transfers it to the process node port. This bit pattern sent from the signal expander to the central array is recognized by the central array and interpreted as a read command, after which the state machine of the central array goes into output mode.

Ab dem nächsten Takt veranlasst die Statemaschine des Signalexpanders, dass die im Prozessorregistersatz enthaltenen Adressen als separate Kommpakete zum Centralarray verschickt werden. Diese sind 7 Adress-Pakete, wobei diese Kommpakete jeweils 4 Bit breit sind. Die letzten 2 Bits des letzten Adr-Pakets haben keine Bedeutung, weil nur 26 AdrBits vorhanden sind.From the next clock cycle, the state machine of the signal expander causes the addresses contained in the processor register set to be sent to the central array as separate comm packets. These are 7 address packets, each of which is 4 bits wide. The last 2 bits of the last address packet have no meaning because there are only 26 address bits.

Dadurch dass sich die Statemaschine im Centralarray schon im richtigen Modus befindet, kann diese nun die eintreffenden Kommpakete in der richtigen Form interpretieren und in der richtigen Reihenfolge in die entsprechenden Register im Centralarray transferieren, womit die Adresse dann im entsprechenden Register liegt.Because the state machine in the central array is already in the correct mode, it can now interpret the incoming COM packets in the correct form and transfer them in the correct order to the corresponding registers in the central array, meaning that the address is then in the corresponding register.

Sind alle Adresspakete übertragen, nimmt die Statemaschine des Centralarrays die zu der Adresse gehörenden Daten und überträgt diese, ebenfalls in Kommpaketen, zum Signalexpander.Once all address packets have been transmitted, the state machine of the central array takes the data associated with the address and transmits it, also in comm packets, to the signal expander.

Die Statemaschine des Signalexpanders ihrerseits hat, nachdem sie die Adressen übertragen hat, entsprechend umgeschaltet und liesst nun die vom Centralarray übertragenen Kommpakete, welche Daten darstellen, und überträgt diese zum Datenregister des Prozessorregistersatzes des Signalexpanders.The state machine of the signal expander, after having transmitted the addresses, has switched accordingly and now reads the comm packets transmitted from the central array, which represent data, and transfers them to the data register of the processor register set of the signal expander.

Sind alle Kommpakete übertragen, wird das RD-Flag wieder deaktiviert, die Statemaschinen von Signalexpander und Centralarray gehen wieder in den inaktiven Modus, und sind so bereit, wieder neue Befehle zu empfangen. Das Übertragungsende der Kommpakete kann die Statemaschine deshalb erkennen, weil die Anzahl der zu übertragenden Kommpakete feststeht, sobald der RD-Befehl vom Centralarray empfangen wird.Once all the comm packets have been transmitted, the RD flag is deactivated again, the state machines of the signal expander and central array go back into inactive mode and are ready to receive new commands. The state machine can recognize the end of the transmission of the comm packets because the number of comm packets to be transmitted is determined as soon as the RD command is received by the central array.

Die nun im Prozessorregistersatz des Signalexpanders enthaltenen Daten können vom Prozessor ausgelesen werden.The data now contained in the processor register set of the signal expander can be read out by the processor.

Das Lesen der Program-Daten aus dem Centralarray funktioniert in analoger Weise, wie das Lesen der Daten.Reading the program data from the central array works in a similar way to reading the data.

Bedingt durch die variable Kommpaket-Bitbreite für verschiedene Signalexpander, kann die gesamte Centralarray-Bandbreite genutzt werden und es kann diese Bandbreite variabel auf die einzelnen Prozessmodule aufgeteilt werden.Due to the variable comm packet bit width for different signal expanders, the entire central array bandwidth can be used and this bandwidth can be variably distributed among the individual process modules.

Die Integration des Signalexpanders zwischen jedem einzelnen Prozessor und Centralarray ermöglicht es somit eine sehr grosse Anzahl von Prozessoren an das Centralarray sehr effizient anzukoppeln, denn

- es sind keine speziellen Transfer-Befehle des Prozessors nötig
- es müssen auch keine zusätzlichen Wartezyklen ausgeführt werden
- die Prozessoren sind bezüglich der Kommunikation permanent mit dem Centralarray verbunden, womit die completten Adressen und Daten permanent, automatisch, ohne irgentwelche zusätzlichen Programmassnahmen übertragen werden.

The integration of the signal expander between each individual processor and central array makes it possible to connect a very large number of processors to the central array very efficiently, because

- no special transfer commands from the processor are required
- no additional waiting cycles need to be executed
- the processors are permanently connected to the central array for communication purposes, meaning that the complete addresses and data are transmitted permanently and automatically without any additional programming measures.

Durch die sehr gute Integrierbarkeit jedes Prozessors oder Microcontrollers, mittels Signalexpander, in dieses System, kann der für die jeweilige Aufgabe beste und effektivste Microcontroller oder Prozessor genutzt werden.Due to the excellent ability to integrate any processor or microcontroller into this system using a signal expander, the best and most effective microcontroller or processor can be used for the respective task.

Somit ist das System sehr gut hardware-scalierbar, bei gleichzeitig maximaler Ausnutzung der verwendeten Prozessoren und Controller.This makes the system very hardware scalable, while at the same time ensuring maximum utilization of the processors and controllers used.

Durch diese Art der Systemleistungserhöhung wird nicht nur die Softwarefunktionalität erweitert, sondern bedingt durch die zunehmende Parallelität der Hardware mit weiterer Scalierung, werden Laufzeitanforderungen der auf anderen Prozessoren laufenden Routinen nicht beeinträchtigt, und somit wird die, über alles errechnete, Gesamtsysteminterruptlatenzzeit und Gesamtcontextswitchzeit, mit zunehmender Hardwarescalierung immer kürzer.This type of system performance increase not only extends the software functionality, but due to the increasing parallelism of the hardware with further scaling, runtime requirements of the routines running on other processors are not affected, and thus the overall system interrupt latency and total context switch time calculated become shorter and shorter with increasing hardware scaling.

Mit der, weiter unten detailiert beschriebenen, in das Centralarray implementierten MultiPortSpeicherMaschine, die auf Grund ihrer speziellen, völlig neuen Architektur, als variabler, in Portgrösse und -anzahl scalierbarer, völlig asynchron betreibbarer, lesbarer und schreibbarer MultiPortSpeicher funktionsfähig ist, ergeben sich weitere signifikante Vorteile, mit denen sich eine völlig neue Dimension von Möglichkeiten und Gesichtspunkten der Multi-Prozessor-Kommuniktion eröffnen.The MultiPort storage machine implemented in the central array, described in detail below, which, due to its special, completely new architecture, functions as a variable, scalable in port size and number, fully asynchronously operable, readable and writable MultiPort storage, offers further significant advantages that open up a completely new dimension of possibilities and aspects of multi-processor communication.

Für die Prozessoren, die mittels dieses hier beschriebenen Massiv-parallel-gekoppelten-Multiprozessorsystems in das System integriert sind, gelten unter anderem die folgenden Vorteile:

- Es kann jeder beliebige Prozessor, ohne Zufügen von dedizierter Hardware, für jeden anderen beliebigen Prozessor als hardwaremässig direkt - breitbandig angekoppelter Co-Prozessor dienen. Dies ist sehr bemerkenswert, weil hierfür normalerweise dedizierte Hardware, für jeden betreffenden Prozessor separat, erforderlich ist. Weiterhin gilt bei diesem neuen System, dass
-jeglicher CPU-Overhead, wie Overhead für Interrupt, Wartezyklen oder sonstiges, der normal bei der Kommuniktion entsteht, entfällt
- in den typischen Fällen der explizite Datentransfer sogar überhaupt nicht erforderlich ist, weil es möglich ist, dass der Prozessor während seiner eigentlichen Bearbeitungsroutine (nicht Datentransferroutine) die Daten automatisch direkt in den entsprechenden Speicherbereich schreibt, und dieser Speicherbereich von den anderen Prozessoren ebenfalls zugänglich ist. Dadurch dass das Prozessor-spezifische Busprotokoll in dem Signalexpander äquivalent umgesetzt wird und in das Centralarray übertragen wird, existiert hinsichtlich der konkreten Assembler- und Maschinen-befehle und auch der dafür erforderlichen CPU-Takte, überhaupt kein Unterschied zwischen physikalischem Speicher nahe der CPU oder externem Speicher. Der physikalisch externe Speicher ist somit ohne irgentwelche Einschränkungen voll in den internen Speicherbereich eingebaut.
- Der Prozessor kann während seiner eigentlichen Bearbeitungsroutine (nicht Datentransferroutine) die Daten automatisch direkt in den entsprechenden Speicherbereich schreiben, wobei dieser Speicherbereich von den anderen Prozessoren ebenfalls zugänglich ist.

The processors that are integrated into the system using this massively parallel-coupled multiprocessor system described here have the following advantages, among others:

- Any processor can serve as a hardware-direct, broadband-coupled co-processor for any other processor without adding dedicated hardware. This is very remarkable, because this normally requires dedicated hardware for each processor separately. Furthermore, this new system is based on the fact that
-any CPU overhead, such as overhead for interrupts, wait cycles or other things that normally arise during communication, is eliminated
- in typical cases, explicit data transfer is not even required at all, because it is possible for the processor to automatically write the data directly into the corresponding memory area during its actual processing routine (not data transfer routine), and this memory area is also accessible from the other processors. Because the processor-specific bus protocol is implemented equivalently in the signal expander and transferred to the central array, there is no difference at all between physical memory near the CPU or external memory with regard to the specific assembler and machine instructions and the CPU clocks required for them. The physically external memory is thus fully integrated into the internal memory area without any restrictions.
- The processor can automatically write the data directly into the corresponding memory area during its actual processing routine (not data transfer routine), whereby this memory area is also accessible to the other processors.

Das besondere daran ist, dass jeder Prozessor für sich selbst, mit seinem logisch internen Datenspeicher, sein Programm ausführt, und diese Daten automatisch, ohne eine speziell ausgeführte Datentransferroutine, allen anderen Prozessoren zur Verfügung steht. Die Prozessoren brauchen also alle für sich nur ihre eigenen internen Bearbeitungsroutinen auszuführen, und sind über dieses Massiv-parallel-gekoppelte-Multiprozessorsystem so miteinander verkoppelt, dass jeder Prozessor ohne irgentwelche Sonderroutinen automatisch auch auf den Speicher jedes anderen Prozessors zugreifen kann.The special thing about it is that each processor executes its own program with its logical internal data memory, and this data is automatically available to all other processors without a specially executed data transfer routine. The processors therefore only need to execute their own internal processing routines, and are linked to one another via this massively parallel-coupled multiprocessor system in such a way that each processor can automatically access the memory of every other processor without any special routines.

Man könnte sagen, dem einzelnen Prozessor steht das ganze Wissen des gesamten Multi Systems für seine eigenen lokalen Bearbeitungsroutinen zur Verfügung und er kann dazu beitragen, dieses gesamte Wissen zu erweitern ohne spezielle Datentransfers durchführen zu müssen.One could say that the individual processor has all the knowledge of the entire multi-system at its disposal for its own local processing routines and can contribute to extending this entire knowledge without having to perform special data transfers.

Durch die Möglichkeit, dass auch die Program-Daten in der gleichen Weise zum Centralarray zugänglich gemacht werden können, ergibt sich nochmals ein weiterer signifikanter Vorteil:

Zum Beispiel können spezielle Daten und Werte, die ein Prozessor im System mit seiner internen Bearbeitungsroutine errechnet, unmittelbar, ohne separate Datenübertragung, von einem anderen Prozessor als Programmspeicher benutzt werden. Dieses Programm könnte eine kritische, sehr schnelle Prozedur im us- oder ns-Bereich sein, deren Eigenschaften von verschiedenen äusseren Faktoren abhängt, die im Zugriff des MultiSystems stehen.

The possibility of making the program data accessible to the central array in the same way results in another significant advantage:

For example, special data and values calculated by a processor in the system with its internal processing routine can be used immediately, without separate data transfer, by another processor as program memory. This program could be a critical, very fast procedure in the us or ns range, the properties of which depend on various external factors that are within the MultiSystem's control.

Diese Menge an Aspekten und Gesichtspunkten dieses Massiv-parallel-gekoppelten-Multiprozessorsystems eröffnen prinzipiell neue Denkansätze und völlig neue Dimensionen der Kommunikationsmöglichkeiten und Leistungsfähigkeiten von Multiprozessorsystemen.This multitude of aspects and viewpoints of this massively parallel-coupled multiprocessor system fundamentally open up new approaches and completely new dimensions of the communication possibilities and performance capabilities of multiprocessor systems.

Verwendete Abkürzungen (intuitiv) und Definitionen: (insbesondere auch für Patentansprüche u. Erläuterungen benutzt) (diese Kapitel hier compl neu)Abbreviations used (intuitive) and definitions: (particularly used for patent claims and explanations) (these chapters here completely new)

- the term logic elements can represent any combination of arbitrarily complex switches, transceivers, gates, registers, flip-flops, latches or similar logic gates
- the term or part of the term processor (PZ) can also stand for microprocessor, controller, microcontroller, MCU, CPU, DSP, special processor or special module with an independently sequentially operating data transfer unit, and the term chip or the addition chip refers to a physical electrical component whose used electrical connections are connected to the conductor tracks of a carrier board.
- the registers or register sets or additional register sets defined here (registers are normally implemented with flip-flops, latches, etc.) have control connections, which, among other things, enable storing of bits (into the register set) and output of stored bits (from the register set) is possible

(Mehrzahl, bsp PRSs, wenn kleines ‚s‘ angehängt, sofern zur Darstellung nötig) (Komb. möglich, bsp mit SE-PZPR ist im Signalexpander der ProzessorPort gemeint) (plural, e.g. PRSs, if a small 's' is appended, if necessary for representation) (combination possible, e.g. SE-PZPR means the processor port in the signal expander)

Abkürzungen zu Begriffen in den Ansprüchen:Abbreviations for terms in the claims:

(Mehrzahl wenn kleines ‚s‘ angehängt, bsp PTRSs, sofern zur Darstellung nötig)

(Plural if small 's' is appended, e.g. PTRSs, if necessary for display)

Claims

Multiprocessor system, characterized in that - a plurality (1, 2, 3, 4, 5..10,11..20 or more) of process modules (PMs) are present, whereby a PM consists of at least one processor (PZ) chip (with internal or external memory (therefore there does not necessarily have to be physical memory on the PM)) - at least one central logic module chip is present (which is referred to as CentralArray (CA) due to its function) - the individual process modules (PMs) are connected directly (with the connections relevant to processor communication (such as typical address, data, WR-RD control connections)) or via an intermediate component (explained below) (called SignalExpander (SE)) to the CentralArray (CA), whereby at least one connection/pin is occupied on the CentralArray (CA) for each connected PM (or PM, PN, PZ or SE) (these individual ports in the CA are referred to as SignalExpanderPort (SEPT)) - the CentralArray (CA) is supplied with at least one clock signal (or oscillator) - that in the CentralArray (CA) the logic circuits and register sets are designed/connected in such a way that - the SignalExpanderPort (SEPT) is connected to the SignalExpanderRegisterSet (SERS) using logic elements - a logical state machine (SM) associated with each SEPT is present, which (based on the command sent by the PZ, called PZ command) controls the transfer of the required addresses and data between SEPT and the relevant logic elements of the CA, and this SM (in the CA) thus controls the transfer between PZ - CA or PZ - SE - CA - that in the CentralArray (CA) the logic circuits and register sets are designed/connected in such a way that - its SignalExpanderRegisterSatz (SERS) has separate registers for addresses and data (each with the corresponding required bit width) and control flags, and the control flags contain at least the appropriate flags 'WR' for Write and 'RD' for Read - logic circuits are present which, based on the address bits supplied (by PM, PN, PZ or SE), effect the transfer of the required data from/to the SERS and thus from/to the SEPT - that in the CentralArray (CA) the logic circuits and register sets are designed/connected in such a way that - the state machine (SM) associated with the respective SignalExpanderPort (SEPT)), in the case of the inactive state, processes the signals on the SEPT according to known bit patterns and thus according to known (from the PM, PN, PZ (with PZ command) or SE (with in PZ packets supplied PZ command) commands, and when a PZ command is recognized, the corresponding required state is set in the SM, whereby - in the case of a recognized write command, the supplied addresses and data are transferred to the corresponding registers, and after the transfer the state machine (SM) returns to the inactive state - in the case of a recognized read command, the supplied addresses are transferred to the corresponding registers, then the data corresponding to the supplied address is transferred to the SignalExpanderPort (SEPT), and after the transfer the state machine (SM) returns to the inactive state - that - in the CentralArray (CA) there are logic circuits with sequential control, which (through continuous access to a large number of logic elements connected to connection ports, and the transfer of the corresponding addresses and data to/with a memory (CentralMemory (CS)) functions as a variable, scalable in port size and number (defined here) MultiPortMemoryMachine (MPSM), (which therefore functions as a machine with which one can access multiple ports (fully asynchronous) access to the memory (Central Memory (CS)) (as explained below) - at least one memory (module) (named Central Memory (CS)) is present inside or outside the Central Array (CA) (with address, data and control signals) - the logic circuit present in the MultiPort Storage Machine (MPSM) is connected to the signal connections of the Central Memory (CS) by means of logic elements (called Central Memory Transceiver (CSTR)) - the logic circuit present in the MultiPort Storage Machine (MPSM) contains a large number of connection ports, whereby - individual connection ports are intended for the completely independent and asynchronous, read or write access of the individual Process Node (PN) to the Central Memory (CS), - the individual connection port has connections and is connected to logic elements (called Port Register Set (PTRS)) so that it can transmit address, data and control signals - the individual Port Register Set (PTRS) has separate registers for addresses, data and control flags, and in the Control flags contain at least the appropriate flags 'WR' for Write and 'RD' for Read - the port register sets (PTRSs) are designed/wired in such a way that (initiated by PN) - for writing data, - the address and data bits of the data to be written are stored in the address and data register of the PTRS - the WR flag is activated - for reading data - the address bits of the data to be read are stored in the address register of the PTRS - the RD flag is activated - the data can be read from the data register of the PTRS - the port register sets (PTRSs) and the central storage transceiver (CSTR) are designed/wired in such a way, and these connection structures are created in such a way that the address and data bits in any PTRS can be transferred to the CS, and the data bits of the CS can be transferred to any PTRS - a sequential logic control in the form of a state machine (SM) in the MultiPortStorageMachine (MPSM) (named RamMachine (RM)), where - at least one clock signal and the required flags of the PTRSs lead to the inputs of the RM - the outputs of the RM lead to the control connections of the PTRSs and the CSTR - the RamMachine (RM), the port register sets (PTRSs) and the relevant registers are designed/connected in such a way, and the PTRSs and the relevant registers are controlled by the RamMachine (RM) in such a way that - to detect a necessary data transfer between individual port register sets (PTRSs) and central storage (CS), at least the WR and RD flags of the corresponding PTRSs are scanned, in the following called scanning, whereby, if the WR flag is active, the writing of the data into the CS is initiated, if the RD flag is active, the reading of the data from the CS is initiated, and if all flags are inactive, no transfer is carried out - the writing of the data from the corresponding port register set (PTRS) to the central memory (CS) initiated by the RAM machine (RM) takes place in that - the address and data bits from the PTRS are transferred to the corresponding address and data connections of the CS - the control signals for creating the addresses and for writing the data into the CS are switched - the WR flag is deactivated - the reading of the data from the central memory (CS) into the corresponding port register set (PTRS) initiated by the RAM machine (RM) takes place in that - the address bits from the PTRS are transferred to the address connections of the CS, - the control signals for reading the data into the CS are switched, - the data read from the CS are transferred to the data register of the corresponding PTRS, then - the RD flag is deactivated - that the RAM machine (RM) is designed/wired in such a way that the scanning of the port register sets (PTRSs) takes place with a fixed, deterministic time division and thus a time slot of the same length is used for each PTRS, which means that it is irrelevant whether a transfer from or to the CS has to be carried out for the currently scanned PTRS or not

Multiprocessor system according to one of the preceding claims, characterized in that in the Central Array (CA) the logic circuits and register sets are designed/connected in such a way that - the SignalExpanderRegisterSet (SERS) additionally contains the control flag with the appropriate designation 'PRG' for ProgramMemory - logic circuits are present which, based on the address bits supplied (by PM, PN, PZ or SE), effect the transfer of the required ProgramMemory values to the SERS and thus to the SEPT - a state machine (SM) associated (with each SEPT) is present which (based on the command sent by the PZ (PZ command) controls the transfer of the correspondingly required addresses and ProgramMemory values between SEPT and the relevant logic elements of the CA, and this SM (in the CA) thus controls the transfer between PZ - CA or PZ - SE - CA - logic circuits are present which enable the information bits sent (by PM, PN, PZ or SE) to be sent to various logic elements can be passed on within the CA and thus the ProgramMemory values (in a compatible form) can be transferred to the PM, PN, PZ or SE

Multiprocessor system according to one of the preceding claims, characterized in that the central array (CA) consists of several logic component chips that are closely wired together, the logic functions in these chips being designed/connected in such a way that the chips as a whole work like a large, complex central array (CA), and the lines coming from the PMs, PNs, PZs or SEs to the CA can also lead to several chips of the CAs in parallel.

Multiprocessor system according to one of the preceding claims, characterized in that - at least one port register set (PTRS) has the flag with the appropriate designation 'WB' for WriteBusy - at least one connection port and port register set (PTRS) is designed/wired in such a way that - the PM, PN, PZ, SE can read the WB flag - the WB flag is activated when the PM, PN, PZ, SE writes data into the PTRS - the function of the RAM machine (RM) and the relevant registers are designed/wired in such a way that the writing of the data into the central memory (CS) initiated by the RAM machine (RM) from the corresponding PTRS takes place in that, after the data has been transferred to the central memory (CS), the WB flag is deactivated.

Multiprocessor system according to one of the preceding claims, characterized in that - at least one port register set (PTRS) has the flag with the appropriate designation 'RE' for ReadEnd - at least one connection port and port register set (PTRS) is designed/connected in such a way that - the PM, PN, PZ, SE can read the RE flag - if the PM, PN, PZ, SE writes a data read request into the PTRS, the RE flag is deactivated - the function of the RamMachine (RM) and the relevant registers are designed/connected in such a way that the reading of the data from the Central Memory (CS) into the corresponding port register set (PTRS) initiated by the RM is done by activating the RE flag after the data has been transferred to the data register of the PTRS

Multiprocessor system according to one of the preceding claims, characterized in that at least one connection port and port register set (PTRS) is designed/connected in such a way that certain control bits and addresses transmitted by the PM, PN, PZ, SE are interpreted as short address commands, after which the currently recognized short address command is converted into a valid address and stored in the address register of the PTRS.

Multiprocessor system according to one of the preceding claims, characterized in that at least one connection port and port register set (PTRS) is designed/connected in such a way that certain control bits and addresses transmitted by the PM, PN, PZ, SE are interpreted as extended address commands, after which, depending on the currently recognized extended address command, a previously defined valid address appears in the address register of the PTRS

Multiprocessor system according to one of the preceding claims, characterized in that the RAM machine (RM) is designed/connected in such a way that the scanning of the port register sets (PTRSs) takes place in such a way that, if the need for data transfer is not determined when scanning a PTRS, the RM directly scans the next PTRS and does not wait for the time that would be required for the data transfer.

Multiprocessor system according to one of the preceding claims, characterized in that the RAM machine (RM) is designed/connected in such a way that the order of the port register sets (PTRSs) scans can be changed by means of control signals, the required control connections leading to the inputs of the RM

Multiprocessor system according to one of the preceding claims, characterized in that the RAM machine (RM) is designed/connected in such a way that during a complete run of PTRS scans one or more port register sets (PTRSs) are scanned more than once

Multiprocessor system according to one of the preceding claims, characterized in that the RAM machine (RM) is designed/connected in such a way that the RM, before it carries out the transfer between CS and PTRSs, first reads in the individual orders from all port register sets (PTRSs), sorts them according to data read and data write, and then carries out the data transfers for data read and data write directly one after the other.

Multiprocessor system according to one of the preceding claims, characterized in that the RAM machine (RM) is designed/connected in such a way that a certain status of the RM can be triggered by a port register set (PTRS)

Multiprocessor system according to one of the preceding claims, characterized in that several MultiPort Storage Machines (MPSMs) are implemented in the Central Array (CA)

Multiprocessor system according to one of the preceding claims, characterized in that the logic circuit in the CentralArray (CA) is designed/connected in such a way that the write, read or program read orders pending at the corresponding SignalExpanderRegisterSet (SERS) are processed by transferring the corresponding address and data bits and the corresponding control signals between the respective SERS in the CA and the further processing logic part of the circuit in the CA or the corresponding port register set (PTRS) of the MultiPortStorageMachine (MPSM).

Multiprocessor system according to one of the preceding claims, characterized in that - in the CentralArray (CA) there is a combinatorial logic decoding network associated with the SignalExpanderRegisterSet (SERS), hereinafter referred to as preaddress descriptor, which enables the bit patterns of the output to be derived from the bit patterns of the input and from an algebraic function depend, whereby this algebraic function can be fixed or can change depending on control signals on the pre-address descriptor - in the Central Array (CA) the logic circuit is designed/connected in such a way that - a control logic associated with the corresponding Signal Expander Register Set (SERS) is present and that the reading of the program data initiated by this control logic takes place in that - the address bits from the SERS are directed to the inputs of the pre-address descriptor - the bits at the outputs of the pre-address descriptor are forwarded to the further processing logic part of the circuit or to the corresponding port register set (PTRS) of the MPSM - the control signals required for reading the program data are switched to the further processing logic part of the circuit or to the corresponding port register set (PTRS) of the MultiPort Storage Machine (MPSM).

Multiprocessor system according to one of the preceding claims, characterized in that - at least one port register set (PTRS) additionally contains the control flag with the appropriate designation 'PRG' for ProgramMemory, and that the signals of these PRG flags lead to the inputs of the RAM machine (RM) - in the system there is a combinatorial logic decoding network associated with each port register set (PTRS), hereinafter referred to as address descriptor, which enables the bit patterns of the output to depend on the bit patterns of the input and on an algebraic function, whereby this algebraic function can be fixed or can change depending on control signals at the address descriptor - the port register sets (PTRSs) are designed/connected in such a way that from the PM, PN, PZ, SE - the reading of the program data takes place in that - the address bits of the program data to be read are stored in the address register of the PTRS - the PRG flag is activated - the program data from the corresponding Registers of the port register set (PTRS) can be read - the RAM machine (RM), port register sets (PTRSs) and the relevant registers are designed/connected in such a way, and the PTRSs and the relevant registers are controlled by the RM in such a way that - to detect a necessary data transfer between individual PTRSs and central memory (CS), the PRG flags of the corresponding PTRSs are also scanned, whereby, if the PRG flag is active, the reading of the program data from the CS is initiated - the reading of the program data from the central memory (CS) into the relevant port register set (PTRS) initiated by the RAM machine (RM) takes place in that - the address bits from the port register set (PTRS) are transferred to the inputs of the address descriptor - the bits at the outputs of the address descriptor are transferred to the address connections of the central memory (CS), - the control signals for reading the data from the central memory (CS) are switched, - the data read from the central memory (CS) are the corresponding register of the corresponding PTRS, then - the PRG flag is deactivated

Multiprocessor system according to one of the previous claims, characterized in that - in a process module (PM) at least one (optional (*)) logic chip (called SignalExpander (SE) due to its function) is present, whereby the connections relevant to processor communication (typically address, data, WR-RD control connections) of the PZ are connected to the SE (this port on the SE is referred to as ProcessorPort (PZPT)), ((*) optional because the SE essentially (only) transmits the relevant processor signals (typically address, data, WR-RD control signals) to the CA) - the logic function of the (optional) SE is designed/wired in such a way that the fed-in processor signals (typically as above) relevant to data transmission can be passed on to a connection group (present on the SE) (referred to as CentralArrayPort (CAPT)) - the SE is supplied with at least one clock signal (or oscillator)

Multiprocessor system according to one of the preceding claims, characterized in that - in the signal expander (SE) - the processor port (PZPT) is connected to the processor register set (PZRS), - the central array port (CAPT) is connected to the central array register set (CARS), - in the SignalExpander (SE) there is a state machine (SM) built with logic switching elements, which (based on the command sent by the PZ, called PZ command) controls the transfer of the corresponding required addresses and data between (SE-)PZPT and (SE-)CAPT, and this SM (in the SE) thus controls the transfer between PZ - SE - CA - in the CA there are further logic circuits with which the reassembled PZ commands sent by the SE can be passed on to the relevant logic elements (in the CA) - the logic functions of SE are compatible with the CA logic part, which is connected to the corresponding SE via the ports, so that the transfer between SE and CA can take place in an orderly manner - the transfer PZ - SE - CA (initiated by the PZ) takes place in such a way that - for data writing, - the PZ commands supplied by the PZ (information bits of data to be written) are temporarily stored in the PZRS - these temporarily stored information bits are read by SM in packets (called PZ packets) to the CARS and thus to the CAPT - for reading data, - the PZ commands (information bits of data to be read) supplied by the PZ are temporarily stored in the PZRS - these temporarily stored information bits are transported in packets (called PZ packets) to the CARS and thus to the CAPT using SM - the resulting data bits (based on the address) are then transported in packets (called PZ packets) from the SEPT (in the CA) to the CAPT using SM, these data bits are then transported on to the PZRS and PZPT - these data bits can then be read out by the PZ from the PZRS.

Multiprocessor system according to one of the preceding claims, characterized in that the signal expander (SE) is designed/wired in such a way that - its processor register set (PZRS) has separate registers for addresses and data (each with the correspondingly required bit width) and control flags, and the control flags contain at least the corresponding flags 'WR' for Write and 'RD' for Read - its processor register set (PZRS) is designed/wired in such a way that (each initiated by the PZ) - for writing data, - the address and data bits relevant to communication (i.e. to be written by the PZ) are stored in the address and data register of the PZRS by means of the appropriate control signal (coming from the PZ or contained in the PZ command) - the flag WR is activated - for reading data, - the address bits relevant to communication (i.e. to be written by the PZ) of data to be read by means of the appropriate control signal (coming from the PZ or contained in the PZ command) control signal is stored in the address register of the PZRS - the flag RD is activated - then the processor (PZ) can read the corresponding data from the data register of the PZRS - whose Central Array Register Set (CARS) is designed/wired in such a way that - depending on the data applied to the CARS (or. contained in PZ commands) control signals - can create certain bit patterns on the CAPT so that the connected CA can receive and decode these bits as a corresponding command - whose PZRS are designed/connected to CARS in such a way that - the address and data bits (with the bit width required for CARS) can be transferred from the PZRS to the CARS, - the data bits (with the bit width required for CARS) can be transferred from the CARS to the PZRS - whose SM inputs carry at least the signals of the WR and RD flags of the PZRS, and a clock signal - whose SM outputs lead to the control connections of the PZRS and the CARS - whose SM is connected to PZRS, CARS, etc. Registers are designed/connected and controlled in such a way that - an active WR or RD flag in the PZRS activates the SM, after which it - executes a write command when the WR flag is active and - executes a read command when the RD flag is active - the write command initiated by the SM is executed by - the CARS outputting a bit pattern to the CAPT, and then using this bit pattern as a write command for the CA serves - the address and data bits from the PZRS are transferred to the CAPT by means of sequentially (using a clock) transmitted individual packets (called PZ packets) (where the bit width and the number of these PZ packets depend on the width of the CAPT; the smaller the CAPT width, the more PZ packets are required) - then the WR flag of the PZRS is deactivated - the read command initiated by the SM is executed by - the CARS outputting a bit pattern to the CAPT, and then this bit pattern serves as a read command for the CA - the address bits from the PZRS are transferred to the CAPT by means of sequentially (using a clock) transmitted individual packets (PZ packets) (where the bit width and number of these PZ packets also depend on the width of the CAPT) - the data that has arrived in the CARS (sent by the CA) are transferred to the PZRS by means of sequentially (using a clock) transmitted individual packets (PZ packets) (where the bit width and quantity of these PZ packets depends on the width of the CAPT) - then the RD flag of the PZRS is deactivated

Multiprocessor system according to one of the preceding claims, characterized in that in the signal expander (SE) - the processor register set (PZRS) additionally contains the control flag with the appropriate designation 'PRG' for program memory - whose processor register set (PZRS) is designed/connected in such a way that (initiated by the PZ) - for reading program memory values - the address bits of program memory values to be read that are relevant for communication (i.e. to be written by the PZ) are stored in the address register of the PZRS by means of a control signal (coming from the PZ or contained in the PZ command) - the PRG flag is activated - then the processor (PZ) can read the associated program memory values from the corresponding register of the PZRS

Multiprocessor system according to one of the preceding claims, characterized in that - all relevant processor connections that are necessary for transmitting the processor-specific memory and InOut bus protocol of the PZ used in the PM are connected to the SignalExpander (SE) - the logic function of the SE is designed/wired in such a way that the entire processor-specific memory and InOut bus protocol of the PZ used in the respective PM is broken down into PZ packets, these PZ packets are transferred to the (SE-)CARS and thus to the (SE-)CAPT and these PZ packets are thus transferred to the CA, whereby the CA reassembles these PZ packets, whereby all the information contained in the bus protocol of the PZ is transferred from the SE to the CA.

Multiprocessor system according to one of the preceding claims, characterized in that an evaluation circuit is present in the signal expander (SE) which reports to the outside by means of connections whether the SE-internal SM is free or still busy

Multiprocessor system according to one of the preceding claims, characterized in that in the signal expander (SE) a logic circuit is designed/connected in such a way that the clock signal can be made available for the connected PZ, whereby this logic circuit enables a clean stopping of the clock signal as long as the SE-internal SM is still busy

Multiprocessor system according to one of the preceding claims, characterized in that an address comparator and evaluation logic for the addresses (transmitted by the PZ) is implemented in the SignalExpander (SE), which enables only short address commands to be transferred to the CA instead of the concrete addresses, after which the currently recognized short address command is converted into a valid address in the CA and stored in the corresponding register.

Multiprocessor system according to one of the preceding claims, characterized in that an address comparator and evaluation logic for the addresses (transmitted by the PZ) is implemented in the signal expander (SE), which enables extended address commands to be transferred to the CA instead of the concrete addresses, after which a previously defined valid address then appears in the CA in the corresponding register depending on the currently recognized extended address command

Multiprocessor system according to one of the preceding claims, characterized in that the free logic resources and free connections of the signal expander (SE) are used to expand the existing direct InOut port signals and connections of the connected PZ by means of additionally implemented registers and additional inclusion of connections.

Multiprocessor system according to one of the preceding claims, characterized in that a signal expander (SE) consists of several logic component chips that are closely wired together, the logic functions in these chips being designed/connected in such a way that the chips as a whole work like a large, complex SE, and the lines coming from the CA to the SE can also lead to several components of the SE in parallel.

Multiprocessor system according to one of the preceding claims, characterized in that at least one connection line from the CA to the SE is also routed in parallel to at least one further SE from another PM

Multiprocessor system according to one of the preceding claims, characterized in that one or a few additional lines lead from the CA to the SEs, whereby implemented logic circuits in the CA and SE ensure that commands and special messages can be transmitted from the CA to the SE via these additional lines

Multiprocessor system according to one of the preceding claims, characterized in that the signal expander (SE) consists of several chips and various chips thereof are standard shift register components, whereby these shift registers are controlled by suitable logic circuits and the shift registers carry out the parallel-serial or serial-parallel conversion of the addresses and the read and written data

Multiprocessor system according to one of the preceding claims, characterized in that the logic circuits for controlling the shift registers for several SEs are combined in one chip

Multiprocessor system according to one of the preceding claims, characterized in that the logic circuit in the signal expander (SE) is designed/connected in such a way that the data to be read is transmitted from the CA to the SE at the time of the active ALE signal (AddressLatchEnable) of the processor (PZ) and this data is only transmitted to the PZ in the case of the RD signal generated by the PZ.

Multiprocessor system according to one of the preceding claims, characterized in that there is a line from the signal expander (SE) to the central array (CA), via which the ALE signal and the WR and RD signals of the PZ can be transmitted to the CA using a special protocol

Multiprocessor system according to one of the preceding claims, characterized in that a clock-synchronous signal sampling takes place on the connecting line and that the protocol is designed/wired in such a way that the first active level on the line is interpreted as an ALE signal of the PZ and the subsequent level change on the line is interpreted as an active WR/RD signal of the PZ and at the next clock the specific WR or RD signal is detected, depending on the existing signal level.