WO1993011485A1

WO1993011485A1 - Method for ordering events in a parallel data processing system

Info

Publication number: WO1993011485A1
Application number: PCT/DK1992/000352
Authority: WO
Inventors: S. M. G. B. S. Harlequin; Richard John Bird
Original assignee: KLAUSTRUP, Edel, Kirstine; KLAUSTRUP ANDERSEN, Henning
Priority date: 1991-11-26
Filing date: 1992-11-26
Publication date: 1993-06-10
Also published as: DK192491D0; AU3081892A

Abstract

A method for controlling many processes or threads of processes operating in parallel with transparent intercommunication between these processes or threads of processes, recursively driven by events. The processes or threads of processes transparently address sets of pairs of complementary vectors which address queues in a shared memory area and are recursively updated after each access which encounters an event whereby the event itself becomes stacked. Stacked events will thus control the sequence of execution in an optimally stochastic manner, not the linear sequence in which the problem was presented, as such a sequence would not execute in parallel.

Description

METHOD FOR ORDERING EVENTS IN A PARALLEL DATA PROCESSING SYSTEM Background to The Invention

Everywhere in the industrialised world there is an overwhelming and ever- increasing demand for computers with ever higher speed, greater capacity and improved structures for data control, This requirement may be for purposes of research, space programs, meteorology, database programs, pattern recognition, CAD/CAM, artificial intelligence, neural network models, genetics, to give a few examples only, One obvious way to achieve these objectives is through the development of faster processor technology, e.g. the use of new materials for chip construction, bus communication etc, and the development of designs such as RISC. Another approach is that of parallel processing, in which essentially many processors carry out parts of the same task(s) simultaneously, By the use of such a parallel processing approach a task can be speeded up by many orders of magnitude, even using conventional chip designs,

The invention (method) achieves parallel processing using a new structural principal in the organisation of computers, It is embodied in a chip (the QCC-chip specified as the kernel), Though the cardinal principle is one of parallel processing among a plurality of processors, it may be used for many other purposes, e.g. communication switching, computer network control, etc. Parallel processing has been attempted in many forms, using hardware, software and mixes of the two approaches, What distinguishes this invention from such previous approaches is the use of a totally dynamic and implicitly self-organising method which transparrently leads to an

automatic ordering of events. This ordering is optimal in processing efficiency, allowing the processors to approach the ideal "theoretical peak" performance as defined by Dongarra in his paper "Performance of Various Computers Using Standard Linear Equations Software." (Mathematical Sciences Section, Oak Ridge National Laboratory, Knoxville USA). To summarise this approach one could say that it implies the structural sorting of chaos into order with coherence, implicitly carried out at the lowest possible level, i.e. machine instruction level,

The Stats of the Art

To meet the demands for parallel processing many different methods have been developed, and these may be divided into three classes; hardware solutions, software solutions and mixes of the two ,

Hardware designs for parallel processing raise many problems involving communication between one or more processors. More or less exotic bus geometries have been attempted, such as rings, stars, hypercubes, butterflies and so on.

Conventional parallel processing systems are extremely expensive in hardware and even more so in software. Hardware solutions may involve elaborate bus structures which raise problems of process scheduling.

Software solutions require special parallel programming languages. These need a massive reprogramming effort if existing software is to be

reproduced in the new environment. Also programming for parallel processes is very complex, since the three major problems, synchronisation, coherence and ordering of events, have not yet found a satisfactory solution,

This is why parallel processing has been up unti l now for the very few, The Inventive Step

The QCC-system, according to the invention (method), is original by embodying a method distinguished by a new transparent event ordering system, be it physical and/or logical event ordering, which implies synchronisation and coherence in a recursive manner, thus enabling serial processes to be parallelised. This is achieved by introducing a massive amount of variable sized transparent Queue/Stack cultures named PIPES, spanning the complete address space for which event ordering is needed and dynamically controlled by the QCC-Chip, i.e. the invention is distinguished and original by being a totally transparent dynamic recursive system adhering to memory management, as opposed to other systems using

Queue/Stack principles.

The QCC is a pipe controller with the ability to transparently handle multiple pipes recursively, arbitrarily organized as queues and/or stacks, thus enabling cross referencing between pipes, pipes and their related functions and between functions, It is originated for the purpose of high performance parallel processing, but can be used in any design demanding high performance intercommunication with synchronisation, event-ordering and coherence,

kernel Description

Fig, 2 relates completely to this paragraph

The function of the kernel is essentially that of an address handler, which generates the addresses of data in a main memery device according to the sequence of events requesting their access. The kernel performs this function by the use of recursively handled vector pointers to sets of complementary stacks and queues, generically referred to as pipes.

The kernel receives inputs U L F and E, these designating the Upper and Lower vector addresses of pipes, and Full and Empty queue bits and a Read/Write from a participant bit, a Queue/Stack select bit, a Refocus request bit and a Bit Zero status bit (1 or 0). It also receives an Adders Control which is the value of a pipe step increment/decrement;, which may in the simplest case be unity. The output from the kernel is principally an Address at which data may be found in the memory interlocked between locals, globals and the QCC system, together with Bit Zero which may be a control bit for queue and/or stack selection, and NMI a non maskable interrupt signal.

The mode of operation of the kernel is as follows. If a F & U or an E & R(i.e. full and write or empty and read) condition is present, the status of the Q/S signal is tested. If it is S a non-maskable interrupt is emitted by the kernel. If it is Q then Bit Zero is set to 1 and this is output by the kernel. If neither the F & W nor E & R combinations is detected, the status of the R/W bit is tested. If it is found to be low (write signal) the status of the Q/S bit is tested. If this is Q the Upper vector adress is immediately asserted as the Address output from the kernel and the Refocus bit is tested, If the Refocus bit is zero, U is incremented by amount of the Adders Control, The R/W bit is then tested and if it is high (read signal) THE VALUES OF U and L are compared. If they are equal F is set to 1 and if they are unequal F is set to 0. If the value of the R/W bit is low (write signal) the values of U and L are compared. If they are equal E is set to 1 and if they are unequal E is set to 0.

If after testing the F & W and E & R combination the R/W signal is found to be high (read) then the status of the Q/S bit is tested, If it is Q then L is immediately asserted as the Address output from the kernel and the Refocus bit is tested, If the Refocus bit is zero, L is incremented by the amount of the Adders Control, The R/W bit is then tested and if it is high (read signal) the values of U and L are compared, If they are equal F is set to 1 and if they are unequal F is set to 0, If the value of the R/W bit is low (write signal) the values of U and L are compared, If they are equal E is set to 1 and if they are unequal E is set to 0,

If after testing the F & W and E & R combination the R/W signal is found to be high (read) and the Q/S signal is found to be S, then L is

immediately asserted as the Address output from the kernel and the Refocus bit is tested, If the Refcus bit is zero, L is decremented by the amount of the Addders Control, The R/W bit is then tested and if it is high (read signal) the values of U and L are compared, If they are equal F is set to 1 and if they are unequal F is set to 0, If the value of the R/W bit is low (write signal) the values of U and L are compared, If they are equal F is set to 1 and if they are unequal F is set to 0, If the value of th R/W bit is low (write signal) the values of U and L are compared. If they are equal E is set to 1 and if theay are unequal E is set to 0,

Further annexed hereto is an implementation of the QCC chip kernel, entitled Q, made to the above specification, including Fig. 3, by Derik Renton of EVJ Electronics, thus incidentally demonstrating that a technician who is an outsider can follow the above specification.

The QCC-System - Principles of Operation

The QCC-system is built around two complementarily operating vector memories each with between 512 (minimal system) and 4 giga (maximal in a 32 bit system) entries. Each entry is minimally a 64 bit word, made up of two 32 bit words formatted as shown in "THE COMPLEMENTARY VECTOR FORMAT", page 14 , These complementary vector memories contains the upper addresses (U), lower addresses (L), queue/stack offset (D), full (F), empty (E) status of the queues/stacks and the semaphores (SO- Sn). The semaphores and their purpose may be defined at will by the users and also the word width of the vector memories may be expanded for any purpose, such as restart procedures,

The U contains the upper vector addresses of the

queues/stacks. The L contains the lower vector addresses of the queues/stacks both together with their appropriate status being updated as a result of their lasi. operation, according to the rules embodied in the chip design as shown in the drawing named Fig. 2 and descriped in "Kernel Description, page 4".

Whenever an attempt is made to either write to a full pipe (F=1) or read from an empty pipe (E=1) the sequence is

terminated and the address of the vectors concerned is

recursively stacked in the neighbouring event pipe, asserted by means of bit zero. The attemptator (the process or thread of a process which attempted the read or write) receives a signal that an event occurred (interrupted), whereafter the attemptator may start a new sequence of operations by

recursively reading a new vector address from a stack in the neighbouring event pipe or initiate a new task. The beginning of a sequence is implicit and inherent in the nature of attemptator's normal interrupt procedure and will succeed when the condition for the termination has been resolved by a process or a thread of a process writing, making it possible to read again (E=0) or by reading, making it possible to write again (F=0). Thus the system is stochastic and operates transparently and recursively in concordance with e.g. a

Markov chain principle, fully exploited to any depth of recursitivity. It can be seen, that the system is alive by any means, i.e. Its logical behaviour is complete towards any condition which may arise , what so ever . It wi ll solve a problem of any complexity, even its own, by its recursitivity, It cannot, by its own rules, produce disorder. On the

contrary it will reduce entropy to the minimum possible level, The system may therefore be regarded as an "entropy machine" similar in operation to the concept of a Maxwell demon used to illustrate thermodynamic theory and by Norbert Wiener in his book "Cybernetics" (page 57),

Claims

1. A method for ordering events and activities in a

mechanical, electromechanical, electronic or other similar device using the principal of queues and stacks controlled by complementary vectors, distinguished by its transparency and the ability that any auxiliary information may become contents of the address vectors and also that the addresses of the address vectors may recursively become the contents of queues or stacks, allowing the system to operate automatically in a recursive manner with synchronization and coherence,

2. A method according to claim 1 in which a process or a number of processes can be controlled by the recursive

ordering of queues and stacks controlled by complementary vectors,

3. A method according to claims 1 and 2 in which the queues, stacks, vectors, addresses, control lines for addressing data, instructions and environments are embodied in a microprocessor chip or other device or devices,

4. A method according to claims 1, 2 and 3 in which the control device is embodied in one or more full custom designed chips or uncommitted logic array(s) or other device(s).

5. Methods according to claims 1, 2, 3 and 4 in which this concept may be used wholly and/or partly in any device and/or apparatus for any other purpose(s).