GB2168182A

GB2168182A - Data-driven processor

Info

Publication number: GB2168182A
Application number: GB08525140A
Authority: GB
Inventors: Zandt John Wayne Van
Original assignee: Conic Corp
Current assignee: Conic Corp
Priority date: 1984-12-05
Filing date: 1985-10-11
Publication date: 1986-06-11
Also published as: GB8525140D0; DE3542436A1; JPS61184641A; FR2574197A1; IL77145A0

Abstract

A data flow processor comprises a bus (21) over which information tokens can be communicated, each token comprising a tag portion and a data portion, and a plurality of token processors (23) connected to the bus. Each token processor includes at least one processing unit (24) for carrying out a specified function on certain input data identified by one or more specific tags, an input token processor (30) for monitoring the tag portion of each token on the bus and for storing the data portion of each such token which has a tag portion identifying input-data for the processing unit, the processing unit carrying out its specified function on the resulting stored data as soon as the data is assembled, and an output token processor (31) for assembling a token consisting of the data produced by the processing unit and a tag identifying such output, the token thereafter being communicated to the bus. <IMAGE>

Description

SPECIFICATION Data flow processor The present invention relates to a data flow processor type of computer. Unlike a conventional computer, in which instruction execution is under program-flow control, the data flow processor is driven by the availability of operand data.

In a conventional computer, the flow of data is controlled by the sequential execution of a program. From a flow chart point of view, the program is a sequential set of steps executed one after the other. If the program requires the computation of several functions, these typically are performed seriation.

By way of example, if a program is to carry out two operations, the first a function using data inputs A, B and C and producing an output D, and the second a function having data inputs B and C and producing an output E, these are performed sequentially. Typically, all data is stored in a common memory. The computer (1) accesses, in order, the data A, then B, then C, from the memory, (2) performs the necessary arithmetic or logical operations to implement the function, and (3) produces the output data D which then is entered into the memory. Subsequently, the memory again is accessed to obtain the data values B and C, which are used to perform the arithmetic or logic operations of the next function.

The resultant output value E in turn is entered into the memory.

In such a conventional program-flow controlled computer, the speed at which processing can take place is limited by the rate at which the instructions can be carried out in serial fashion, and by the sequential data access time to the common memory. The latter limitation is related to the so-called "Von Neumann bottleneck", a term denoting the reality that no matter how fast the processor operates, all of the data must line up single flie before it can get processed.

In the example of the conventional serial program flow-controlled computer just described, the second function could not be carried out until after completion of the first function, even though the data (B and C) required as inputs for the second function was already available in the memory at the time when computation of the first function began.

One prior art approach to overcoming this speed limitation has been the use of multiple processors operating in parallel. For the example just described, a computer with parallel processors could be programmed so that the first function was carried out by one processor while the second function was carried out by a separate parallel processor. However, even in such arrangement the processors must compete for access to the common data memory, thereby bringing into play the same Von Neumann bottleneck. The first processor must sequentially access the data A, B and C, while the second processor must separately, sequentially access the data B and C. This serial lineup of the data thus poses an inherent speed limitation even for a parallel processing computer of the program-flow controlled type.

By contrast, a data flow processor has a fundamentally different organization. Parallel processing is utilized, but the Von Neuman bottleneck is eliminated by distributing the data directly to the various parallel processing units. At each such unit, as soon as the data is available to perform the requisite function, that operation is carried out. Significant improvement in performance speed is achieved.

In the example described above, the various data A, B and C is distributed directly and concurrently the various processing units. The processing unit assigned to perform the first function accepts the data A, B and C and immediatly carries out computation of that function to provide the output value D. Meanwhile, as the data B and C is being provided to the first processing unit, it is also supplied to the separate processing unit configured to perform the second function. That unit accepts the data B and C, and immediately computes the second function, producing an output value E.

Thus in a data flow processor, functional operations are carried out not in a sequential program-flow controlled manner, but rather are carried out in response to operand data availability.

An object of the present invention is to provide an implementation of such a data flow processor.

This objective is achieved by providing a data flow processor in which operand data is communicated through the system in the form of tokens each having a tag field which contains an identification, and a data field which contains an operand data value.

The tokens are communicated on a bus, herein called a "flowbus", to which a plurality of token processors or "data flow cards" are attached. Each such token processor includes one or more processing units, each configured or controllable to perform a certain function requiring specific input data.

Also included on each data flow card is an input token processor (ITP) which monitors the identification or tag field of every token on the flowbus. The ITP looks for tokens which contain operand data required by the processing units on the same data flow card. When the ITP recognizes, from the identification in the tag field of the token on the flowbus, that the present token is required by such a processing unit, the ITP enters the operand data from that token into a memory on the data flow card itself.

When all of the requisite operand data needed by a particular processing unit has been received and entered into the ITP memory, the processing unit executes the function.

For example, if the processing unit is to carry out a function requiring input data A, B and C, the ITP looks for tokens on the flowbus having the tag field identification values "A", "B" and "C". When these tokens are detected, the corresponding operand data is stored by the ITP. As soon as operand values for all three inputs "A", "B" and "C" are received, the processing unit can immediately carry out the function.

Each token processor or data flow card also has an output token processor (OTP). As the processing unit produces an output data value, the OTP assembles this into a new output token. For example, if the processing unit performs a function having an output identified as "D", the OTP will produce a new token having an identification in the token field corresponding to "D", and will contain in the data field the actual data value produced by the processing unit.

The OTP queues the newly produced output tokens for delivery onto the flowbus. Associated with the flowbus is an arbitrator circuit which polls each of the token processors to determine which have tokens ready for output.

The arbitrator grants access to one token processor at a time, on an appropriate rotating priority basis. The token processor receiving the grant then outputs to the flowbus the next token in the queue.

The present invention also encompasses multiple flowbus configurations. Interconnecting the flowbusses are apporpriate bus-to-bus token processors which transfer from one flowbus to another only those tokens which are required by the recipient bus.

Computer configuration control of the inventive data flow processor permits arbitrary assignment of particular functions to each of the processing units on individual data flow cards.

Thus the system can be selectively reconfigured to optimize performance for a particular application. In effect, such reconfiguration constitutes "programming" of the data flow processor to implement a certain set of interrelated functions.

Figure 1 is a block diagram of the basic data flow processor in accordance with the present invention.

Figure 2 illustrates the format of an information token utilized in the data flow processor of Fig. 1.

Figure 3 is a schematic representation of a typical function implemented by a processing unit included in the data flow processor of Fig. 1.

Figure 4 is a schematic representation or "datagraph" of a plurality of interrelated functions which may be implemented by the data flow processor of Fig. 1.

Figure 5 is a block diagram of a typical token processor utilized in the system of Fig.

1.

Figure 6 is a block diagram of another embodiment of the inventive data flow processor utilizing two separate token-carrying busses, with facilities for the selective intercommunication of selected tokens between these busses.

Figure 7 is a block diagram of an embodiment of inventive data flow processor utilizing plural interconnected token carrying busses.

Figure 8 is a block diagram showing an arrangement for the input and/or output of data to the inventive data flow processor.

Figure 9 is a block diagram of a system facilitating computer controlled reconfiguration of a data flow processor in accordance with the present invention.

Fig. 1 shows a basic data flow processor 20 in accordance with the present invention. It includes a bus 21 (sometimes referred to as a "flowbus") over which information tokens are communicated. The token format 22 (Fig. 2) includes a tag field 22t containing a "tag" or identifier, and a data field 22d containing an actual operand data value.

Attached to the flowbus 21 are a plurality of token processors 23 (each sometimes called a "data flow card" and individually designated 23a, 23b etc.). Within each token processor 23 there is a function performing section 24 containing one pr more processing units 25. These processing units carry out the actual arithmetic, mathematical, logical or other processing of data received in token format from the flowbus.

Each processing unit 25 is configured to carry out a certain function (f) on certain input data. This is schematically illustrated in Fig. 3 wherein the circle or "node" 26 represents a certain function (fa) which is to be carried out by one of the processing units 25. The function (fa) requires three data inputs identified by the arrows or "arcs" labelled A(1), B(2) and C(3). The result of performing the function (f,) is the production of an output data identified by the arrow or arc labelled D(4).

As a simple example, the function (fa) may be the computation of a temperature gradient g across a wall having a thickness w, with an outside temperature t, and an inside temperature t,. in accordance with the formula (Equation 1).

g=~~ w To carry out this operation, the function unit 26 must receive three input data values. In the example of Fig. 3, the outside temperature to may be delivered on the path or arc A.

The tag for this arc is the number "1". Thus the input data representing the outside temperature to might be represented by a token having the format 22 of Fig. 2 in which the tag portion 22t contains the value "1" that identifies the accompanying data as being associated with the arc or path A (and therefore representing a value of outside temperature to) The data field 22b then would contain the actual numerical value of the outside temperature t0.

Correspondingly, the inside temperature value tj may be supplied on the arc B having a tag value "2" so that its associated token format would have the value "2" in the tag field 22t and the appropriate value of inside temperature tin the data field 22d. The same is true with the wall thickness value w delivered on the arc C having a tag value "3", to that its token would include the value "3" in the tag field 22t and the appropriate value of wall thickness w in the data field 22d.

When the function 26 is carried out (e.g., in accordance with the example of equation (1) above), the resultant output value is provided on an output arc D having the tag "4". The calculated data value for the gradient g thus can be delivered in token format by a token having a tag field value "4" (associated with the output arc D) and containing in the data field 22d the value of g computed using the input data presently supplied to the node 26.

Relating Fig. 3 to the actual data flow processor 20 of Fig. 1, one of the processing units 25 would be configured to carry out equation (1) above. Thus the processing unit 25a could comprise a microprocessor programmed to carry out such equation, or it might comprise dedicated hardware or firmware to carry out the equation (1).

To accumulate the data needed by the processing units 25, each data flow card 23 includes an input token processor (ITP) 30 described in more detail below. The ITP 30 monitors all of the tokens passed on the flowbus 21 to determine if any contain data needed by one of the processing units 25 contained on the same data flow card. In the example just described, where the processing unit 25a carries out equation (1) above, the ITP 30 looks for tokens which have tag field values of "1", "2", and "3", since these are tokens that contain the data values for the outside temperature to inside temperature t and wall thickness w that are required as input data to the processing unit 25a.If any such tokens are present on the flowbus 21, the input token processor 30 will accept the token and will store the data contained in the token in an appropriate memory.

As soon as all three data values to t, and w are received and stored in the ITP 30, the processing unit 25 can immediately carry out the function (fa), i.e., can immediately calculate the corresponding gradient g value in accordance with equation (1).

The resultant value of the gradient g calculated by the processing unit 25a itself is supplied in token format for delivery to the flowbus 21. To accomplish this, the data flow card 23 (Fig. 1) includes an output token processor (OTP) 31 which assembles tokens for delivery to the bus. As soon as the processing unit 25a has computed the new value of g, this OTP 31 assembles a token which has in its tag field 22t the value "4" corresponding to the arc D(4) for the node or function 26 (Fig. 3). The OTP 31 will place into the data field 22d of that token the new value for the gradient g which has just been computed by the processing unit 25a. The OTP 31 will then signal to the bus 21 that this token processor 23a has a new token ready to be delivered onto the flowbus.Later, when that token is outputted to the bus 21, it will become available to any processing unit which requires as a data input the gradient value g.

The flow of tokens on the bus 21 is controlled by an arbitrator unit 32 which performs a poll and grant operation. In an illustrative embodiment, the flowbus 21 may include a dedicated separate poll signal line and a dedicated separate grant signal line from the OTP 31 of each token processor 23 to the arbitrator 22.

At any one time, only a single token can be communicated on the flowbus 21. A typical system may have a 4MHz rate, so that four million tokens per second can be communicated, one at a time, via the flowbus 21. Each token may comprise 32 bits, including a 16bit tag field 22t and a 16-bit data field 22d, which 32 bits are communicated in parallel on the flowbus 21.

At any time, the arbitrator 32 recognizes from the poll signal lines which of the token processors 23 presently contains a token waiting to be passed onto the flowbus. At each cycle, the arbitrator 32 sends a grant signal to only one of the token processors 23 associated with the bus 21, thereby enabling that processor to output one token onto the flowbus.

Advantageously, but not necessarily, the arbitrator 32 utilizes a "rotating" priortiy scheme. Initially, the arbitrator scans the token processors sequentially. (For example, if there are sixteen processors 23 connected to the flowbus 21, the arbitrator will first check if processor &num;1 has a token to be delivered, then will check processors &num;2, &num;3, etc.) As soon as the one processor is determined to have a token ready for delivery, a grant signal is sent to that processor. Say, for example, this is processor &num;5. As soon as the token is delivered from this processor &num;5 to the flowbus, the corresponding processor &num;5 is resequenced by the arbitrator 32 to the lowest priority number. Thereafter, as each grant signal is generated, the priority order of the corresponding token processor likewise is set to the lowest value.This arrangement guarantees that each data flow card will be accessed once each 16 cycles. It should be understood, however, that this arbitration scheme is exem plary only, and any appropriate bus arbitration scheme could be used in conjunction with the inventive data flow processor 20.

With the data flow processor 20 of Fig. 1, complex computations can be carried out asynchronously, with the various constituent functions being separately performed in individual processing units 25. Each such unit carries out its dedicated function as soon as its requisite input data has been assembled by the associated input token processor. The "datagraph" of Fig. 4 shows schematically the type of complex computation that can be carried out by the data flow processor 20. Each of the nodes or functions f1 through f5 shown in Fig.4 represent a particular mathematical, arithmetic, logical or other process carried out by a respective one of the processing units 25. For example, the node f2 may correspond to the function (f,) in Fig. 3, and may be carried out by the processing unit 25a.The function f3 may be assigned to the processing unit 25b also contained in the data flow card 23a.

The functions fl, ~4 and f5 may be assigned to three respective processing units 25c, 25d, 25e in the same token processor 23a. Alternatively, some or all of these functions f1, f4 and f5 may be assigned to processing units 25 in separate token processors 23.

A significant characteristic of the inventive system is that the assignment of the various functions to the particular processing units is completely arbitrary. Any function can be assigned for performance to any processing unit.

Moreover, the inventive system will also accommodate the situation in which there are more functions to be performed than there are processing units. This can be accomplished by time sharing one or more of the processing units 25 to separately perform two or more functions. For example, the processing unit 25a in the token processor 23 may be configured to perform both the functions f2 and f3.

For example, the processing unit 25a may be implemented by a microprocessor with two separate stored programs, one for carrying out the function f2 and another for carrying out the function f3.

Furthermore, the times during which such processing unit 25a carries out either the function f2 or the function f3 need not be fixed. Rather, this can be dynamically determined in accordance with the availability of input data necessary to perform the particular function.

For example, the function f2 requires data values from the arcs A, B and C, corresponding to tokens having tags "1", "2" and "3".

On the other hand, the function f4 requires input data from arcs G and H, characterized by tokens having tags "7" and "8". The associated input tokens having tags "7" and "8". The associated input token processor 30 will accumulate all of these tokens, i.e., tokens with tags 1, 2, 3, 7 and 8. The input token processor then can indicate to the processing unit 25a when the appropriate data has been assembled for one or the other of the functions ~2 and f4. The processing unit 25a then can carry out the requisite in accordance with the availability of data.

For example, the ITP 30 may first receive a token with tag "1", then a token with tag "7", then a token with tag "8". At this point, all of the data is available to perform the function ~4, but all of the dat needed for the function f2 is not yet available. Accordingly, the ITP 30 can indicate this situation to the processing unit 25a. That processor will then condition itself to carry out the function ~4, immediately perform this function, and provide an output value that is formatted into a token (by the output token processor 31) having the tag "9" associated with the output arc 1(9) of the function f4.

Subsequently, the ITP 30 may receive tokens having tags "2" and "3". At that point, all of the data (having tags 1, 2 and 3) required to carry out the function f2 will have been assembled by the ITP 30. The processing unit 25a, in response to this situation, will condition itself and carry out the function f2.

Of course, the functions f2 and f4 could be (and typically are) assigned to separate processing units, e.g. units 25a and 25c respectively. In that case, as soon as the ITP 30 has received tokens with the tags "1", "2" and "3", the processing unit 25a will carry out the function f2. Likewise, as soon as the ITP 30 has received tokens with the tags "7" and "8", the processing unit 25c will perform the function f2. For example, if tokens are received in the order (by tag value) "2", "7", "1", "3" and "8", the processing unit 25a will begin to carry out the function f2 when the token "3" is received, since all of the requisite input data then are available. The unit 25c will begin function f4 when the token "8" is received.

Furthermore, as described further below, several values for one or more arc may be received before a complete requisite set of data is assembled to carry out a particular function. The function will be performed when the complete set is received, using the input data for each arc on a "first received" basis.

For example, suppose tokens are received in the order (by token value) "2", "1", "1", "3". At this point, all input data will be available to carry out the function ~2, but two data values for the arc A(1) will have been received. The processing unit 25a will carry out the function ~2, but will use the first to be received of the two arc A(1) data vaues (i.e. it will use the data value from the earliest received of the two tokens with the tag "1").

The data value from the later received token with the tag "1" will be retained for use by the processing unit 25a as soon as second data values for the arcs B(2) and C(3) also are received.

The same input token processor 30 (Fig. 1) accepts tokens required for all of the functions carried out by all of the processing units 25a, 25b, 25c etc. in the function performing section 24 of the same data flow card. Thus if the processing units 25a, 25b and 25c are configured to carry out the functions ~2, F5,, and f4 respectively, the ITP 30 in the token processor 23 will assemble all of the tokens required for all of these functions.

Note in that example that one input arc l(a) of the function f5 corresponds to an output of another node (f4) which is processed on the same data flow card. The tag "9" tokens in this case will nevertheless be received by the ITP 30 from the flowbus 21 when they are outputted to that bus by the OTP 31 in the same data flow card.

An illustrative arrangement of the input token processor 30 is shown in Fig. 5. The input token processor 30 includes a required tag memory 33 which stores identification numbers corresponding to the tags of tokens required by the processing units in the function performing section 24 of the same token processor. In the example, where the functions ~2, ~4 and f5 are performed by processing units on this token processor or data flow card, tokens for the arcs A, B, C, F, G, H and I are required. Accordingly, the corresponding tag identification numbers "1", "2", "3", "6", "7", "8" and "9" are stored in the memory 33.

As each token appears on the flowbus, a tag comparator 34 determines if that token is required by this DFC by comparing the tag field of the token with all of the required tag identifications contained in the memory 33.

If the token is required, the comparator 34 signals an appropriate memory data entry control circuit 35 to accept the token and to load the data from the token into an appropriate position in a memory 36.

Advantageously, but not necessarily, the memory 36 may be of the first-in-first-out (FIFO) type. Also advantageously, the memory 36 is organized to have separate data storage areas 36a, 36b, 36c..., each operating on a FIFO basis, for storing respectively only data associated with tokens of the corresponding tag identification. For example, the storage area 36a may be assigned to store data field values from tokens association with the arc A having the tag identification "1". Similarly, the storage areas 36b through 369 respectively may store only data from the data fields of tokens having respective tag values "2", "3", "6", "7", "8" and "9".

The order in which various tokens appear on the flowbus 21 is somewhat arbitrary or random. However, the input token processor 30 assembles the incoming data in the appropriate order for supply to the function performing section 24.

For example, the first token on the bus 21 may have a tag "5". If this token is not needed by this DFC, the tag comparator 34 will recognize that the tag "5" is not one of those stored in the memory 33. Accordingly, this tag will not be accepted.

The next token on the bus 21 may have a flag "3". In this case, the tag comparator 34 will direct the entry control circuit 35 to accept the token. The corresponding data value (identified symbolically by the term "C,") will be entered into the first memory storage position in storage area 36c. The next accepted token may have a tag "2" and its data B1 will be entered into the first position in the memory storage 36b. It may be that the next token accepted by the ITP 30 also has a tag "3". Its data C2 will be entered into the next in order position in the storage area 36c. Note that how two values of data from arc C will be stored in the FIFO memory 36.However, the function performing section 24 will not yet have performed the initial calculation of the function ~2, since a first value of data from arc A has yet to be received.

If that information is next received, the tag comparator 34 will cause entry of the corresponding data value A, from the token with the tag "1" into the first memory position in the storage area 36a. The contents of the FIFO memory 36 will then have the appearance shown in Fig. 5. As this time, a memory output control 37 will recognize that one complete set of data (herein comprising the value A1, B,, and C,) required to perform the function f2 is present in the memory 36. The control 37 will then take these values, (on a firstin-first-out basis) from the memory 36, and provide them to the function performing section 24 for execution of the f2 function.

When this function has been performed, the function performing section 24 will output a data value (i.e., an output D, of the function f2 computed from the input data values A1, 8, and C,) associated with the arc D. This data value D, will be provided to a token forming circuit 38 (Fig. 5) in the output token processor 31. The circuit 38 recognizes that the output data Dl has been produced from the processing unit carrying the function ~2, so that this data represents data on the arc D and should therefore be incoporated into a token having a tag value "4". Accordingly, the token forming circuit 38 forms a token having the tag value "4" and the data value D,.

This newly formed token is placed in an output token queue 39 which may comprise a FIFO memory. Entry of the token into the queue 39 also causes a poll and grant circuit 40 to send a poll signal via the bus 21 to the arbitrator 32. This indicates that this DFC particular has a token ready for transmission to the bus. When the arbitrator 32 provides a grant signal back to that DFC, the control circuit 40 outputs the token first stored in the queue 39 to the bus 21.

There are certain practical limits to the number of token processors 23 which can be associated with a single flowbus 21 in the configuration of Fig. 1. Clearly, as more token processors 23 are added to a single flowbus, more concurrent processing capability is available, and hence more tokens are produced. in a given interval of time. The transfer rate of tokens on the flowbus, however, sets an upper limit to the number of tokens that can be passed between data flow cards in a given interval of time.

To overcome this limitation, the configuration of Fig. 6 shows how the system capability can be substantially increased by utilizing two or more independent but interrelated flowbus subsystems, each having the general configuration of Fig. 1. Provision is made, through the use of appropriate bus-to-bus token processors 41A, 41B to provide communication for tokens between the separate flowbus subsystems.

The examplary systems 20' of Fig. 6 combines the data flow processor 20 of Fig. 1, having data flow cards (DFC's) 23a, 23b...23i with a like data flow processor 20A having a flowbus 21A and associated token processors or DFC's 23j, 23k... 23m.

Associated with the flowbus 21A is an arbitrator 32A configured like the arbitrator 32.

Interconnecting the flowbusses 21 and 21A are the like configured bus-to-bus token processor cards 41A and 41B. Together, these facilitate the transfer of tokens between the busses 21 and 21A. However, not all tokens present on each bus is transferred to the other. Rather, only those tokens required for the functions performed in the token processors 23j-23m associated with the bus 21A are passed from the bus 21 to the bus 21A.

Conversely, only those tokens required for the functions performed in the processors 23a-32i connected to the bus 21 are transferred from the bus 21A to the bus 21. In this manner, optimum throughput of tokens on each of the flowbusses 21 and 21A can be accomplished, without requiring that both flowbusses process all the tokens required in the entire system. Expressed differently, each of the flowbusses 21 and 21A only handle those tokens necessary to the performance of the functions in the token processors associated with that respective bus.

The bus-to-bus card 41A itself resembles a token processor. That is, from the point of view of the flowbus 21, the bus-to-bus card 41A looks just like any of the other token processors attached to that bus.

In particular, the bus-to-bus card 41A includes an input token processor 30A which has a required tag memory 33A. Stored in this memory 33A, however, is a list of the tags for all of the tokens required by the function performing sections 24 of every one of the DFC's 23j-23m associated with the flowbus 21A. A tag comparator 34A compares the tags of each of the tokens on the bus 21 with those required by the bus 21A (as stored in the memory 33A). If the comparator 34A indicates that the token present on the bus 21 is required by the bus 21A, the comparator enables a gate 42A which passes the token from the bus 21 to a driver circuit 43A. In turn, driver 43A transmits the token to a receiver circuit 44B in the associated bus-to-bus card 41B.

The card 40B, which is configured just like the bus-to-bus card 41A, includes an output token processor 31 B. It has an output token queue 39B which enters the transfered token from the receiver 44B into a queue or FIFO memory. The OTP 31B also has a poll and grant control 40B. The presence of a token in the queue 39B causes this circuit to produce a poll signal on the flowbus 21A. The arbitrator 32A treats the bus-to-bus card 41 B in the same manner as it treats any of the token processors 23j-23m associated with the flowbus 21A. That is, in receipt of a poll from the bus-to-bus card 41B, it applies, in the appropriate priority order, a grant signal. In response to this grant signal, the OTP 31B transfers the token from the queue 39B to the flowbus 21A. Token transfer from the bus 21 to the bus 21A thus is accomplished.

Conversely, token transfer from the bus 21A to the bus 21 is accomplished via the input token processor 31B of the bus-to-bus card 41B, an associated driver 43B, a receiver 44A in the bus-to-bus card 41A and an output token processor 31A. Of course, the required tag memory 33B in the ITP 31B contains a list of the tags of all of the tokens required for performance of all functions on the bus 21.

In the system 20' of Fig. 6, each of the subsystems 20 and 20A functions completely independently of the other. Indeed, each such subsystem 20, 20A is unaware of the presence of the other system. The interface, consisting of the associated bus-to-bus cards 41A or 41B, looks to the individual system 20 or 20A just like any other token processor attached to the associated flowbus 21 or 21A.

The present invention is not limited to the one or two flowbus configurations of Figs. 1 and 6. To the contrary, a plurality of subsystems like that of Fig. 1, each with its own flowbus, can be interconnected to form an overall system. Moreover, the interconnections between the flowbusses are completely selectable or arbitrary.

An example of such a multiple flowbus system is shown schematically in Fig. 7. There, the separate data flow processor subsystems 20B through 20F each has its own respective flowbus 21B through 21F. Each of the subsystems is configured like that of Fig. 1, with its associated token processors 23B through 23F and its own associated arbitrator 32B32F.

In the configuration of Fig. 7, the subsystem 20B is interconnected to the subsystems 20C, 20D and 20F by respective pairs of bus-tobus cards 41B-C, 41B-D and 41B-F. Similarly, the subsystem 20C is interconnected with the subsystems 20D and 20F by the bus-to-bus cards 41C-D and 41C-F. Similarly, the subsystems 20E and 20F are interconnected with the bus-to-bus cards 40E-F.

As in the system described in Fig. 6, each pair of bus-to-bus cards 41 operate to transfer between the interconnected busses only those tokens that are required by the recipient bus.

As shown in Fig. 8, data input and output to the data flow processor 20 may be implemented by one or more input/output (I/O) cards 45 associated with the flowbus 21.

Such an I/O card 45 is configured substantially like a token processor 23. However, instead of having a processing unit 25 contained within the function performing section 24, it has an appropriate input receiver 46 and/or output driver 47.

The input token processor 30i in the I/O card 45 stores a list of the tags for all tokens having data which is to be transmitted via the output driver 47 to some outside device such as modem, printer, CRT or the like. Thus any time a token is present on the bus 21 which contains data to be output via the driver 47, that token is accepted by the ITP 30i and forwarded to the output device via the driver 47.

Conversely, data which is to be input to the flow bus 21 is supplied via the receiver 46 to an output token procesor 31i where it is assembled into a token with the appropriate tag.

The new token is then queued and delivered to the bus 21 in the same manner as is carried out by the output token processor 31 described in connection with Fig. 5.

The overall data flow processor system configuration and initialization may be implemented by an auxiliary computer or CPU 50 that is interconnected, as shown in Fig. 9, with the token processors or data flow cards associated with the data flow processor 20.

In particular, each of the token processor cards 23a, 23b ... 23i associated with the flowbus 21 (as described in Fig. 1) may also be interconnected to the CPU 50 via a separate bus (herein the "LDF bus") 51. One purpose of this arrangement is to allow the specific functions that are performed by each of the processing units 25 in each of the data flow cards 23 to be selected and initialized under control of the computer 50.

By way of example, the processing unit 25a may itself comprise a microprocessor and a program memory into which is stored the appropriate program or programs to cause that microprocessor to carry out one or more specified functions ft, ~2, etc. These functions and their corresponding programs need not be set in advance into the processing unit 25a.

Rather, they made be entered into the procesing unit 25a from the computer 50 when the data flow processor 20 is initialized to carry out some computation effort. Such an arrangement allows great flexibility in the applications to which the inventive data flow processor is placed.

If a particular user, carrying out a certain project, desires to have the processing unit 25a carry out the function (f,) of Fig. 3, he would, in advance of carrying out the actual computation, cause the computer 50 to enter into the processing unit 25a the appropriate computer program for carrying out the function (f,). Correspondingly, the CPU 50 may be used to load appropriate other functions into others of the processing units 25 in the function performing sections 24 of some or all of the data flow cards or token processors 23 associated with the flowbus 21.

This function loading operation need not be carried before the data flow processor 20 actually does its computation. Thus the CPU 50 can actually monitor the processing being done by the processor system 20, and dynamically introduce changes in the system at appropriate times. To this end, a performance monitor and debug card 52 interconnects the flowbus 21 with the LDF bus 51. This card 51 may be configured (by including an input token processor like the ITP 30 of Fig. 5) to detect the presence of tokens having certain particular tags, and to monitor the data values associated therewith. The computer 50 can then be programmed e.g., to respond to certain data conditions in tokens having certain tags. In response to detection of these conditions, the computer 50 can then change or alter the functions performed by one or more of the processing units in the token processors 23.Thus dynamic control of the system can be achieved.

Another but cognate function of the computer 50, utilized in both initialization and in dynamically changing the functions of the processing units, is to load into the required tag memories 33 (Fig. 5) of each input token processor 30 the list of tags of tokens required by the associated function performing section 24. Thus, in the example described above, when the CPU 50 sets up the processing unit 25a to carry out the function f2 of Fig. 4, the CPU 50 at the same time will enter into the required tag memory 33 the tags "1", "2" and "3" associated with the tokens containing the data requisite to carry out that function.

Similarly, the CPU 50 will load into the token forming circuit 38 of the OTP 31 (Fig. 5) the necessary information to ensure that output data from the processing unit 25a and associated with the arc D will be formed into a token having the necessary tag "4".

Similarly, if the data flow processing system incorporates more than one flowbus, the CPU 50 will also initialize the bus-to-bus cards 41A, 41B (Fig. 6) so as to enter in the requisite tag memories 33A and 33B the tag values of the tokens which must respectively be passed from the bus 21 to the bus 21A and vice versa.

Yet another function of the CPU 50 (Fig. 9) is to monitor malfunctions in the processor 20 as it is operating. For example, if a particular procesing unit, say the processing unit 25a in the token processor 23, should fail during a processing operation, this situation may be recognized by the CPU 50. For example, it may be recognized by the absence of any tokens with the tag "4" (which should be output from that function unit) for a certain interval of time. In that case, the CPU 50 could dynamically alter the system so as to cause some other, but operative, processing unit 25 to perform the function which previously had been assigned to the malfunctioning processing unit 25a.

Another function of the CPU 50 is to optimize the configuration of the data flow processor 20 for a particular computation. For example, if it is known that a particular computation requires a far larger number of operations of a certain particular function fK, as compared to the number of computations for other functions that are required, the CPU 50 may configure the systems so that more than one processing unit 25 is configured to carry out the function fK A particular feature of the present data flow processor 20 is that it utilizes no global memory to which all functions are concurrently attempting to gain access. The computation speed in such a global memory system often was limited by the competing demands to obtain access to the common data base. This and the associated Von Neumann bottleneck thus are eliminated.

Claims

1. A data flow processor comprising: a bus over which information tokens can be communicated, each token comprising a tag portion and a data portion, and a plurality of token processors connected to said bus, each token processor including: at least one processing unit for carrying out a specified function on certain input data identified by one or more specific tags, an input token processor for monitoring the tag portion of each token on said bus and for storing the data portion of each such token which has a tag portion corresponding to one of said specific tags identifying input data for a processing unit in said processor, said pro cessing unit carrying out its specified function on the resultantly stored data as soon as said data is assembled, and an output token processor for assembling a token consisting of the data produced as an output from said processing unit and a tag identifying such output, said token thereafter being communicated to said bus.

2. A data flow processor according to claim 1 further comprising: aribrator means, communicating via said bus to each of said token processors connected thereto, for polling all of said token processors connected to said bus to determine which have tokens awaiting communication to said bus, and for enabling said token processors to communicate said tokens to said bus on a rotating priority basis.

3. A plurality of data flow processors each according to claim 1, the bus of each such processor being interconnected to the bus of at least one other of said processors by a bus-to-bus token processor having; means for identifying from the tag portion thereof tokens present on one of the interconected busses which of such tokens are required by processing units of a second of the interconnected busses, and for transmitting only those required tokens from said one to said second interconnected bus.

4. A data flow processor according to claim 1 further comprising: computer means, interconnected to said token processors, for conditioning the processing units of said token processors to performing the associated specific functions and for correspondingly condition input token processors to store data only from tokens having tag portions representing input arcs of said specific functions.

5. A data flow processor comprising: a bus over which data tokens are communicated, each token having an identification tag field and an opened data field, and plural data flow cards associated with said bus, each having at least one processor unit adapted to perform a specifiable function on operand data obtained from tokens delivered via said bus and having certain values identifying input arcs of that function, and to produce, for delivery onto said bus, tokens containing data resultant from performance of said function and having tag values identifying an output arc of that function.

6. A data flow processor according to claim 5 and being configurable to perform the interrelated functions of a "datagraph", further comprising: computer means, interconnected to said plural data flow cards, for specifying to the processor units of said cards the particular functions to be performed thereby, said particular functions being those interrelated by said "datagraph", and to specify to said cards the identities of the input and output arcs corre sponding to each of said specified functions.

7. A data flow processor according to claim 5 wherein each of said data flow cards includes an input token processor comprising: a required tag memory means for storing the values identifying the input arcs for all of the functions performed by processor units on that data flow card, and a tag comparator means for comparing the tag values of each token on said bus with the tag values stored in said required tag memory means, and for accepting from said bus for use by that data flow card only those tokens found to have corresponding tag values.

8. A data flow processor according to claim 7 further comprising: memory means for temporarily storing at least the operand portions of those accepted tokens, and operand set providing means for providing the temporarily stored operand portions to each processor unit as a complete set of operand data for all input arcs required to perform the specified function of that processor unit has been stored in said memory means.

9. A data flow processor according to claim 5 wherein each of said data flow cards includes an output token processor comprising: token circuit forming means for forming tokens each having a tag value identifying an output arc of a function performed by a processing unit on that data flow card and having in the operand data field the data value procuced by that processing unit for the corresponding output arc.

10. A data flow processor according to claim 9 wherein said output token processor further comprises: output queue means for temporarily storing tokens formed by said output token forming means, and poll and grant means for indicating that a token in said queue means is ready for supply to said bus and for delivery of that token from said queue means to said bus in response to a grant signal.

11. A data driven token processor associated with a bus on which data tokens are communicated, each token including a tag field containing an arc identifier and an operand data field, comprising: at least one processing unit for performing a specifiable function on operand data for one or more input arcs of that function and producing one or more outputs on respective output arcs of that function, input token processor means for identifying from the tag fields thereof only those tokens communicated on the bus which correspond to input arcs associated with said at least one processing unit, and for temporarily storing the operand data from said identified tokens, and said processing unit performing said specified function as soon as a complete set of input operand data for all of the input arcs of that function have been stored by said input token processor means.

12. A token processor according to claim 11 further comprising: output token processor means for assembling and delivering to said bus new tokens each including a tag field value identifying an output arc of a function performed by said at least one processing unit and an operand data field value corresponding to an output value produced for that output arc by said processing unit.

13. A data flow processor substantially as hereinbefore described with reference to the accompanying drawings.