GB2251964A

GB2251964A - Processor arrays

Info

Publication number: GB2251964A
Application number: GB9100852A
Authority: GB
Inventors: Peter Eastty
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1991-01-15
Filing date: 1991-01-15
Publication date: 1992-07-22
Anticipated expiration: 2011-01-15
Also published as: GB9100852D0; GB2251964B; JPH04288659A

Abstract

A processor array includes a number of processors SP arranged in modules (sub-arrays) A, B etc. Each module includes a rectangular configuration of processors SP, preferably 4 x 4, interconnected by vertical and horizontal buses. Clocked buffers R are provided at each boundary between adjacent modules so as to provide a communication link for the corresponding buses. By effectively splitting the horizontal and vertical processor interconnections into a number of short buses, high operating speed can be maintained, and also high signal bandwidth can be provided. A modularised practical implementation using processor and buffer cards is also described (Figs 5, 6). <IMAGE>

Description

PROCESSOR ARRAYS This invention relates to processor arrays, such as may be provided in computer systems in which a number of processors are connected together for increased processing power.

In order to increase the size and hence the processing power of a computer system, a number of individual processors can be connected together in such a way that the tasks to be performed by the computer system are shared among the processors. It is known to connect the processors in such a system by a two-dimensional rectangular configuration, or even a three-dimensional cube-type configuration of common buses, such that the buses in a row or column extend fully across the resulting array of processors. A typical application for such a processor array is in the field of digital audio processing, in which a requirement may be to process digital audio signals on a large number of separate channels rapidly and simultaneously prior to mixing or the like.

A problem with such processor arrays is that, with increasing size of the system and hence an increasing number of individual processors, the common buses become longer, and this has an adverse effect on processing speed since the system clock rate must be set sufficiently low to allow for proper communication along the maximum length of the bus between the furthest-apart processors or inputioutput means.

Another problem is that with, for example, sixteen (or even more) processors connected by a common bus, access in one clock cycle is restricted to a single one of the processors; it would not, for example, be possible for two other processors elsewhere on that bus to communicate together at the same time.

Thus it will be apparent that, with a common bus interconnection system, increase in size of the processor array will result in a slowing down of operational speed and in a decrease of operational flexibility.

According to the present invention there is provided a processor array comprising a plurality of processor modules each of which includes n.m processors interconnected within the module by n buses in a first direction and m buses in a second direction, and clocked buffers arranged at each boundary between adjacent modules such that each of the clocked buffers is operative to connect a respective one of the buses in one module with the corresponding bus in the adjacent module.

In a preferred embodiment of the invention, there are the same number of buses in the first direction as in the second direction in each module, and preferably n = LI.

Thus each module behaves effectively as a pseudo-self-contained unit, and the clock rate and access speed may be high since they need only have values appropriate to a relatively small module. Since communication between modules is by way of the clocked buffers, expansion of the processor array by addition of further processor modules will have no effect on the processing speed and operation within each module, since the bus lengths within the modules are fixed.

In the preferred embodiment, the clocked buffers are bidirectional, and this allows communication to take place in either direction across each boundary between the processors. The preferred arrangement of the bidirectional buffers and their clocking enables each buffer to act as a signal passing place, one signal (constituting one bit of a digital word) passing across the boundary by way of the buffer in one direction at the same time as another signal (constituting one bit of another digital word) passes across the boundary in the opposite direction.

Input/output means, in the form of input/output clocked buffers, can be provided at two opposite sides of the array such that, for example, the vertically-interconnected buses of the modules terminate at top and bottom of the array in input/output clocked buffers.

The buses extending in one or both of the two directions can be connected in effectively circular modes, such that processors from different modules are physically interleaved with the interconnecting buses of one module alternating in physical location with the interconnecting buses of another module.

A processor array embodying the invention allows for modularisation of the processor mounting arrangement. The processors may be mounted on processor cards, and a preferred version involves the processors within each module which are connected to a particular bus (for example, in the vertical direction) being mounted on a respective processor card, such that each processor module is then made up of more than one processor card. In the case of each module including sixteen processors, this means that each processor card carries four (vertically interconnected) processors, and that four such processor cards make up a module. Each such processor card may include clocked buffers at each end of its bus. As well as processor cards, the array may include buffer cards on which are mounted clocked buffers for interconnection of adjacent processor modules.In the case described above of processor cards carrying four vertically interconnected processors, the buffer cards will be used for connection between horizontally-adjacent processor modules. The buffer cards may also be used for input/output connection.

Embodiments of the invention can thus provide significant advantages.

Firstly, as outlined above, since the buses within each module are kept electrically short, due to the use of the clocked buffers between modules, high operational speed of the processor array can be maintained.

Secondly, the processor array is readily extendible, the array being capable of expansion in size without incurring a speed penalty, since no individual bus gets longer (and hence slower) as the array gets larger.

Thirdly, the processor array can have a high operating bandwidth.

The large scale parallelism provided by the many separate horizontal and vertical buses leads to an increase in bandwidth compared to single bus systems. The separation of each horizontal or vertical bus into a number of independent shorter buses connectable together allows for more transfers to be active in each access time slot than with a single bus.

Fourthly, the processor array can be implemented in a practical manner. Compared with complex full-interconnect, hyper-cube style bus arrangements, fully parallel implementations are possible using standard printed circuit board technology.

The invention will now be described by way of example with reference to the accompanying drawings, throughout which like parts are referred to by like references, and in which: Figure 1 is a schematic diagram of a simplified arrangement of a processor array according to an embodiment of the invention; Figure 2 is a schematic diagram of an interleaved arrangement of a processor array according to another embodiment of the invention, with input/output clocked buffers connected to the vertical buses and with the horizontal buses arranged in circular mode; Figure 3 is an enlarged view of part of Figure 2, showing the manner in which the processors of two modules are interleaved;; Figure LI is a schematic diagram of an interleaved arrangement of a processor array according to a further embodiment of the invention, with both vertical and horizontal buses arranged in circular mode; Figure 5 is a perspective view of part of a practical implementation of the invention using processor and buffer cards, corresponding to the top left-hand corner of the array of Figure 2 and the enlarged view of Figure 3; and Figure 6 is a perspective view similar to that of Figure 5, but corresponding to the top centre of the array shown in Figure 2.

Referring to Figure 1, a simplified arrangement of a processor array embodying the invention is shown as including sixteen modules of sixteen single processors SP each; thus the total number of individual processors in the array is 256.

The modules are referenced A to P such that the top left-hand module is referenced A and the bottom right-hand module is referenced P.

The processors SP in each module are arranged in a 4 x LI matrix with a vertical bus VB (only module A referenced) connecting each column of processors SP, and a horizontal bus HB connecting each row of processors SP. Thus each module has four vertical buses VB and four horizontal buses HB. At the boundary between adjacent modules (referenced only for the modules A and B), there is a bidirectional clocked buffer R for connecting the respective bus from one module (e.g. A) with its corresponding bus in the adjacent module (e.g. B).

Each bidirection buffer R is shown as being formed by two oppositelydirected clocked buffer parts, such as a pair of registers, connected in parallel. Figure 1 is a simplified diagram and so no input/output means for the processor array are shown.

It will be clear from Figure 1 that the processors SP within each module may readily communicate with each other by means of the vertical and horizontal buses VB, HB. The vertical and horizontal buses VB, HB are synchronously re-clocked at regular intervals in the array.

Synchronous clocking of the bidirectional buffers R means that, not only can communication take place in either direction across the module boundaries, but that each buffer R can act as a signal passing place with one signal passing across the boundary by way of the respective unidirectional part of the buffer at the same time as another signal passes across the boundary by way of the opposite unidirectional part of the buffer in the opposite direction.

If a larger processor array is required, further modules can be added without increasing the effective length of the vertical and horizontal buses VB, HB within each module. Thus, since the buses within each module are kept electrically short, high operational speeds of the processor array can be achieved. The processor array may also have a high bandwidth since the effective controlled separation of each of the horizontal and vertical buses HB, VB allows for more transfers to be possible at one time than if the horizontal and vertical buses were to extend continuously across the full length and width of the array.

The modules in the processor array do not need to be physically separate as shown in Figure 1, and the arrangement of Figure 2 in which the horizontal buses are connected in circular mode has particular advantages. The circular mode of the horizontal buses is such as to lead to an effective interleaving of processors and horizontal buses.

Thus, as shown in Figure 2, the modules A and D are interleaved, as are the modules B and C; the other modules are interleaved in similar manner. It will be apparent from an inspection of the electrical connections that the interleaved modules remain electrically independent; the reason for using the circular mode of connection is that the horizontal bus interconnections are improved since, for example, not only are the modules A, B, C and D serially connected (as in Figure 1), but the modules D and A are also directly connected thereby completing the circular connection mode. This has the effect of improving communication between the modules A and D (which, in Figure 1, would need to be synchronously clocked via the modules B and C).

The array shown in Figure 2 also includes input/output means in the form of bidirectional input/output clocked buffers R' at the top and bottom sides of the array, connected to the vertical buses of the adjacent modules.

In order to explain more clearly the configuration of Figure 2, reference is made to Figure 3 which shows an enlarged view of the top left-hand corner of Figure 2, namely the interleaved modules A and D.

The processors of the module A are referenced All to ALILI, and the processors of the module D are in similar manner referenced D11 to Dllrl.

The processors All to A44 are interconnected by horizontal buses HBAI to HBA4, and by vertical buses VBA1 to VBA4. The processors D11 to DLILI are interconnected by horizontal buses HBDI to HOD4, and by vertical buses VBD1 to VBD4. As will be seen, there is no interconnection between the processors of the two modules A and D, other than by the edge-mounted buffers R at the left-hand side of Figure 3.Input/output communication is effected in this embodiment by means of the input/output buffers R' connected to respective ones of the vertical buses VBA1 to VBALI and VBD1 to VBDLI. It can be seen that both the processors and the bus lines of each module are interleaved, without affecting the electrical interconnections. The remaining modules of Figure 2 are connected in a similar manner.

Figure LI shows a processor array in which both vertical and horizontal buses are arranged in a circular mode. This leads to a more complex interleaving arrangement than that shown in Figures 2 and 3, in that now four modules are interleaved together. There are four groups of modules, namely A, D, M, P; B, C, N, O; E, H, I, L; and F, G, J, K.

Connection between modules is via the buffers R, both at the boundaries between groups of modules and also at the array edges. Input/output connections, although not shown in Figure 4, may be arranged suitably such as at the array edges.

The advantages of the array shown in Figure LI are similar to those of Figure 2, namely that processing speed and flexibility are improved by overcoming the need for modules at the array edges having to synchronously clock through other modules in the array to communicate together. However, these advantages are increased in the Figure 4 array since vertical circular communication is possible as well as horizontal circular communication.

Figure 5 is a perspective view of the top left-hand part of a practical implementation of the processor array of Figures 2 and 3.

The processors are mounted on processor cards such that the processors within each module which are connected to a particular vertical bus are on one processor card. Thus, in the 4x4 module arrangement as illustrated, each processor card will have four individual processors, and there will be four processor cards to each module. Figure 5 shows eight processor cards PCA1, PCD1, PCA2, PCD2, PCE1, PCH1, PCE2, PCH2 and each one, for example the card PCA1, includes four individual processors All to A41 and two buffers R. Although the input/output buffers are designated R', in practice they can be identical to the module interconnection buffers R. As well as the processor cards, edge buffering for the circular connection mode is performed by buffers R mounted on buffer cards RCAD, RCEH.The processor and bus designations are similar to those used in Figure 3, except that E and H have been added for the adjacent modules E and H as shown in Figure 2.

Figure 6 is similar to Figure 5, but shows the practical implementation of the top centre of the processor array of Figure 2.

The same designation scheme is used as in Figure 5, and it will be seen that four buffer cards RCAD', RCBC', RCEH', RCFG' are interposed between the processor cards.

It will be noted that, in the arrangements of Figures 5 and 6, there are effectively two buffers in series at each vertical module boundary. The synchronous clocking is arranged to take account of this.

The use of the processor cards and buffer cards of Figures 5 and 6 represents a very efficient and advantageous implementation of the invention. All configurations can be represented by just one design of processor card, and similarly just one design of buffer card.

Servicing of the processor array is very simple since it requires only the replacement of any faulty card. The previously-mentioned advantages of the invention are retained since the electrical layout of the modules is the same as that described above. In particular, the processing speed advantage is maintained, as is the flexibility of the design, requiring only the addition of further processor and buffer cards in order to expand the processor array.

Although embodiments of the invention have been described in the preferred context of modules of 4 x 4 processor configuration, the number of processors in each module may be varied as required. Also, the number of columns in a module may be different from the number of rows, thereby providing rectangular rather than square modules.

Claims

1. A processor array comprising a plurality of processor modules each of which includes n.m processors interconnected within the module by n buses in a first direction and m buses in a second direction, and clocked buffers arranged at each boundary between adjacent modules such that each of the clocked buffers is operative to connect a respective one of the buses in one module with the corresponding bus in the adjacent module.

2. A processor array according to claim 1, wherein there are the same number of buses in the first direction as in the second direction in each module, whereby n = m.

3. A processor array according to claim 2, wherein there are sixteen processors in each module interconnected by four buses in the first direction and by four buses in the second direction.

4. A processor array according to claim 1, claim 2 or claim 3, wherein the clocked buffers are bidirectional thereby allowing communication to take place in either direction across each boundary.

5. A processor array according to claim li, wherein the bidirectional buffers are clocked in such a way that a signal can pass across the boundary in one direction at the same time as another signal passing across the boundary in the opposite direction.

6. A processor array according to any one of the preceding claims, including input/output means provided at one side of the array.

7. A processor array according to claim 6, wherein the input/output means are provided at two opposite sides of the array.

8. A processor array according to claim 7, wherein the input/output means comprise input/output clocked buffers connected to the respective buses at each of the opposite sides of the array.

9. A processor array according to claim 7 or claim 8, wherein the buses extending in the other direction to that of the buses connected to the input/output means are connected in a circular mode.

10. A processor array according to any one of claims 1 to 8, wherein the buses in the first direction and the buses in the second direction are both connected in circular modes.

11. A processor array according to claim 9 or claim 10, wherein, according to the circular mode of bus connection, processors from different modules are physically interleaved with the interconnecting buses of one module alternating in physical location with the interconnecting buses of another module.

12. A processor array according to any one of the preceding claims, wherein the processors within each module which are connected to a particular bus in the first direction are mounted on a respective processor card, such that each module comprises more than one processor card.

13. A processor array according to claim 12, wherein each processor card includes clocked buffers at each end of the particular bus which is interconnecting the processors on that processor card.

14. A processor array according to claim 12 or claim 13, including buffer cards on which are mounted clocked buffers for interconnection of adjacent processor modules and/or for input/output connection.

15. A processor array substantially as herein described with reference to the accompanying drawings.