WO1992003792A1 - Method and apparatus for routing and partitioning a multistage interconnection network and for determining network passability - Google Patents

Method and apparatus for routing and partitioning a multistage interconnection network and for determining network passability Download PDF

Info

Publication number
WO1992003792A1
WO1992003792A1 PCT/US1991/005667 US9105667W WO9203792A1 WO 1992003792 A1 WO1992003792 A1 WO 1992003792A1 US 9105667 W US9105667 W US 9105667W WO 9203792 A1 WO9203792 A1 WO 9203792A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
stage
permutation
bpc
inputs
Prior art date
Application number
PCT/US1991/005667
Other languages
French (fr)
Inventor
Chien-Yi R. Chen
Jyan-Ann C. Hsia
Original Assignee
Syracuse University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Syracuse University filed Critical Syracuse University
Publication of WO1992003792A1 publication Critical patent/WO1992003792A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q3/00Selecting arrangements
    • H04Q3/64Distributing or queueing
    • H04Q3/68Grouping or interlacing selector groups or stages
    • H04Q3/685Circuit arrangements therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17356Indirect interconnection networks
    • G06F15/17368Indirect interconnection networks non hierarchical topologies
    • G06F15/17393Indirect interconnection networks non hierarchical topologies having multistage networks, e.g. broadcasting scattering, gathering, hot spot contention, combining/decombining
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/104Asynchronous transfer mode [ATM] switching fabrics
    • H04L49/105ATM switching elements
    • H04L49/106ATM switching elements using space switching, e.g. crossbar or matrix
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/25Routing or path finding in a switch fabric
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/101Packet switching elements characterised by the switching fabric construction using crossbar or matrix
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/15Interconnection of switching modules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/30Peripheral units, e.g. input or output ports
    • H04L49/3009Header conversion, routing tables or routing tags

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A method and apparatus are described for simulating on one multistage interconnection network (MIN) (210) the operation of a second MIN. By means of two algorithms, we generate characteristic O and I vectors for the first and second MINs and generate from these vectors two additional vectors, U and V, which are used to specifiy a reordering of the inputs (240) and outputs (250) of one MIN (210) in order that it may simulate the operation of the second MIN. A method and apparatus are also described for using the characteristic O and I vectors of a MIN (210) to implement either distributed or centralized routing of messages through that MIN (210). In distributed routing, the inverse of the characteristic I vectors is used to calculate the tags which then accompany the messages passing through the MIN (210) in order to route those messages to the desired network output. In centralized routing, all switch states are determined outside the network by calculations made using the MIN's (210) characteristic vectors. A method and apparatus are also decribed for implementing network partitioning by means of algorithms which make certain determinations based on the values of that network's charcteristic O and I vectors. A method and apparatus are also described for determining what permutations or groups of permutations can be realized (i.e., passed) by an arbitrary BPC MIN. Another closely related method and apparatus are described which allow the inputs of any BPC MIN to be reordered so that MIN can pass any specified permutation.

Description

METHOD AND APPARATUS FOR ROUTING AND PARTITIONING A MULTISTAGE INTERCONNECTION NETWORK AND FOR DETERMINING NETWORK PASSABILITY
Cross-Reference to Related Applications
This is a continuation-in-part of application Serial No. 07/414852, filed September 29, 1989 for "Method and Apparatus for Simulating an Interconnection Network." This application is incorporated herein by reference.
Field of the Invention
The present invention relates to a Method and
Apparatus for Routing and Partitioning a Multistage
Interconnection Network and for Determining Network
Passability. Background of the Invention
One technique for increasing the performance of present day computer systems is to provide multiple, interconnected processors which all operate concurrently on the same problem. On some types of problems, such a multiprocessor system of N processors can achieve a speedup factor of nearly N over a uniprocessor system.
In a multiprocessor system, communication between processors and between processors and memory takes place via an interconnection subsystem. The interconnection subsystem may have many different forms, the least expensive of which is a time shared bus. The time shared bus, however, has relatively low bandwidth and thus becomes inadequate even for a system with a relatively small number of processors.
For greater bandwidth, an interconnection network is used instead of a time shared bus. Interconnection networks are categorized as static or dynamic. In a static network such as the ring, binary tree or hypercube, the network links are permanently fixed and links within the network cannot be connected to nodes other than the ones to which they are fixed. In contrast, a dynamic network such as the crossbar, Benes, or a member of the Banyan family of multistage interconnect networks (MINs), possesses switching elements which are active and allow the network to be reconfigured so that the network connects any input directly to any output.
The crossbar network provides the maximum
bandwidth of any interconnection network. However, for systems using large numbers of processors, it is by far the most expensive and most difficult to build because the number of switching elements is directly proportional to the product of the number of inputs and the number of outputs of the network.
The Banyan family of MINs, on the other hand, is far less expensive to build because the number of switching elements required to implement such a network is directly proportional to N log(N) where N is the number of inputs to the network. However, Banyan networks can be blocked by attempts to make certain simultaneous connections of inputs to outputs, resulting in a contention of internal network communication links. This blocking condition causes some input to output connections to fail and requires that these failed connections be attempted again later.
Blocking conditions are avoided in networks such as the Benes network. Although these networks can realize all simultaneous connections between inputs and outputs, they require approximately twice the hardware of Banyan type MIN's to achieve the overall network state necessary to implement a particular connection requirement.
The Banyan family of MINs, as well as the Benes type nonblocking networks, provide practical and economical interconnection means for multiprocessor systems of any size. Both network families have good bandwidth
capabilities relative to their degree of complexity and have proven effective in implementing various multiprocessor systems to date. However, the proliferation of different types of Banyan networks and the recognition that different types of such networks are especially suited for the solution of different types of problems lead to difficult choices for a designer or user in deciding which one of the many available networks to implement or acquire.
Banyan Type MINs
Banyan type MINs are defined by L.R. Goke and G.J. Lipovski in "Banyan Networks for Partitioning
Multiprocessor Systems," 1st Annual Symposium on Computer Architecture, pp. 21-28 (Dec. 19, 1973), as networks having a unique path between any input and any output. Examples of networks that are Banyan type MINs include the Delta family of networks (shown in Fig. 1), the Omega (Fig. 2), Baseline and Reverse Baseline networks (Fig. 3), the Indirect Binary Cube and. Cube networks (Fig. 4), the Flip network (Fig. 5) and the Modified Data Manipulator (Fig. 6). Although these networks perform differently on different types of
computations, they have all been determined to be
topologically equivalent multistage interconnect networks. Topological equivalence implies that for any MIN in a class of equivalent Banyan type MINs, any other MIN in this class can be obtained by appropriate renumbering and rearrangement of the inputs and outputs. For an extensive discussion of these network types and topological equivalence, see D.S. Parker, Jr., "Notes on Shuffle/Exchange-Type Switching
Networks," IEEE Trans. Comput., Vol. C-29, pp. 213-222
(March 1980); C. Wu and T.Y. Feng, "On a Class of Multistage Interconnection Networks," IEEE Trans . Comput., Vol. C-29, pp. 694-702, (Aug. 1980); J.H. Patel, "Performance of
Processor-Memory Interconnections for Multiprocessors," IEEE Trans. Comput., Vol. C-30, pp. 771-780, (Oct. 1981); and H.J. Siegel, Interconnection Networks for Large-Scale
Parallel Processing, pp. 125-150 (Lexington Books, 1985), which are incorporated herein by reference.
As is evident from Figs. 1-6, Banyan type MINsare built in a multistage fashion using many smaller
crossbar switches that are connected together via
communication links. The outputs of the crossbar switches in one stage are fed via the communication links into the inputs of the crossbar switches in the next stage until an input to output path has been established.
In general, the number of network inputs need not equal the number of network outputs and the crossbar
switches that are used to build the overall Banyan type MIN may be of any size. If a x b crossbar switches (i.e., switches having a inputs and b outputs) are used, the resulting Banyan type MIN will have an network inputs and bn network outputs if n stages are used. However, as depicted in the conventional MINs illustrated in Figs.
1-6, Banyan type MINs are commonly built using 2x2 crossbar switches 30 and commonly have the same number N of inputs 40 and outputs 50. Such a square NxN network is built with n (where n=log2N) stages 60 of crossbar switches 30, with each stage consisting of N/2 2x2 crossbar switches 30. The stages are interconnected by N communication links 70 between each pair of successive stages as well as between the inputs and the first stage and between the last stage and the outputs. The links provide distinctive
interconnection patterns or permutations between these elements and in conjunction with the crossbar switches establish paths or mappings between the inputs and outputs.
Figs. 1-6 depict illustrative 16x16 Delta, Omega, Baseline, Indirect Binary Cube, Flip and Modified Data
Manipulator Networks, respectively. As will be apparent, each network has sixteen inputs 40, sixteen outputs 50 and four stages of eight 2x2 crossbar switches 30. Also
represented by Figs. 3 and 4, respectively, are the Reverse
Baseline and Cube networks which are simply reversed
versions of the Baseline and Indirect Binary Cube networks in which the network inputs are located in place of the outputs shown in these figures and the network outputs located in place of the inputs.
For any network of N inputs and N outputs, the number of different mappings of all N inputs each to a different output is N!. Any of these mappings can be implemented in a crossbar network. Like a crossbar network, the Banyan type MIN's are able to connect any of their inputs to any of their outputs. Unlike a crossbar network, however, pairs of the input to output paths of a Banyan type
MIN share switching elements at each stage within the network and contentions will result if both paths need to use the same output from a switching element in order to reach their final destination. The Banyan type MIN in Fig.
1, for example, cannot realize the identity mapping in which input 0 is connected to output 0, input 1 is connected to output 1 and so on. Other Banyan type MIN's exhibit similar difficulties in realizing other mappings. Thus, unlike crossbar networks, an NxN Banyan type MIN cannot realize all Nl possible mappings.
As a result of these differences in the ability of a MIN to realize a specific mapping, one Banyan type MIN may be better suited than another to implement a specific type of parallel processing algorithm. Consequently, in the prior art, different multiprocessor systems are customarily constructed with different interconnection networks
depending on the nature of the algorithms to be run on the system. However, if an algorithm that should ideally be implemented on one network is implemented on another
network, software redesign normally is required if that algorithm is to run optimally.
Hence, for reasons of flexibility, it would be very desirable for any given multiprocessor system to be able to implement many different networks so that many types of algorithms could be optimally implemented in a convenient and efficient manner. Such a system would yield maximum performance for any algorithm type without the need for algorithm or software redesign specific to the network being used in the overall multiprocessor system.
A Simulator for Multistage Interconnection Networks
The above-referenced application Serial No.
07/414,852 describes a method and apparatus for simulating on one MIN the operation of a second MIN. In that technique vectors are generated which characterize the first and second MINs. From these characteristic vectors, two other vectors are generated which are used to make the first MIN simulate the second MIN. In particular, the first of these vectors is used to reorder the inputs to the first MIN and the second of these vectors is used to reorder the outputs. As a result of this, a set of inputs to said first MIN, when reordered, is mapped by the first MIN to a set of outputs which, when reordered, simulates the operation of the second MIN.
More particularly, by means of two algorithms first and second vectors, I1, O1, are generated which characterize the first MIN; and by means of the same two algorithms third and fourth vectors, I2, O2, are generated which characterize the second MIN. Fifth and sixth vectors
U, V, are generated where U = O2 * O1 -1 and V = I1 -1 * I2 where O1 -1 and I1 -1 are the inverses, respectively, of 0. and I1 and * is a two-operand permutation operation which permutes elements of a first operand (e.g., O2) in
accordance with an order specified by a second operand
(e.g., O1 -1). The fifth vector is then used to reorder the inputs to the first MIN; and the sixth vector is used to reorder the outputs from said first MIN. As a result, inputs to the first MIN are mapped to outputs from said first MIN in accordance with the input to output mapping of the second MIN.
As is described in the '852 application, the first, second, third and fourth vectors are generated by numbering the inputs to each MIN sequentially in binary notation and recording the paths of the inputs to the outputs of the MIN in the form of a series of permutations represented by permutation vector performed on the digits of the binary notation by the connection patterns of the communication links. For the MINs of interest these
permutations shift each digit of the binary notation of the input into the least significant bit position at one of the stages of the MINs. For each MIN, the order in which the binary digits are shifted into the least significant bit position determines the O vector. The I vector is
determined by identifying for each bit position in the binary notation of the outputs the stage at which the bit in that position was shifted into the least significant bit position.
By using the U and V vectors to reorder its inputs, the first MIN is able to simulate the second MIN in that any permutation which can be implemented by the first MIN can also be implemented by the second MIN. Moreover, if a permutation can be routed through the first MIN, then it is possible to route that permutation through the second MIN.
In routing a multistage network, one of two general techniques is normally used: distributed routing or centralized (global) routing. When distributed routing is used to route a MIN, control bits known as tags accompany messages through the MIN and configure the switching elements of the MIN to route the messages they accompany to the desired network output. On the other hand, in centralized routing the state of the individual crossbar switching elements is determined outside the network and control signals to the individual crossbar switching elements
configure the network to implement the desired routing.
When a first BPC MIN is used to simulate a second
BPC MIN, the second MIN is functionally equivalent to the first MIN. When distributed routing is used on either the first or second MIN, the routing tags that are used to establish paths through the second MIN (i.e., the simulated network) can be used to correctly establish paths in the first MIN (i.e., the simulating network). That is to say, the routing tags for either the simulated or simulating network are identical because the simulated and simulating networks are functionally equivalent. If, on the other hand, centralized routing is used, both the first MIN and the seconde MIN will route the same inputs to the same outputs if each switching element in one MIN is set to the same state as its equivalent switching element in the other MIN.
It is well known that equivalent switching
elements in two MINs do not necessarily appear in the same location in schematic representations of the respective networks. It is also well known that if the inputs, outputs, and switching elements depicted in the schematic of one MIN are properly rearranged, the schematic of an equivalent MIN will result. Switching elements in two equivalent networks which can be rearranged in one schematic to take the place of the other switching element in the second schematic are therefore considered to be equivalent switching elements. The above referenced application Serial No.
07/414,852 describes a large class of multistage networks known as BPC MINs which can be uniquely characterized by the two vectors known as the O and the I vectors. Because this class of networks is so large, it is desirable to route any member of the class of BPC MINs using an approach common to all networks of that class. To date, such standardized routing approaches have only been considered for a relatively small number of well known networks such as the Omega
network.
Summary of the Invention
We have found that the characteristic vectors of networks provide a vehicle for making significant
improvements in three areas of considerable practical
importance in the operation of multistage interconnection networks: routing, partitioning, and determining
permutation passability. Routing
We have devised a method and apparatus for using the characteristic O and I vectors of any network which is a BPC MIN to perform distributed routing on that MIN. We have also devised a method and apparatus for using the
characteristic O and I vectors of a BPC MIN to perform centralized routing, also known as global routing, on that
BPC MIN.
In accordance with the invention, we have found that routing tags may be determined for any BPC MIN by permuting the binary destination address in accordance with the I-1 vector of the network being routed.
When, on the other hand, centralized routing is used to route a BPC MIN, the settings of the individual switching elements may be determined in accordance with the invention by logical functions which are performed outside the network being routed. These logical functions involve the use of the BPC MIN's permutation vectors P0, P1, ... Pn and the inverse of the MIN's characteristic I vector to determine the states of each of the MIN's switching
elements. Once the state of each of the MIN's switching elements has been determined, control signals are used to configure the MIN accordingly and to thereby accomplish routing.
Partitioning
It is often desirable to partition a single network into multiple smaller disjoint networks known as subnetworks. Network partitioning of this type is necessary and useful for a variety of interconnection network
applications such as fault tolerance and for applications in which multiple tasks are run simultaneously.
It therefore is desirable to partition any one of this large class of networks. Moreover, it is desirable to be able to partition these networks using an approach common to all networks of that class.
We have devised a method and apparatus for
partitioning any BPC type MIN into multiple disjoint BPC type MINs. The resulting partition and multiple disjoint BPC type MINs can be completely and uniquely specified by the
characteristic O and I vectors of the original BPC MIN being partitioned.
A BPC MIN can be partitioned into multiple
disjoint subnetworks by setting all of the switching elements in one or more of the stages of that network to either the pass-through state or the exchange state. If all of the switching elements of i stages of a network are set to the pass-through or exchange state, the original BPC MIN is effectively divided into 2 disjoint subnetworks. The source and destination addresses which compose these subnetworks are determined by the states to which the i stages of the network are set.
The characteristic O and I vectors of the network being partitioned can be used to determine the subnetworks that are formed when certain stages of the BPC MIN are set to certain states. Conversely, the O and I vectors of a BPC MIN being partitioned can also be used to determine the stages which should be set to some state in order to partition a BPC
MIN so that certain specific subnetworks are produced. The O and I vectors can then be used to determine the specific states to which those stages should be set. Because of this, partitioning of any BPC MIN may be performed according to the number and size of the disjoint subnetworks which are
required to be produced from that partition.
Network Passability
Although any single N x N BPC type MIN can realize only a relatively small fraction of all N! possible permutations, or mappings of inputs to outputs, the complete set of all BPC type MINs taken together can indeed realize all N! possible permutations. That is to say, although any BPC MIN of the same size can realize exactly the same number of permutations, one particular BPC MIN may be able to realize a certain permutation or group of permutations that another BPC MIN can not. As a result of these differences in the ability of one BPC MIN to realize a specific permutation that another BPC MIN can not, one BPC MIN may therefore be better suited than another to implement a particular type of parallel processing algorithm. For these reasons, it is desirable to know which permutations or groups of permutations can be realized (i.e. passed) by an arbitrary BPC MIN. Additionally, it is
desirable to be able to specify a BPC MIN which can realize any specified permutation.
We have devised a method and apparatus to
determine which permutations or groups of permutations are passable by an arbitrary BPC MIN. Additionally, we have devised a closely related method and apparatus for
determining a BPC MIN which can pass any specified
permutation.
The method and apparatus to determine permutation passability utilizes the characteristic O and I vectors of the BPC MIN under consideration. Permutation passability is determined by first partitioning the BPC MIN under
consideration in several specially chosen ways and by then making certain determinations based on the value of the destination routing tags of the permutation being checked for passability.
Like the method and apparatus to determine
permutation passability, the method and apparatus to
determine a BPC MIN which can pass any specified permutation also utilizes the characteristic O and I vectors of the BPC MIN being used to pass that permutation. This method and apparatus provides for the input of a Modified Data
Manipulator network to be reordered to allow that network to pass any permutation. Once this reordering has been
accomplished, the inputs and outputs of any other BPC MIN can be reordered to allow that network to simulate the Modified Data Manipulator network. In this way, any BPC MIN can thus be configured to pass any arbitrary permutation. Brief Description of Drawing
These and other objects, features and advantages of the invention will be more readily apparent from the following detailed description of a preferred embodiment of the invention in which:
Fig. 1 is a schematic illustration of a prior art
Delta interconnection network;
Fig. 2 is a schematic illustration of a prior art Omega interconnection network;
Fig. 3 is a schematic illustration of a prior art Baseline interconnection network;
Fig. 4 is a schematic illustration of a prior art Indirect Binary Cube interconnection network;
Fig. 5 is a schematic illustration of a prior art
Flip interconnection network;
Fig. 6 is a schematic illustration of a prior art Modified Data Manipulator interconnection network;
Fig. 7 is a schematic illustration of an
interconnection network used in an illustrative example;
Fig. 8 is a flowchart of a preferred embodiment of the method of our invention;
Fig. 9 is a flowchart depicting an algorithm used in the practice of our invention;
Figs. 10a, 10b and 10c are a flowchart depicting another algorithm used in the practice of our invention.
Fig. 11 is a schematic illustration of a preferred embodiment of a multi-stage interconnection network of our invention;
Fig. 12 is a schematic illustration of a preferred embodiment of certain elements of the MIN of Fig. 11;
Fig. 13 is a schematic illustration of the use of the interconnection network of Fig. 11 to simulate another interconnection network; and Fig. 14 is a schematic illustration of a prior art 2n-1 stage rearrangeable interconnection network.
Fig. 15 is a schematic illustration of the MIN of
Fig. 7 which has been configured to connect one of its inputs to one of its outputs.
Figs. 16a, 16b, and 16c are a flowchart depicting another algorithm used in the practice our invention.
Fig. 17 is a schematic illustration of the MIN of
Fig. 7 after it has been partitioned into four smaller networks.
Figs. 18a, 18b, and 18c are a flowchart depicting another algorithm used in the practice of our invention.
Figs. 19a, 19b, and 19c are a flowchart depicting another algorithm used in the practice of our invention.
Figs. 20a and 20b are schematic illustrations of the MIN of Fig. 7 partitioned into two subnetworks.
Figs. 21a and 21b are schematic illustrations of the MIN of Fig. 7 partitioned into four subnetworks.
Figs. 22a and 22b are schematic illustrations of the MIN of Fig. 7 partitioned into eight subnetworks.
Fig. 23 is a schematic illustration of the MIN of Fig. 7 configured in accordance with one of the examples.
Figs. 24a, 24b, and 24c are a flowchart depicting another algorithm used in the practice of our invention.
Fig. 25 is a pictorial representation of one of the steps of an example.
Fig. 26 is a pictorial representation of another step of that example.
Fig. 27 is a pictorial representation of another step of that example.
Fig. 28 is a pictorial representation of another step of that example. Fig. 29 is a schematic illustration of the Data Manipulator Network of Fig. 6 with its inputs reordered in order to pass a particular permutation.
Fig. 30 is a schematic illustration of a preferred embodiment of a certain multistage network of our invention.
Fig. 31 is a schematic illustration of a preferred embodiment of certain elements of the MIN of Fig. 30.
Fig. 32 is a schematic illustration of a preferred embodiment of our invention.
Detailed Description of Preferred Embodiment of Invention
Table of Contents
For the convenience of the reader, the detailed description is divided into the following sections:
Permutation Vectors . . . . . . . . .16
Characteristic Vectors. . . . . . . .20
Simulation of Networks . . . . . . . .21
Algorithms to Calculate
Characteristic Vectors . . . . . .22
Possible Implementations of
Network Simulation . . . . . . . .24
Other Network Types . . . . . . . . .25
Distributed Routing . . . . . . . . .25 Centralized Routing . . . . . . . . .30
Partitioning of Bit-Permute¬
Complement Type MINs . . . . . .36
Determination of Subnetworks
from User Specified
Switching Element
Settings . . . . . . . . .38
Determination of Switching
Element Settings From
User Specified Subnetworks . . . . . . .47
Permutation Passability . . . . . . .62
Determining Permutation
Passability for an
Arbitrary Permutation
and BPC MIN . . . . . . .63
Possible Implementations of
Network Routing, Partitioning, and Permutation Passability . . .91
Determining the Set of All
Permutations Which are
Passable by an
Arbitrary BPC MIN . . . . . . .93
Determining a BPC MIN Which Can
Pass an Arbitrary
Permutation . . . . . . . . . .94
Possible Implementations of
Determining a BPC MIN to Pass
a Specified Permutation . . . . 107
Permutation Vectors
In describing a MIN, it is customary to identify or index the inputs and the outputs of the MIN by numbering them sequentially starting with zero. The same numbering scheme is also applied to the inputs and outputs of each stage of switching elements. As a result, the
interconnection pattern established by the communication links between any two stages or between the inputs and the first stage or between the last stage and the outputs can be specified by a label having two numbers, the first of which identifies the front end of the link and the second of which identifies the distal end of the link. In certain instances, where the relationship between the front ends and distal ends is regularly ordered, it is possible to specify this relationship by a
mathematical formula. In the case of several MINs, the formula can be specified in terms of a sequence of numbers that define a permutation of the binary digits which identify the front end of each link to generate the binary digits which identify the distal end of each link. We call this sequence of numbers a permutation vector, Pi.
In particular, each input of the N inputs to any of the MINs shown in Figs. 1-6 can be identified by a decimal number x having a binary equivalent value such that x = [xn- 1 . . . x0 ]
where
x = (2n-1) xn-1 + ... +(2) x1 + x0 .
As will be apparent, the number of binary digits required to identify an input is the same as the number of stages in the MIN.
Following the above-cited paper by Parker, it is possible to characterize the MINs of Figs. 1-6 by
permutations on these binary equivalent values. For
example, for the perfect shuffle MIN shown in Fig. 1, the perfect shuffle permutation σ is a circular left shift of the bits of the binary equivalent value of each input. This may be represented mathematically by σ(x) = σ ([xn-1 ... x0]) = [xn-2 ... x0xn-1]. The unshuffle, σ-1, is simply a circular right shift.
Similarly, a bit reversal permutation p is defined by p ( [xn- 1 xn-2 . . . x1x0 ] ) = [x0x1. . . xn-2xn-1] and the kth butterfly permutation βk which interchanges the first and kth bits of the binary index is defined by
k [xn-1...x0]) = [xn-1...xk+1 x0 xk-l ... x1 xk].
Of interest to the present invention is the entire class of MINs which can be specified by permutation vectors. This class includes all the MINs of Figs. 1-6 and many others. Further, the invention is also applicable to those MINs which can be specified by permuting the binary digits of the inputs and complementing one or more of those digits. We will refer to this entire class of MINs on which the invention may be practiced as the bit-permute-complement MINs.
In considering the routing of inputs to outputs through a MIN it is also necessary to consider the effect of the switching elements. The switching elements of a MIN have at least two operations, one of which is a pass-through that connects each input to an output that is indexed by the same number and the other of which is an exchange operation that swaps the two available outputs. Since the inputs are numbered sequentially from zero, the two inputs to a
switching element have binary values that differ only in their least significant bit. Thus, the exchange operation will be seen to have the effect of complementing the least significant bit while the pass-through leaves it unchanged. We specify the switching elements of a stage that are set to effect an exchange by an exchange vector Ei.
By use of permutation vectors Pi and exchange vectors E1, it is possible to describe many MINs succinctly. Thus, each of the MINs of Figs. 1-6 is described generally by
г(x) = P0 E0 P1 E1 P2 E2 P3 E3 P4 ([xn-1...xo]) where the vectors Pi and Ei are applied from the left; and the perfect shuffle MIN of Fig. 1, for example, may be described more particularly by
rσ (X) = σ0 E0 σ1 E1 σ2 E2 σ3 E3 σ4 ([xn-1... x0]).
While the exchange vectors are critical in determining the routing path from an input to an output, they operate only on the value or sign of the least
significant bit and do not affect the position of that bit in the set of bits provided to the switching element. For this reason, the change in the position of the digits in the binary representation of an input or output can be specified by the permutation vectors alone.
A better understanding of the operation of a set of permutation vectors may be obtained from a consideration of the bit-permute-complement network depicted in Fig. 7. The five permutation vectors that characterize the routing of the communication links of this network are
P0 = ( 2 , -1 , -0 , 3 )
P1 = ( 2 , 3 , 0 , -1 )
P2 = ( -0 , 1 , 3 , -2 )
P3 = ( 2 , -0 , 3 , 1)
P4 = ( 1 , 3 , 0 , 2 )
Each of these vectors specifies a permutation performed in turn on a set of binary values S = (s3, s2, s1, s0) which identify an input to the network. The use of a minus sign indicates that the bit in that position is complemented.
Thus, in the case of the permutation vector P0, this vector rearranges the binary digits s3, s2, s1, s0, which identify an input in the order s2, s1, s0, s3, and complements the value of the s1, and s0 bits. The permutation vector P then operates on the rearranged binary digits and so on. If we apply this set of permutation vectors to an input we have the permutations set forth in Table 1.
Figure imgf000022_0004
Thus for the network of Fig. 7, the relationship between the inputs and outputs that is established by the communication links is such that an input specified by the binary digits s3, s2, s1, s0, is mapped or routed to an output specified by the binary digits s
Figure imgf000022_0001
3, s
Figure imgf000022_0002
2, where is the complement of si.
Figure imgf000022_0003
Characteristic Vectors
As set forth in the '852 application, we have found that any bit-permute-complement interconnection network can be characterized by two vectors. One of these vectors is determined by the order in which the permutation vectors shift the digits of the binary notation identifying the inputs into the least significant bit position. We call this characteristic vector the 0 vector. The other vector is determined by the number of the stage at which each digit in the binary notation identifying the outputs was located in the least significant bit position. We call this
characteristic vector the I vector.
In the example of Fig. 7 and Table 1, the righthand column is the least significant bit position. Hence the O vector can be determined by inspection of the subscripts of the inputs to the four stages of switching elements to be (¬
1, -2, 0, 3) where a minus sign is used if the input was complemented. The I vector is likewise determined by
inspection by identifying for each of the digits of the output the stage at which that digit was in the least
significant bit position. For the example of Table 1, I is seen to be (-1, 0, 3, -2) where a minus sign is used if either the output digit is complemented or the digit was complemented when in the least significant bit position, but not if both digits were complemented.
Simulation of Networks
Further, we have found that these vectors can be used to provide a practical simulation of one MIN by another MIN. In particular, as depicted in the flowchart of Fig. 8 , we simulate a second MIN on a first MIN by determining first and second vectors, I1, O1, which characterize the first MIN (box 110), and third and fourth vectors I2, O2, which
characterize the second MIN (box 112). We then determine the inverse values O1 -1 and I1 -1 of the first and second vectors and use these values to determine fifth and sixth vectors, U,
V where U = O2 * O1 -1 and V = I1 -1 * I2 where * is a twooperand permutation operation which permutes elements of a first operand (e.g., O2) in accordance with an order
specified by a second operand (e.g., O1 -1) (box 114). The fifth and sixth vectors are permutation vectors which specify a reordering of the communication links in the first MIN. We use the fifth vector to reorder the inputs to the first MIN
(box 116) and the sixth vector to reorder the outputs (box
118).
For example, to simulate the network of Fig. 7 on an Omega network such as shown in Fig. 2, we need to compute the O and I of the Omega network. The Omega network is defined by four successive circular left shifts. Hence, the digit, S3, then the digit S2, then the digit, S1 and finally the digit S0 will be shifted into the least significant bit positions and the O vector is therefore (0 1 2 3). Four shifts of an Omega network constitute an identity operation, hence the I vector is also (0 1 2 3). For the operation * an inverse is defined such that x * x -1 = (3, 2, 1, 0). Hence the inverse of (0 1 2 3) is also (0 1 2 3). Accordingly, to simulate the network of Fig. 7 on the Omega network of Fig.
2, we set U = O2 * O1 -1 = (-1, -2, 0, 3) * (0, 1, 2, 3) = (3,
0, -2, -1) and we set V = I1 -1 * I2 =
(0, 1, 2, 3) * (-1, 0, 3, -2) = (-2, 3, 0, -1). This
implementation of these permutation vectors U, V on an Omega network is shown in the network depicted in Fig. 13.
It can be shown that each input to the networks of
Figs. 7 and 13 will be routed by those networks to the same outputs as long as the settings of the equivalent switching elements are the same. Hence, the network of Fig. 13 can be substituted for that of Fig. 7 and can be regarded as a simulation of that network.
Algorithms to Calculate Characteristic Vectors
Details of an algorithm to calculate the 0 vector are set forth in Fig. 9 and details of an algorithm to calculate the I vector are set forth in Figs. 10a, 10b and 10c. Figure 9 depicts a flowchart of the algorithm to calculate the characteristic vector 0. Specifically, the algorithm requires that the binary digits representing the network input be sequentially permuted by each of the first n permutation vectors Pi (Box 120, 126). For example in the four stage network in Fig. 1, the binary bits representing the input would be permuted by each of P0, P1, P2 and P3.
For each of these n permutations, one unique bit of the binary digits representing the input to that
particular stage is shifted into the least significant bit
(LSB) position. The order in which this occurs determines the O vector. For example, if bit n-1 of the binary digits representing the input is shifted into the LSB position by stage 0 of the network (i.e., by P0) then O0 = n-1. If a binary digit is complemented after it is shifted into the LSB position, the appropriate component of the O vector is defined to be negative (box 122, 124). When (On-1,
On-2 . . . O1, O0) have each been determined, the
characteristic vector O has been calculated.
Figs. 10a, 10b and 10c depict a flowchart of the algorithm to calculate the characteristic vector I.
Specifically, the algorithm requires that the binary digits representing the network input be sequentially permuted by each of the n+1 permutation vectors Pi (box 130, 132). For each of the first n permutations, one unique bit of the binary digits representing the input to that particular stage is shifted into the LSB position. After the binary digits have been permuted by all of the n+1 permutation vectors Pi, the resulting binary number represents the output of the overall network. This output is defined to be D = (dn-1, dn-2 . . . d1, d0) (box 133, 134).
The I vector is determined by identifying for each of the n bit positions in the binary representation of the network output the stage at which the bit in that position was shifted into the LSB position (box 131, 135). If the binary digit of the output has been complemented subsequent to having been shifted into the LSB position, the
corresponding component of the I vector is defined to be negative (box 135).
When, for all n bit positions in the binary representation of the network output, the stage at which that bit was shifted into the LSB position has been determined, the entire I vector has been computed (box 136).
Possible Implementations of Network Simulation
Illustrative apparatus for simulating a network is shown in Figs. 11 and 12. As depicted in Fig. 11, the apparatus comprises a MIN 210, illustratively a perfect shuffle or Omega network, an input permutation matrix 220, and an output permutation matrix 225. Switching elements 230, inputs 240, outputs 250, stages 260 of MIN 210 and communication links 270 are the same as those of any one of the networks depicted in Figs. 1-6 and bear the same numbers incremented by 200. The input permutation matrix and the output permutation matrix advantageously are of identical construction. Each matrix has the capability of connecting any one of its inputs to any one of its outputs. The input matrix implements the fifth vector U and the output matrix implements the sixth vector V.
Illustratively, as shown in Fig. 12, each
permutation matrix 220, 225, comprises an array of
multiplexers 222, one for each output from the matrix. Each multiplexer has inputs from every one of the inputs to the matrix and a single output to a different one of the outputs of the matrix. Thus, for the specific embodiment shown in Figs. 11 and 12, each matrix comprises sixteen sixteen-to-one multiplexers 222. Each of the sixteen multiplexers has sixteen inputs from each of the inputs to the matrix and a single output to a different one of the outputs of the matrix. One of the inputs of each of the multiplexers is selected for connection to the multiplexer output by means of four control lines. The signals on the four control lines to each of the sixteen multiplexers are generated by control logic (not shown) which computes for the vector U or V, as the case may be, which input is to be connected to which output and generates the control signals which cause the multiplexer connected to that output to select the
appropriate input.
As will be apparent to those skilled in the art, numerous alternatives may be employed in the practice of the invention. Other ways may be found to determine the O and I vectors and to calculate the U and V vectors.
Other Network Types
The techniques described herein may also be practiced on other types of networks such as the (2n-1) stage rearrangeable interconnection networks. These networks may be regarded as two back-to-back n stage networks in which one stage is shared in common. Characteristic vectors O and I may be defined for each of these two back-to-back networks following the techniques set forth above. Of particular interest, it can be determined that the (2n-1) stage
interconnection network is rearrangeable if for the I vector of the input side of the two back-to-back networks and the O vector of the output side of the two networks I * O = [O, 1, . . . n-1].
Distributed Tag Routing
In some networks, the switching elements are controlled by tags which are routed through the network as part of a message instead of by control signals applied externally to the switching elements. The routing tags indicate which output (e.g. 0 or 1) of the individual
crossbar switching elements is the intended destination of the message accompanied by that tag. As an n-bit routing tag passes through the network's n stages of crossbar switching
elements, each switching element uses the tag bit
corresponding to the stage in which it resides to determine its switch state. When each crossbar switching element has determined its own switch state, the desired routing of the network is complete.
If the pass-through state of the switching
elements is defined to be state 0 and the exchange state is defined to be state 1, each switch can determine its own state (i.e., pass-through = 0; exchange = 1) by exclusiveoring the binary digit which identifies the input where the tag enters the switch (e.g., 0 or 1) with the appropriate tag bit.
For the prior art Delta, Omega, Baseline, and
Modified Data Manipulator networks depicted in Figs. 1, 2, 3, and 6, respectively, the tag is ordinarily the binary address of the intended destination address (assuming the most significant tag bit controls the leftmost network stage, the second most significant tag bit controls the second leftmost stage, and so on).
In accordance with the present invention, however, the routing tag for either the simulated or the simulating bit-permute-complement MIN is determined by permuting the binary address of the destination in accordance with the vector specified by I-1 of the network being simulated. Thus the tag for any bit-permute-complement MIN is calculated in accordance with the present invention by the equation Tag = D * I2 -1 where D is the binary representation of the destination address and I2 -1 is the inverse of the characteristic vector I of the simulated network. Distributed Switch State Routing
Alternatively, distributed switch state routing may be used to control the switch states for each switching element in the network. If distributed switch state routing is used, switch states are calculated without the routing tag actually passing through the stages of the network. Rather, a binary representation of the switch states passes through the network stages and each switching element is thereby not required to determine its own respective switch state as in the distributed tag routing method described above.
In the prior art Omega network depicted in Fig. 2, the switch states may be determined for any source to
destination connection by exclusive-oring the n-bit routing tag described above with the n-bit binary source address.
Since for the Omega network, a routing tag is simply the binary address of the intended destination address as
described above, the switch states to implement a particular source to destination link in an Omega network may be
calculated by the equation
Switch States = S XOR D where S and D are the binary representations of the source and destination addresses, respectively, and XOR denotes a bit by bit exclusive-or function.
In accordance with the present invention, the switch states for any bit-permute-complement MIN may be similarly determined without the tag actually passing
through the stages of the network. The switch states are calculated by exclusive-oring the n-bit destination tag with the n-bit source address permuted in accordance with the characteristic vector 0 of the network being simulated. It has already been shown that the tags for a bit-permutecomplement MIN are calculated in accordance with the present invention by the equation Tag = D * I2 -1.
The switch states for these networks are calculated in accordance with the present invention from the equation
Switch States = (S * O2) XOR (Tag)
= (S * O2 ) XOR (D * I2 - 1) where S and D are the binary representations of the source and destination addresses, respectively, O and I2 are the characteristic vectors O and I, respectively, of the network being simulated, and XOR denotes the bit by bit exclusive-or function.
In certain situations when using either of the two distributed routing methods discussed above, a conflict may arise in which the desired combination of source to destination connections requires one or more crossbar switching elements to be simultaneously set to two different states. In the event of such a conflict, one or more source to destination connections will be unable to be routed at that time and must be attempted again later.
A distributed tag routing example and a distributed switch state routing example are now presented. To calculate the destination tags for either of the networks depicted in Figs. 7 and 13, the binary destination address must be permuted by the inverse of the characteristic I vector of the simulated network according to the equation
Tag = D * I2 -1.
Since I2 -1 = (-1, 0, 3, -2)-1 = (1, -0, -3, 2),
Figure imgf000031_0001
More specifically, if destination address
D = (1 0 1 1) (decimal equivalent 11) is the destination of a communication originating from any arbitrary source address, the destination tag for that communication is determined by
Tag = (t3, t2, t1, t0) = D * I2 -1
= (1 0 1 1) * (1, -0, -3, 2)
= (1 0 0 0).
Thus for stage 3 (the rightmost stage) of either of the networks depicted in Figs. 7 and 13, the tag bit is a binary 1 and for stages 2, 1 and 0 of the network, the tag bit is a binary 0. When using distributed tag routing and beginning at any arbitrary input of either network, if the crossbar switching elements in stages 0, 1 and 2 route their respective inputs to their 0 (i.e., upper) outputs and the switch in stage 3 routes its input to its 1 (i.e., lower) output, destination address 11 (decimal) will be reached at the network's output. If this distributed tag method of routing is used, each crossbar switching element must determine its own switch state (e.g., pass-through state or exchange state) as a function of the tag bit designated for use by that switch.
Alternatively, if distributed switch state routing is used, the switch states for either of the
networks depicted in Figs. 7 and 13 are calculated without the tag passing through the n stages of the network and the individual crossbar switching elements do not have to determine their own switch states. For any particular pair of source and destination addresses that are to be linked together, it has already been shown that the switch states to implement that
connection are calculated in accordance with the distributedswitch state routing method by the equation
Switch States = (S * O2) XOR (Tag)
= (S * O2) XOR (D * I2 -1).
Specifically, for the networks in Figs . 7 and 13 for which O2 = (-1 , -2, 0, 3), if source address
S = (1 1 0 1) (Decimal equivalent 13) is to be connected to destination address D = (1 0 1 1) (Decimal equivalent 11), the switch states are calculated by
Switch States = (ss3, ss2, ss1, ss0)
= (S * O2) XOR (Tag)
= ((1 1 0 1) * (-1, -0, 2 , 3))
XOR (1 0 0 0)
= (1 0 1 1) XOR (1 0 0 0)
= (0 0 1 1).
Thus for either of the networks depicted in Figs. 7 and 13, the switch in stage 0 (the leftmost stage) through which the communication between source address 13 and destination address 11 passes should be set to its exchange state (designated by a binary 1). Similarly, the switch through which this communication passes in stage 1 should be set to its exchange state (designated by a binary 1) and the switches through which the communication passes in stages 2 and 3 should both be set to their pass-through states
(designated by a binary 0).
The distributed routing of source address 13 to destination address 11 by the network of Fig. 7 is depicted in Fig. 15. The network switch settings for this source to destination connection are identical regardless of whether the individual crossbar switching elements determine their own switch states (i.e., distributed tag routing) or whether those switch states are determined without the tag actually passing through the stages of the networks (i.e.,
distributed switch state routing).
Distributed routing may be used to implement connections between any arbitrary pair of source and
destination addresses. An overall network may be controlled with distributed routing by using a distributed routing method to implement each of the individual source to
destination connections.
Centralized Routing
In some networks, the settings of the individual switching elements are determined by a process external to the network itself. This type of routing method in which control signals are externally generated for each of the network's crossbar switching elements is known as
centralized or global routing. If centralized or global routing is used, it is not necessary for any tags or switch state information to accompany the data being communicated through the network.
In accordance with the invention, any bitpermute-complement MIN may be globally routed by a method in which the permutation vectors P0, P1, ... Pn, and the inverse of the characteristic I vector are used to determine the states of the network's switching elements. It should be noted that a MIN's characteristic O and I vectors are determined from that MIN's permutation vectors.
For any particular pair of source and destination addresses that are to be linked together, the binary source address is first permuted according to P0. Bit 0 of the binary source address after this permutation by P0 is then exclusive-ored (XORed) with and subsequently replaced by bit 0 of the binary destination address permuted by the inverse of the characteristic I vector. The n-1 most significant bits of the remaining n-bit binary number determine the number of the switch to be set in stage 0 in order to implement the desired connection.
The result of the exclusive-or operation determines the state of that switch if it is assumed that the pass-through state of the switching element is defined to be state 0 and the exchange state defined to be state 1.
Similarly, to determine the states of the
switches in stage 1, the n-bit binary number that remains after the above operations is then permuted according to P1.
Bit 0 of the n-bit binary number that remains after this permutation by P1 is exclusive-ored with and subsequently replaced by bit 1 of the binary destination address permuted by I-1. The n-1 most significant bits of this newly
remaining n-bit binary number determine the number of the switch to be set in stage 1 in order to implement the desired connection. The result of the most recent
exclusive-or operation determines the state of that switch.
In order to route the entire network, this process is continued for all of the remaining network stages and for all desired source address to destination address connections. In certain situations, a routing conflict may arise in which the desired combination of source to
destination connections requires one or more crossbar switching elements to be simultaneously set to two different states. In the event of such a conflict, one or more source to destination connections will be unable to be routed at that time and must be attempted again later.
An example of global routing for the network depicted in Fig. 7 is now presented. For this network,
N=16, n=4 and it has already been shown that the permutation vectors are given by P0 = ( 2, -1, -0, 3)
P1 = ( 2, 3, 0, -1)
P2 = (-0, 1, 3, -2)
P3 = ( 2, -0, 3, 1)
P4 = ( 1, 3, 0, 2)
Likewise, the characteristic vector I and its inverse have been shown to be
I = (-1, 0, 3, -2) and
I-1= ( 1, -0, -3, 2), respectively.
The binary source address is defined to be
S = (s3, s2, s1, s0). If the binary destination address is defined to be D = (d3, d2, d1, d0), then the binary destination address D permuted by the inverse of the characteristic I vector is given by
L
Figure imgf000035_0001
D' is calculated in the same way that the destination tag would be calculated if distributed routing were being used to implement the desired source to destination connections.
To globally route the network depicted in Fig. 7, for each stage i = 0, 1, 2 and 3 of that network, the binary source address for that stage is permuted by the appropriate permutation vector Pi and bit 0 of the result is exclusive-ored with and subsequently replaced by bit i of D'. The result determines the switch states for each of the network's n stages. The steps to route the network of Fig. 7 are set forth in Table 2. For this network, D' = D * I-1 =
0 and the crossbar switching elements of
Figure imgf000036_0001
each stage are numbered consecutively from 0 through 7 with switch 0 defined to be the switching element located at the top of each stage.
Figure imgf000036_0002
Figure imgf000037_0006
Thus to implement an arbitrary source address to destination address connection, crossbar switching element in stage 0 is set to state (s
Figure imgf000037_0001
3 XOR d2).
Similarly, switch n stage 1 is set to state
Figure imgf000037_0003
switc
Figure imgf000037_0002
in stage 2 is set to state and switch (d
Figure imgf000037_0004
2 d0 d3) in stage 3 is set to state ( XOR d )
Figure imgf000037_0005
For example, if in the network depicted in
Figure 7, source address S = (1 1 0 1) (Decimal equivalent 13) is to be connected to destination address D = (1 0 1 1) (Decimal equivalent 11), the switch states may be
calculated using global routing as presented below. Stage 0: Switching element is set to state
Figure imgf000038_0001
(s3 XOR d2) which is to say, (1 1 0) (Decimal equivalent 6) is set to state (1 XOR 0) = 1 (i.e., exchange state).
Stage 1: Switching element is set to state
Figure imgf000038_0003
Figure imgf000038_0002
which is to say, switching element (1 1 0) (Decimal equivalent 6) is set to state (1 XOR 0) = 1 (i.e., exchange state).
Stage 2: Switching element is set to state
Figure imgf000038_0004
which is to say, switching element
Figure imgf000038_0005
(1 0 1) (Decimal equivalent 5) is set to state (0 XOR 0) = 0 (i.e., pass-through state).
Stage 3: Switching element (d2 d0 d3) is set to state which is to say, switching element
Figure imgf000038_0006
(0 1 1) (Decimal equivalent 3) is set to state
(1 XOR 1) = 0 (i.e., pass through state).
To summarize, if the crossbar switching elements of the network in Fig. 7 are set so that switch 6 of stage 0 is set to state 1, switch 6 of stage 1 is set to state 1, switch 5 of stage 2 is set to state 0, and switch 3 of stage 3 is set to state 0, source address 13 will be connected to destination address 11. In order to use centralized routing to completely route the network of Fig. 7, similar calculations must be made for all desired source to destination connections. The centralized routing of source address 13 to destination address 11 by the network of Fig. 7 is depicted in Fig. 15. It has been shown that Fig. 15 also depicts the distributed routing of source address 13 to destination address 11. For the network of this example and for any BPC
MIN, the crossbar switching element settings to implement a source to destination connection are identical regardless of whether distributed or centralized routing of the network is used. Fig. 15 does not show the control circuitry which generates the signals that configure the crossbar switching elements to implement the desired connection between the source addresses 40 and destination addresses 50. Partitioning of Bit-Permute-Complement Type MINs.
The present invention provides a method and apparatus for partitioning any BPC type MIN into multiple disjoint BPC type MINs. Furthermore, the resulting
partition and multiple disjoint BPC type MINs may be
completely and uniquely specified by the characteristic O and I vectors of the original BPC MIN being partitioned.
The resulting multiple disjoint BPC MINs are considered to be disjoint because they do not have any switching elements in common. Since the multiple disjoint networks that result from partitioning are themselves BPC MINs, the partitioning method and apparatus claimed herein may therefore be applied recursively to the subnetworks that result from the
partitioning of a large network.
Specifically, a BPC MIN can be partitioned into multiple disjoint subnetworks by setting all of the
switching elements in one or more stages of the MIN being partitioned to either a pass-through state (i.e., state 0) or an exchange state (i.e., state 1). When partitioning a BPC MIN into multiple disjoint BPC MINs, every switching element in a stage being set to state 0 or state 1 must be forced to the same state. If the switching elements in more than one stage are set to some designated state in order to implement a particular partition, the switching elements in one stage need not be set to the same state as the switching elements in another stage.
A BPC MIN may be partitioned in many different ways to yield many different combinations of smaller
disjoint BPC MINs. More specifically, for an arbitrary BPC MIN, a different partition (i.e., a different combination of smaller disjoint BPC MINs) will result for each stage or combination of stages that are selected to be set to a particular state. Moreover, for any given selection of stages that are to be set to a particular state, each combination of states to which those selected stages may be set yields a different partition.
Once a BPC MIN has been partitioned into multiple disjoint subnetworks, it is possible to use the
characteristic O and I vectors of the original BPC MIN to specify exactly which source and destination addresses of the original BPC MIN will be assigned to each of the
subnetworks produced by the partition. That is to say, for a given BPC MIN with characteristic vectors O and I, it is possible to specify a particular partition according to which source and destination addresses of the original BPC MIN will be assigned to each of the smaller disjoint networks that result from the partition.
Although n stage BPC MINs may, in general, be built with a x b crossbar switching elements and have an source addresses and b destination addresses, BPC MINs are commonly built using 2 x 2 crossbar switches and commonly have 2 source and 2 destination addresses. A BPC MIN of arbitrary size which employs 2 x 2 crossbar switching elements may be partitioned into multiple smaller disjoint
BPC MINs according to the method and apparatus presented herein.
The method and apparatus of this invention may be used to determine the disjoint subnetworks that are formed when the switching elements of certain stages of a BPC MIN are set to certain arbitrary states. A subnetwork is typically specified by which source and destination
addresses of the original BPC MIN compose that subnetwork.
Alternatively, the method and apparatus of the present invention may be also used to determine to what state certain specific stages should be set in order to implement a partition that will yield a desired subnetwork or set of subnetworks. Both applications of the method and apparatus of the present invention are presented below.
Determination of Subnetworks from User
Specified Switching Element Settings
An n stage BPC network can be partitioned into
2i 2n-i x 2n-i subnetworks by setting the switching elements of any arbitrary i stages of that network to either the pass-through state (i.e., state 0) or the exchange state
(i.e., state 1). That is to say, 2 disjoint (n-i) stage networks are formed when all of the switching elements in i stages of an n stage BPC MIN are set to either state 0 or 1.
The present invention provides a method and apparatus to determine the subnetworks that are formed when the crossbar switching elements of certain stages of a BPC MIN are set by the user to either state 0 or 1.
If any one stage of the 16 x 16 4 stage network depicted in Fig. 7 is forced to a 0 or a 1 state, for example, two disjoint 8 x 8 3 stage subnetworks are
produced. Similarly, if two or three stages of the network depicted in Fig. 7 are forced to a 0 or a 1 state, four disjoint 4 x 4 subnetworks or eight disjoint 2 x 2
subnetworks, respectively, are formed.
The multiple subnetworks that are formed by a particular partition may be specifically determined by the definitions and relations set forth below and in the flowchart depicted in Figs. 16a, 16b and 16c. The set of all i stages which are to be set to either state 0 or state
1 (Box 140) is defined by set K where
K = {w1, w2, ..., wi} and where
K SUB {0, 1, ..., n-1) where SUB denotes "is a subset of" (Box 141)
For any stage wj where wj ELT K (ELT denotes "is an element of") that is set to state 0 or state 1 to implement a partition, the specific state to which all of the switching elements in that stage are set is defined to be is therefore equal to either 0 or 1 for all wj
(Bo
Figure imgf000042_0001
x 144).
Given the above definitions, if i stages of an n stage BPC MIN are set to either state 0 or 1 (i.e., state
2i disjoint (n-i) stage subnetworks are formed.
Figure imgf000042_0002
Furthermore, for each subnetwork formed, i bits of the source addresses of that subnetwork are always identical. For any subnetwork, these bits are defined by set E (Box 142) where
Figure imgf000043_0001
where O is the characteristic O vector of the original BPC
MIN being partitioned, S is the binary representation of that network's source addresses, denotes the wjth bit of the binary source address
Figure imgf000043_0002
permuted by O and ELT denotes "is an element of".
Similarly, for each subnetwork formed, i bits of the destination addresses of that subnetwork are also always identical and are defined by set F (Box 143). Set F which denotes a particular subnetwork's i identical destination address bits is given by where I represents the
Figure imgf000043_0003
: :
original BPC MIN's characteristic I vector, D is the binary representation of the destination address of that network, and ELT denotes "is an element of".
Furthermore, there is a definitive relationship between the destination addresses of each subnetwork and the source addresses of that same subnetwork. A subnetwork's i identical destination address
bits together with the relationship between that
subnetwork's source and destination addresses is given by set G (Box 145) where set G is a set of equalities and where
Figure imgf000044_0001
where O and I represent the original BPC MIN's
characteristic O and I vectors, respectively, S and D are the binary representations of the source and destination addresses, respectively, of that network, XOR denotes the bit by bit exclusive-or function, ELT denotes "is an element of", and denotes the state to which the switching elements
Figure imgf000044_0002
offstage wj have been set in order to implement the desired partition (Box 144).
A BPC MIN partitioning example is now presented.
This example illustrates how to determine what subnetworks are formed when the apparatus of the present invention is used to implement a particular partition. For either of the two equivalent 4 stage MINs depicted in Figs. 7 and 13, if the switching elements of any two stages of either of those networks are set to either a 0 or a 1 state, four disjoint 4 x 4 subnetworks are formed. Which particular four
subnetworks are formed is a function of which two stages of the original 4 stage MIN are set to state 0 or 1 and of whether those two stages are set to state 0 or to state 1.
For this particular partitioning example, the switching elements of arbitrary stages 0 and 1 are
arbitrarily chosen to be set to state 0 and 1, respectively.
Therefore, for this example,
O = (-1, -2, 0, 3) ;
I-1 = (-1, 0, 3, -2)-1 (1, -0, -3, 2) ;
n = 4;
i = 2;
K = {w1, w2) = {0, 1};
t0 = 0 ; and
t1 = 1. Since i = 2, and n = 4 for this example, this particular partition yields 2i 2n-i x 2n-i = 22 24-2 x 24-2
= 4 4 x 4 disjoint subnetworks.
It has already been shown that, for each of the four subnetworks formed by this partition, the i identical bits of any subnetwork's source addresses are denoted by set
E where
Figure imgf000045_0001
Substituting in the specific values of O and K for this example yields
Figure imgf000045_0002
Set E's elements {s3, s0} indicate that for each of the four subnetworks formed by this partition, bits s3 and s0 of those subnetworks' binary source addresses will be equal to one another. The four possible combinations of the two bits s3 and s0 correspond to each of the four subnetworks formed by this partition. More specifically, if c is defined to be a binary variable which can equal either 0 or 1, the source addresses of the four subnetworks formed by setting the switching elements of stages 0 and 1 to state 0 and 1, respectively, are
Subnetwork Source Addresses
{(s3, c, c, S0)} 1 {(0, c, c, 0)} = {0, 2, 4, 6};
(binary) (decimal) 2 = {(0, c, c, 1)} = {1, 3, 5, 7};
(binary) (decimal) 3 = {(1, c, c, 0)} = {8, 10, 12, 14); and
(binary) (decimal)
4 = {(1, c, c, 1)} = {9, 11, 13, 15}.
(binary) (decimal)
The first of the four subnetworks formed by this partition has source addresses equal to {(0, c, c,
0)} (binary) = {0, 2, 4, 6} (decimal); the second of the four subnetworks has source addresses equal to {(0, c, c,
1)} (binary) = {1, 3, 5, 7} (decimal); the third of the four subnetworks has source addresses equal to {(1, c, c,
0)} (binary) = {8, 10, 12, 14} (decimal) and the fourth of the four subnetworks has source addresses equal to {(1, c, c, 1)} (binary) = {9, 11, 13, 15} (decimal).
To summarize, by setting the switching elements of stages 0 and 1 of the BPC MIN depicted in Figs. 7 or 13 to state 0 and 1, respectively, four 4 x 4 disjoint
subnetworks are created with source addresses specified in decimal form as follows: Subnetwork 1: 4 source addresses are {0, 2, 4, 6};
Subnetwork 2: 4 source addresses are {1, 3, 5, 7);
Subnetwork 3: 4 source addresses are {8, 10, 12, 14};
Subnetwork 4: 4 source addresses are {9, 11, 13, 15}.
The corresponding destination addresses for each of these four subnetworks may now be determined. It has already been shown that for each subnetwork formed, the i identical destination address bits for that subnetwork and the relationship between those i bits and the i identical source address bits for that same subnetwork are given by set G where set G is a set of equalities and where
Figure imgf000047_0001
Substituting in the specific values of I-1, O, and K for this example yields
Figure imgf000047_0002
O
j
>
Figure imgf000048_0001
Set G gives the i = 2 identical destination address bits and the relationship between those bits and the i = 2 identical source address bits for each of the four subnetworks formed by this partition. That is to say, set G specifies the relationship between the source addresses and the destination addresses that compose each of the four subnetworks formed by the partition in which all of the switching elements in stages 0 and 1 are set to state 0 and 1, respectively.
Specifically, for any of the four subnetworks formed by this particular partition, bit d3 and bit d2 of any binary destination address of that subnetwork will be equal to bit s0 and bit s3, respectively, of any binary source address of that same subnetwork. Therefore, the source and destination addresses of the four subnetworks are denoted by
Source Address = {(s3, c, c, s0)} (binary)
Destination Address = {(d3, d2, c, c)} (binary) where d3 = s0 and d2 = s3 for all source and destination addresses of any single subnetwork. The source and
destination addresses composing the four subnetworks of this partition are therefore as follows:
4 Source Addresses 4 Destination Addresses
Subnetwork 1: { (0, c, c, 0)} {(0, 0, c, c)}
(binary) (binary)
{0, 2, 4, 6} {0, l, 2, 3}
(decimal) (decimal)
Subnetwork 2 {(0, c, c, 1)} {(1, 0, c, c )}
(binary) (binary)
{1, 3, 5, 7} {8, 9, 10, 11}
(decimal) (decimal) Subnetwork 3: ((1, c, c, O)} {(0, 1, c, c)}
(binary) (binary)
= {8, 10, 12, 14} {4, 5, 6, 7}
(decimal) (decimal)
Subnetwork 4: {(1, c, c, 1)} {(1, 1, c, c)}
(binary) (binary)
= {9, 11, 13, 15} {12, 13, 14, 15} (decimal) (decimal)
The above listed four sets of source and
destination addresses completely and uniquely define each of the four subnetworks formed by the partition set forth in this example. That is because, in general, the subnetworks formed by a partition definitively specify that partition and, conversely, any partition definitively specifies the subnetworks created by that partition.
Fig. 17 depicts the network of Fig. 7 partitioned into four 4 x 4 subnetworks in accordance with the above example. The four 4 x 4 subnetworks have each been
highlighted for clarity. The four sets 80 of source
addresses 40 and four sets 85 of destination addresses 50 which compose each of the four 4 x 4 subnetworks are also depicted in Fig. 17.
Determination of Switching Element Settings
From User Specified Subnetworks
The previous partitioning presentation and example illustrate how to determine the subnetworks which are formed when the switching elements of certain stages of a BPC MIN are set to state 0 or 1. Conversely, the BPC MIN partitioning method and apparatus of the present invention can also be used to determine and implement, for an
arbitrary BPC MIN, the switch settings that will produce a specific desired partition which will in turn yield a desired subnetwork. That is to say, the method and
apparatus of the present invention can be used to partition an arbitrary BPC MIN into multiple disjoint networks whose size and source and destination addresses can be specified by the user.
It has already been shown that an arbitrary n stage BPC MIN can be partitioned into 2i 2n-i x 2n-i disjoint networks by setting all of the switching elements in i stages of that network to either state 0 or state 1. A user specified partition that yields subnetworks of desired size and which are composed of desired source and
destination addresses can be created by properly selecting the BPC MIN's i stages which are to be set to a particular state and by then properly selecting the states to which the switching elements of those i stages should be set.
If it is desired that 2i 2n-i x 2n-i (n-i) stage disjoint subnetworks are to be generated from a single 2n x
2n n stage network, then the switching elements of exactly i stages of the original n stage BPC MIN must be set to either state 0 or 1. The determination and implementation of which stages are to be set to a particular state and the
determination of each of those states are functions of the desired source and destination addresses of the subnetworks that will result from the partition. The determination and implementation of these values is presented below and in the flowchart depicted in Figs. 18a, 18b, and 18c.
When a 2n x 2n n stage network is partitioned into 2i 2n-i x 2n-i (n-i) stage disjoint subnetworks (Box 150), it has been shown that i bits of the source addresses of each of the resulting subnetworks will always be
identical. When determining the switch settings to
implement a desired partition that will in turn yield a desired subnetwork, the desired subnetwork is specified by the source and destination addresses that compose that subnetwork.
More specifically, the user of the present invention specifies the source and destination addresses that compose the desired subnetwork or group of subnetworks that will result from the partition by specifying the i identical source and destination address bits for those subnetworks (Box 151). The set of all i source address bits which are identical defines the source addresses for the desired subnetwork or group of subnetworks and is given by set E where
Figure imgf000052_0001
Set K of all i stages which must be set to a particular state to implement the desired subnetworks with desired source addresses as specified by set E can be determined from set E and from the characteristic O vector of the original n stage BPC MIN being partitioned (Box 152). Set K has been previously defined to be the set of all i stages which are to be set to either state 0 or 1 in order to implement the desired partition where
K = {w1, w2, . . . wi} and where
K SUB {0, 1, . . ., n-1}
Set K can be determined from set E and from the original n stage BPC MIN's characteristic O vector by using E and O to find all i stages wj that must be set to a 0 or 1 state so that the desired i source address bits are identical as specified by set E. Set K = {w1, w2 , ... , wi} can be
determined by solving the relation
Figure imgf000053_0001
for all possible wj where ABS denotes "the absolute value of".
Once K = {w1, w2, . . ., wi} has been determined, the i stages that must be set to either state 0 or 1 to implement the subnetwork with source addresses specified by set E are thereby specified. Since each of the i stages wj which are denoted by set K may be set to one of two possible different states (i.e., state 0 or state 1), there are therefore a total of 2i possible combinations of states to which these i stages may be set.
These 2i possible combinations of states in turn represent the 2 possible combinations of sets of
destination addresses that the user of this invention may assign to the subnetworks formed by the partition. I-1 of the original n stage BPC MIN and set K can be used to determine set F which in turn will yield the 2i possible sets of destination addresses that can be assigned to the subnetwork which contains the source addresses as specified by set E (box 153).
It has already been shown that for each subnetwork resulting from a partition, there are i identical destination address bits which are directly related to that subnetwork's i identical source address bits. Those i identical destination address bits are denoted by set F where
Figure imgf000054_0001
Once Set F has been determined from set K = {w1,w2, ...,wi} and from I-1 of the BPC MIN being partitioned, the subnetwork's i identical destination address bits have thereby also been determined. Once these i identical destination address bits have been determined, the 2i possible sets of destination addresses that may be assigned to the subnetwork with source addresses specified by set E are then also known. The apparatus of the present invention can then be further used to specify the exact binary values of those i destination address bits.
By specifying the binary values of these i destination address bits, the user can specify the exact set of destination addresses that are to compose the subnetwork with source addresses specified by set E. The i destination address bits which are identical for all destination
addresses of the subnetworks are given by set F where
Figure imgf000054_0002
It is possible to further specify the exact binary value of those destination address bits according to the relation
Figure imgf000054_0003
where is the binary value of the destination address bit denote
Figure imgf000054_0005
d by [D * (Box 154). The above relation
Figure imgf000054_0004
specifies the i identical destination address bits and their values thereby specifying the 2n-i destination addresses for the desired subnetwork with source addresses specified by set E. The above source and destination address bit
relationships completely define a unique subnetwork because any 2n-i x 2n-i subnetwork's 2n-i source addresses and 2n-i destination addresses fully and uniquely define that
subnetwork.
It is now possible to determine the n stage BPC
MIN's crossbar switching element settings that will
implement the partition which will yield the subnetwork with specified source and destination addresses. If the set of all i identical source address bits for the desired
partition is defined by set E where
Figure imgf000055_0001
and the set F of all i identical destination address bits and their values for that partition are denoted by the relation
Figure imgf000055_0002
then the state to which the crossbar switching elements of each stage wj which is an element of K should be set in order to implement the desired partition is denoted by
(Box 155) where
Figure imgf000055_0003
Figure imgf000055_0004
To summarize, the switching elements in each stage wj which is an element of set K should be set to state in order to implement a partition that will yield a
Figure imgf000055_0005
subnetwork with source and destination addresses as
specified by sets E and F together with the additional relation (Box 156).
Figure imgf000056_0001
A second BPC MIN partitioning example is now presented. This example ultimately implements the same partition as is depicted in the previous BPC MIN
partitioning example and in Fig. 17. The previous
partitioning example differs from this example, however, in that the previous example demonstrates how the method and apparatus of the present invention may be used to determine the source and destination addresses of the subnetworks which result from setting the switching elements of certain stages of an n stage BPC MIN to either state 0 or state 1.
The example set forth below, on the other hand, demonstrates how the method and apparatus of this invention may be used to determine and implement a particular n stage BPC MIN's switching element's settings in order to produce a partition that will yield a desired subnetwork with user specified size and source and destination addresses.
For this example, the objective is to partition the four stage 16 x 16 BPC MIN depicted in Fig. 7 into four two stage 4 x 4 subnetworks. It is a further objective of this example that one of these four 4 x 4 subnetworks be composed of the set of source addresses S = (s3, s2, s1, s0) equal to ((0, c, c, 0)} (binary) which is equivalent to {0, 2 , 4, 6} (decimal) and of the set of destination addresses D = (d3, d2, d1, d0) equal to {(0, 0, c, c) } (binary) which is equivalent to {0, 1, 2, 3} (decimal).
Set E, which is defined to be the set of all i identical source address bits for the desired subnetwork and therefore for the desired partition is hence given by
Figure imgf000057_0001
Therefore, for this particular example,
O (-1, -2, 0, 3) ;
I-1 = (-1, 0, 3, -2)-1 = (1, -0, -3, 2); n = 4;
i = 2;
E = {s0, s3}.
For this example, since E = (s0, s3 }, the source addresses of all four 4 x 4 subnetworks are thereby
specified as follows:
Source Addresses
Subnetwork 1: {(0, c, c, 0)} = {0, 2, 4, 6}
(binary) (decimal)
Subnetwork 2: {(0, c, c, 1)} = {1, 3, 5, 7}
(binary) (decimal)
Subnetwork 3: {(1, c, c, 0)} {8, 10, 12, 14}
(binary) (decimal)
Subnetwork 4: {(1, c, c, 1)} = {9, 11, 13, 15}
(binary) (decimal)
Set K where K = {w1, w2, ..., wi} and where
K SUB (0, 1, ..., n-1} is given for this
particular example by
K = {w1, w2} SUB {0, 1, 2, 3} and can be determined from set E and from the characteristic O vector of the 16 x 16 BPC MIN being partitioned.
Specifically, it has already been shown that set
K = {W1, w2} can be determined by solving the relation
Figure imgf000058_0003
for all possible wj as shown below:
D D
Figure imgf000058_0002
Substituting in the specific values for O and E for this example yields the relation
Figure imgf000058_0001
0
Figure imgf000059_0001
From this relation, it can be determined that for
Figure imgf000059_0002
Since set K denotes the i stages that must be set to a particular state to implement a desired partition and since the above equations have shown that
K = {w1, w2} = {0, 1}, it is therefore known that in order to implement the desired partition, all switching elements of stages 0 and 1 (as denoted by set K) must be set to either state 0 or 1. The determination of whether stages 0 and 1 (as denoted by set
K) are to be set to state 0 or to state 1 is made according to the destination addresses that the user of the present invention designates as the desired outputs of the
subnetwork with source addresses {0, 2, 4, 6} (decimal) as previously specified in this example by set E. It is now possible to calculate set F from I-1 and K where set F denotes the i = 2 identical destination address bits for the 2n-i = 4 binary destination addresses of the subnetwork with source addresses { 0, 2, 4,
6} (decimal) where
Figure imgf000060_0001
Substituting in the specific values for I-1 and K for this example yields
Figure imgf000060_0002
That is to say, for the subnetwork with source addresses {0, 2, 4, 6} (decimal), the i = 2 bits d and d2 will be the same for each of the four binary destination addresses of that subnetwork. Since i = 2, there are 2i = 4 possible sets of destination addresses that the user of the present invention may assign to the subnetwork with source addresses {0, 2, 4, 6} (decimal). These four sets of
destination addresses are as follows: Source Addresses = { 0, 2, 4, 6 } 4 Possible Sets of 4 Destination Addresses
{(d3, d2, d1, d0)}
Set A: {(0, 0, c, c)} = {0, 1, 2, 3}
(binary) (decimal)
Set B: {(0, 1, c, c)} = {4, 5, 6, 7}
(binary) (decimal)
Set C: {(1, 0, c, c)} = {8, 9, 10, 11}.
(binary) (decimal)
Set D: {(1, 1, c, c)} = {12, 13, 14, 15}.
(binary) (decimal)
The binary values of bits d2 and can be
Figure imgf000061_0003
selected to specify the set of destination addresses from the above list for the desired subnetwork. This is
accomplished by selecting the appropriate value for each of the i = 2 destination address bits d2 and d
Figure imgf000061_0002
in
accordance with the equation >
Figure imgf000061_0001
Figure imgf000062_0005
where v0 represents the binary value of destination address bit d2 and v1 represents the binary value of destination address bit
Figure imgf000062_0003
Since it has already been arbitrarily decided at the beginning of this example that the destination addresses {(d3, d2, d1, d0)} of the subnetwork with source addresses {0, 2, 4, 6} (decimal) are to be {0, 1, 2, 3} (decimal) = {(0, 0, c, c) } (binary), it is thereby specified that
Figure imgf000062_0004
It has already been shown that for this example, i = 2, and set K = {w1, w2, ...,wi} = {w1, w2} = {0, 1} which denotes that stages 0 and 1 must be set to either state 0 or
1 state in order to implement a subnetwork with source addresses {0, 2, 4, 6} (decimal). Once the binary values of destination bits d2 and
Figure imgf000062_0001
are determined according to what set of destination addresses are chosen to compose the desired subnetwork, the state to which each stage wj. should be set in order to impl
Figure imgf000062_0002
ement that subnetwork can then be determined by the equation
Figure imgf000063_0001
Substituting the specific values of O, and K for this example yields
Figure imgf000063_0003
>
Figure imgf000063_0002
This equation is equivalent to the two equations t0 - s3 XOR v0; t1 = s0 XOR v1. Since the set of source addresses S = {(s3, s2, s1, s0)} for the desired subnetwork has been chosen for this example to be {0, 2, 4, 6} (decimal) which is equivalent to {(0, c, c, 0)} (binary), then it is known that
s3 = 0 and s0 = 0.
Similarly, because the destination addressesD = {(d3, d2, d1, d0)} for the desired subnetwork have been chosen for this example to be {0, 1, 2, 3} (decimal) which is equivalent to {(0, 0, c, c) } (binary) and because it has also been determined for this example that v0 = d2 and
V1 = d3, then it is likewise known that
Figure imgf000064_0001
Therefore, the states t0 and t1 of stages 0 and 1, respectively, that will implement the partition to yield the subnetwork with source and destination addresses (0, 2 , 4, 6} (decimal) and {0, 1, 2, 3} (decimal), respectively, can now be calculated by the equations
t0 = s3 XOR v0 t1 = s0 XOR v1 = 0 XOR 0 = 0 XOR 1
= 0 = 1
Thus, the desired subnetwork with source and destination addresses {0, 2, 4, 6} (decimal) and {0, 1, 2, 3} (decimal), respectively, will be produced by setting all of the switching elements in stage 0 and 1 to states 0 and 1, respectively.
Fig. 17 depicts the network of Fig. 7 with the switching elements of stage 0 set to state t0 = 0 and the switching elements of stage 1 set to state t1 = 1 in
accordance with the above example. As Fig. 17 also depicts, the subnetwork with the set 80 of source addresses 40 equal to {0, 2, 4, 6} (decimal) and the set 85 of destination addresses 50 equal to {0, 1, 2, 3} (decimal) is denoted as subnetwork 1. Since the switching elements of stages 0 and 1 are set to states 0 and 1, respectively, a total of four 4 x 4 subnetworks are created and are denoted in Fig. 17 as subnetworks 1-4.
Any BPC MIN of arbitrary size may be partitioned into multiple disjoint BPC MIN's of specified size and source and destination addresses by using the method and apparatus set forth in the above disclosures and examples. Furthermore, since each subnetwork that results from the partitioning of a BPC MIN is itself a BPC MIN, the method and apparatus of this invention may be used to further partition any subnetwork into multiple subnetworks.
Permutation Passability
A network permutation is defined to be a one-toone mapping of each input of a network to exactly one output of that network. For any network which contains N inputs and N outputs, there are a total of N! possible different
permutations. Although any single N x N BPC type MIN can realize only a relatively small fraction of all N! possible permutations, the complete set of all BPC type MINs taken together can indeed realize all N! possible permutations.
That is to say, although any BPC MIN of the same size can realize exactly the same number of permutations, one
particular BPC MIN may be able to realize a certain
permutation or group of permutations that another BPC MIN can not. As a result of these differences in the ability of one
BPC MIN to realize a specific permutation that another BPC
MIN can not, one BPC MIN may therefore be better suited than another to implement a particular type of parallel processing algorithm. For these reasons, it is desirable to know which permutations or groups of permutations can be realized (i.e. passed) by an arbitrary BPC MIN.
This invention provides a method and apparatus to determine, for any arbitrary permutation and arbitrary N x N
BPC MIN, whether that permutation can be realized (i.e., is passable) by that MIN. This invention also provides a method and apparatus to determine the complete set of all
permutations that are passable by an arbitrary BPC MIN.
Additionally, this invention provides a method and apparatus to specify a BPC MIN which can realize any particular
arbitrary permutation.
Determining Permutation Passability for an
Arbitrary Permutation and BPC MIN
For any arbitrary n stage BPC MIN, it is possible to determine from that MIN's characteristic O and I vectors whether any arbitrary permutation is passable through that BPC MIN. More specifically, permutation passability is determined by determining whether the permutation in question will cause a blocking condition in any of the stages of the BPC MIN being considered. If the permutation in question causes a blocking condition in one or more of the stages of the BPC MIN under consideration, then that permutation is not passable through that MIN. If, on the other hand, the permutation can indeed be routed through all of the stages of the BPC MIN without creating any blocking conditions, then that permutation is routable through the BPC MIN being considered.
In order to determine permutation passability for an n stage BPC MIN, it is first necessary to partition that
MIN into several different specific partitions which are specially chosen for the purpose of determining permutation passability. It is then necessary to determine the
destination routing tags for the permutation and network in question and to make certain determinations based on the values of these routing tags. More specifically, it is necessary to determine whether certain requirements are satisfied by the values of the routing tags that would, during normal network routing, accompany the messages passing through each of the individual subnetworks formed by the specially chosen partitions.
It should be noted that it is not necessary to actually route a permutation through a BPC MIN in order to determine whether that permutation is passable through that MIN. Rather, it is only necessary to determine the
destination routing tags that would be used to route that permutation through the BPC MIN in question and then to make certain determinations based on the values of those
destination routing tags. The determination of permutation passability is thus unrelated to the method and apparatus that is actually used to route a permutation through a BPC MIN.
The determination of permutation passability incorporates the concept of Complete Residue Systems (CRSs). A Complete Residue System modulo m (CRS mod m) set is defined to be a set of m integers whose elements, when each is
individually used as an operand of the modulo m function, yield exactly one instance of every possible remainder resulting from that modulo m function (i.e., exactly one instance each of {0, 1, 2,..., m-1}).
For example, the sets {1, 0} and {8, 13} each constitute a CRS mod 2 set because when the two elements in either of these two sets are used as operands of the modulo 2 operation, the result yields exactly one instance of every possible remainder of the modulo 2 operation (i.e., one instance each of 0 and 1). Specifically, the set given by { 1, 0 } constitutes a CRS mod 2 set because
1 mod 2 = 1 and
0 mod 2 = 0 and the set given by {8, 13} constitutes a CRS mod 2 set because
8 mod 2 = 0 and
13 mod 2 = 1.
Similarly, {0, 9, 2, 11} and {1, 20, 15, 14} each constitute a CRS mod 4 set because when the four elements of either of these two sets are used as operands of the modulo 4 operation, the result yields exactly one instance of every possible remainder of the modulo 4 operation (i.e., one instance each of 0, 1, 2, and 3).
Specifically, the set given by {0, 9, 2, 11} constitutes CRS mod 4 set because
0 mod 4 = 0,
9 mod 4 = 1,
2 mod 4 = 2 and
11 mod 4 = 3. The set given by {1, 20, 15, 14} constitutes a CRS mod 4 set because
1 mod 4 = 1,
20 mod 4 = 0,
15 mod 4 = 3 and
14 mod 4 = 2.
Although n stage BPC MINs may, in general, be built with a x b crossbar switching elements and have a source addresses and bn destination addresses, BPC MINs are commonly built using 2 x 2 crossbar switches and commonly have 2n source and 2n destination addresses.
This invention can be used to determine permutation passability for an arbitrary permutation and an arbitrary n stage N x N BPC MIN which employs 2 x 2 crossbar switching elements. Substantially equivalent methods and apparatus may be used to determine permutation passability for BPC MIN's built with switching elements of arbitrary size.
Permutation passability is determined by first partitioning the BPC MIN under consideration into exactly n-1 specifically chosen partitions which in turn will yield a total of exactly N-2 different subnetworks. The destination tags for the permutation in question must then be determined and the N-2 sets of destination tags that would, during normal network routing, appear at the inputs of each of these respective N-2 subnetworks are checked to determine whether each set of tags constitutes a Complete Residue System.
If, for each of the N-2 subnetworks in question, the tags that, during normal network routing, would appear at the inputs of those subnetworks constitute a Complete Residue System, then the permutation in question is passable through the BPC MIN under consideration. If, on the other hand, one or more of the N-2 sets of tags that would, during normal network routing, appear at the inputs of the N-2 subnetworks does not constitute a Complete Residue System, then the permutation in question is not passable through the BPC MIN under consideration.
As depicted in the flowchart of Figs. 19a, 19b, and 19c, to determine permutation passability for an n-stage
BPC MIN, n-1 specially chosen partitions must first be determined. These n-1 specially chosen partitions in turn yield a total of N-2 different subnetworks. Specifically, the n-1 partitions that must be determined are the partitions which are created by setting all of the switching elements in the i stages, numbered n-i through n-1 (inclusive), to either state 0 or 1 where i is an integer such that 1 ≤ i ≤ n-1 (Box
160, 163). Each of these n-1 partitions in turn forms 2i
2n-i x 2n-i (n-i) stage disjoint subnetworks.
Therefore, the first of the n-1 specially chosen partitions that must be considered when determining
permutation passability for a BPC MIN such as the one
depicted in Fig. 7 is the partition which is created by setting all of the crossbar switching elements of the
rightmost stage to either state 0 or 1. The second of the n-1 specially chosen partitions to be considered is the one which is created by setting all of the crossbar switching elements of the two rightmost stages to state 0 or 1; the third partition is the one which is created by setting all of the crossbar switching elements of the three rightmost stages to state 0 or 1.
It should be noted that whenever partitioning a BPC MIN in order to determine permutation passability, the crossbar switching elements of the leftmost stage of that MIN are never set to state 0 or 1. It should also be noted that when partitioning BPC MINs for the purpose of determining permutation passability, stages n-i through n-1 may be arbitrarily set to either state 0 or 1 as long as all of the switching elements in any single stage are set to the same state. The 2i 2n-i x 2n-i multiple disjoint subnetworks formed by each of the individual n-1 partitions divide the n stage BPC MIN's N source addresses {0, 1, 2, ..., N-1} into
2i disjoint groups of 2n-i source addresses each. For any single i, each of these 2i disjoint groups of 2n-i source addresses is defined to be ψ(i, u)
(Box 161) where 0 ≤ u < 2i-1 for any i (Box 162). Therefore, for the set of all i such that 1 ≤ i ≤ n-1, the total of all
N-2 groups of 2n-i source addresses is given by ψ(i' u) where 1 ≤ i < n-1 and where 0 ≤ u ≤ 2i-1. That is to say ψ(i, u), where 1 ≤ i ≤ n-1 and where 0 ≤ u ≤ 2i-1 represents all of the disjoint sets of the n stage BPC MIN's source addresses that are assigned to each of the individual
multiple disjoint subnetworks that are created when the switching elements of stages n-1 through n-i (inclusive) are set to either state 0 or 1 (Box 164).
To determine permutation passability, once the n-1 specially chosen partitions and each of those partitions' respective disjoint subnetworks have been determined for the BPC MIN in question, the permutation being examined is then considered. Specifically, the destination routing tags for every source to destination connection in the permutation must be calculated (Box 165) and then certain determinations are made based on the values of these destination routing tags.
A permutation for an N x N network is defined to be a one to one mapping of that network's N source addresses to its N destination addresses. A permutation for an N X N network is represented by Source Address S: 0 1 2 . . . N-1
Destination
Address D(S) : D(0) D(1) D(2) . . . D(N-1)
where D(0), D(1), D(2) and D(N-1) are the binary
representations of the destination addresses to which source addresses 0, 1, 2, and N-1, respectively, are to be connected as specified by that permutation.
It has been shown that the destination routing tags for an arbitrary permutation and BPC MIN may be uniquely determined from the characteristic I vector of that BPC MIN according to the equation
TAG D * I-1 where D is the binary representation of that MIN's
destination addresses and I-1 is the inverse of the
characteristic I vector of that BPC MIN. Thus, a permutation for an N x N network together with the destination routing tags for that permutation are represented by
Source
Address S: 0 1 2 . . .N-1
Destination
Address D(S) : D(0) D(1) D(2) . . . D(N-1)
Destination
Routing Tags
Tag(S): Tag(0) Tag(l) Tag(2) Tag(N-l)
=D(S)*I-1: =D(0)*I- 1 =D(1)*I-1 =D(2)*I-1 =D(N-1)*I-1 where Tag(0) = D(0)*I-1, Tag(1) = D(1)*I-1, Tag(2) = D(2)*I-1 and Tag(N-1) = D(N-1)*I-1 represent the destination routing tags for source addresses 0, 1, 2, and N-1, respectively, and I-1 represents the inverse of the characteristic I vector of the N x N BPC MIN for which permutation passability is being determined.
Once the destination routing tags for the
permutation in question have been determined, they may be used together with the N-2 sets of source addresses, ψ(i, u), to determine permutation passability for the permutation and
BPC MIN under consideration. It has been shown that the N-2 sets of source addresses ψ(i, u), where 1 ≤ i ≤ n-1 and 0≤ u
≤ 2i-1 , are the individual sets of source addresses that compose each of the individual N-2 subnetworks that are formed when the n stage BPC MIN in question is partitioned in n-1 specially chosen different ways for the purpose of determining permutation passability.
For any permutation and any BPC MIN that has been specially partitioned in n-1 different ways as described above, that permutation is passable through that BPC MIN if and only if for each 1 ≤ i ≤ n-1 and each 0 ≤ u ≤ 2i-1, every set of destination tags
Figure imgf000073_0001
constitutes a CRS mod 2n-i set where D(j) * I-1 = Tag (j) represents the destination routing tag for source address j of the n stage BPC MIN and permutation under consideration, ψ(i, u) represents the set of 2n-1 source addresses that compose the subnetwork denoted by u which is formed by the partition created by setting i stages (stage n-i through n-1, inclusive) to state 0 or 1, and ELT denotes "is an element of" (Box 166). That is to say, the permutation in question is passable through the n stage BPC MIN being considered if and only if, for every individual subnetwork formed by the n-1 specially chosen partitions, the 2i individual sets of 2n-i destination tags which appear at the 2n-i inputs to each of these subnetworks constitutes a CRS mod 2n-i set. If one or more set of destination tags appearing at the 2n-i inputs of any one of the 2i subnetworks formed by any one of the n-1 partitions does not constitute a CRS mod 2 set, then the permutation in question is not passable through the n stage
BPC MIN being considered.
To determine permutation passability, a total of
N-2 different subnetworks must be determined and N-2
corresponding sets of destination routing tags must be checked to determine if they each constitute a CRS set.
Because these N-2 events take place independently of one another, it is possible to use N-2 sets of identical but independent control circuitry to rapidly determine
permutation passability.
A permutation passability example is now presented. The objective in this example is to determine whether the four stage 16 x 16 BPC MIN depicted in Fig. 7 is capable of realizing the permutation given by Source
Address S: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Destination
Address D(S) : 12 4 14 6 13 5 15 7 8 0 10 2 9 1 11 3 in which source address 0 is connected to destination address
12, source address 1 is connected to destination address 4, and so on. For the network of Fig. 7, it has been shown that N = 16,
n = 4,
O = (-1, -2, 0, 3), and
I-1 = (-1, 0, 3, -2)-1 = (1, -0, -3, 2).
To determine whether this four stage BPC MIN can realize the above permutation, it is first necessary to determine n-1 = 4-1 = 3 specially chosen partitions of that BPC MIN and to identify the N-2 = 16-2 = 14 disjoint sets of source addresses ψ(i, u) that compose the N-2 = 14
subnetworks formed by those 3 specially chosen partitions. It will then be possible to determine whether the sets of destination routing tags that correspond to the sets of source addresses ψ(i, u) composing those subnetworks
constitute CRS sets. If all 14 sets of destination routing tags constitute CRS sets, then the permutation in question is indeed passable. If, on the other hand, one or more of those 14 sets of destination routing tags does not constitute a CRS set, then the permutation in question will be blocked by the network of Fig. 7 and is therefore not passable.
The 3 specially chosen partitions that must be considered in order to determine permutation passability have been shown to be the 3 partitions that are created by setting all of the switching elements in the i stages numbered n-i through n-1 (inclusive) to either state 0 or 1 where 1 ≤ i ≤ n-1. Since n-1 = 4-1 = 3 for this particular example, the 3 partitions of interest for which i = 1, 2, and 3,
respectively, are given by i How to Implement Partition i = 1 Set the switching elements in stage 3 to either state 0 or 1; i = 2 Set the switching elements in stages 2 and 3 to either state 0 or 1; i = 3 Set the switching elements in stages 1, 2 and 3 to either state 0 or 1.
It has been shown that if the switching elements of certain specific stages of any BPC MIN are set to state 0 or 1 for the purpose of partitioning, the laws of
partitioning can be used to determine the sets of source addresses ψ(i, u) that compose the subnetworks formed by that partition. Furthermore, these sets of source addresses may be uniquely determined from the characteristic O vector of the BPC MIN being partitioned. Specifically, if i stages of an n stage BPC MIN are set to either state 0 or 1, 2i disjoint (n-i) stage subnetworks are formed and i bits of the source addresses assigned to each subnetwork are always identical.
For an arbitrary n stage BPC MIN, the set of i stages which are set to either state 0 or 1 for the purpose of partitioning has been defined to be set K where
K = {w1, w2, ..., wi} and where K SUB {0, 1, ..., n-1} where SUB denotes "is a subset of". Furthermore, the set of i identical source address bits for each disjoint subnetwork is defined by set E where
Figure imgf000077_0001
where 0 is the characteristic O vector of the original BPC
MIN being partitioned, S is the binary representation of that network's source addresses, denotes the wjth bit of
Figure imgf000077_0002
the binary source address permuted by O, and ELT denotes "is an element of".
For this example, set K and set E must be
determined for each of the three partitions of interest for which i = 1, 2, and 3.
For i = 1, K = {w1} = {n-1} = {3}; for i = 2, K = {w1, w2) = {n-2, n-1} = {2, 3}; and for i = 3, K = {w1, w2, w3 } = {n-3, n-2, n-1}
= {1, 2, 3}.
Set E can now be determined from set K and from the characteristic O vector of the four stage BPC MIN of Fig. 7. For this particular example in which there are three different partitions that must be considered, set K must be determined for each of these three partitions for which i = 1, 2, and 3. For i = 1 where K = {3}, V
Figure imgf000078_0001
Similarly, for i = 2 where K = {2, 3},
Figure imgf000078_0002
for i = 3 where K = {1, 2, 3}, V
V
Figure imgf000079_0001
Set E has thus been determined for each of the three specially chosen partitions that must be considered in order to determine permutation passability in this example.
The partition for which i = 1 in this particular example yields 2i 2n-i x 2n-i = 21 24-1 x 24-1 = 2 8 x 8 disjoint subnetworks; the partition for which i = 2 yields 22 24-2 x 24-2 = 4 4 x 4 dιs3θint subnetworks; the partition for which i = 3 yields 23 24-3 x 24-3 = 8 2 x 2 disjoint
subnetworks. It has been shown that for each specially chosen partition which is specified by i, the 2i sets of 2n-i source addresses composing the 2i 2n-i x 2n-i disjoint subnetworks that are formed by that partition are specified by ψ(i, u) where 1 ≤ i ≤ n-1 and 0 ≤ u ≤ 2i-1.
It has been shown that when partitioning a BPC MIN into multiple disjoint subnetworks, the set of source
addresses ψ(i, u) that composes each subnetwork can be determined from set E, which denotes the set of i identical source address bits for each of those same subnetworks.
Specifically, for the partition for which i = 1 and E =
Figure imgf000079_0002
the 2i = 2 disjoint subnetworks formed by that partition have 2 corresponding sets of source addresses ψ(i, u) which are denoted by ψ(1, 0) and ψ(1, 1). ψ(1, 0) and ψ(1, 1) are given by such that
Figure imgf000080_0001
ψ(1, 0) = {(c, c, 0, c)} = {0, 1, 4, 5, 8, 9, 12, 13} and
(binary) (decimal) ψ(1, 1) = {(c, c, 1, c)} = {2, 3, 6, 7, 10, 11, 14, 15}
(binary) (decimal) where c represents a binary variable which is equal to either
0 or 1.
Similarly, for the partition for which i = 2 and tne 2 = 4 disjoint subnetworks formed by that
Figure imgf000080_0002
partition have 4 corresponding disjoint sets of source addresses ψ(i, u) which are denoted by ψ(2, 0), ψ(2, 1), ψ(2,
2), and ψ(2, 3). ψ(2, 0), ψ(2, 1), ψ(2, 2) and ψ(2, 3) are given by such that
Figure imgf000080_0003
ψ(2, 0) = {(c, 0, 0, c)} = {0, 1, 8, 9},
(binary) (decimal) ψ(2, 1) = {(c, 0, 1, c)} = {2, 3, 10, 11},
(binary) (decimal) ψ(2, 2) = {(c, 1, 0, c) } = {4, 5, 12, 13}, and
(binary) (decimal) ψ(2, 3) = {(c, 1, 1, c) } = {6, 7, 14, 15}.
(binary) (decimal)
For the partition for which i = 3 and E =
Figure imgf000081_0001
the 2i = 8 disjoint subnetworks formed by that partition
Figure imgf000081_0002
have 8 corresponding disjoint sets of 4 source addresses ψ(i, u) which are equal to ψ(3, 0), ψ(3,1), ψ(3, 2), ψ(3, 3), ψ(3,
4), ψ(3, 5), ψ(3, 6), and ψ(3, 7). ψ(3, 0), ψ(3, 1), ψ(3, 2), ψ(3, 3), ψ(3, 4), ψ(3, 5), ψ(3, 6), and ψ(3, 7) are given by such that
Figure imgf000081_0003
ψ(3, 0) = {(c, 0, 0, 0)} = {0, 8},
(binary) (decimal) ψ(3, 1) = {(c, 0, 0, 1)} = {1, 9},
(binary) (decimal) ψ(3, 2) = {(c, 0, 1, 0)} = {2, 10},
(binary) (decimal) ψ(3, 3) = {(c, 0, 1, 1)} = {3, 11},
(binary) (decimal) ψ(3, 4) = {(c, 1, 0, 0)} = {4, 12},
(binary) (decimal) ψ(3, 5) = {(c- 1- 0, 1)} = {5, 13},
(binary) (decimal) ψ(3, 6) = {(c, 1, 1- 0)} = {6, 14}, and
(binary) (decimal) ψ(3, 7) = {(c, 1, 1, 1)} = {7, 15}.
(binary) (decimal)
To summarize, the N-2 = 14 disjoint sets of source addresses ψ(i, u) for the three partitions for which i = 1, 2, and 3 are given by for i = 1, ψ(1, 0) = {(0, 1, 4, 5, 8, 9, 12, 13} (decimal), and ψ(1, 1) = {2, 3, 6, 7, 10, 11, 14, 15}
(decimal);
for i = 2, ψ(2, 0) = {0, 1, 8, 9} (decimal) ψ(2, 1) = {2, 3, 10, 11} (decimal), ψ(2, 2) = {4, 5, 12, 13} (decimal), and ψ(2, 3) = {6, 7, 14, 15} (decimal) and
for i = 3, ψ(3, 0) = {0, 8} (decimal), ψ(3, 1) = {1, 9} (decimal), ψ(3, 2) = {2, 10} (decimal), ψ(3, 3) = {3, 11} (decimal), ψ(3, 4) = {4, 12} (decimal), ψ(3, 5) = {5, 13} (decimal), ψ(3, 6) = {6, 14} (decimal), and ψ(3, 7) = {7, 15} (decimal).
For i = 1, in this example, the 2i 2n-i x 2n-i = 2 8 x 8 disjoint subnetworks that are formed when the crossbar switching elements in stage 3 of the network of Fig. 7 are set to state 0 or 1 are depicted in Figs. 20a and 20b.
Fig. 20a depicts the BPC MIN of Fig. 7 with the two 8 x 8 subnetworks corresponding to ψ(1, 0) and ψ(1, 1), respectively, highlighted for clarity. The 8 inputs 40 that compose each of the two subnetworks are also shown in Fig. 20a. Fig. 20b depicts a separate view of each of the two 8 x 8 subnetworks 75 which correspond to ψ(1, 0) and ψ(1, 1) and the 8 inputs 40 that compose each of those subnetworks.
Fig. 21a depicts the BPC MIN of Fig. 7 with the four 4 x 4 subnetworks corresponding to ψ(2, 0), ψ(2, 1), ψ(2, 2), and ψ(2, 3) highlighted for clarity. The four inputs 40 that compose each of the four subnetworks are also shown in Fig. 21a. Fig. 21b depicts a separate view of each of the four 4 x 4 subnetworks 75 which correspond to ψ(2, 0), ψ(2, 1), ψ(2, 2), and ψ(2, 3) and the 4 inputs 40 that compose each of those subnetworks.
Fig. 22a depicts the BPC MIN of Fig. 7 with the eight 2 x 2 subnetworks corresponding to ψ(3, 0), ψ(3, 1), ψ(3, 2), ψ(3, 3), ψ(3, 4), ψ(3, 5), ψ(3, 6), and ψ(3, 7) highlighted for clarity. The 2 inputs 40 that compose each of the eight subnetworks are also shown in Fig. 22a. Fig.
22b depicts a separate view of each of these eight 2 x 2 subnetworks 75 and the 4 inputs 40 that compose each of those networks.
Once the n-1 = 3 specially chosen partitions and the total of N-2 = 14 disjoint subnetworks and corresponding disjoint sets of source addresses ψ(i, u) have been
determined for the BPC MIN of Fig. 7, the destination routing tags for the permutation being considered must be determined. Once these destination routing tags are determined, those tags that would, during normal network routing, accompany the messages originating at each of the N-2 = 14 individual disjoint sets of source addresses ψ(i, u) are considered.
If, for each set of source addresses, ψ(i, u), the set of destination tags corresponding to these source addresses constitutes a CRS set, then the permutation in question is passable through the network of Fig. 7. If, on the other hand, one or more sets of destination tags corresponding to those sets of source addresses does not constitute a CRS set, then the permutation being considered is not passable through the network of Fig. 7.
For this particular example, the permutation for which permutation passability is to be determined has been shown to be
Source
Address S: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Destination
Address D(S): 12 4 14 6 13 5 15 7 8 0 10 2 9 1 11 3.
The destination routing tag corresponding to each source address from 0 through N-1 is given by Tag(S) = D(S) * I-1 where D is the binary representation of the destination address to which source address S is to be routed, and I-1 is the inverse of the characteristic I vector of the n stage BPC MIN under consideration
For source address S = 0, for example, the
destination address D(S) to which that source address should be routed during normal network routing is given by D(S) = 12. Thus, the destination routing tag Tag(S) for source address S = 0 is given by
Tag(S) = D(S) * I-1.
Substituting in the specific values for S, D(S), and I-1 for this example yields
Tag(0) = D(0) * I-1
(1 1 0 0) * (1, -0, -3, 2)
= (0 1 0 1) (binary)
= 5 (decimal). The destination routing tags for the entire
permutation may be determined by using the method set forth above. The permutation for this example, together with the destination routing tags for that permutation, are given by Source
Address S: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1
Destination
Address D(S): 12 4 14 6 13 5 15 7 8 0 10 2 9 1 11 3 Destination
Routing Tag
Tag(S)
=D(S)*I-1 : 5 7 13 15 1 3 9 11 4 6 12 14 0 2 8 10.
Once the destination routing tags for the
permutation under consideration have been determined, they can be used together with the N-2 = 14 sets of source
addresses ψ(i, u) that compose the 14 disjoint subnetworks in order to determine permutation passability.
Specifically, it has been shown that a permutation is
passable through a BPC MIN if and only if for each 1 ≤ i ≤ n-1 and each 0 ≤ u ≤ 2i-1, every set of destination tags
(Tag (j) | j ELT ψ(i, u)} which corresponds to the set of source addresses ψ(i, u) constitutes a CRS mod 2n-i set. In this equation, Tag(j) represents the destination routing tag corresponding to source address j of the n stage BPC MIN under consideration, ψ(i, u) represents the individual sets of 2n-i source
addresses that compose the subnetwork u formed by the
partition that results from setting a particular i stages (stage n-i through n-1, inclusive) to state 0 or 1, and ELT denotes "is an element of". If one or more sets of
destination tags represented by the above equation does not constitute a CRS mod 2n-i set, then the permutation being considered is not passable through the network in question.
For example, for i = 1, it must be determined whether each set of destination routing tags { Tag (j ) | j ELT ψ( i , u) } constitutes a CRS mod 2n-i = CRS mod 8 set for all u.
Substituting in the specific values for i = 1 for this particular example yields the 2i = 2 sets of destination tags denoted by
{Tag (j) | j ELT ψ(1, u)} For each u such that
0 ≤ u ≤ 2i-1
= 0 ≤ u ≤ 21-1 = 0 ≤ u ≤ 1, this equation yields the two sets of destination routing tagS
{Tag (j) | j ELT ψ(1, 0)} and
{Tag (j) | j ELT ψ(1, 1)}, each of which must constitute a CRS mod 8 set in order for the permutation in question to be passable.
For the first of these two sets of destination routing tags in which ψ(i, u) = ψ(1, 0), that set of tags is given by {Tag (j) | j ELT ψ(1, 0)} = (Tag (j) | j ELT {0, 1, 4, 5, 8, 9, 12, 13}} = {5, 7, 1, 3, 4, 6, 0, 2} which does indeed constitute a CRS mod 8 set because
5 modulo 8 = 5,
7 modulo 8 = 7,
1 modulo 8 = 1,
3 modulo 8 = 3,
4 modulo 8 = 4,
6 modulo 8 = 6,
0 modulo 8 = 0, and
2 modulo 8 = 2 and the set of remainders {5, 7, 1, 3, 4, 6, 0, 2} includes one instance of every possible remainder of the modulo 8 operation (i.e., one instance each of the integers from 0 through 7).
Similarly, for the set of destination routing tags in which ψ(i, u) = ψ(1, 1), that set of tags is given by
{Tag (j) | j ELT ψ(1, 1)}
= {Tag (j) | j ELT {2, 3, 6, 7, 10, 11, 14, 15}} = {13, 15, 9, 11, 12, 14, 8, 10} which also constitutes a CRS mod 8 set because
13 modulo 8 = 5,
15 modulo 8 = 7,
9 modulo 8 = 1,
11 modulo 8 = 3,
12 modulo 8 = 4,
14 modulo 8 = 6,
8 modulo 8 = 0, and
10 modulo 8 = 2 and the set of remainders {5, 7, 1, 3, 4, 6, 0, 2} includes one instance of every possible remainder of the modulo 8 operation.
For i = 2, for this example, it must similarly be determined whether each set of destination routing tags denoted by {Tag (j) | j ELT ψ(2, u)
= {Tag (j) | j ELT ψ(2, 0), ψ(2, 1), ψ(2, 2), ψ(2, 3)} constitutes a CRS mod 2n-i = CRS mod 4 set.
For the first of these four sets of destination routing tags in which ψ(i, u) = ψ(2, 0), that set of tags is given by
{Tag (j) | j ELT ψ(2, 0)} = {Tag (j) | j ELT {0, 1, 8, 9}}
= {5, 7, 4, 6,} which indeed constitutes a CRS mod 4 set.
Similarly, for ψ(2, 1),
{Tag (j) | j ELT ψ(2, 1)}
= {Tag (j) | j ELT {2, 3, 10, 11}}
= {13, 15, 12, 14}; for ψ(2, 2),
{Tag (j) | j ELT ψ(2, 2)}
= {Tag (j) | j ELT {4, 5, 12, 13}}
= {1, 3, 0, 2} and for ψ(2, 3), {Tag (j) | j ELT ψ(2, 3)}
= {Tag (j) | j ELT {6, 7, 14, 15}}
= {9, 11, 8, 10}.
Each of the sets of destination tags {5, 7, 4,
6}, {13, 15, 12, 14), {1, 3, 0, 2}, and {9, 11, 8, 10} does indeed constitute a CRS mod 4 set. Finally, for i = 3 in this example, it must be determined whether each set of destination routing tags denoted by
{Tag (j) | j ELT ψ(3, u)}
= {Tag (j) | j ELT ψ(3, 0), ψ(3, 1), ψ(3, 2),
ψ(3, 3), ψ(3, 4), ψ(3, 5), ψ(3, 6), ψ(3, 7)} constitutes a CRS mod 2n-1 = CRS mod 2 set.
For ψ(3, 0), for example,
{Tag (j) | j ELT ψ(3, 0)} = {Tag (j) | j ELT (0, 8)} = {5, 4} and
for ψ(3, 1),
{Tag (j) | j ELT ψ(3, 1)} = {Tag (j) | j ELT (1, 9)}
= {7, 6}.
Similar calculations can be made to determine the sets of destination routing tags that correspond to the remaining sets of source addresses ψ(3, u) where 0 ≤ u ≤ 7. The results of these calculations are summarized below. ψ(i, u) (Tag (j) | j ELT ψ(i, u)} ψ(3, 0) {5, 4}
ψ(3, 1) {7, 6}
ψ(3, 2) {13, 12}
ψ(3, 3) {15, 14}
ψ(3, 4) {1, 0}
ψ(3, 5) {3, 2}
ψ(3, 6) {9, 8}
ψ(3, 7) {11, 10}
Each of the sets of destination tags {5, 4}, {7, 6}, {13, 12}, {15, 14}, {1, 0), {3, 2}, {9, 8}, and {11, 10}
constitutes a CRS mod 2 set.
Therefore, the permutation of this example is indeed passable through the network of Fig. 7 because for this network and permutation and for each 1 ≤ i ≤ 3 and each 0 ≤ u ≤ 2i-1, the set of destination routing tags denoted by
{Tag (j) | j ELT ψ(i, u)} constitutes a CRS mod 2n-i set.
Fig. 23 depicts the network of Fig. 7 with its crossbar switching elements 30 configured to the states necessary to route the permutation of this example. The control circuitry which generates the signals to properly configure the crossbar switching elements is not shown. It should be noted that either distributed or centralized routing may be used to route the permutation of this example through the network of Fig. 7.
There are many permutations, however, that are not passable through the BPC MIN of Fig. 7. An example of one such permutation is given by
Source
Address S: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Destination
Address D (S): 12 4 7 6 13 5 15 14 8 0 10 2 9 1 11 3
This permutation, together with the destination routing tags for the permutation is given by
Source
Address S: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Destination
Address D(S) : 12 4 7 6 13 5 15 14 8 0 10 2 9 1 11 3
Destination
Routing Tag
Tag (S)
=D(S)*I-1 5 7 11 15 1 3 9 13 4 6 12 14 0 2 8
10.
For ψ(i, u) = ψ(2, 1) for this example, the set of
destination routing tags given by
{Tag (j) | j ELT ψ(2, 1)}
= {Tag (j) | j ELT {2, 3, 10, 11}
= {11, 15, 10, 2} does not constitute a CRS mod 2n-i = CRS mod 24-2 = CRS mod
4 set because 11 modulo 4 = 3
15 modulo 4 = 3
10 modulo 4 = 2
2 modulo 4 = 2 and the set of remainders {3, 3, 2, 2} clearly does not include one instance of every possible remainder of the modulo 4 operation (i.e., one instance each of the integers from 0 through 3). The permutation of this particular example is therefore not passable through the network
depicted in Fig. 7 and will cause a blocking condition in that network if routing is attempted.
Possible Implementations of Network Routing,
Partitioning, and Permutation Passability
Illustrative apparatus for implementing the
invention is shown in Fig. 32. The apparatus comprises a MIN 300, a processor array 305 which uses the MIN 300, to
communicate with a device 310 which is also connected to the
MIN 300, and control circuitry 320 which determines the control signals necessary to implement routing in the MIN
300. The control circuitry 320 illustratively is comprised of a central processor unit (CPU) 321, a random access memory
(RAM) 322, a read only memory (ROM) 323, and input-output circuitry 324. CPU 321 is comprised at least partly of an arithmetic logic until (ALU) 325 which performs the logical operations necessary to generate the control signals that implement routing in the MIN 300.
A MIN can be routed using either distributed or centralized routing. If distributed routing is used, the control circuitry 320 determines the destination routing tags necessary to implement the desired communication by
calculating the I vector, determining its inverse and using message. These tags are determined using information (i.e., the destination address) communicated from the processor array 305 to the control circuitry 320 via communication bus
340. The tags are then sent over the communication bus 330 to the processor array 305 so that the tags may accompany the messages passing from the processor array to the device 310 through the MIN 300.
If, on the other hand, centralized routing is used, the control circuitry 320 determines the states of all switching elements (not shown) of the MIN 300 by calculating the I vector, determining its inverse and using said inverse together with the MIN's permutation vectors to calculate said switch states as set forth above. Again, these switch states are determined using destination address information
communicated from the processor array 305 to the control circuitry 320 via communication bus 340.
Because partitioning involves setting the states of certain switching elements to a predetermined value, the apparatus depicted in Fig. 32 can also be used to partition the MIN 300 into multiple smaller disjoint networks. The desired partition as specified by the processor array 305 is communicated to the control logic 320 via communication bus 340. The control logic then calculates the O vector, I vector, and the inverse of the I vector and determines the switch settings to implement the desired partition. The necessary crossbar switching element settings are then implemented using either distributed or centralized routing as described above.
The apparatus depicted in Fig. 32 can also be used to determine permutation passability. The permutation which the processor array 305 desires to pass is communicated to the control logic 320 via communication bus 340. The O vector, I vector, and the inverse of the I vector are then calculated and permutation passability is determined and communicated to processor array 305 via communication bus
340. If the desired permutation is passable, the apparatus of Fig. 32 can then implement the routing of that permutation as described herein.
As will be apparent to those skilled in the art, numerous alternatives may be employed in the practice of this invention. There are many other ways to determine
destination routing tags and switch states in order to implement network routing. There are also many ways to implement partitioning and the determination of permutation passability.
Determining the Set of All Permutations
Which are Passable by an Arbitrary BPC MIN
The method and apparatus of this invention can be used to determine the complete set of all permutations that are passable by any arbitrary BPC MIN. In this way, it can be determined which particular BPC MIN is best suited to implement a particular type of parallel processing algorithm. This is because different types of parallel processing algorithms typically require that different permutations or groups of permutations be implemented.
It has been shown that for any permutation for which the destination routing tags have been calculated, that permutation is passable on an arbitrary N x N n stage BPC MIN if and only if for each 1 ≤ i ≤ n-1 and each 0 ≤ u < 2i-1, every set of destination tags denoted by
{ Tag (j ) | j ELT ψ ( i , u) }
{ D (j ) * I- 1 | j ELT ψ ( i , u) constitutes a CRS mod 2n-i set where {D(j) * I-1 = Tag(j) represents the destination routing tag for source address j of that n stage BPC and that permutation, ψ(i, u) represents the set of 2n-i source addresses that compose the subnetwork u formed by the partition created by setting i stages (stage n-i through n-1, inclusive) to state 0 or 1, and ELT denotes "is an element of".
Therefore, for an arbitrary BPC MIN, any permutation that satisfies these requirements is passable through that MIN. Thus, for an arbitrary BPC MIN, the entire set of permutations that satisfy these requirements is the complete set of all permutations passable through that MIN.
Determining a BPC MIN Which Can
Pass an Arbitrary Permutation
The method and apparatus of this invention can be used to determine a BPC MIN which is capable of passing any specified permutation. In this way, a BPC MIN can be
selected for a particular multiprocessor application
according to which permutations are required in order to implement that application.
It has been shown that it is possible to simulate any BPC MIN on any other BPC MIN, by properly reordering the inputs and outputs of the simulating MIN. Therefore, once a first BPC MIN which can pass an arbitrary permutation has been determined, the inputs and outputs of any other second BPC MIN can be appropriately reordered to allow that second BPC MIN to simulate the first BPC MIN which will in turn allow the second BPC MIN to pass the arbitrary permutation. In this way, any BPC MIN can be used to implement an
arbitrary permutation if the inputs and outputs of that BPC MIN are reordered appropriately.
Although n stage BPC MINs may, in general, be built with a x b crossbar switching elements and have a source addresses and b destination addresses, BPC MINs are commonly built using 2 x 2 crossbar switching elements and commonly have 2n source and 2n destination addresses. For the purpose of this presentation, the n stages of an n stage BPC MIN are assumed to be numbered from 0 through n-1 with stage 0 defined to be the leftmost stage. Additionally, the N/2 crossbar switching elements composing any single stage are assumed to be numbered from 0 through N/2 - 1 with crossbar switching element 0 defined to be the topmost switching element in any stage.
This invention allows any arbitrary n stage N x N BPC MIN which employs 2 x 2 crossbar switching elements to be configured so that it can pass any arbitrary permutation. More specifically, this invention allows for the inputs of a Modified Data Manipulator network, such as the one depicted in Fig. 6, to be reordered in a way which will enable that network to pass any specific arbitrary permutation. The Modified Data Manipulator network is particularly well suited for input reordering for the purpose of passing arbitrary permutations because of the nature of that network's
characteristic O and I vectors. Once the input reordering scheme has been determined that will allow the Modified Data Manipulator to pass the arbitrary permutation under
consideration, any BPC MIN may be used to simulate and replace the Modified Data Manipulator network. This
effectively allows any BPC MIN to be used to implement any arbitrary permutation. It should be noted that certain networks other than the Modified Data Manipulator network also have
characteristic O and I vectors that make those networks well suited for input reordering and passing arbitrary
permutations by using methods similar to the method presented herein. Because a Modified Data Manipulator network can be simulated by any BPC MIN, however, it does not matter which network is initially selected for input reordering because, ultimately, any BPC MIN can be chosen to implement the arbitrary permutation that must be passed.
It has been shown that a permutation for an N x N network is defined to be a one to one mapping of that
network's N source addresses to its N destination addresses and is represented by
Source Address S: 0 1 2 . . . . . . N-1
Destination
Address D(S): D(S) D(0) D(1) D(2) . . . D(N-1) where D(0), D(1), D(2) and D(N-1) are the binary
representations of the destination addresses to which source addresses 0, 1, 2, and N-1, respectively, are to be connected according to that permutation.
As depicted in the flowchart in Figs. 24a, 24b, and 24c, in order to reorder a Modified Data Manipulator's inputs to allow that network to pass any arbitrary
permutation, it is first necessary to reorder the permutation that must be passed so that the destination addresses appear from left to right in ascending order (Box 170). The general permutation given above after it has been reordered in such a manner is given by Destination
Address D: 0 1 2 . . . N-1
Source
Address S(D): S(0) S(1) S(2) . . . S(N-1) where 0, 1, 2, and N-1 are the destination addresses to which source addresses S(0), S(1), S(2), and S(N-1), respectively, are to be connected according to that reordered permutation.
Once the permutation has been reordered as
described above, the permutation is then divided into two groups which will be referred to as group 1 and group 2 (Box
171). Group 1 is composed of the source to destination address pairs which include destination addresses 0 through N/2 - 1 (inclusive); group 2 is composed of the source to destination address pairs which include destination addresses N/2 through N-1 (inclusive).
The source addresses in each of the two groups are then assigned in a certain way to the N inputs of stage 0 of the Modified Data Manipulator network. The assignment of these permutation source addresses to the network inputs specifies the pattern in which the inputs of the Modified Data Manipulator network should be reordered in order for that network to pass the desired permutation under
consideration.
More specifically, the dividing of an arbitrary reordered permutation given by
Destination
Address D: 0 1 2 . . . N-1 Source
Address S(D): S(0) S(1) S(2) . . . S(N-1) into two groups designated as group 1 and group 2 is given by
Group 1
Destination
Address D: 0 1 . . N/2 - 1
Source
Address S(D): S(0) S(1) . . S(N/2 - 1) and
Group 2
Destination
Address D: N/2 N/2 + 1 . . N-1
Source
Address S(D): S(N/2) S(N/2 + 1) . . S(N-1) To reorder the Modified Data Manipulator's inputs, the first source address (source address S(0)), which appears in group 1 of the divided reordered permutation is paired together with the corresponding first source address (source address S(N/2)) that appears in group 2 of that reordered permutation. These two source addresses S(0) and S(N/2) are then assigned to the inputs of crossbar switching element 0 of stage 0 of the Modified Data Manipulator network (Box 172). Similarly, the second source address in group 1 and group 2 (source addresses S(1) and S(N/2 + 1), respectively) are paired together and are assigned to the inputs of
switching element 1 of stage 0. This process is continued until all of the source addresses of the permutation under consideration have been assigned to one of the Modified Data Manipulator network's stage 0 switching element inputs (Box 173, 174, and 175).
The assignment of source addresses to stage 0 inputs of the Modified Data Manipulator network defines the reordering of the network's inputs and is given by
Permutation Source Stage 0 Crossbar
Address S(D) Switching Element
S(0) 0
S(N/2) 0
S(1) 1
S(N/2 + 1) 1
. .
. .
. .
. .
S(N/2 - 2) N/2 - 2
S(N-2) N/2 - 2
S(N/2 - 1) N/2 - 1
S(N-1) N/2 - 1
The pattern in which the reordered permutation's source addresses are assigned to the stage 0 crossbar switching element inputs specifies the reordering of the Modified Data Manipulator network's inputs. Once the inputs of the network have been reordered according to the specified assignment of the permutation's source addresses to the network's stage 0 inputs, the Modified Data Manipulator network will pass the desired permutation without any
blocking.
If each of the two groups of the divided reordered permutation described above is, itself, divided into two groups, and each of those resulting four groups is again divided into two groups and so on, then it can be seen why the characteristic O and I vectors of the Modified Data
Manipulator network make it particularly well suited for input reordering for the purpose of passing arbitrary
permutations (Box 176). This may be demonstrated by first dividing group 1 and group 2 of the divided reordered
permutation into two groups each, thus producing a total of four groups containing N/4 source to destination pairs.
These four groups are given by
Group 1 Group 2
Destination
Address D: 0 through N/4 - 1 N/4 through N/2 -1
Source S ( 0) through S(N/4) through
Address S(D) S(N/4 - 1) S(N/2 - 1)
Group 3 Group 4
Destination
Address D: N/2 through 3N/4 - 1 3N/4 through N-1
Source S(N/2) through S(3N/4) through S(N-1) Address S(D) S(3N/4 - 1) The source addresses in group 1 are paired with those in group 2 and the source addresses in group 3 are paired with those in group 4 in a manner similar to that used to pair the source addresses to determine the
reordering of the Modified Data Manipulator network's inputs. This pairing of the source addresses in groups 1 and 2 and in groups 3 and 4 produces an assignment of permutation source addresses to the network's stage 1 inputs. More specifically, this assignment of permutation source addresses to stage 1 inputs results in the
interconnection pattern identical to that which connects stage 0 and stage 1 of a Modified Data Manipulator network.
For example, the first source address of group 1
(source address S(0)) is paired with the first source address of group 2 (source address S(N/4)) and these two source addresses, once they have passed through stage 0 and are connected to the outputs of that stage, are assigned to the inputs of crossbar switching element 0 of stage 1. If source address S(0) from group 1 is assigned to the upper input of switching element 0 of stage 1 and source address S(N/4) from group 2 is assigned to the lower input of switching element 0 of stage 1, then the resulting
connection is identical to the one which exists between stages 0 and 1 of a Modified Data Manipulator network.
If the remainder of the divided permutation's source addresses in groups 1 and 2 and then 3 and 4 are assigned to the inputs of stage 1 so that the source address from the lower numbered group is always assigned to the upper input of the designated switching element and the source address from the higher numbered group is always assigned to the lower input of that switching element, the interconnection pattern that results will be identical to the interconnection pattern connecting stages 0 and 1 in a Modified Data Manipulator network.
Similarly, if these 4 groups composing the reordered permutation are then divided into two groups each and the source addresses from groups 1 and 2 and from groups 3 and 4 and so on are paired together and assigned to the inputs of stage 2 in the manner set forth above, then the interconnection pattern which results will be that which connects stages 1 and 2 of a Modified Data Manipulator network. Therefore, in general, if the permutation groups which compose the reordered permutation are recursively divided into two groups each and the source addresses are paired together and are assigned to switching element inputs in an orderly fashion as specified above, the
interconnection patterns which result will be those which connect the stages of a Modified Data Manipulator Network.
To summarize, the inputs of a Modified Data
Manipulator network may be reordered to allow that network to pass any specified permutation. This reordering of the network inputs can be determined by first reordering the permutation that must be passed and then dividing that permutation into two groups. Source addresses from the two groups are paired together and are assigned in a particular fashion to the network's inputs thereby specifying the reordering of that network's inputs. Additionally, if this is performed recursively, the resulting interconnection patterns are identical to those that connect the stages of a Modified Manipulator network. For these reasons, the
Modified Data Manipulator is a particularly well suited BPC MIN for the application of reordering inputs in order to pass an arbitrary permutation.
An example is now presented in which the inputs of a Modified Data Manipulator network are reordered to allow that network to pass an arbitrary permutation. This example also illustrates how the recursive process of dividing the groups composing the reordered permutation into two groups each actually specifies the interconnection patterns which connect the stages of a Modified Data
Manipulator network. The objective of this example is to reorder the inputs of 16 x 16 Modified Data Manipulator Network depicted in Fig. 6 so that it can pass the arbitrary permutation given by Source Address S: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Destination
Address D(S): 9 4 2 5 7 0 13 15 1 8 6 3 14 10 11 12
For the Modified Data Manipulator of Fig. 6,
N = 16
n = 4
O = (1, 2, 3, 0) and
I = (0, 1, 2, 3).
Before reordering the network's inputs so that it can pass the above permutation, it is first necessary to reorder that permutation so that the destination addresses appear in ascending order. The reordered permutation is given by
Destination
Address D: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Source
Address S(D): 5 8 2 11 1 3 10 4 9 0 13 14 15 6 12 7.
This reordered permutation is now divided into two groups as depicted in Fig. 25. Group 1 is composed of source
addresses 90 S(0) through S(N/2 - 1) = S(7) and the destination addresses 95 associated with those source addresses. Group 2 is composed of source addresses S(N/2) =
S(8) through S(N-1) = S(15) and the destination addresses associated with those source addresses.
As indicated by the arrows in Fig. 25, source address S(0) = 5 from group 1 is paired with source address S(N/2) = S(8) = 9 from group 2 and these two source
addresses are assigned to crossbar switching element 0 of stage 0 of the Modified Data Manipulator network depicted in Fig. 6. Similarly, source address S(1) = 8 from group 1 is paired with source address S(N/2 + 1) = S(9) = 0 from group 2 and these two source addresses are assigned to crossbar switching element 1 of stage 0. This process is continued until all of the source addresses in groups 1 and 2 have been assigned to a crossbar switching element input at stage 0 of the Modified Data Mannpulator network.
Fig. 29 depicts this assignment of source
addresses 40 to the network inputs of the leftmost stage 60 which produces the input reordering scheme 70 that is to be used to pass the desired permutation through the Modified Data Manipulator network.
It should be noted that for this example, the source addresses from group 1 are always assigned to the upper input (i.e., input 0) of the designated stage 0 crossbar switching element and the source addresses from group 2 are assigned to the lower input (i.e. input 1) of that crossbar switching element. Although the source addresses from group 1 and group 2 may be assigned to either input of the designated crossbar switching element, if they are assigned in the way they have been in this example, the Modified Data Manipulator network will pass the desired permutation if all of the crossbar switching elements in stage 0 are set to state 0 (i.e., the pass-through state). If the two groups depicted in Fig. 25 are
recursively divided into two groups each, it can be
demonstrated how the process of recursively dividing
permutation groups and assigning the source addresses of those groups to switching element inputs actually specifies the interconnection patterns which connect the stages of a Modified Data Manipulator network. If groups 1 and 2 as depicted in Fig. 25 are each divided into two groups, a total of four groups will result; these groups are denoted as groups 1, 2, 3, and 4 and are depicted in Fig. 26. The source addresses of groups 1 and 2 are then paired together and the source addresses of groups 3 and 4 are paired together as depicted by the arrows in Fig. 26.
In particular, source address S(0) = 5 from group 1 is paired with source address S(N/4) = S(4) = 1 from group 2 and these two source addresses are assigned to crossbar switching element 0 of stage 1. Similarly, the remaining three pairs of source addresses from groups 1 and 2 of
Fig. 26 are assigned to crossbar switching elements 1, 2, and 3, respectively, of stage 1. Additionally, the four pair of source addresses from groups 3 and 4 are assigned to crossbar switching elements 4, 5, 6 and 7, respectively. If the source addresses which appear at the outputs of the stage 0 crossbar switching elements are each assigned to a stage 1 input, this assignment pattern produces an
interconnection pattern between stage 0 and stage 1.
More specifically, for any pair of source
addresses taken from any two groups, if the source address which originates from the group designated by the lower number (e.g., group 3 is designated by a lower number than group 4) is always assigned to the upper input (i.e., input 0) of the designated switching element and the source address from the higher numbered group is always assigned to the lower input (i.e., input 1) of that crossbar switch, then the interconnection pattern that results between stage 0 and 1 is identical to the interconnection pattern which connects stages 0 and 1 of the 16 x 16 Modified Data
Manipulator network of Fig. 6.
The interconnection patterns between stages 1 and 2 and between stages 2 and 3 can be similarly generated by further dividing the 4 groups depicted in Fig. 26 into a total of 8 and 16 groups, respectively, as depicted in Fig. 27 and Fig. 28, respectively. Source addresses from two of these groups are then paired together as depicted by the arrows in Figs. 27 and 28 and these pairs of source
addresses are assigned in an orderly fashion as set forth above to the crossbar switching elements in stages 2 and 3, respectively. If the source addresses from the lower numbered group of the two groups being paired are always assigned to the upper input (i.e., input 0) of the
designated crossbar switching element, then the
interconnection patterns that result are identical to those that connect stages 1 and 2 and stages 2 and 3,
respectively, of the Modified Data Manipulator network depicted in Fig. 6.
The Modified Data Manipulator network whose inputs have been reordered to pass the permutation under consideration is depicted in Fig. 29. The network of Fig. 29 will pass the permutation of this example if all crossbar switching elements of that network are set to state 0 (i.e., the pass-through state).
It should be noted that any BPC MIN can be used to simulate and replace the Modified Data Manipulator network of Fig. 6. In this way, any BPC MIN may be
configured to replace the Modified Data Manipulator network of Fig. 29 and may therefore be used to pass any arbitrary permutation. Possible Implementations of Determining
a BPC MIN to Pass a Specified Permutation
Illustrative apparatus for implementing the invention is shown in Figs. 30 and 31. As depicted in Fig.
30, the apparatus comprises a MIN 215, and an input
permutation matrix 220. Switching elements 230, inputs 240, outputs 250, stages 260 of MIN 215 and communication links
270 are the same as those of any one of the networks depicted in Figs. 1-6 and bear the same numbers incremented by 200.
The input permutation matrix 220 has the capability of connecting any one of its inputs to any one of its outputs.
The input permutation matrix implements the reordering of the inputs of the Modified Data Manipulator network to allow that network to pass any specified permutation.
Illustratively, as shown in Fig. 31, the input permutation matrix 220 comprises an array of multiplexers 222, one for each output of the matrix. Thus, for the specific embodiment shown in Figs. 30 and 31, the input permutation matrix comprises sixteen sixteen-to-one
multiplexers 222. Each of the sixteen multiplexers has sixteen inputs from each of the inputs to the matrix and a single output to a unique one of the sixteen outputs of the matrix. One of the inputs to each of the multiplexers is selected for connection to the multiplexer output by means of four control lines. The signals on the four control lines to each of the sixteen multiplexers are generated by control logic 223 which computes for the permutation which is
intended to pass through the network, which input is to be connected to which output. The control logic then generates the control signals which cause the multiplexer connected to that output to select the appropriate input. As will be apparent to those skilled in the art, numerous alternatives may be employed in the practice of this invention. Other ways may be found to determine a network capable of passing any specified permutation.

Claims

What is claimed is:
1. In a multi-stage interconnection network having a plurality of inputs, a plurality of outputs, a plurality of stages of switching elements and a plurality of communication links between successive stages as well as between the inputs and a first stage of switching elements and between a last stage of switching elements and the outputs, a method of routing messages in said multi-stage interconnection network comprising the steps of:
determining a first vector comprising the steps of:
numbering the inputs and outputs of the interconnection network sequentially in binary notation;
determining a mapping between the inputs and outputs of said interconnection network in the form of a series of permutations performed successively on the digits of the binary notation identifying the inputs, said
permutations specifying interconnection patterns established by the communication links between successive stages of the network and between the inputs and a first stage and between a last stage and the outputs, said permutations shifting each digit of the binary notation identifying the inputs into the least significant bit position at one of the stages of the multi-stage interconnection network, there being n stages of switching elements numbered from a stage 0 to a stage n-1; and
setting the first vector equal to a vector of n elements, the value of each element being equal to the number of the stage at which the bit in that same position in the binary notation identifying the outputs of the multistage network was shifted into the least significant bit position during said mapping; determining an inverse of the first vector;
calculating routing tags for the messages by permuting the destination addresses of the messages by the inverse of the first vector; and
routing the messages through the interconnection network using said routing tags.
2. A multi-stage interconnection network
comprising:
N inputs,
N outputs,
n stages of 2x2 switches where n=log2N and there are N/2 such switches in each stage,
N communication links between each pair of
successive stages, between the inputs and a first stage and between a last stage and the outputs, the inputs and outputs of said interconnection network being numbered sequentially in binary notation and the switching elements being numbered from a stage 0 to a stage n-1;
a mapping between the inputs and outputs of each interconnection network being defined by a series of
permutations performed successively on the digits of the binary notation identifying the inputs, said permutations specifying interconnection patterns established by the communication links between successive stages of the network and between the inputs and a first stage and between a last stage and the outputs, the permutations shifting each digit of the binary notation identifying the inputs into the least significant bit position at one of the stages of the multistage interconnection network; and
means for calculating routing tags for routing messages through said interconnection network to a
destination address comprising: for each bit position in the binary notation identifying the outputs of the multistage network, means for determining a first vector from the number of the stage at which the bit in that position was shifted into the leastsignificant bit position;
means for determining an inverse of the first vector; and
means for permuting the destination address with the inverse of the first vector.
PCT/US1991/005667 1990-08-10 1991-08-09 Method and apparatus for routing and partitioning a multistage interconnection network and for determining network passability WO1992003792A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US56558490A 1990-08-10 1990-08-10
US565,584 1990-08-10

Publications (1)

Publication Number Publication Date
WO1992003792A1 true WO1992003792A1 (en) 1992-03-05

Family

ID=24259272

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1991/005667 WO1992003792A1 (en) 1990-08-10 1991-08-09 Method and apparatus for routing and partitioning a multistage interconnection network and for determining network passability

Country Status (1)

Country Link
WO (1) WO1992003792A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0676703A2 (en) * 1994-04-04 1995-10-11 International Business Machines Corporation A technique for accomplishing deadlock free routing through a multi-stage cross-point packet switch

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4523273A (en) * 1982-12-23 1985-06-11 Purdue Research Foundation Extra stage cube
US4968977A (en) * 1989-02-03 1990-11-06 Digital Equipment Corporation Modular crossbar interconnection metwork for data transactions between system units in a multi-processor system
US5050069A (en) * 1987-04-27 1991-09-17 Thinking Machines Corporation Method and apparatus for simulating m-dimension connection networks in and n-dimension network where m is less than n

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4523273A (en) * 1982-12-23 1985-06-11 Purdue Research Foundation Extra stage cube
US5050069A (en) * 1987-04-27 1991-09-17 Thinking Machines Corporation Method and apparatus for simulating m-dimension connection networks in and n-dimension network where m is less than n
US4968977A (en) * 1989-02-03 1990-11-06 Digital Equipment Corporation Modular crossbar interconnection metwork for data transactions between system units in a multi-processor system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0676703A2 (en) * 1994-04-04 1995-10-11 International Business Machines Corporation A technique for accomplishing deadlock free routing through a multi-stage cross-point packet switch
EP0676703A3 (en) * 1994-04-04 1996-02-07 Ibm A technique for accomplishing deadlock free routing through a multi-stage cross-point packet switch.

Similar Documents

Publication Publication Date Title
US5299317A (en) Method and apparatus for simulating an interconnection network
Lee On the Rearrangeability of 2 (Iog 2 N)–1 Stage Permutation Networks
Lee A new Benes network control algorithm
Wu et al. The reverse-exchange interconnection network
US3812467A (en) Permutation network
US5471580A (en) Hierarchical network having lower and upper layer networks where gate nodes are selectively chosen in the lower and upper layer networks to form a recursive layer
Raghavendra et al. On self-routing in Beneš and shuffle-exchange networks
EP0398543B1 (en) Concurrent multi-stage network control method
JP4804829B2 (en) circuit
EP0397370A2 (en) Network control arrangement for processing a plurality of connection requests
KR970062893A (en) Data processing device and data processing method
EP1125216A2 (en) Digital processing device
US4495590A (en) PLA With time division multiplex feature for improved density
US4162534A (en) Parallel alignment network for d-ordered vector elements
Lee et al. A new decomposition algorithm for rearrangeable Clos interconnection networks
Karol Optical interconnection using shufflenet multihop networks in multi-connected ring topologies
Seo et al. The composite banyan network
WO1992003792A1 (en) Method and apparatus for routing and partitioning a multistage interconnection network and for determining network passability
Das et al. Isomorphism of conflict graphs in multistage interconnection networks and its application to optimal routing
CN105260162A (en) Vector arrangement circuit and vector processor
Nassimi et al. A self routing benes network
Chakrabarty et al. Matrix-based nonblocking routing algorithm for Beneš networks
US5142686A (en) Multiprocessor system having processors and switches with each pair of processors connected through a single switch using Latin square matrix
US4685128A (en) Method and network for transmitting addressed signal samples from any network input to an addressed network output
US6128719A (en) Indirect rotator graph network

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IT LU NL SE

NENP Non-entry into the national phase

Ref country code: CA