WO2017158027A1 - Optical switch architecture - Google Patents

Optical switch architecture Download PDF

Info

Publication number
WO2017158027A1
WO2017158027A1 PCT/EP2017/056129 EP2017056129W WO2017158027A1 WO 2017158027 A1 WO2017158027 A1 WO 2017158027A1 EP 2017056129 W EP2017056129 W EP 2017056129W WO 2017158027 A1 WO2017158027 A1 WO 2017158027A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
switches
array
spine
leaf
Prior art date
Application number
PCT/EP2017/056129
Other languages
French (fr)
Inventor
Thomas Schrans
Cyriel Minkenberg
Nathan Farrington
Andrew Rickman
Original Assignee
Rockley Photonics Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/072,314 external-priority patent/US9706276B2/en
Priority claimed from PCT/GB2016/051127 external-priority patent/WO2016170357A1/en
Priority claimed from GB1611433.2A external-priority patent/GB2549156B/en
Priority claimed from PCT/EP2016/076755 external-priority patent/WO2017077093A2/en
Application filed by Rockley Photonics Limited filed Critical Rockley Photonics Limited
Priority to GB1816669.4A priority Critical patent/GB2564354B/en
Publication of WO2017158027A1 publication Critical patent/WO2017158027A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0005Switch and router aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0005Switch and router aspects
    • H04Q2011/0037Operation
    • H04Q2011/005Arbitration and scheduling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0005Switch and router aspects
    • H04Q2011/0052Interconnection of switches
    • H04Q2011/0056Clos

Definitions

  • the present invention relates to optoelectronic switches, and in particular to the topology according to which the constituent switching elements are arranged within that switch.
  • Large-scale packet switches can be built in a scalable fashion from smaller switching elements by connecting the switching elements according to the interconnection pattern of a given network topology.
  • network topologies are Folded Clos networks (also called k-ar /7-trees), Torus (also called k-ary /7-cubes) and "RPFabric topologies" such as those topologies disclosed in PCT/GB2016/051127.
  • an example of a known network topology is a Folded Clos topology.
  • an Z-dimensional Folded Clos topology made up of switching elements having a radix R, the maximum number of endp given by:
  • an RPFabric topology having L dimensions has a maximum number of endpoints given by:
  • embodiments of the present invention provide an optoelectronic switch architecture which provides incremental network scalability while minimizing the number of unused ports on the constituent switching elements.
  • embodiments of the present invention achieve this by utilizing the concept of link bundling (also known as “link aggregation”, “parallel linking", or “link trunking”).
  • Link bundling is a technique wherein two or more physical ports on a given switching element are treated equivalently in terms of packet forwarding, which allows more generalized topologies, leading to greater efficiency at a finer granularity of switch configurations.
  • the signals transferred from the input to the output device may be either optical or electronic signals, since it is not this feature which is at the heart of embodiments of the invention, rather it is the arrangement of switching elements within the optoelectronic switch which achieves the advantageous technical effects. This is described in greater detail in the remainder of the application.
  • a first aspect of the present invention provides an optoelectronic switch for transferring a signal from an input device to an output device, the optoelectronic switch including:
  • each leaf switch is a member of L sub-arrays, each of the L sub-arrays associated with a different one of the L dimensions, and including:
  • each leaf switch having C client ports for connecting to an input device or an output device, and F fabric ports for connecting to the spine switches,
  • each of the L sub-arrays being connected to a plurality of S j spine switches, each having R fabric ports for connecting to the fabric ports of the leaf switches; and wherein, in a given sub-array associated with the reduced dimension, the spine switches each have:
  • link bundling By providing more than one connection between a given spine switch and leaf switch ("link bundling"), the two connections or links are treated equivalently in terms of e.g. packet forwarding.
  • packet forwarding As discussed in the Background section above, when a reduced-size or underpopulated dimension is used in a switch architecture employing constant-radix switching elements, there are unused ports leading to inefficiency.
  • the same connectivity i.e. in terms of bisection bandwidth and path diversity can be achieved by reducing the number of spine switches and employing link bundling.
  • the number of leaf switches in a given sub-array is therefore greater than the number of spine switches connected to that sub-array.
  • any leaf switch i.e. a source leaf switch
  • any other leaf switch in the array i.e. a final destination leaf switch
  • hop is the transfer of a signal from one leaf switch in a sub- array to another leaf-switch, which is in the same array, the transfer taking place via a spine switch connected to the array.
  • hop is the transfer of a signal from one leaf switch in a sub- array to another leaf-switch, which is in the same array, the transfer taking place via a spine switch connected to the array.
  • hop is the transfer of a signal from one leaf switch in a sub- array to another leaf-switch, which is in the same array, the transfer taking place via a spine switch connected to the array.
  • This is possible because the leaf switches are able to act as intermediate switching elements, which can forward a signal coming into one of its "fabric ports, to another of its own fabric ports. This internal forwarding may be performed by an integrated switch inside the leaf switch, e.g.
  • data can perform a hop from one leaf switch to another leaf switch (via a spine switch), and then an internal electronic hop within the leaf switch to another fabric port, and then a second hop, along a different dimension (i.e. in a different sub-array of which the (intermediate) leaf switch is also a member). This process may be repeated up to L times, until the data reaches the final destination leaf switch, wherein it is then transferred to an output device, via a client port on that leaf switch.
  • Optoelectronic switches include a plurality of sub-arrays, and more specifically, the number of sub-arrays associated with each dimension (i.e. the y ' -th dimension) is given by the product of the sizes of all the dimensions bar the dimension in question, or:
  • the total number of sub-arrays in the whole optoelectronic switch is given by the sum of the number of sub-arrays for each dimension, over all L dimensions:
  • the layout or structure i.e. the interconnectivity between the spine switches and the leaf switches is identical or substantially identical for each sub-array associated with a given dimension.
  • Such embodiments are easier to manufacture, since only one dimension is reduced in size.
  • the layout or structure of all of the sub-arrays associated with the dimensions having equal size may be identical or substantially identical.
  • the control process i.e. to determine the path which a given signal takes when traversing a given sub-array, is simplified as the same process can be applied to a plurality of sub-arrays, and a bespoke control process is not required for switching in different dimensions. Details of the methods by which the switching may be controlled may be found later in the application.
  • the aggregate client port bandwidth per leaf switch is equal to the fabric port bandwidth available per dimension. Thus, if all of the ports on the switching element have the same bandwidth, this means that one fabric port per dimension should be provided for each client port.
  • the switching elements may be oversubscribed, i.e., there may be fewer than one fabric port per dimension for each client port, or the switching elements may be overprovisioned, i.e., there may be more than one fabric port per dimension for each client port.
  • the value of R is a number which is evenly divisible by 2, 3, 4, 5 or 6. In a subset of these embodiments, the value of R is divisible by more than one of 2, 3, 4, 5, and 6. For example, R may be equal to 12, 24, 30, 36, or 60.
  • the number of unused ports is minimized, where "unused" refers to fabric ports on the spine switches which are not connected to any fabric ports on any other spine switches or leaf switches (though spine switches may in any event not be connected to other spine switches). Accordingly, in some embodiments, for a given sub- array associated with the reduced dimension, all of the fabric ports included on the plurality of Sj spine switches connected to the sub-array are connected to a fabric port on a leaf switch in that sub-array.
  • each of the fabric ports included on the plurality ofS j spine switches may not always be possible to arrange for each of the fabric ports included on the plurality ofS j spine switches to be connected to a respective fabric port on a leaf switch in that sub-array.
  • At least one spine switch of the plurality of Sj spine switches connected to the sub-array has a plurality of connections to each of the ff j leaf switches.
  • one of the spine switches may have two or three connections to each of the leaf switches.
  • all of the spine switches connected to the sub-array may have a plurality, e.g. two or three, connections to each of the leaf switches in the sub-array. The greater the extent to which the reduced dimension is reduced in size relative to the other dimensions, the greater the number of connections which the spine switches may have to each of the leaf switches.
  • At least one spine switch, or alternatively each spine switch, connected to a given sub-array (associated with the reduced dimension) may have the same number of connections to each leaf switch in the array. This is possible when the number of client ports per leaf switch on the sub-array is divisible by the number of spine switches connected to the sub-array, with integer result. Such embodiments have a high degree of topological regularity, and therefore associated advantages in terms of routing and load balancing. In other embodiments, the number of connections may not be uniform across all of the leaf switches. This is the case when the number of client ports per leaf switch on the sub-array is not divisible by the number of spine switches connected to the sub-array. In these cases, each spine switch connected to a given sub-array associated with the reduced dimension may have:
  • the first number is the same for all of the spine switches
  • the second number is the same for all of the spine switches
  • the first number is greater than the second number
  • the first subset of leaf switches is disjoint from the second subset of leaf switches.
  • “disjoint” means that, for a given spine switch, the first subset and the second subset of leaf switches have no members in common.
  • the constituents of the first and second subset of leaf switches for one spine switch may be different from the constituents of the first and second subset of leaf switches for another spine switch, as long as there are the same numbers of leaf switches in each.
  • These groups of connections may be referred to as "bundles" or "link bundles", and may contain one connection.
  • the first number is greater than the second number by one. By having the first number and the second number as close as possible, the degree of topological regularity is maximized for those embodiments in which it is not possible to have equal numbers of connections to each leaf switch.
  • the one connection may be a bidirectional connection, which may be in the form of a single cable or wire containing two bundled optical fibres, in other words a bidirectional connection providing physical media allowing full-duplex communication.
  • a bidirectional connection may be in the form of a single cable or wire containing two bundled optical fibres, in other words a bidirectional connection providing physical media allowing full-duplex communication.
  • the spine switches connected to the sub-array are divided into a first subset and a second subset which is disjoint from the first subset, wherein:
  • each of the spine switches in the first subset of spine switches has:
  • each of the spine switches in the second subset of spine switches has:
  • the first number is the same for all of the spine switches connected to the first subset of spine switches
  • the second number is the same for all of the spine switches connected to the first subset of spine switches
  • the first number is greater than the second number
  • the third number is greater than the fourth number.
  • first and second subset of spine switches are "disjoint", this means that no spine switch is a member of both.
  • the second subset of leaf switches is disjoint from the first subset, with respect to each spine switch in the first subset of spine switches" this means that for a given spine switch in the first subset of spine switches, the first and second subset of leaf switches have no members in common.
  • a leaf switch which is in the first subset for a first spine switch in the first subset of spine switches to be in the following:
  • the first number may be greater than the second number by one, and/or the third number may be greater than the fourth number by one.
  • a spine switch may connect to leaf switches in more than one sub-array. More specifically, embodiments of a second aspect of the present invention provide an optoelectronic switch for transferring a signal from an input device to an output device, the optoelectronic switch including:
  • each leaf switch having an associated Z-tuple of coordinates (x ⁇ , 3 ⁇ 4) giving its location with respect to each of the L dimensions;
  • each leaf switch is a member of L sub-arrays, each being associated with a different one of the L dimensions, and including R leaf switches whose coordinates differ only in respect of the ith dimension, each sub-array further connected to a spine switch connected to all of the leaf switches in the sub-array, wherein for a given sub-array:
  • each leaf switch in the sub-array has:
  • C client ports each for connecting to an input device or an output device
  • the spine switch has fabric ports for connecting to fabric ports of leaf switches, and
  • the spine switch has connections to:
  • At least one leaf switch in a second sub-array associated with the same dimension as the first sub-array.
  • a single spine switch may connect to all leaf switches in a plurality of sub-arrays, each sub-array associated with the same dimension.
  • additional connectivity since the consolidated spines permit movement along two dimensions in a single hop. This can therefore also shorten the average path length (where the path length is the smallest number of hops that may be used to send a signal from a source leaf switch to its final destination leaf switch).
  • Each "hop" is a transfer of data directly between two switches. For example, if a packet of data is sent from a first leaf switch to a first spine switch, and from there to a second leaf switch, the packet has executed two hops.
  • one spine associated with the i th dimension can be used to connect up to x sub-arrays along a second dimension j ⁇ i.
  • "along a second dimension” does not mean that the sub-arrays are associated with a different dimension, but that a second dimension is traversed in order to connect to the sub-arrays in e.g. an adjacent sub-array. This is shown visually later in the application.
  • a spine switch connected to a first sub-array may be connected to a leaf switch (or plurality of leaves) in a second sub-array (in addition to all of the leaf switches in the first sub-array) associated with the same dimension as the first.
  • the spine switch may be connected to all leaf switches having the same co-ordinate in the dimension in question.
  • each sub-array may be connected to a plurality of spine switches, wherein each spine switch connected to a given sub-array associated with the reduced dimension may have a connection to each leaf switch in the sub-array and a plurality of connections to at least one leaf switch in the sub-array. Accordingly, any of the optional features presented above with reference to embodiments of the first aspect of the present invention may also apply to embodiments of the second aspect of the invention, to the extent that they are compatible.
  • the leaf switches may contain a packet processor configured to perform packet fragmentation, wherein packets of data having the same next destination switch module (i.e. those packets which are intended for the same leaf switch after the next hop, whether that leaf switch module be the final destination or just the next intermediate switch module in the journey of that packet of data) are arranged into frames having a predetermined size, and wherein packets of data may be split up into a plurality of packet fragments, which are then arranged in a corresponding plurality of frames.
  • one frame may contain data from more than one packet of data.
  • Each packet fragment may have its own packet fragment header which includes information at least identifying the packet to which that packet fragment originally belonged, so that the packet may be reconstructed when all of its constituent fragments reach their final destination module.
  • a first frame may include the 400B packet, and 200B of the first 800B packet, and then a second frame may include the second 800B packet and the remaining 200B of the first 800B packet. This leads to an efficiency of 100%.
  • the frames that are constructed by this process represent packets of data in their own right, and so further fragmentation may occur at intermediate switch modules, when the packet undergoes more than one hop (e.g., more than one optical hop) in order to reach the destination switch module.
  • subsequent processing of a frame may not occur until the filling proportion of a frame reaches a set or predetermined threshold, e.g. more than 80%, more than 90%, or when the frame is filled to 100%.
  • the packets may alternatively be sent for subsequent processing after a set or predetermined amount of time has elapsed. In this way, if packets of data for a given switch module cease to arrive at the packet processor, a frame which is still below the threshold filling proportion may still be sent for subsequent processing rather than lying stagnant on the packet processor.
  • the set or predetermined amount of time may be between 50 and 1000ns, or between 50 and 200ns.
  • the time interval is around approximately 100ns.
  • the packet processor may include or be associated with a transmission side memory in which to temporarily store incomplete frames during their construction.
  • the set or predetermined amount of time may be varied depending upon traffic demand; typically, the higher the rate of traffic flow, the shorter will be the set or predetermined amount of time and lower rates of traffic flow may lead to an increase in the set or predetermined amount of time.
  • the leaf switches may correspondingly include another packet processor, which may be the same as the first packet processor, or may be a different packet processor, which is arranged to recombine the packet fragments upon receiving them, to recreate the original packet of data for subsequent processing and transmission.
  • Leaf switches may be configured to operate in burst mode, in which the leaf switches send data (e.g. in the form of packets, packet fragments or frames as described above) in a series of successive bursts, each burst containing only data having the same next destination leaf switch. Each successive burst may include a frame of data having a different next destination leaf switch. Pairs of sequential bursts may be separated by a predetermined time interval between 50 and 1000ns, or between 50 and 200ns, e.g. 100ns. All of the leaf switches sending signals within a given sub-array may be able to "fire" a burst synchronously.
  • data e.g. in the form of packets, packet fragments or frames as described above
  • Each successive burst may include a frame of data having a different next destination leaf switch. Pairs of sequential bursts may be separated by a predetermined time interval between 50 and 1000ns, or between 50 and 200ns, e.g. 100ns. All of the
  • That sub-array may include an arbiter, considered to control the operation of the spine switches connected to that sub-array, based on destination information contained in the data to be transferred.
  • This control allows the provision of a route which can ensure that all data reaches its next destination leaf switch in a non-blocking fashion to minimize bottlenecking.
  • the arbiter may be connected to a packet processor in each of the leaf switches, either directly or via a controller, or the like. When, for example, a packet of data is received by a leaf switch, a request is sent by the packet processor to the arbiter. The request may optionally identify the next destination leaf switch of a given packet of data.
  • the arbiter is configured to establish a scheme which ensures that, to the greatest extent possible, each packet is able to perform its next hop.
  • the arbiter may accordingly be configured to perform a bipartite graph matching algorithm in order to calculate pairings between the inputs and outputs of the spine switches, such that each input is paired with at most one output, and vice versa.
  • there may be an arbiter associated with each spine switch which is configured to control the routing of signals from the inputs to the outputs of the spine switch.
  • Each spine with its respective arbiter may be able to operate independently of the other spine switches connected to the sub-array. Naturally, in some cases, where e.g. several leaf switches send large amounts of data all of which is intended for the same output of a given spine switch, the request cannot be met.
  • the arbiter may be configured to store information relating to requests that cannot be met, in a request queue. Then, until these requests are met, the associated data is buffered on the corresponding leaf switch, e.g. in the packet processor or in a separate memory. In this way, requests that cannot be met are delayed rather than dropped, e.g. when a local bottleneck occurs at one or more of the spine switches.
  • the arbiter maintains the state of a buffer memory or a virtual output queue (VOQ) on the leaf switches or spine switches, this state can be in the form of counters (counting e.g.
  • the route may be deduced entirely from a comparison between the coordinates of the source leaf switch and the final destination leaf switch. For example, in a process known as dimension ordered routing, the first hop may match the first coordinate of the source and final destination leaf switches, the second hop may match the second coordinate of the source and final destination leaf switches and so on, until all of the coordinates match, i.e.
  • the dimension-ordered route might be: a, b, c, d) -> ( w, b, c, d) -> ( w, x, c, d) -> ( w, x,y, d) -> ⁇ w, x, y, z).
  • the packet processor may compare the coordinates of the source leaf switch against the coordinates of the final destination leaf switch, and determine which coordinates do not yet match. Then it will decide to route along the non-matching directions, e.g. with the lowest index, or the highest index.
  • Fig. 1 shows an example sub-array of a switch in which the dimension shown is fully populated.
  • Fig. 2 shows an example sub-array of the switch of Fig. 1, in which the dimension shown is shortened, and is thus no longer fully populated.
  • Fig. 3 shows a schematic diagram of a sub-array, which may be found in switches according to embodiments of the first aspect of the present invention.
  • Fig. 4 shows a schematic diagram of a sub-array, which may be found in switches according to embodiments of the first aspect of the present invention.
  • Fig. 5 shows a schematic diagram of a sub-array, which may be found in switches according to embodiments of the first aspect of the present invention.
  • Fig. 6 shows a schematic diagram of a sub-array, which may be found in switches according to embodiments of the first aspect of the present invention.
  • Fig. 7 shows a schematic diagram of a sub-array, which may be found in switches according to embodiments of the first aspect of the present invention.
  • Fig. 8 shows a schematic diagram of a sub-array, which may be found in switches according to embodiments of the first aspect of the present invention.
  • Fig. 9 shows a schematic diagram of a sub-array, which may be found in switches according to embodiments of the first aspect of the present invention.
  • Fig. 10 shows an example of a configuration of the connections between the leaf switches in, and spine switches connected to, two different sub-arrays associated with the same dimension, according to embodiments of the second aspect of the present invention.
  • Fig. 1 shows an example of a fully-populated sub-array of a 1-dimensional array of leaf switches, though it will be apparent that the same interconnectivity may be achieved in sub-arrays of leaf switches which are part of arrays having higher dimensionality.
  • the "dimension" of an array of leaf switches connected by spine switches is one half of the diameter of the array, where the diameter is defined to be the greatest path length of the path lengths between the pairs of leaf switches in the array.
  • each spine switch has 8 fabric ports, and each of these fabric ports provides a connection to one of the 8 leaf switches. Accordingly, each leaf has 4 fabric ports, and each of these provides a connection to one of the 4 spine switches.
  • there are no unused ports on any of the spine switches because the sub-array is fully populated.
  • FIG. 2 An example demonstrating this resulting inefficiency is shown in Fig. 2.
  • each of the leaf switches is connected to each of the spine switches, and each of the spine switches is connected to each of the leaf switches.
  • each spine switch includes a maximum of one connection to each leaf switch.
  • an inefficiency arises.
  • embodiments of the present invention address this problem by rearrangement of the links over fewer spine switches. In this way, fewer spine switches are used, and they are used more efficiently.
  • Figs. 3 to 9 are best understood from a mathematical description of the architecture of embodiments of the present invention.
  • a sub-array (x L , ... , x k+1 , ... , ⁇ ) is defined by a set of leaf switches that differ only in dimension x k , and the sub-array includes this set of leaf switches and is connected to a set of spine switches each connected to all of the leaf switches in the sub-array.
  • a spine switch is said to be "connected to" a sub-array (and the sub-array is said to be connected to the spine switch) if and only if the spine switch is connected to at least one of the leaf switches in the sub-array.
  • a sub-array when a sub-array is associated with a dimension, the dimension may equivalently be said to be associated with the sub-array.
  • some leaf switches (denoted LSI, LS2 etc.) are connected to one spine switch (denoted SSI, SS2 etc.) and some are connected to two spine switches.
  • the connections which form the "second" connection between a leaf switch and a spine switch are shown in a thicker black line.
  • SS2 has a "second" connection to both LS2 and LS3.
  • the RPFabric which is employed in embodiments of the present invention includes spines and leaves in which the leaves are connected only to clients and spines, and the spines are connected only to leaves.
  • Each leaf switch provides C client ports and F fabric ports, where C + F ⁇ R.
  • Each spine switch connected to sub-arrays associated with the / th dimension provides R fabric ports, where ff j of those ports are used to connect to leaf switches within a given sub-array, and where ff j ⁇ R.
  • the numbers of unused ports per switching element is given by the following expressions:
  • R t The size of a dimension /is denoted by R t , meaning that R t leaf switches are arranged along the / th axis of the grid.
  • the total number of leaf switches equals the product of all ff j .
  • Ri ⁇ R For each R it Ri ⁇ R holds, meaning that each spine switch can be connected to all leaf switches in a given sub-array.
  • a larger value of C may be used, so that the leaf switches are oversubscribed. This may result in an optoelectronic switch that provides a larger number of client connections, possibly resulting in a reduction in performance at the client ports.
  • a larger value of F may be used, resulting in leaf switches that are overprovisioned.
  • Case 1 In this case, the answer to each of the above two questions is yes.
  • the bundling factor Z? is an integer, which means that exactly b ports from each leaf switch are connected to each spine connected to the sub-array in question. This case is illustrated in Fig. 3.
  • each spine switch are connected to each of the leaf switches. Accordingly, all 8 of the fabric ports on each spine switch are used, maximizing efficiency, especially as compared to the case shown in Fig. 2.
  • each spine switch Therefore, the 8 fabric ports on each spine switch are distributed amongst the 6 leaf switches with 1 connection to 4 of the leaf switches and 2 connections to the remaining two leaf switches. The same is true for all of the spine switches, and accordingly each leaf switch has 1 connection to each of 2 of the spine switches, and 2 connections to the third. These connections are distributed evenly so that all 8 of the fabric ports are utilized for each spine switch.
  • the spine switches labelled SSI and SS2 form the first disjoint set, and the spine switch labelled SS3 forms the second disjoint set.
  • Spine switches SSI and SS2 each have 7 used ports, and 1 unused port. Of the 7 fabric ports which provide connections to the leaf switches, there are 2 bundles of 2, and 3 bundles of 1.
  • Spine switch SS3 has 6 used ports and 2 unused ports.
  • Table 2 Constituent leaf switches of each subset, for spine switches in the first subset of spine switches of Fig. 6.
  • Table 3 Constituent leaf switches of each subset, for spine switches in the second subset of spine switches of Fig. 6.
  • Figs. 7 to 9 show examples of parts of two dimensional optoelectronic switches according to embodiments of the present invention. It must be noted that only one sub- array is shown in each of these drawings. In these drawings, the different types of connecting line represent connections from different spine switches connected to the sub- array shown.
  • each of the spine switches has 4 bundles of 2 links per leaf.
  • each leaf switch LSl-4 therefore has two links connected to each of the spine switches SSl-2, or in other words, each spine switch has 5 bundles of 2 links per leaf. Again, this falls into case 3 as described above.
  • Fig. 9 is an example of case 4.
  • Each leaf has 1 link bundle of 2 and 2 "link bundles" of 1.
  • Table 4 Constituent leaf switches of each subset, for spine switches in the first subset of spine switches of Fig. 9.
  • Table 5 Constituent leaf switches of each subset, for spine switches in the second subset of spine switches of Fig. 9.
  • Table 6 Values of various parameters, varying with the size of the reduced dimension and the number of spine switches in sub-arrays associated with that dimension.
  • Fig. 10 shows an embodiment of the second aspect of the present invention.
  • the two rows of leaf switches LS/LS* are different sub-arrays which are associated with the same dimension, which is the "horizontal" direction, when the drawing is viewed with the page oriented in landscape.
  • the array may be two-dimensional as a result of each of the spine switches being configured (e.g., programmed) to forward data along only one of the two dimensions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

An optoelectronic switch for transferring a signal from an input device for an output device, the optoelectronic switch including: a plurality of leaf switches, each having a radix R, and arranged in an L-dimensional array, in which the i-th dimension has a size R i where (i = 1, 2,..., L) and for a reduced dimension, R i is less than for all of the other dimensions, each leaf switch having an associated L-tuple of coordinates (x 1 ,..., x L ) giving its location with respect to each of the L dimensions; wherein each leaf switch is a member of L sub-arrays, each of the L sub-arrays associated with a different one of the L dimensions, and including: a plurality of R i leaf switches whose coordinates differ only in respect of the i-th dimension, each leaf switch having C client ports for connecting to an input device or an output device.

Description

OPTICAL SWITCH ARCHITECTURE
CROSS-REFERENCE TO RELATED APPLICATION(S)
[0001] The present application claims priority to and the benefit of U.S. Provisional Application 62/309,425, filed March 16, 2016, entitled "SWITCH MODULE AND
OPTOELECTRONIC SWITCH INCORPORATING THE SAME", and priority to and the benefit of U.S. Provisional Application 62/354,600, filed June 24, 2016, entitled "OPTOELECTRONIC SWITCH", and is a continuation-in-part of U.S. Patent Application 15/072,314, filed March 16, 2016, entitled "OPTOELECTRONIC SWITCH", and is a continuation in part of PCT Application PCT/EP2016/076755, filed November 4, 2016, entitled "OPTICAL SWITCH ARCHITECTURES", and is a continuation in part of PCT Application PCT/GB2016/051127, filed April 22, 2016, entitled "OPTOELECTRONIC SWITCH ARCHITECTURES", and claims priority to foreign application No. 1611433.2, filed in Great Britain on June 30, 2016, entitled "OPTOELECTRONIC SWITCH", the entire content of each of which is incorporated herein by reference.
FIELD
[0002] The present invention relates to optoelectronic switches, and in particular to the topology according to which the constituent switching elements are arranged within that switch.
BACKGROUND
[0003] Large-scale packet switches can be built in a scalable fashion from smaller switching elements by connecting the switching elements according to the interconnection pattern of a given network topology. Examples of such network topologies are Folded Clos networks (also called k-ar /7-trees), Torus (also called k-ary /7-cubes) and "RPFabric topologies" such as those topologies disclosed in PCT/GB2016/051127. These network topologies are hierarchical in nature, meaning that a given implementation of the topology having L tiers or dimensions can be extended (scaled) by adding another tier or dimension to the topology, in such a fashion that the larger (Z+l) dimension topology includes a number of identical Z-dimensional sub-topologies that are interconnected by the (Z+l)* dimension in a recursive fashion. [0004] For an important class of topologies, the maximum scale (i.e. the maximum number of endpoints or nodes) supported by the topology is determined by the radix (i.e. the number of ports) of the constituent switching elements. This is true for e.g. Folded Clos and RPFabric topologies, but not for Torus topologies. Consequently, for topologies in this class, the factor by which the maximum scale of a given topology increases when adding a dimension is also determined by the radix of the switching elements.
[0005] It may be preferable and moreover economically advantageous to implement a given network topology using switching elements that are identical. This enables important economies of scale, as the individual switching element is generally implemented as an ASIC which is costly to design, manufacture and test. Requiring different ASICs to build a switch fabric would multiply the associated monetary and temporal costs.
[0006] As mentioned above, an example of a known network topology is a Folded Clos topology. In an Z-dimensional Folded Clos topology made up of switching elements having a radix R, the maximum number of endp given by:
Figure imgf000003_0001
[0007] Similarly, an RPFabric topology having L dimensions has a maximum number of endpoints given by:
R
N = RL
L + 1
[0008] Correspondingly, adding a dimension increases the maximum scales of Folded Clos and RPFabric topologies by a factor of R/2 and R respectively. Depending on the target scale for a specific instantiation, this granularity may be too coarse, in the sense that the maximum scale for L dimensions may be just too small for the target size, whereas the maximum scale for (L+l) dimensions may be much too large. For example, consider a situation where N= 8,000 endpoints are required. First, consider the case of an RPFabric topology with R= 24 and L = 2. The maximum scale in this case is 8 x 242 = 4,608 endpoints, which is clearly too small. Adding an extra dimension leads to a network with a maximum scale of 6 x 243 = 82,944, which is greater than an order of magnitude too large, and therefore highly wasteful. [0009] This problem can be addressed by not fully populating some of the dimensions. For instance, in the example set out above, it would be possible to populate just two of the 24 possible two-dimensional sub-topologies of the three-dimensional network. This leads to topologies where the sizes or cardinalities of the dimensions may vary, thus providing finer granularity in scaling the size of the network. This principle can be applied to a single dimension, or to multiple dimensions. However, it is preferable to apply this approach to scale down an (Z+l)-dimension network to sizes that are larger than what the same topology can support with Z dimensions. Otherwise it would be more economical to use an Z-dimensional network instead. This approach can still incur some inefficiency: because the switching elements all have the same radix R, the switching elements belonging to a scaled- down dimensions have unconnected ports.
SUMMARY
[0010] Accordingly, at its most general, embodiments of the present invention provide an optoelectronic switch architecture which provides incremental network scalability while minimizing the number of unused ports on the constituent switching elements. Broadly speaking, embodiments of the present invention achieve this by utilizing the concept of link bundling (also known as "link aggregation", "parallel linking", or "link trunking"). Link bundling is a technique wherein two or more physical ports on a given switching element are treated equivalently in terms of packet forwarding, which allows more generalized topologies, leading to greater efficiency at a finer granularity of switch configurations. The signals transferred from the input to the output device may be either optical or electronic signals, since it is not this feature which is at the heart of embodiments of the invention, rather it is the arrangement of switching elements within the optoelectronic switch which achieves the advantageous technical effects. This is described in greater detail in the remainder of the application.
[0011] Throughout this application, the terms "leaf switch" and "leaf" are used
interchangeably, as are "spine switch" and "spine".
[0012] Accordingly, a first aspect of the present invention provides an optoelectronic switch for transferring a signal from an input device to an output device, the optoelectronic switch including:
a plurality of leaf switches, each having a radix R, and arranged in an Z-dimensional array, in which the Ah dimension has a size Rt where (i = 1, 2, ... , L) and for a reduced dimension, ff j is less than for all of the other dimensions, each leaf switch having an associated Z-tuple of coordinates (x1, ... , xL) giving its location with respect to each of the L dimensions;
wherein each leaf switch is a member of L sub-arrays, each of the L sub-arrays associated with a different one of the L dimensions, and including:
a plurality of ff j leaf switches whose coordinates differ only in respect of the j-t dimension, each leaf switch having C client ports for connecting to an input device or an output device, and F fabric ports for connecting to the spine switches,
each of the L sub-arrays being connected to a plurality of Sj spine switches, each having R fabric ports for connecting to the fabric ports of the leaf switches; and wherein, in a given sub-array associated with the reduced dimension, the spine switches each have:
a connection to each leaf switch in the sub-array, and
a plurality of connections to at least one leaf switch in the sub-array.
[0013] By providing more than one connection between a given spine switch and leaf switch ("link bundling"), the two connections or links are treated equivalently in terms of e.g. packet forwarding. As discussed in the Background section above, when a reduced-size or underpopulated dimension is used in a switch architecture employing constant-radix switching elements, there are unused ports leading to inefficiency. However, the same connectivity, i.e. in terms of bisection bandwidth and path diversity can be achieved by reducing the number of spine switches and employing link bundling. In certain
embodiments of the present invention, the number of leaf switches in a given sub-array is therefore greater than the number of spine switches connected to that sub-array.
[0014] Switching elements which may be used in embodiments of the present invention are described in US Patent Application No. 15/072,314.
[0015] In arrangements of spine switches and leaf switches according to embodiments of the present invention, it is possible to send data from any leaf switch (i.e. a source leaf switch) to any other leaf switch in the array (i.e. a final destination leaf switch) using a maximum of L hops (here, "hop" is the transfer of a signal from one leaf switch in a sub- array to another leaf-switch, which is in the same array, the transfer taking place via a spine switch connected to the array). This is possible because the leaf switches are able to act as intermediate switching elements, which can forward a signal coming into one of its "fabric ports, to another of its own fabric ports. This internal forwarding may be performed by an integrated switch inside the leaf switch, e.g. an electronic crossbar switch or an electronic shared-memory switch. Thus during a data transfer operation, data can perform a hop from one leaf switch to another leaf switch (via a spine switch), and then an internal electronic hop within the leaf switch to another fabric port, and then a second hop, along a different dimension (i.e. in a different sub-array of which the (intermediate) leaf switch is also a member). This process may be repeated up to L times, until the data reaches the final destination leaf switch, wherein it is then transferred to an output device, via a client port on that leaf switch.
[0016] Optoelectronic switches according to embodiments of the present invention include a plurality of sub-arrays, and more specifically, the number of sub-arrays associated with each dimension (i.e. the y'-th dimension) is given by the product of the sizes of all the dimensions bar the dimension in question, or:
L l,i≠]
Accordingly, the total number of sub-arrays in the whole optoelectronic switch is given by the sum of the number of sub-arrays for each dimension, over all L dimensions:
L
Ttotal = ^ Tj
1
[0017] In some embodiments, the layout or structure, i.e. the interconnectivity between the spine switches and the leaf switches is identical or substantially identical for each sub-array associated with a given dimension. There may only be two different sizes ff j of dimension, and in some embodiments, all but one of the dimensions may have a size Riarge and the remaining dimension has a size Rsmau, which is smaller than Riarge- F°r example e.g. in a 3- dimensional optoelectronic switch: R = R2 = Riarge = 24, and R3 = Rsmau = 2. Such embodiments are easier to manufacture, since only one dimension is reduced in size.
Having just one dimension reduced in size still provides improved granularity.
[0018] In such embodiments, the layout or structure of all of the sub-arrays associated with the dimensions having equal size may be identical or substantially identical. By having identical layouts, the control process, i.e. to determine the path which a given signal takes when traversing a given sub-array, is simplified as the same process can be applied to a plurality of sub-arrays, and a bespoke control process is not required for switching in different dimensions. Details of the methods by which the switching may be controlled may be found later in the application.
[0019] In some embodiments, the aggregate client port bandwidth per leaf switch is equal to the fabric port bandwidth available per dimension. Thus, if all of the ports on the switching element have the same bandwidth, this means that one fabric port per dimension should be provided for each client port. In other embodiments, the switching elements may be oversubscribed, i.e., there may be fewer than one fabric port per dimension for each client port, or the switching elements may be overprovisioned, i.e., there may be more than one fabric port per dimension for each client port. In some embodiments, the value of R is a number which is evenly divisible by 2, 3, 4, 5 or 6. In a subset of these embodiments, the value of R is divisible by more than one of 2, 3, 4, 5, and 6. For example, R may be equal to 12, 24, 30, 36, or 60.
[0020] In some embodiments, the number of unused ports is minimized, where "unused" refers to fabric ports on the spine switches which are not connected to any fabric ports on any other spine switches or leaf switches (though spine switches may in any event not be connected to other spine switches). Accordingly, in some embodiments, for a given sub- array associated with the reduced dimension, all of the fabric ports included on the plurality of Sj spine switches connected to the sub-array are connected to a fabric port on a leaf switch in that sub-array.
[0021] Depending on the number of client ports, the radix of the switches, the number of spine switches connected to the sub-array associated with the reduced dimension, and the number of leaf switches in the sub-array, it may not always be possible to arrange for each of the fabric ports included on the plurality ofSj spine switches to be connected to a respective fabric port on a leaf switch in that sub-array.
[0022] If }/ is the number of fabric ports per leaf switch in dimension and Z^/ is the number of fabric ports per spine switch in dimension then in dimension /the total number of spine fabric ports is equal to the total number of leaf fabric ports (and unused fabric ports can be avoided) when the following constraint is met:
[0023] If Fjj = C(i.e., the leaf switches are neither oversubscribed nor overprovisioned) and Fsi = R, the constraint is met (and it is possible to avoid unused fabric ports) if and only if Sjff = Cff j, in which Sj, R, C and ff j are integer values, having the same meanings as described above. In this way, bisection bandwidth can be maintained (i.e. StR≥ Cff j) while all of the fabric ports are connected to a fabric port on a leaf switch, to provide connectivity therebetween. In some embodiments, in a given sub-array associated with the reduced dimension, at least one spine switch of the plurality of Sj spine switches connected to the sub-array has a plurality of connections to each of the ff j leaf switches. In other words, one of the spine switches may have two or three connections to each of the leaf switches. In some embodiments, all of the spine switches connected to the sub-array may have a plurality, e.g. two or three, connections to each of the leaf switches in the sub-array. The greater the extent to which the reduced dimension is reduced in size relative to the other dimensions, the greater the number of connections which the spine switches may have to each of the leaf switches.
[0024] If C > Fjj (i.e., the leaf switches are oversubscribed) or if C< Fu (i.e., the leaf switches are overprovisioned), then the constraint to be met to make it possible to avoid having unused ports may instead be S/ R = F Ri. If each spine switch is connected to more than one sub-array (see, e.g., Fig. 10), then Fsi < R, and the constraint to be met to make it possible to avoid having unused ports (if the leaf switches are neither oversubscribed nor overprovisioned) may instead be Si Fsi = CRi. In the remainder of the present disclosure it is assumed, except where otherwise stated (e.g., in the context of Fig. 10), that each spine switch is connected to only one sub-array, and that the leaf switches are neither
oversubscribed nor overprovisioned.
[0025] In some embodiments, at least one spine switch, or alternatively each spine switch, connected to a given sub-array (associated with the reduced dimension) may have the same number of connections to each leaf switch in the array. This is possible when the number of client ports per leaf switch on the sub-array is divisible by the number of spine switches connected to the sub-array, with integer result. Such embodiments have a high degree of topological regularity, and therefore associated advantages in terms of routing and load balancing. In other embodiments, the number of connections may not be uniform across all of the leaf switches. This is the case when the number of client ports per leaf switch on the sub-array is not divisible by the number of spine switches connected to the sub-array. In these cases, each spine switch connected to a given sub-array associated with the reduced dimension may have:
a first number of connections to each of a first subset of leaf switches;
a second number of connection(s) to each of a second subset of leaf switches;
wherein: the first number is the same for all of the spine switches, the second number is the same for all of the spine switches, the first number is greater than the second number, and
for each spine switch, the first subset of leaf switches is disjoint from the second subset of leaf switches.
[0026] It should be noted that it is not necessary that there are a plurality of connections between each spine switch and each leaf switch. In other words, the second number may be exactly one.
[0027] Here, "disjoint" means that, for a given spine switch, the first subset and the second subset of leaf switches have no members in common. However, the constituents of the first and second subset of leaf switches for one spine switch may be different from the constituents of the first and second subset of leaf switches for another spine switch, as long as there are the same numbers of leaf switches in each. For example, for one spine switch, there may be three connections to each of a subset of (i.e. containing) two leaf switches, and one connection to each of a subset of (i.e. containing) four leaf switches. These groups of connections may be referred to as "bundles" or "link bundles", and may contain one connection. In some embodiments, the first number is greater than the second number by one. By having the first number and the second number as close as possible, the degree of topological regularity is maximized for those embodiments in which it is not possible to have equal numbers of connections to each leaf switch.
[0028] Embodiments among those described above that fulfil the criterion wherein
Sjff = Cff j may have no unused ports. However, this is not possible with all arrangements. In some embodiments Sjff > Cff j, and according there are U = Sjff — CRt unused ports. Note that if Sjff is less than Cff j, bisection bandwidth may not be preserved. Even though there are some unused fabric ports in the sub-arrays which are associated with the reduced dimension, the number of unused ports is still reduced relative to configurations in which there is a maximum of one connection between each leaf switch and spine switch. The one connection may be a bidirectional connection, which may be in the form of a single cable or wire containing two bundled optical fibres, in other words a bidirectional connection providing physical media allowing full-duplex communication. [0029] It may be possible to maintain both efficiency and topological regularity, even with unused ports in a given sub-array associated with the reduced dimension. In particular, when (as above) the number of client ports is exactly divisible by the number of spine switches connected to the sub-array, the connections and the unused ports can be spread evenly across the spine switches. In other words, each of the spine switches connected to a given sub-array associated with the reduced dimension may have the same number of unused ports, given by U /St.
[0030] In other cases, e.g., where Sjff > Cffj and C/Sj is not an integer, it is still possible to maximize both the efficiency and topological regularity by adopting a configuration wherein for a given sub-array associated with the reduced dimension:
the spine switches connected to the sub-array are divided into a first subset and a second subset which is disjoint from the first subset, wherein:
each of the spine switches in the first subset of spine switches has:
a first number of connections to each of a first subset of leaf switches in the sub-array;
a second number of connections to each of a second subset of leaf switches in the sub-array, the second subset of leaf switches being disjoint from the first subset, with respect to each spine switch connected to the first subset of spine switches;
each of the spine switches in the second subset of spine switches has:
a third number of connections to each of a third subset of leaf switches in the same array;
a fourth number of connections to each of a fourth subset of leaf switches in the sub-array, the fourth subset of leaf switches being disjoint from the third subset, with respect to each spine switch connected to the second subset of spine switches;
and wherein:
the first number is the same for all of the spine switches connected to the first subset of spine switches;
the second number is the same for all of the spine switches connected to the first subset of spine switches;
the third number is the same for all of the spine switches connected to the second subset of spine switches; the fourth number is the same for all of the spine switches connected to the second subset of spine switches;
the first number is greater than the second number, and
the third number is greater than the fourth number.
[0031] Here when the first and second subset of spine switches are "disjoint", this means that no spine switch is a member of both. Similarly, when "the second subset of leaf switches is disjoint from the first subset, with respect to each spine switch in the first subset of spine switches", this means that for a given spine switch in the first subset of spine switches, the first and second subset of leaf switches have no members in common.
However, it is possible for a leaf switch which is in the first subset for a first spine switch in the first subset of spine switches to be in the following:
• The first subset for a first spine switch in the second subset of spine switches, or
• The second subset of a second spine switch in the first subset of spine switches, or
• The second subset of a second spine switch in the second subset of spine switches.
[0032] The same definition of "disjoint" applies for the third and fourth subset of leaf switches. This is explained in detail with reference to the drawings later on in the application. For the same reasons as above, the first number may be greater than the second number by one, and/or the third number may be greater than the fourth number by one.
[0033] The principle of combining spine switches with unused ports can not only be applied with parallel spines connected to the same sub-array. Accordingly, at its most general, in embodiments of a second aspect of the present invention, instead of combining parallel spine switches connected to the same sub-array, a spine switch may connect to leaf switches in more than one sub-array. More specifically, embodiments of a second aspect of the present invention provide an optoelectronic switch for transferring a signal from an input device to an output device, the optoelectronic switch including:
a plurality of leaf switches of radix ^arranged in an Z-dimensional array, the ith dimension having a size R, (i = 1, 2, L), each leaf switch having an associated Z-tuple of coordinates (x\, ¾) giving its location with respect to each of the L dimensions;
wherein each leaf switch is a member of L sub-arrays, each being associated with a different one of the L dimensions, and including R leaf switches whose coordinates differ only in respect of the ith dimension, each sub-array further connected to a spine switch connected to all of the leaf switches in the sub-array, wherein for a given sub-array:
each leaf switch in the sub-array has:
C client ports, each for connecting to an input device or an output device;
"fabric ports, for connecting to the spine switch,
the spine switch has fabric ports for connecting to fabric ports of leaf switches, and
the spine switch has connections to:
each leaf switch in a first sub-array, and
at least one leaf switch in a second sub-array, associated with the same dimension as the first sub-array.
[0034] In some embodiments, a single spine switch may connect to all leaf switches in a plurality of sub-arrays, each sub-array associated with the same dimension. In addition to reducing the total number of spine switches, such embodiments of the second aspect of the present invention introduce additional connectivity, since the consolidated spines permit movement along two dimensions in a single hop. This can therefore also shorten the average path length (where the path length is the smallest number of hops that may be used to send a signal from a source leaf switch to its final destination leaf switch). Each "hop" is a transfer of data directly between two switches. For example, if a packet of data is sent from a first leaf switch to a first spine switch, and from there to a second leaf switch, the packet has executed two hops. Given a spine switch having a radix R and a given dimension for which Rt < R (i.e. a reduced dimension), with x = — , one spine associated with the ith dimension can be used to connect up to x sub-arrays along a second dimension j≠ i. As used herein, "along a second dimension", does not mean that the sub-arrays are associated with a different dimension, but that a second dimension is traversed in order to connect to the sub-arrays in e.g. an adjacent sub-array. This is shown visually later in the application. [0035] The number of unused ports per spine is given by U = R— xff j. It is also possible to partially combine sub-arrays. For example, in some embodiments, a spine switch connected to a first sub-array may be connected to a leaf switch (or plurality of leaves) in a second sub-array (in addition to all of the leaf switches in the first sub-array) associated with the same dimension as the first. The spine switch may be connected to all leaf switches having the same co-ordinate in the dimension in question. It is noted that embodiments of the first and second aspects of the present invention may be combined, and each sub-array may be connected to a plurality of spine switches, wherein each spine switch connected to a given sub-array associated with the reduced dimension may have a connection to each leaf switch in the sub-array and a plurality of connections to at least one leaf switch in the sub-array. Accordingly, any of the optional features presented above with reference to embodiments of the first aspect of the present invention may also apply to embodiments of the second aspect of the invention, to the extent that they are compatible.
[0036] The following optional features are compatible with embodiments of both the first and second aspects of the present invention.
[0037] The leaf switches may contain a packet processor configured to perform packet fragmentation, wherein packets of data having the same next destination switch module (i.e. those packets which are intended for the same leaf switch after the next hop, whether that leaf switch module be the final destination or just the next intermediate switch module in the journey of that packet of data) are arranged into frames having a predetermined size, and wherein packets of data may be split up into a plurality of packet fragments, which are then arranged in a corresponding plurality of frames. Optionally, one frame may contain data from more than one packet of data. Each packet fragment may have its own packet fragment header which includes information at least identifying the packet to which that packet fragment originally belonged, so that the packet may be reconstructed when all of its constituent fragments reach their final destination module.
[0038] For example, consider the case where the packet processor is configured so that the frame payload size is 1000B, and three packets of 400B, 800B and 800B are input into the switch module. If each of these were to be sent in separate frames, of one packet each, this would represent an efficiency of (400 + 800 + 800)/3000 = 67%. However, by using packet fragmentation, a first frame may include the 400B packet, and 200B of the first 800B packet, and then a second frame may include the second 800B packet and the remaining 200B of the first 800B packet. This leads to an efficiency of 100%. The frames that are constructed by this process represent packets of data in their own right, and so further fragmentation may occur at intermediate switch modules, when the packet undergoes more than one hop (e.g., more than one optical hop) in order to reach the destination switch module.
[0039] In order to maximize efficiency, subsequent processing of a frame (e.g. forwarding said frame to be converted into a first plurality of optical signals) may not occur until the filling proportion of a frame reaches a set or predetermined threshold, e.g. more than 80%, more than 90%, or when the frame is filled to 100%. The packets may alternatively be sent for subsequent processing after a set or predetermined amount of time has elapsed. In this way, if packets of data for a given switch module cease to arrive at the packet processor, a frame which is still below the threshold filling proportion may still be sent for subsequent processing rather than lying stagnant on the packet processor. The set or predetermined amount of time may be between 50 and 1000ns, or between 50 and 200ns. In some embodiments, the time interval is around approximately 100ns. Accordingly, the packet processor may include or be associated with a transmission side memory in which to temporarily store incomplete frames during their construction. The set or predetermined amount of time may be varied depending upon traffic demand; typically, the higher the rate of traffic flow, the shorter will be the set or predetermined amount of time and lower rates of traffic flow may lead to an increase in the set or predetermined amount of time. The leaf switches may correspondingly include another packet processor, which may be the same as the first packet processor, or may be a different packet processor, which is arranged to recombine the packet fragments upon receiving them, to recreate the original packet of data for subsequent processing and transmission.
[0040] Leaf switches may be configured to operate in burst mode, in which the leaf switches send data (e.g. in the form of packets, packet fragments or frames as described above) in a series of successive bursts, each burst containing only data having the same next destination leaf switch. Each successive burst may include a frame of data having a different next destination leaf switch. Pairs of sequential bursts may be separated by a predetermined time interval between 50 and 1000ns, or between 50 and 200ns, e.g. 100ns. All of the leaf switches sending signals within a given sub-array may be able to "fire" a burst synchronously.
[0041] To control the switching of data by the spine switches connected to a given sub- array, that sub-array may include an arbiter, considered to control the operation of the spine switches connected to that sub-array, based on destination information contained in the data to be transferred. This control allows the provision of a route which can ensure that all data reaches its next destination leaf switch in a non-blocking fashion to minimize bottlenecking. The arbiter may be connected to a packet processor in each of the leaf switches, either directly or via a controller, or the like. When, for example, a packet of data is received by a leaf switch, a request is sent by the packet processor to the arbiter. The request may optionally identify the next destination leaf switch of a given packet of data. The arbiter is configured to establish a scheme which ensures that, to the greatest extent possible, each packet is able to perform its next hop. The arbiter may accordingly be configured to perform a bipartite graph matching algorithm in order to calculate pairings between the inputs and outputs of the spine switches, such that each input is paired with at most one output, and vice versa. In such embodiments, there may be an arbiter associated with each spine switch, which is configured to control the routing of signals from the inputs to the outputs of the spine switch. Each spine with its respective arbiter may be able to operate independently of the other spine switches connected to the sub-array. Naturally, in some cases, where e.g. several leaf switches send large amounts of data all of which is intended for the same output of a given spine switch, the request cannot be met.
Accordingly, the arbiter may be configured to store information relating to requests that cannot be met, in a request queue. Then, until these requests are met, the associated data is buffered on the corresponding leaf switch, e.g. in the packet processor or in a separate memory. In this way, requests that cannot be met are delayed rather than dropped, e.g. when a local bottleneck occurs at one or more of the spine switches. In other words, the arbiter maintains the state of a buffer memory or a virtual output queue (VOQ) on the leaf switches or spine switches, this state can be in the form of counters (counting e.g. the number of packets or bytes per VOQ), or in the form of FIFOs (first-in, first-out) that store packet descriptors. However, the actual packets themselves remain stored on the leaf switch(es) rather than at the arbiter. [0042] When it is necessary for a packet to perform more than one hop in order to reach its final destination leaf switch, the route may be deduced entirely from a comparison between the coordinates of the source leaf switch and the final destination leaf switch. For example, in a process known as dimension ordered routing, the first hop may match the first coordinate of the source and final destination leaf switches, the second hop may match the second coordinate of the source and final destination leaf switches and so on, until all of the coordinates match, i.e. until the packet has been transferred to the final destination leaf switch. For example, in a four-dimensional network, if the source leaf switch were to have coordinates a, b, c, d) and the final destination leaf switch were to have coordinates ( w, x, y, z), then the dimension-ordered route might be: a, b, c, d) -> ( w, b, c, d) -> ( w, x, c, d) -> ( w, x,y, d) -> { w, x, y, z). At any point along the route, the packet processor may compare the coordinates of the source leaf switch against the coordinates of the final destination leaf switch, and determine which coordinates do not yet match. Then it will decide to route along the non-matching directions, e.g. with the lowest index, or the highest index.
BRIEF DESCRIPTION OF THE DRAWINGS
[0043] Fig. 1 shows an example sub-array of a switch in which the dimension shown is fully populated.
[0044] Fig. 2 shows an example sub-array of the switch of Fig. 1, in which the dimension shown is shortened, and is thus no longer fully populated.
[0045] Fig. 3 shows a schematic diagram of a sub-array, which may be found in switches according to embodiments of the first aspect of the present invention.
[0046] Fig. 4 shows a schematic diagram of a sub-array, which may be found in switches according to embodiments of the first aspect of the present invention.
[0047] Fig. 5 shows a schematic diagram of a sub-array, which may be found in switches according to embodiments of the first aspect of the present invention.
[0048] Fig. 6 shows a schematic diagram of a sub-array, which may be found in switches according to embodiments of the first aspect of the present invention. [0049] Fig. 7 shows a schematic diagram of a sub-array, which may be found in switches according to embodiments of the first aspect of the present invention.
[0050] Fig. 8 shows a schematic diagram of a sub-array, which may be found in switches according to embodiments of the first aspect of the present invention.
[0051] Fig. 9 shows a schematic diagram of a sub-array, which may be found in switches according to embodiments of the first aspect of the present invention.
[0052] Fig. 10 shows an example of a configuration of the connections between the leaf switches in, and spine switches connected to, two different sub-arrays associated with the same dimension, according to embodiments of the second aspect of the present invention.
DETAILED DESCRIPTION
[0053] Fig. 1 shows an example of a fully-populated sub-array of a 1-dimensional array of leaf switches, though it will be apparent that the same interconnectivity may be achieved in sub-arrays of leaf switches which are part of arrays having higher dimensionality. In Fig. 1, L = 1, ff-L = R = 8, C = S1 = 4, and N = 32, as discussed in further detail below. As used herein, the "dimension" of an array of leaf switches connected by spine switches is one half of the diameter of the array, where the diameter is defined to be the greatest path length of the path lengths between the pairs of leaf switches in the array. The sub-array shown in Fig. 1 contains Ri= 8 leaf switches, each having a radix R= 8. These leaf switches are connected to each other by Si = 4 spine switches, again each having a radix Roi 8. Each spine switch has 8 fabric ports, and each of these fabric ports provides a connection to one of the 8 leaf switches. Accordingly, each leaf has 4 fabric ports, and each of these provides a connection to one of the 4 spine switches. In this example, there are no unused ports on any of the spine switches, because the sub-array is fully populated. However, as discussed earlier in the application, it is often undesirable to use a fully-populated Z-dimensional topology, since it may provide orders of magnitude too many endpoints, which can be wasteful and therefore uneconomical. As also discussed, this problem can be partially solved by under-populating a given sub-array, though this can still lead to inefficiencies in terms of unused ports. [0054] An example demonstrating this resulting inefficiency is shown in Fig. 2. In Fig. 2, L = 1, R1 = R = 8, C = S1 = 4, N = 24, and U = 8, as discussed in further detail below. Here there are R\ = 6 leaf switches, but still 5 = 4 spine switches. As with the previous case, each of the leaf switches is connected to each of the spine switches, and each of the spine switches is connected to each of the leaf switches. However, because there are only 6 leaf switches, and each spine switch has R= 8 fabric ports, 2 fabric ports on each of the spine switches remain unused. This is a direct consequence of the fact that each spine switch includes a maximum of one connection to each leaf switch. Thus, an inefficiency arises. As is described above, embodiments of the present invention address this problem by rearrangement of the links over fewer spine switches. In this way, fewer spine switches are used, and they are used more efficiently.
[0055] Figs. 3 to 9 are best understood from a mathematical description of the architecture of embodiments of the present invention. Recall that a sub-array (xL, ... , xk+1, ... , ±) is defined by a set of leaf switches that differ only in dimension xk, and the sub-array includes this set of leaf switches and is connected to a set of spine switches each connected to all of the leaf switches in the sub-array. As used herein, a spine switch is said to be "connected to" a sub-array (and the sub-array is said to be connected to the spine switch) if and only if the spine switch is connected to at least one of the leaf switches in the sub-array. As used herein, when a sub-array is associated with a dimension, the dimension may equivalently be said to be associated with the sub-array. In Figs. 3 to 6, some leaf switches (denoted LSI, LS2 etc.) are connected to one spine switch (denoted SSI, SS2 etc.) and some are connected to two spine switches. In order to represent this more clearly, the connections which form the "second" connection between a leaf switch and a spine switch are shown in a thicker black line. For example, it can be seen in Fig. 6 that SS2, has a "second" connection to both LS2 and LS3.
[0056] The RPFabric, which is employed in embodiments of the present invention includes spines and leaves in which the leaves are connected only to clients and spines, and the spines are connected only to leaves. Each leaf switch provides C client ports and F fabric ports, where C + F≤ R. Each spine switch connected to sub-arrays associated with the /th dimension provides R fabric ports, where ff j of those ports are used to connect to leaf switches within a given sub-array, and where ff j < R. The numbers of unused ports per switching element is given by the following expressions:
^leaf = R— F— C
^spine ~ R ~
[0057] The size of a dimension /is denoted by Rt, meaning that Rt leaf switches are arranged along the /th axis of the grid. The total number of leaf switches equals the product of all ffj. For each Rit Ri≤ R holds, meaning that each spine switch can be connected to all leaf switches in a given sub-array.
[0058] In embodiments of the first aspect of the invention, there are multiple spine switches along each dimension, i.e. there are multiple spine switches connected to each sub- array. The number of spine switches for the ith dimension is denoted by Si. If the leaf switches are neither overprovisioned nor oversubscribed, then:
R
C =
[L + l
R
F = LC = L
LL + U
In some embodiments, a larger value of C may be used, so that the leaf switches are oversubscribed. This may result in an optoelectronic switch that provides a larger number of client connections, possibly resulting in a reduction in performance at the client ports. In other embodiments, a larger value of F may be used, resulting in leaf switches that are overprovisioned.
[0059] Then consider a sub-array including Ri < R leaf switches. There are then
U = C(ff — Ri) unused ports in total on the set of Sj spine switches connected to that sub- array. Assuming that all sub-arrays associated with that dimension each contain the same number of leaf switches, the number of unused ports is the same for all sub-arrays associated with that dimension. Then, if C(ff — ff j) > R, then at least one spine switch may be removed without affecting the available bandwidth, as long as the existing connections are distributed over the remaining spines. This is where the concept of link bundling, and therefore the technical effect of embodiments of the present invention comes into play, and it will become apparent that four distinct cases arise, all falling within the scope of embodiments of the first aspect of the present invention. [0060] Denoting the actual number of spines for dimension / (i.e., the number of spines connected to each sub-array associated with dimension in an embodiment in which the number of spines connected to each such sub-array is the same) by Sj, the total number of ports available on the spines connected to a dimension-/sub-array is given by Sjff. The total number of ports sufficient for full bisection bandwidth is given by Cff j. In order to have zero unused ports, Sjff = Cff j. In some embodiments, however (e.g., along the reduced dimension, which includes fewer than the full number of leaf switches), it may instead be
[CR Ί
the case that Sjff > Cff j, e.g., it may be the case that Sj = -^- where the ceiling operator is used to select the smallest value of Sj that ensures that every leaf fabric port in dimension i can be used, i.e., connected to a spine fabric port. Thus, there may still be some unused ports, given by U = Sjff — Cff j, across the spine switches connected to the same sub-array. Even if there are unused ports, embodiments of the present invention still provide an arrangement in which this number is minimized and the spine switches are utilized in as efficient a manner as possible.
[0061] Four cases may be identified, one corresponding to each of Figs. 3 to 6, each of which include switching elements (i.e. leaf switches and spine switches) all having a radix ff = 8, and with C = 4 client ports, all having the same bandwidth. The following examples all show 1-dimensional cases, but the same principle applies equivalently where the sub-array of leaf switches in question is just one sub-array from an array having higher dimensionality.
[0062] The case to which a given configuration of switching elements belongs can be determined by the following two criteria:
• Does Sjff = Cff j hold, so that it is possible to avoid unused ports?
• Can the link bundles be distributed evenly, i.e. is the bundling factor, b an integer?
[0063] Case 1: In this case, the answer to each of the above two questions is yes. The bundling factor Z?is an integer, which means that exactly b ports from each leaf switch are connected to each spine connected to the sub-array in question. This case is illustrated in Fig. 3. In Fig. 3, L = 1, R = 8, C = 4, R1 = 4, S1 = 2, U = 0, N = 16, and b1 = b2 = 2, as discussed in further detail below
[0064] Here there are ?x = 4 leaf switches in the sub-array, connected using Sx = 2 spine
Q
switches. The bundling factor b =— = 2, and therefore, it can be seen that 2 links from
Si
each spine switch are connected to each of the leaf switches. Accordingly, all 8 of the fabric ports on each spine switch are used, maximizing efficiency, especially as compared to the case shown in Fig. 2.
[0065] Case 2: In this case, Sjff = Cff j but b =— is not an integer. Thus, all of the fabric
Si
ports on all of the spine switches are used, but unlike in the previous example, the links are not distributed evenly amongst the spine switches. More specifically, there are ax bundles having b1 = = b1— 1 links in them. In
Figure imgf000022_0001
these cases, the following is true:
axbx + a2b2 = R
a1 + a2 = Ri
[0066] It can then be shown that: a = R — b2Rt and a2 = b-^Ri— R.
[0067] This is illustrated in Fig. 4, in which there are S1 = 3 spine switches providing connectivity between R1 = 6 leaf switches. Again Sjff = CRt (in Fig. 4 Sjff = CRt = 24), but
C 4
in this case b =— = - which is non-integer. Using the expressions defined above, it can be seen that a = 2, b = 2, a2 = 4, b2 = 1. In Fig. 4, L = 1, R = 8, C = 4, U = 0, and N = 16.
[0068] Therefore, the 8 fabric ports on each spine switch are distributed amongst the 6 leaf switches with 1 connection to 4 of the leaf switches and 2 connections to the remaining two leaf switches. The same is true for all of the spine switches, and accordingly each leaf switch has 1 connection to each of 2 of the spine switches, and 2 connections to the third. These connections are distributed evenly so that all 8 of the fabric ports are utilized for each spine switch. [0069] The table below sets out which leaf switches are in the first and second subsets, as described earlier in the application (accordingly, the first number is the first bundling factor, b1 = 2, and the second number, is the second bundling factor b2 = 1; and a2 represent the number of leaf switches in the first and second subset respectively):
Figure imgf000023_0001
Table 1: Constituent leaf switches of each subset, for the spine switches in Fig. 4 c
[0070] Case 3: In this case, Sjff > Cff j, but b =— is an integer value. Thus, there still remain some unused fabric ports on each of the spine switches, but these are evenly distributed among all of the spine switches. Or equivalently, b fabric ports from each leaf switch are connected to each spine connected to the given sub-array. It therefore follows that the
U CR *
number of unused ports per spine is also uniform in this case:— = R l- = R— bRi.
Si Si
[0071] This example is shown in Fig. 5, in which there are S = 2 spines providing connectivity between R1 = 3 leaf switches. Thus, there are U = Sjff — Cff j = 4 unused
Q
ports, which are distributed evenly across both of the spine switches. Thus, b =— = 2 links
Si
from each spine switch are connected to each of the leaf switches. This leaves 2 unused ports on each of the spine switches. It can be seen that this arrangement provides the optimum connectivity between the spine switches and leaf switches, and minimizes the number of unused ports. Fig. 5, L = 1, R = 8, C = 4, and N = 12.
[0072] Case 4: The final case is the most irregular, in which Sjff > Cff j and b =— is non-
Si
integer. In this case, there are some bundles with bt = links in them, and other bundles with b2 = l^— J = b — 1 links in them. Moreover, there are two disjoint sets of spines: the first set includes u spines with v1 = [—1 unused ports and the second set includes u2
Si spines with v2 = — unused ports, wherein ut = U— v2Si and u2 = vtSi— U, such that
Si.
Figure imgf000024_0001
[0073] Each spine in the first set has a = R — v — b2Ri bundles of b links, and a2 = R i — ax bundles of b2 links.
[0074] Correspondingly, each spine in the second set has a3 = R — v2— b2Ri bundles of b1 links and a = Rt — a3 bundles of b2 links.
[0075] This example is shown in Fig. 6, in which S1 = 3 spine switches are used to connect R1 = 5 leaf switches. This arrangement is less regular than the previous three cases, but still presents a reduction in the number of unused ports, though the use of link bundling. The spine switches labelled SSI and SS2 form the first disjoint set, and the spine switch labelled SS3 forms the second disjoint set. Spine switches SSI and SS2 each have 7 used ports, and 1 unused port. Of the 7 fabric ports which provide connections to the leaf switches, there are 2 bundles of 2, and 3 bundles of 1. Spine switch SS3 has 6 used ports and 2 unused ports. Of the 6 fabric ports which provide connections to the leaf switches, there are 4 bundles of 1, and 1 bundle of 2. Fig. 6, L = 1, R = 8, C = 4, U = 4, v1 = 2, v2 = = l, u2 = 2, and N = 20.
[0076] The following tables set out which switches are present in which subsets, to use the terminology used earlier in the application. Accordingly, the first number and the third number are equal to the bundling factor b1 = 2, and the second number and the fourth number are equal to the bundling factor b2 = 1; u and u2 give the number of spines in each of the subsets of spine switches; ax and a2 give the number of leaf switches in the first and second subset of leaf switches respectively, and a3 and 4 give the number of leaf switches in the third and fourth subsets respectively.
Figure imgf000024_0002
Table 2: Constituent leaf switches of each subset, for spine switches in the first subset of spine switches of Fig. 6.
Figure imgf000025_0001
Table 3: Constituent leaf switches of each subset, for spine switches in the second subset of spine switches of Fig. 6.
[0077] Figs. 7 to 9 show examples of parts of two dimensional optoelectronic switches according to embodiments of the present invention. It must be noted that only one sub- array is shown in each of these drawings. In these drawings, the different types of connecting line represent connections from different spine switches connected to the sub- array shown. Figs. 7 and 8 show embodiments in which L = 2, R = 12, R1 (not shown) = 12, and R2 (which is shown) = 4. This embodiment falls into case 3 above, since
C 4
S2R = 2 x 12 is greater than CR2 = 4 x 4, but— = - = 2, which is an integer. Accordingly,
S2 2
there are U = 24— 16 unused ports across the two spine switches, i.e. 4 on each, and two connections to each spine switch SSl-2 from each leaf switch LSl-4. In other words, each of the spine switches has 4 bundles of 2 links per leaf.
[0078] In Fig. 8, R2 = 5, rather than 4, as in the previous case. Therefore, CR2 = 4 x 5 = 20, and so U = 4, i.e. there are 2 unused ports on each spine switch, but again these are
Q
evenly distributed since— = 2 still. Each leaf switch LSl-4 therefore has two links connected to each of the spine switches SSl-2, or in other words, each spine switch has 5 bundles of 2 links per leaf. Again, this falls into case 3 as described above.
[0079] Fig. 9 is an example of case 4. Here, L = 2, R = 12, ff-^not shown) = 12, and R2
C 4
(shown)= 7. Thus, Sjff = 36, Cff j = 28, giving U = 8. Since— = - is not integer valued, these unused ports are not evenly distributed across the spine switches. There is therefore an irregular connection pattern, as compared to the previous examples. The connections are as follows:
• Each leaf has 1 link bundle of 2 and 2 "link bundles" of 1.
• Spine SSI has 3 link bundles of 2 and 5 of 1 (10 links total, 2 unused)
• Spines SS2 and SS3 have 2 link bundles of 2 and 5 of 1 (2x9 links total, 2x3 unused).
[0080] As above, the following tables summarize which leaf switches fall within which subset, as defined earlier in the application:
Figure imgf000026_0001
Table 4: Constituent leaf switches of each subset, for spine switches in the first subset of spine switches of Fig. 9.
Figure imgf000026_0002
Table 5: Constituent leaf switches of each subset, for spine switches in the second subset of spine switches of Fig. 9.
[0082] The table below shows examples of which cases various configurations of switching elements fall into, in an optoelectronic switch having 2 dimensions (L = 2), a switching element radix R = 12, and for C = 4 client ports. In particular the values of all of the other parameters described above are shown, when a given dimension is reduced to Rt =
2, 3, ... , 12, for Sj = 1, 2, 3, spine switches.
Figure imgf000027_0001
Table 6: Values of various parameters, varying with the size of the reduced dimension and the number of spine switches in sub-arrays associated with that dimension.
[0083] Fig. 10 shows an embodiment of the second aspect of the present invention. Here, the two rows of leaf switches LS/LS* are different sub-arrays which are associated with the same dimension, which is the "horizontal" direction, when the drawing is viewed with the page oriented in landscape. Each of the spine switches here has a radix R = 8, and each of the sub-arrays has a size Rt = 4. Accordingly, to use the notation used earlier in the application x =— = 2, and so each of the spine switches SS1-3 is able to connect to all of the leaf switches in the two sub-arrays. Although in Fig. 10 each spine switch is connected to all of the leaf switches, the array may be two-dimensional as a result of each of the spine switches being configured (e.g., programmed) to forward data along only one of the two dimensions.

Claims

WHAT IS CLAIMED IS:
1. An optoelectronic switch for transferring a signal from an input device to an output device, the optoelectronic switch comprising:
an array of leaf switches, the array having L dimensions, each of the leaf switches having a radix R, Rand L being integers greater than 1, the size of the array being less, in a reduced dimension of the L dimensions, than in each of the other dimensions of the L dimensions, each of the leaf switches having an associated Z-tuple of coordinates giving its location in the array with respect to each of the L dimensions; and
a plurality of spine switches,
wherein the array consists of a plurality of overlapping sub-arrays, each of the sub- arrays being associated with a dimension of the L dimensions and consisting of
Figure imgf000029_0001
leaf switches, wherein RAs the size of the array in the dimension associated with the sub-array, the coordinates of the leaf switches of the sub-array differing only in respect of the dimension associated with the sub-array, each of the sub-arrays being connected to a respective plurality of spine switches of the plurality of spine switches,
wherein each of the leaf switches is a member of L sub-arrays of the sub-arrays, each of the L sub-arrays being associated with a different one of the L dimensions,
wherein each of the leaf switches has:
a plurality of client ports for connecting to an input device or an output device; and
a plurality of fabric ports for connecting to the spine switches,
wherein each spine switch has fabric ports for connecting to fabric ports of the leaf switches of the array, and
wherein each of the spine switches connected to a first sub-array of the plurality of sub-arrays, the first sub-array being associated with the reduced dimension, has:
a connection to each of the leaf switches in the sub-array, and
a plurality of connections to at least one of the leaf switches in the sub-array.
2. The optoelectronic switch of claim 1, wherein all of the fabric ports of each of the plurality of spine switches connected to the first sub-array are connected to a fabric port on a leaf switch in the first sub-array.
3. The optoelectronic switch of claim 1, wherein a spine switch of the plurality of spine switches connected to the first sub-array has a plurality of connections to each of the leaf switches in the first sub-array.
4. The optoelectronic switch of claim 1, wherein a spine switch of the plurality of spine
switches connected to the first sub-array has the same number of connections to every leaf switch in the first sub-array.
5. The optoelectronic switch of claim 1, wherein each of the spine switches connected to the first sub-array has the same number of connections to every leaf switch in the first sub- array.
6. The optoelectronic switch of claim 1, wherein each of the spine switches connected to a second sub-array of the plurality of sub-arrays has:
at least one connection to each leaf switch in the second sub-array;
a first number of connections to each of a first subset of the leaf switches in the second sub-array;
a second number of connections to each of a second subset of the leaf switches, disjoint from the first subset, in the second sub-array,
wherein:
the first number is the same for all of the spine switches connected to the second sub-array,
the second number is the same for all of the spine switches connected to the second sub-array, and
the first number is greater than the second number.
7. The optoelectronic switch of claim 6, wherein the first number is one greater than the second number.
8. The optoelectronic switch of claim 1, wherein at least two of the fabric ports of the plurality of spine switches connected to the first sub-array are unused.
9. The optoelectronic switch of claim 8, wherein the number of unused ports is the same on each of the spine switches connected to the first sub-array.
10. The optoelectronic switch of claim 8, wherein:
the plurality of spine switches connected to the first sub-array consists of a first subset of spine switches and a second subset of spine switches, disjoint from the first subset of spine switches;
each of the spine switches in the first subset of spine switches has:
a first number of connections to each of a first subset of leaf switches in the first sub-array; and
a second number of connections to each of a second subset of leaf switches in the first sub-array, the second subset of leaf switches being disjoint from the first subset of leaf switches;
each of the spine switches in the second subset of spine switches has:
a third number of connections to each of a third subset of leaf switches in the first sub-array; and
a fourth number of connections to each of a fourth subset of leaf switches in the first sub-array, the fourth subset of leaf switches being disjoint from the third subset of leaf switches;
and wherein:
the first number is the same for all of the spine switches in the first subset of spine switches;
the second number is the same for all of the spine switches in the first subset of spine switches;
the third number is the same for all of the spine switches in the second subset of spine switches;
the fourth number is the same for all of the spine switches in the second subset of spine switches;
the first number is greater than the second number, and the third number is greater than the fourth number.
11. The optoelectronic switch of claim 10, wherein:
the first number is one greater than the second number; and/or the third number is one greater than the second number.
12. An optoelectronic switch for transferring a signal from an input device to an output device, the optoelectronic switch comprising:
an array of leaf switches, the array having L dimensions, each of the leaf switches having a radix R, Rand L being integers greater than 1, the size of the array being less, in a reduced dimension of the L dimensions, than in each of the other dimensions of the L dimensions, each of the leaf switches having an associated Z-tuple of coordinates giving its location in the array with respect to each of the L dimensions; and
a plurality of spine switches,
wherein the array consists of a plurality of overlapping sub-arrays, each of the sub- arrays being associated with a dimension of the L dimensions and consisting of
Figure imgf000032_0001
leaf switches, wherein RAs the size of the array in the dimension associated with the sub-array, the coordinates of the leaf switches of the sub-array differing only in respect of the dimension associated with the sub-array, each of the sub-arrays being connected to a respective spine switch of the plurality of spine switches,
wherein each of the leaf switches is a member of L sub-arrays of the sub-arrays, each of the L sub-arrays being associated with a different one of the L dimensions,
wherein each of the leaf switches has:
a plurality of client ports for connecting to an input device or an output device; and
a plurality of fabric ports for connecting to the spine switches,
wherein each spine switch has fabric ports for connecting to fabric ports of the leaf switches of the array, and
wherein a first spine switch is connected to a first sub-array of the plurality of sub- arrays and has fabric ports each connected to a fabric port of a leaf switch of the array, and the spine switch has connections to:
each of the leaf switches in a first sub-array of the plurality of sub-arrays, and
a leaf switch of the leaf switches in a second sub-array of the plurality of sub- arrays, associated with the same dimension as the first sub-array.
13. The optoelectronic switch of claim 12, wherein the first spine switch is connected to a first plurality of leaf switches in a respective first plurality of sub-arrays of the plurality of sub- arrays, each associated with the same dimension, the leaf switches of the first plurality of leaf switches all having the same co-ordinate with respect to the dimension with which each sub-array of the first plurality of sub-arrays is associated.
14. The optoelectronic switch of claim 12, wherein the first spine switch is connected to a leaf switch in every sub-array, of the plurality of sub-arrays, associated with the same dimension as the first sub-array and the second sub-array.
The optoelectronic switch of claim 12, wherein each spine switch of a first plurality of spine switches connected to a second sub-array of the plurality of sub-arrays has:
a connection to each leaf switch in the second sub-array, and
a plurality of connections to at least one leaf switch in the second sub-array.
16. The optoelectronic switch of claim 15, wherein a spine switch of the first plurality of spine switches has a connection to each of the leaf switches in a first plurality of sub-arrays of the plurality of sub-arrays, each sub-array of the first plurality of sub-arrays being associated with the same dimension.
17. An optoelectronic switch for transferring a signal from an input device to an output device, the optoelectronic switch comprising:
a one-dimensional array of leaf switches; and
a plurality of spine switches,
wherein each of the spine switches has
a connection to each of the leaf switches, and
a plurality of connections to at least one of the leaf switches.
PCT/EP2017/056129 2015-11-05 2017-03-15 Optical switch architecture WO2017158027A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1816669.4A GB2564354B (en) 2015-11-05 2017-03-15 Optical switch architecture

Applications Claiming Priority (16)

Application Number Priority Date Filing Date Title
US201662309425P 2016-03-16 2016-03-16
US15/072,314 US9706276B2 (en) 2015-11-05 2016-03-16 Optoelectronic switch
US62/309,425 2016-03-16
US15/072,314 2016-03-16
GBPCT/GB2016/051127 2016-04-22
PCT/GB2016/051127 WO2016170357A1 (en) 2015-04-24 2016-04-22 Optoelectronic switch architectures
US201662354600P 2016-06-24 2016-06-24
US62/354,600 2016-06-24
GB1611433.2A GB2549156B (en) 2015-11-05 2016-06-30 Optoelectronic switch
GB1611433.2 2016-06-30
US201662364233P 2016-07-19 2016-07-19
US62/364,233 2016-07-19
EPPCT/EP2016/076755 2016-11-04
EPPCT/EP2016/076756 2016-11-04
PCT/EP2016/076755 WO2017077093A2 (en) 2015-11-05 2016-11-04 Optoelectronic switch
PCT/EP2016/076756 WO2017077094A1 (en) 2015-11-05 2016-11-04 Multi-dimensional optoelectronic switch

Publications (1)

Publication Number Publication Date
WO2017158027A1 true WO2017158027A1 (en) 2017-09-21

Family

ID=59858216

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2017/056129 WO2017158027A1 (en) 2015-11-05 2017-03-15 Optical switch architecture

Country Status (1)

Country Link
WO (1) WO2017158027A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100254703A1 (en) * 2009-04-01 2010-10-07 Kirkpatrick Peter E Optical Network for Cluster Computing
US20120250574A1 (en) * 2011-03-31 2012-10-04 Amazon Technologies, Inc. Incremental high radix network scaling

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100254703A1 (en) * 2009-04-01 2010-10-07 Kirkpatrick Peter E Optical Network for Cluster Computing
US20120250574A1 (en) * 2011-03-31 2012-10-04 Amazon Technologies, Inc. Incremental high radix network scaling

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ALEXEY ANDREYEV: "Introducing data center fabric, the next-generation Facebook data center network", 14 November 2014 (2014-11-14), pages 1 - 11, XP055339807, Retrieved from the Internet <URL:https://code.facebook.com/posts/360346274145943/introducing-%C2%AD%E2%80%90data-%C2%AD%E2%80%90center-%C2%AD%E2%80%90fabric-%C2%AD%E2%80%90the-%C2%AD%E2%80%90next-%C2%AD%E2%80%90generation-%C2%AD%E2%80%90facebook-%C2%AD%E2%80%90data-%C2%AD%E2%80%90center-%C2%AD%E2%80%90network/> [retrieved on 20170127] *
NATHAN FARRINGTON ET AL: "Data Center Switch Architecture in the Age of Merchant Silicon", HIGH PERFORMANCE INTERCONNECTS, 2009. HOTI 2009. 17TH IEEE SYMPOSIUM ON, IEEE, PISCATAWAY, NJ, USA, 25 August 2009 (2009-08-25), pages 93 - 102, XP031528533, ISBN: 978-0-7695-3847-1 *
NATHAN FARRINGTON ET AL: "Facebook's data center network architecture", 2013 OPTICAL INTERCONNECTS CONFERENCE, 1 May 2013 (2013-05-01), pages 49 - 50, XP055339437, ISBN: 978-1-4673-5062-4, DOI: 10.1109/OIC.2013.6552917 *
PADMANABHAN K ET AL: "DILATED NETWORKS FOR PHOTONIC SWITCHING", IEEE TRANSACTIONS ON COMMUNICATIONS, IEEE SERVICE CENTER, PISCATAWAY, NJ. USA, vol. 35, no. 12, 1 December 1987 (1987-12-01), pages 1357 - 1365, XP000608587, ISSN: 0090-6778, DOI: 10.1109/TCOM.1987.1096722 *

Similar Documents

Publication Publication Date Title
US10028041B2 (en) Optical switch architecture
US8605716B2 (en) Large-scale packet switch
KR100356447B1 (en) Memory interface unit, shared memory switch system and associated method
US8223759B2 (en) High-capacity data switch employing contention-free switch modules
US6876629B2 (en) Rate-controlled multi-class high-capacity packet switch
US20150172218A1 (en) Multiple Petabit-per-second Switching System Employing Latent Switches
SK62193A3 (en) Packet switch
CA2401337A1 (en) Packet switching
AU5908598A (en) A scalable low-latency switch for usage in an interconnect structure
EP1856860A2 (en) Input buffered switch
EP1668928A1 (en) Matching process
US20130201994A1 (en) Packet-Switching Node with Inner Flow Equalization
US20090262744A1 (en) Switching network
US7397796B1 (en) Load balancing algorithms in non-blocking multistage packet switches
AU756112B2 (en) Multi-port RAM based cross-connect system
US11005724B1 (en) Network topology having minimal number of long connections among groups of network elements
US6999453B1 (en) Distributed switch fabric arbitration
CN106886498A (en) Data processing equipment and terminal
US20080031262A1 (en) Load-balanced switch architecture for reducing cell delay time
WO2005086912A2 (en) Scalable network for computing and data storage management
KR20050020583A (en) Multi-dimensional disconnected mesh switching network
WO2017158027A1 (en) Optical switch architecture
US8687628B2 (en) Scalable balanced switches
WO2003094536A2 (en) Distribution stage for enabling efficient expansion of a switching network
US20050190795A1 (en) Method and allocation device for allocating pending requests for data packet transmission at a number of inputs to a number of outputs of a packet switching device in successive time slots

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 201816669

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20170315

WWE Wipo information: entry into national phase

Ref document number: 1816669.4

Country of ref document: GB

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17710018

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17710018

Country of ref document: EP

Kind code of ref document: A1