CA2006241A1 - Modular architecture for very large packet switch - Google Patents

Modular architecture for very large packet switch

Info

Publication number
CA2006241A1
CA2006241A1 CA 2006241 CA2006241A CA2006241A1 CA 2006241 A1 CA2006241 A1 CA 2006241A1 CA 2006241 CA2006241 CA 2006241 CA 2006241 A CA2006241 A CA 2006241A CA 2006241 A1 CA2006241 A1 CA 2006241A1
Authority
CA
Canada
Prior art keywords
outputs
packet
switch
modules
packet switch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA 2006241
Other languages
French (fr)
Inventor
Tony T. Lee
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iconectiv LLC
Original Assignee
Bell Communications Research Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bell Communications Research Inc filed Critical Bell Communications Research Inc
Publication of CA2006241A1 publication Critical patent/CA2006241A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/15Interconnection of switching modules
    • H04L49/1507Distribute and route fabrics, e.g. sorting-routing or Batcher-Banyan
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/54Store-and-forward switching systems 
    • H04L12/56Packet switching systems
    • H04L12/5601Transfer mode dependent, e.g. ATM
    • H04L2012/5672Multiplexing, e.g. coding, scrambling
    • H04L2012/5674Synchronisation, timing recovery or alignment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/15Interconnection of switching modules
    • H04L49/1515Non-blocking multistage, e.g. Clos
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/20Support for services
    • H04L49/205Quality of Service based
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/35Switches specially adapted for specific applications
    • H04L49/356Switches specially adapted for specific applications for storage area networks
    • H04L49/357Fibre channel switches
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/45Arrangements for providing or supporting expansion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/55Prevention, detection or correction of errors
    • H04L49/557Error correction, e.g. fault recovery or fault tolerance

Abstract

Abstract of the Disclosure A modular architecture for a very large packet switch is disclosed.
Each module comprises a Batcher subnetwork and an expansion routing network, which expansion routing network includes a set of binary tree subnetwork interconnected with a set of banyan subnetworks. A packet may be routed from an input of the switch to any output by means of only one switch module. This means that the switch modules may be synchronized independently of each other and if one module fails the remainder of the modules may continue to route packets.

Description

201)6241 . 1 .

Fleld of ~he /nv~rt~on The pres Dt in~c~ion rdate~ to a paeket switeh and, more particularly, to a modular architectllre for a ver~ lar~e paelset s~iteh.
S ~ac~tground of the ~nv~ntton A pacl~et s~vitchin~ network co~pri~es a~ array ot pacl~et ~itches intercoml~cted by trun~s. Typicaily, a packet ~itch cor~prises a set of input~ which recoi~e ~rriving pae~ets ~la a set o~ incomin~ trl~Dk~l aDd a ~et ot output~ which are eonneeted to a ~ot o~ outgDin~ trun~ Vi~l ~vhieh pwlcets dep~ the switch.
Tho iDpUt~ and outputs ~Ire eooneeted by a ~witeh fabrie.
~lustrati~ely, ~he switrh ~abrie i9 a self-rou~g ~wite~ tabrie. ~ ~el~-routing switeh fabrie typieaily compri~es an ~rray of wde3 orgaDized iDto ~tages. At e~eh nodo routhg deei~iona aro mado ba~ed on an addre~ eoDtained i~ the heador of the paeket beiDg routed. Typie~lly, sueh ~el~-routiDg ne~worlc~ aro ~ynehrono~ o that p~cl~eta arri~e 15 ~imulta~eou~ly at t~e iDpUtJ In inter~al~ oî tir~e l~no~vn u time slots.
The ~ost com~no21y u3ed ~ell-routing s~vite~lDg ~abrie utilizes a Bateher net~arlc and a basy~s networlc. A bsnys~ Detwork i~ a self-routiy networlc whie~ eaD route a paeket from any input to uly output ~ed on an addreu eoDtsined iD
the paeket header. E~o~eYer, the ba~yan networlc ~ ers from interDal pael~es couisioD3.
20 To eliminato iDtern~l colli~io~s iD 11 b~ networ~, thc packets ill any time 310s are preseDtcd to tbe i~putr ot the b~ysn ~ct~orlc h incre~iD~ or docreasing ordor according t9 dcj~Dation ad~reu. Iu sdditioD, the packct~ receh~ed at the iDpUt9 to the baoya~
networ~ ~hould be concontrated, i.c., thoro ~ould bo r~o i~acti~re iDpUt lincs to the banyan network in bc~woe~ licthre hput linal to tho b~nyan ne2~ror~ in aIly particular timo 310t.
25 Thu~, the b~yan notwork is u~ually psecedsâ by ~ BatG~er notworlc ~vhich ~ort~ arrivi~g pa~ot~ ~ocordiD~ to do~tinatio~ ~ddrou. T~o ~orted packot~ aro thal routod to tho d~i ed o~tpu~ ~ hldic~ltedl by thoir addroue~ by the banya~ networl~. Tbu~
co~bille~ B-tchor b~yllD Det~ork pro~ide~ ~ full co~ec~ion, htOrDally colllslooloss ~witch ~lbri¢. To iDau:re that the p~lckcta arri~iDg ~t the b~yan network iD a time slot sre 30 coDccntra~od, ~ conceDtr~tor ~etwosk m~ly be loc~ted betweal the Batchor net~vork snd thc bs~y~ nct~ork.
P`or a ba~yan notwork ~hich ~orms psrt of a Blltcher b~Dyan networlk to bo llo~ blocking (i.o. suf~er no inter~ collisio~ 13G~chcr output~ and thc bsnys~ iDpU~ a~e coDr~ccted with a pattern k~ow~ perfect ~hume. Consider 3 35 3ituation ~vhere tbore Rre 64 intercol~nection~ bot~ocD the B~tcher network aDd the bsDysn networl;. As~umo tho B~tcbcr nutput~ are nu~Dbered 1 to 64 nnd that tho b~yan iDpUt~
sre ai~o numbcred 1 to 64. The fir~ hal~ o~ the B~tchor output~ 32) ar~ nnected to .
: :, .,, , . . .. :
~ ' ' :,: ' ' . . : ',, : , .
;, : ~, ~, :: : ,: ,, : - : : : : ,- :
`, . ~ ::: ~ : , .: . : . :
2~L

tho odd numbcred ba~an iDpUt9 1,3...63. The secoDd haif of the Batcher output~ (33-64) are connected to tbe e~/c~ ~umbcred banynn illputs 2,4...64. Tbl~ i~ llko a per~ect ~huffle of the two hal~e~ ol a dcck ot playi~g card3.
While switch fabric~ ba~ed o~ the Batcher-banyan net~vorlc do not S ~uffer from intcr~al colli~lon~, they do ~uf-er from oYterDal colli~io~. An e~ternai collisioD o~cur~ ~vhen two os morc packct~ dc~tlned for thc samo output ~sre simultaneouYly prc2e~ted at the i~pUt9 0~ the Batcher-baDyall nctwork. ;
Se~rerai algorithms have bcen dc~ised for resol~ing ~uch output port codliet~ ~o as to eli~inate c~ter~al collisio~ in a Batcber-b~yaD nctwork. The3c10 aigorithma aro 1. Rscirculatioo Algorithm: Thi~ al~orithm tecd3 blockod pac~et~ back to the i~put~ for sc-entry at the ~o~t time 910t. (Soe e.~., Hu~g, ~. and K~auor, S., Starlltc: A Widebaud Dlgltal Swltch, Proe~edlrl~ of Globccorn 0~, pp. 121-125).
2. Three Pha~e ~Igorith~: Thla ~orith~ i~ d~o a teedbaclc ~cbeme. l~ach iDpUt lS sesd~ ~ probiDg hcader for arbi'a~o~ iD phase 1. The inpl-ts then receiYc a positiYe or negati~rs ac~Dowlcdge~ent bac~ ~roDI ths output~ jD ph~e 2. Tbc actuR~ ~i3sio~ of tho ~iDDiD8 pwl~os~, i.e., tho~o ~ho ~ecoi~e po3iti~e ackDo~vledgoDIents, ~ake~ place il~ pba~o 3. ~See e.g., Hui, Y.N.J. alld Arthurs, 1~., A Bro~dbaDd Pacl~et S~vitch ~or l[ntegrated TraDspor~, JEEE
Jollrrlal or~ S~lccbd Are~ ;A CO~IJAlCattO~J~ VOI. 5, No. 3, Octobor 1987, pp.
1264-1273; sce al~o T.T.Lce ot d, ~The Arc~itecture for a Multic~st Bro~dbaDd Packet Ss~itch~, Proc2edlll8l ~IAfaco~ 198J, pp.1-8 which descri~c~ rte pha~le al~orit!b~ ia ~hich oaly ~ Batcher network i~ requircd for arbitr~tion).
2S 3. RjD~ }le~c~ation Al~orithlD: lbis }~ ~ lolcc~ pa3~11iDg 3che~e. At the begh~is~ ot ~ timo llot, a cle r to~c~ i~ hsusd by 1~ to~en gsnerator. The tnlcen ha~ Ul N-bit ~ield to iDdic~a~ the a~vdl~bility ot sach o N outputs. Thetokal ir circul-ted around a ring connectin~ ~put port~ ociated vvith each of tho p~ot s~vitch i~puts. Wlbe~ ~ho tol~e~ es at a particulllr iDpUt port, the p~lclcot at tho hcsd ot a queue ~t the p rticular Input port ~vill m~ke a resena~o1l by ~riti~g 1l IQ~IC "1" Into tho ~it position o~ tho toltoD
corro~ dill~ to the de~ired output, i~ ~ho do~sed output h~ ~ot been re~cngd by a preYiou~ iDpUt pon 011 tbo r~. If the i~toDdcd output is suceeu~ully reson~ed, the ~acko~ ia ~aDarl~itted dutin8 tllo ~e~t ti~e 910t. Thcrescn~tioD cycle ~Dd tran~miuis~ 9G1o Ca~l be o~crlapped ~o ~i~imize tbe o~erhe2d. (See, e.~., Birlgh~ B. a~d Busscy H., l~eior~atioll-Ba~ed Coote~tion Rçsolution Me~h~Dis~ for B~l~cher-l~aDyan Packe~ Switche~
E~c~ror~k~ L.ett~r~, 23rd, Vol. 24, No. 13, Juno 1988, pp. 772-773; scc also B.

: .: , , ;. , ~ ' , ' , ~ .::.

:; .. .. ', :~.. ,. . , :, :,: .. : .:
..... . . .

20~)G241 -3 ~

Bln~hal~ et ai, U.S. Patent 4,761,780).
AAY of the three algoritbm~ mentio~cd aboYe hu th~ potontial to re901~r8 output port conflict~ aDd elimillate c~tern~i colli~io~ tor a ~lvitch fabric bucd OD
a Batchor ba~yall nctworX.
S A pac~et swi~cb which scrves a~ a centr~i officc tor a bro~dbandtelecommunicatlons ~et~vork ~vill requiro ~ppso~imatGly 10,000 input~ d outputs. Such v~ry lar~e packet !l~itche~, whe~ implemc~ted ~l~ing very l~r~e Batcher bllayaD j~itch ~abric~, have a ~umber of signific~Qt problelD~. ~r~t, the ~oll-blocicillg propcrty of the Bstcher bany~ ~witch ~abric tequirc~ th~lt tbe ~v~ole sct of i~put paclsota (i.c. the 3et of 10 pacl~ct~ arri~ing ae the input~ of tho ~wltch fabtic duriD~ ooe time ~lot) bc oyDchronized at eYery stage of the ~itcb fabric. For a ~witch ~ith 10,000 input~, this meaoa ~ynchrooiziDg up t~ 10,000 pscket3 ~vithin a network oî about lOn st~e~. A sccond probloD~ witb a lrery large packet s~it~h i~ thc p~y~ic~l limit~tio~s o~ tho ize of the VLSI
chips a~d ~e comple~ity o~ tbo ingerCo~eCtiOD ~isin~ bctvveen tho chip~. A ti ird 15 proble~ wit~ l~ery large pacl~ct switchea rel~to~ to rell~bility tmd maiDtahabili~. It ia clcar that s~allcr s~itch ~brics ~ el~siez to de~elop, to te~t, to ~dnt~D, ~nd to rcplacc.
A nu~bcr of ~ppro~che~ ha~e beon de~elopcd for ~odularizing very large ~lvitches (see e.~., C. Clos, "A Study o~ No~Blwking S~vitchi~g Networl~s", 8 SyJt~3 Tchn1ell~ Jo~unal, Yol. 32, 1953, pp.406-424; D.R. Spoars, "Bro~db~nd ISDN
20 S~vitcbing Capability FroDI A Sen~ice Per~pocth~", IEEE lol rRal of S~l~et~d Ar~a~ in Comm., Vol. SAC-S, No. 8, Oct 1~87, pp. 1222-1230). Convo~tio~al approache~ to modulariziD~ ~esy l~rgo Ba~cher- b~yan ;_itch ~abrics llavo not proYen to be ~ucce~ful iD sOIYill8 the ~bove-ide~tified proble~. Whe~l modulsri~ed i~ the so~ventlonal ~nner, vory large B~tcher-~any~ witch fabri~ are fonnet from reladvely small Batcher-ba~yan 25 ~odullel~ which ue l~roupod in ~ e~. One outp~t of cach Bateber~b~ya~ module in one ~t~ge i~ co~ectod ~co o~lch l!l-tcher~b~yu~ module i~ tbe ~e~t ~ta~ze. This providc~ full iDterco~necti~i~ betweon ~11 t~e input~ aod ~U ~e outputs of ~he qery large pac~et swit~.
~Iowevor, in ordor for pN&et~ arriYing at the s~lvitch ill a par~icular timo ~lot to be routed fro~ p~cul~r iDpU~ ~o particular o~ltpU~, the pac~e~ must p8U throul~h plurality of 30 ~ho rel~tl~oly small B-tcher~b~yu~ ~odule~ in a plur~llity of ~It~430a. When thi~ typo of ~odular archit~uro i~ utilizcd, it is po~ibl~ to indepo~de~tly synchroai~o the i~dividual Bstchor- banyaa modulal oDly i( bnffer~ aro u~llzçd beswoen tho ~ta~e~. Withou~ 3uch Wfors ~ll o~ t~o ~odule~ havc tc bo glob~lly ~ync~ronizod. Furthermore, tho coDvo~tional modulor ~rchitec~ure de~cr{bcd abo~e rcquire~ complos intorconne~ion-35 bot~ve~D tho ~t~go~ of Batcher~banyDD ~odules. 11l additlon, bocauso a pacltot mu~t passthrough more t~u~ ono of tho Batcher-bsDyan modlule~ ~o bo routod from a particular input to a particulas output, i~ ODO of tho 8a~cher-b2ny~ module~ fail~, thc ~lvit,:b a~ a whole no lon~or oyor3tos properly. Still another problom ~rith tho coD~e~tiunal modular : - ~ : : . :
. :,. ,. , , :, ; ~ :
. : : .
., ~ : ~ . : : :
, , ;~0(116241 architccture relate~ to co~flict resolutioo. BecauJe the n~odule9 do ~Dt operatoiudepo~del~tly o~ ~ach other, aDd because eaCIl p&C~et has to pass throueh ~I plural;ty of module~l in a plusaiiq of stages, connict resolution is v~ry dlfficult. I~ particular, coDflic~
resolution hu ts~ be per~ormed at o~rety Batchor-banyan module, wbich co~nlct resolution 5 also rcquires buffors internally b~twee~ the staE~e~. Accordi4gly, sucb a conveDtional modular architocture is aot quitable for implemoDtin~ ~ vory high-spccd p~cl~et avvitcb.
To oYercomo tb~ above-describcd problcm~ 7~hich aro characteristic of very large p~sl~et s~itche~, It i~ aD objeet ot the prc~ent invention to providc a modular archi~çcturo for a l~ory ll~rgo pacl~e~ s~itcb, ~hlc~ rchitecturo ut~lizes ~odules tbat 10 oporate indepcndontly o~ each other. More p-rticularly, it is a~ objcct o~ thc prescnt h~o~tio~ to prs~vide a very lar~e packct switcb construsted ~rD~ a plura.lity o~ modulos wbich call be sy~chroDizcd i~depcndc~ltly and ~qhish ~re of rolathrely simple co4struction tor oaso of mail~tenss~ nd bigb ~çliability.
S~m~ry of ~he lAverlt~o~
lS In accordanGe with the pteseDt i~Ye~don, a pac~et s~itc~ haYiDg Ni~put~ and N output3 ~nay be impleme~ted U3iDg IC (1<IC<N) rel~ ely small switchmodules. I~he set of N iDpUts i~ partitioDed into 1~ 3ub~e~"0 that oach ~ubset h~
M= N/~ inputs. l~seh subset of M iDpUtS i~ coD~ccted to all N output~ usiog onc of thc K
switch modulal. ~oro particularly, each nVjtGh ~odulo ha~ M inputll and N outputs. The 20 ~'~' output ot o~ch module is conDcG~ed to ~ multiple~er, which multiplo~er b connectod to the jth output of t~e paclcot switch Thc switch module is the ba~ic b~;ldiDg bloclc of tbe NxM packet ~itsh accordiDg to t~o prejcDt in~eDtioD. }~aGlb of the twitcb modules is an au~o~omou~, -Do~-blocking, sel~ soutiDg pscket 3witch.
2S Dllugratively, oach 3~itch ~odule comprise3 a Bstcher ~ortiDg ~ubDc~or~ uld ~n esp~n~iou routing Det~or~. Tho o~paD3iol~ routin~ ~et~ork comprises a ~ot ot b~y trso ~ubnet~ork~ intercoDneeted ~vith ~ 3et o~ ba~yan ~ubnetwork~. In CODI~l~t Wi~ lho eo~vçDtional modular pwl~et ~witchc~ de~cribed cbove, a packet i~
ro~ fros~ a pardc~l~r iDpUt to a pllrtiOUIar output, throu~h use of orlly ono ~witch 30 modulo. There r~ no lfiterconnectioll~ betwee~ modulo~ d tho module~ operate i~depende~tly of o~cb other.
Tbe ~aeDco o~ ~uch Intotconnection~ Dpllfie~ t~o apot~ltion iand mainte~ance of t~o eDtiro pacl~et ~itcb. ID p~rticnlar, tho inven~vo pack~t switch architcct~ro allow~ for the indopeDdont ~yncbronization o~ tho modulal wbich ~implifie~
35 timiDg ~ub~ti~dally. Tho rolatively s~nall ~ize of each 11witcb module m~u synchronization ~ithin ei~ch modulc rolasi~c ts2ightfo~w~rd. In the psckct switch of the pre~en~ i~veDtio~ not ~ece~ ry to ~ynchroDlze tho indlYidual 3witch module~ with0110 another. In otber ~ord~, in a 10,000 input ~witcb, it i~ not ne~s~ary to ~yDchronize :: . , .: . :
'. ~ ' ' ,. ' `~,:, ,;. , , . . , , . ~ . . . .
.
.
~, : , :~0 [)~4~

10 000 pacltets ~vbich srrivts in a particular tim~s slot. It is only ~ects~sary to i~dopendently syDchroni~o tho pacl~eta routed withln tsach o~ the indiYidual switch module~.
In addition it is ~ ad~u~tage ot tho packet 3witch ot tho preseot inqention thzt coDtention resolution aigorithm~ de~cloped lor Batchcr-b~yaD ~witche~
S such u the RccirculatioD Algorithm T~ree-Ph0se Algorithm aDd Rlng Resenation Algorithm remain vaiid for tho indiqidual ~itch module~. In particular the l;UngReservation Algorith~ i~ a~ attractivc contentioD resolution algorithm for the 3~vitcb modules becaus~ a separaSc rin~ is used for each s~itch r~odule thereby convertiDg an othct~ise seriPI procedure into a parallel proctsdure.
A futtbor adv~nt~go ot tl~e pscXet switch ~rchitccture o( tho pre3ent in~re~tion is that a~y tault in 8 module will diaturb only tho local traafic carriod by that module 9 hile tho remaiaiD8 switch modulo~ c~ ~till be Dormally operat~d. Pllulttoler~w~s eaD therofore bo accomplis~ed by proYidir,g 1l spllro module Dot ~ duplication of tho entire s~ritah.
It should ~llso b~ noted th~t becswe the pacltet swit~ ot the preaent inqcDtio~ is coD~tructod utilizillg independe~t swltch module~ t~e cspacity o~ the pscket switch csn be di~trikutot over a ~ur~l esch~m~e area to reduee accc~s co~ts. -For the re~son~ stated abo~e the modulsr pacl~et switc~ architccturc of the preSeDt iD~DtioD ropreseDt~ 3igniic Dt ad~ance o~er con~entio~l ~odular paclcet 20 s~itch arc~itcceu~os c~pa~ lly for the impl~e~t~tion of vcry largç pac1cet switc~e~.
21rl~D~scrfptto~ of the D~awin~
FI~. 1 ~chom~d~31ly illu~trate~ a /ery large pac~et switch compri~iDg N i~putil ~d N outpu~.
FI~. 2 scheDI~ically illu~ts~te~ ~ con~entiollal modulsr 25 i~plcmentation o~ t~e ~witeh of PI~. 1.
PI(;. 3 sche~atiallly ill~rtr~c~ ~ ~odul~r i~ple~nenhtinn of the ~itc~ o~ ~n. 1, ~ ~ord~co ~Nith sn Illus~rati~e en~bodime~t of t~e prGyeDt i~eation.
FIC~. 4 schom~ti~lly illu~srate~ ho~ ~ phlr~ o~ ~ritch modules of t~o typ~ ~o~n in ~I~;. 3 arc inte2co~necsod iD tbseo dime~ion~ to ~or~ the nvitch of 30 ~I~. 3.
Fl~. S chcmatic~lly illu~trate~ nn c~plmsion achqotk ~or use in ~o paclcot 3~ritc~ ml~dlulo~ ol ~. 3 ~d PI~. 4.
E71C~. 6 i~ 8 tablo vlhich summ~lFize~ the propertle~ of tho subnetworh co~prisiD~ ~ Jwitch modulo.
FIG. 7 schemaacaily illuDtratea an ~Iternati~o modular p8ckot tch in ~ccordance ~vith ~ ~Iternative cmbodir~ent of the pre~alt inve~tlon.
FIG. 8 illu~r~te~ by way of fiow cbllrt a ring re3en~atioD
coDten~io~ resoludo~ gorithm (or u~e ia connection wi~ the packet ~itch of FIG. 7 in ::- : , .:: - , ;; ~ ,'~ ' ' , ' . .
.~:

~OOti2Al ~6 accord~co ~vith an illustrative esnbodiol~nt of tbc }n~Yention.
D~t~le~ D~cript~o~
Tho detailed dc~crlption ot tbe invention i~ divided into a ~umber of subsection!l. Sub~ction A deuribe3 ~ cooves~tion~l packet ~wi~ch arcbitccture. Sub~cction 5 B pre~ents an oveniew of a modular packet ~witch architecture according to tbe pre~nt invention. Sub~oction C de~cribe~ in detail tho Ipac~ct ~itch modulo3 of tbe prcsent invention. Subsec~on I) dc~cribe~ contention rosolutioD and output space c~tcn~iou lor the packet switch modules o~ present iDve~tiori.

~..ConYe~iQ~LPac~et SDritch~cb~

Tusllin~ to FIG. 1, a verg lar~e pscket ~witch 10 i~ schemath:aily illustrated. Tho switch 10 comprhe- ~ DOn~ blockiDg, tel( routiDg s~vitch tabric 20. Tho ~ritch fabric 20 ~ay bo 1I Batcher~baDy~ witch fabric.
Paclcets ~Irrivo at ;ho pllc~et ~vitch 10 via tho high ~pced tiber optic input truDh 22. E~ach ot the îiber optic input trunlu 22 i~ connocted to a demul~ipleses 15 23. E~lch deD~Iultiple~er 23 de~ultiplo~¢D the 3tream of packets arTiving o~ the corrc3pondin~ fi~er optic trunk 22 iDto a plnra1ity d pscket s~eam~ on the input lines 24 because the electroniG ~witch fabric 20 operate~ at a slo~Gr ~peed than the optical fiber input tmnks 22.
Illu~tr3t;vely, the s~ritch ~bric 20 has N i~put 25 and N outpu~s 26 20 50 that there arc N inpuS liDe3 2~ IcadiDg to the nvitcb fabric 20. Tllere are al50 N output liDc~ 27 lea~h3 tho ~wltch ~bric. TypicDlly, N i~ on ~c ordor of 10,000.
Tho ~itch fabric 20 ~er~s~ to route each packet arn~ing ~ria aD
input lino 24 to ~ p~rtle~l~ output l~no 27 b~scd oa ~D addro~ co~tained in the packet hcadcr. If tllo ~ abrie 20 i9 ~ B~tcbor bsDy~ nct~ork, tholl the s~i~ch fabric 20 i~
25 3y~1ChrC110118 1~ e pwkot~l sro routzd shrough tho s~itcb ~abric i~ ~me ~lots. The p~et~ le~in~ ~o ~ite~ fabric 20 ~ a liDe~ 27 ~re multipls~ed UsiD~ ~hc mul~plc~crs 28 for tran~isd~D Yilil tho ~igh pccd ~i~or op~e output trunk~ 29.
1~ tho s~vitch fabrlc 20 b a ~at~her bnnyan ~ot~ror~, a mcc~ m ;
(not ~ho~vn ID P~G. 1) rQ~ bo pr~idod to rerol~o connlct~ e~ mor~ t~ oDe pscket is 30 addros~ed to the umo output lino 27 in ~ t~o ~lot. Tho coDtoDtlo~ re-olutioD mech~nism s~ay in~olvo u~io of the Rel:ircullltioa, Rlng Re~etvat~ou, or Three Phase algoritbla~
~ne~tloned ~bovo.
PI~. 2 ~cho~atically illu~te~ a conveDtlonal approach for modul~riziDg tho ~witcb fAbric 20 o~ PIG. 1. I~ }. 2, the ;~risch fQbrio 20 i5 formed 35 fro~ ~ pluraiity of Batc~or~baDyan ~oduls~ 30. Tho module- 30 aro or~aDizcd inso ~tages 31-1 and 31-2.

.. ~ . . : . ; - .
~ . , . .
.. , , . .
: . , . ,: , .

., :. : .:

::, ;:

~OV624~

Each module 30 iwludel~ a Bstcher nchvosk aDd ~ b~y~ Dctworlt.
Illus~ra~vely, each 13atchGr banyaD module 30 h~ 2S6 input~l and 2S6 outputJ. One OUtp11t from each o~ tho module~ 30 iD Sta86 3~ CO~Dl:Cted to eaoh o2 the module~ jD
the ~tsge 312. Thu~, the ~witcb f~bric of E~IC;. 2 provides full co~na:tivity, I.e. a p~cl~et 5 arriviDg at ~ p~rticular i~put ot a modulo in ~t~l~e 31-1 can be routed to any output of aDy module in ~tage 31-2.
Howe~er, the ~witch (abric o2 E71G. 2 hss a nnmber ot ~igDificant disadvantage~. Firstiy, for a pscket to be routod ~ro~ an input of tho ~witch fabric to an DUtpUt, it must pa~s tbrough modules l~od in ~ plur~lity o~ ~tagc~. Thus, ~he i~dividual 10 module~ 30 caD be ~yDchro~izct indopendoDtly only i buffers arc loalted between the 3ta8e~. Secondly, thls architecture of E~I(}. 2 rCqUirG- A complel: interconDcction pattern between the E~atcher banysl~ modulea of t~lVo adjace~t 3t~8es. Thirdly, bccau3~ p~clcet~ aro transmitted from particular inpu~ so pllrticulsr output~ vill ~ plur~lity of ~odulea located in ~ plur~lity of ~tages, if ono modulo ~ , tho eDdro ~itch fabric m~y not opetate lS proporly. A further dl~advaDta~o of tho ~c~ltecture of PXG. 2 i~ tbat conteDtion bet~con packet~ eo~tainisg conflicti~g output Addre~e~ C~D~ot bo mlolvcd indop~lldestly 20r o~ch module, t~ereby malciDg overall co~te~o~ re~olut~o~ ~or t~e ~vitch fabri¢ quite co~plicated. For these roasoDs, tllo modular atshitecture o~ E7IG. 2 ha~ a limit~d thro~ghput and i~ not suit~ble for implomcntatioo of a very largo packet ~itch baving on 20 the ord~r o~ 10,000 iaput~ smd ~0,000 outpu~.

A. ~itch fabtic 20' i~ accord~os wit~ ~n illustrati~te embodime~t of the pre~en~ in~e~tioD i~ sho~u i~ . 3. The n ~ h ~slbric 20' h~ N i~puts 41 a~d N
OUtpUtll ~2. ~G N iDpUt~ 41 aro divided iDto ~ ~ubset~ 47 ot M iDpUtl~ e~ch. Thus, the 25 fir~t ~ub~et 47 ~ illpUt~ 41 i~:lude~ i~p~ 1,...,M alad the ~Ib sub~e~ 47 0~ iupub 41 i~udes ~nput~ 1)+1,....,M~ ~heso M~C~N.
Thc ~itch modulo 20~ compri~e~ a plurslity d module~ 40. Each ~ubses ~7 of M i~pUtll 41 ~orm~ the ~ot of inpu~ ~or o~o of the s~i~ch module~ 40. Thut, cl~ch ~vltcj m~dub 40 h~ M iDpUt~ ch modulo 40 has N outpu~ 43. Thc modulo 30 outputu 43 ~itb tho ~ addre~, ooe ko~ each module 40, ro mul~plo~ed togetherUsiD~ tho multiplaEor~ 44 ~nd fed to tho output 42 bellriD~ that sddte~. It l~ a highly de~irable fe~turo o~ ths pre~lo~t iDlrentio~ t eacb moduls 40 i~ autonomous, ~on-blockiDg, self-rout~Dg p~cket ~ritch ~et~or~.
A t~re6-di~onsionsl implem~nhtion of the atchitecture of FIG. 3 i5 35 sbo~n iD E~IC;. 4. Illus~ad7cly, ~ ~ho~n i~ , tho ~witch fnbric 20' comprise~ a plur~lity of modilles ~0, eacb of whkb il implemen~ed by combiDi~g ~ plurality of ,: , - . , .

,, ,., ' ' ;~O~GZ~

~ubnet~qorl~ in three diDlen!lio~s.
Each modulo 40 of Pl~. 4 compsbe~ ~ BAtcher sortlng ~ubnetwork 51 snd an e~paD~ioll nohivork S2. In etoh ~odule 40, the input~ to the Batcher ~ubnet~rork 51 Yorm the module inputs 41 (~ee ~IG. 3). The Batchcr ~ubnet~orl~ 51 S comprise~ an lrr~y o~ nodes 58 which are orgsnizod into st~gc~ 59.
Tho e~paD~iou nct~vorl~ S2 compri~c~ ~I sct o~ binary trce ~ubnet~vork~ 53 and a sct o- bsDy:~ ~ubnetwor!~l S4. (Such binary banyasl net~vorlca ha~c pret~iou~ly bec~ used for ropllc~ting pa~es~ jD a broadca~t packet ~vitch, ~oe c.g., T.T.
Lee, U.5. Patcnt 4,813,038 illsued March 14, 1989). Illu~trati ely, the binasy tree 10 subnetq/ork~ 53 comprise t~e nodes 60 aD d llre ~taclced verticaJly. Tho banya sub~etwor~ S4 compsise tho node3 61 uld are !~taekod horizo~tally. ~ detailcd dl~cuo~iou ol the operstio~ ol the bi~sry tree ~ubnehsosl~s 53 ~d ba~y~ subDetlYork~ S4 i3 pre~entcd belo~ aloDg ~ith a discu~io~ a~ to ho~ the ~ctwor~s 53, 54 ~re interco~nccted to forr~ a~ o~pan~ioo rlet7vork 52. In PI(~. 4, tho multiple~erl~ 44 (3ee PIB.3) ase 15 arnmgcd in set~ of ~our, ~hieh set~ of four multiplo~er~ are stscked rertical~y. In osch module 4û, tho outpu~ olt the b~yan 3ubDetwork~ 54 form tbe ~odule outputs 43 (ace FI~. 3). A.s illdicatcd ~hGYe, thc output~ ~3 /ith the same address, one from each switch ~odul~ 4û, are multiple~ed togGtllcr by me~3 of ~ pleser 44 alld eo~nected ~o the corre~pondin~ outpu~ 42.
Tlle ~odule~ 40 of FIG. 3 ~nd 4 opcrate as follo~s. A~ ~how~ iD
FIG. 3, a set of N p~cl~et arti~ing ~t the i~pul~ 41 in a particular time slot i8 partitiollcd into K sub~et~ o~ M packet~ o~c~ 50 t~at each mod~l6 40 recei~/os a sub~let d up to M
pacl~et~ at its input-. The sub~et of pae~ot~ arri~iu~ ~t the M iDpUt?~ of a module 40 ~re sorted by ~he Bstc~cr ~ub~et~vorlc 51. I~o sorted sub~et i~l theII partitioned again into 5ub-gob~et- by tl~o b~n~ry tre4 3ubDets~orlu S3. Ill cach module 40, tho orderod p,ackct3 of the~o s~b-~ et~ ~o rout~d concurrently to their dosti~tlon~ by the banyul ~ub ~etwor~3 54. F~.llr, the maltiplacer~ 44 collect the p~cke~ from tho baDyaD output~ 43 and toute th~ to t~l0 p~c}~t ~Iwitcll OUtpUtl~ 42.
Thc prim~ ds~tll3es o~ the ~odular arebitechlre of thc preson~
30 im~eDdon C~D be undo~s~d ~itl~ re~ere~sce to ~I~. 3 ~Dd 4. Fisst it should be noted ~hat the module~ 40 ar~ Dot iD~erco~neetcd witlb c~ch otbeY. It is o~ly 2~tcr proce~d~g by the modulet 40, that pllclceta lellviug the n~odulea 40 ~ro multiple~ed to~etber. 1~ other word~, the sDoduler 40 operate independe~tly ol eacb othor. Thi~ meaD~ thst each modulo 40 can bo synchro~izedl indepondently. It ia oDly n~coul~y to ~y~cbror~izs with csch otber 35 tho M packet~l which propa~ate through a ~odule 40 in a gi~Gn ~i~ne 910t ~8thor thaD all N
paclsct3 ~hich r~ay ~ o at ail th0 modulea in d gi~/en time 310t. Thu~, the in~entive paclcet switch arciliteeture of FI~. 3 an~ 4 h~ ~ sigDiflca~t ~d~lDtago ol~er thc conqention~l mot~ll r arcbitccture oî Fl~. 2 e~peci~llly whe~ there i~ on tho ordor of .,~ , , ' .. ; . .. . . .

200~Z41 N~10,000 i~put~ Slnco cach module 40 i~ relati~cly ~mall, synchroDlzlll;oD is relati~ely ~raighdorward ID addido~, jD contra~t to the co~cutior~al modul r archiiçcture o~
FIG 2, no comple~ psttcrD o~ interconncctionll e~i~ts betwetD groupll o~ ModulesA furthor ~dva~tage ot thc inventivc modul~r ~rchltecture ot FIG 3 a~d 4 is thst a p~cl~et which i~ routed ~ro~ a particular packot ~viteh i~put 41 to a p rticular paclcot !Iwitcb output 42, oDly putell tbrougb one switch module 40 Tbus, if ono s~vitcb module 40 ~ail~ to opcr~te, the roDlainder o~ the modules 40 ~ill cootinue to operatc properly Fault toicrance cul ~e acco~pli~hed by pro~iding ~ spare modulc r~ot a 10 duplic~tion of the e~tirc s~vitGh 1~1 contrast, becsu~e tho coD~c~tional modular packet 3~vitch architechlre compri~e~ A co~plc~c srr~emeDt o~ interco~nected 8atchcr bany~
modules, i~ o~e module fail~, thc ~hole !l~vitch may fui to operate proporly Another atVaDtage o~ the i~vontive packet ~witch arcbit&cture i~ that conte~tion resolution can bo carried out for cach raodule 40 ~cparatsly I~ additioD, a~
15 i~dicated abo~,rc, becauso the Dlodules 40 operate i~depead~ntly o~ each other, thc modulo~
40 c~n be ~pread ovor a rural e~chAn~e ~ren to reduce ~ICCe~13 co~t~

A nui~h ~odulc 40 (3ee PIG 3 al~d 4) includi~g a Batcher 20 ~uhlet~ork 51 and a~ espan~io~ not~ror~ 52 l~ sho7vn il~ ~oro detail i~ PIG 5 An e~pamlioD notYvorlc i~ ~ net~vork with ~ore ou~put~ ~n inputs 1~1 ge~eral, all n-~t~ge a~p~sioD notwor3c ~ 2n' inpuh and N-2" o~put~ i~ a combin~tlo~ of 2 sct of M bimlry tree wb~c~rork3, and a ~et of ~8N/P~l b~yu~ ot~or1~9. nlu~tra~ely, Y3~ DI Pla~. 5~ tho module 40 25 cO~Dpri~al ~n e~p~ioll ~e~ork 52 havlD~ n-4 ~t~e~ (laboled 1,2,3,~ h FIG 5) Thc Det~orlc S2 o~ S hu ~-2n'=22~4 i~put~ beled 1,2,3,4 ~n PIG 5) and N~2~2~16 OU~ belled 1, ~16 i~ 5) Thoro ~ro M=4 bi~ry trse net~orl~
53 ~nd X=NtM=16~4'4 ballyaD ~etwork~ S4 It ~hould b~ noted th~t th~ output~ 1,2,3,4 ot sho Batcher 30 ~ubnet~ork 51 of thc ~witcb modub ~0 of PIG S are oon~ec~od to th~ lnputs 1,2,3,4 of tho e~p~io~ ~et~vorlc S2 in a porfoct sbufno pattera Thl~ h topologically equi~alent to po~ect shumo p~ rD of coDnçction~ at ~he input~ to tbc b~y~ ~ubnGtworlcl~ 54 Io aD expa~ioD net~orl~ 52 tor U30 in ~ /itch module 40, eacb biaary ~ec i~ a 1~1~ nctwork ~nd has k-lo~K=logN logM=n-m stage~ Thu~, in FIG 5,35 oa~h ot tho bin~ry troe network~ 53 has one input and IC-4 output~ ch biDary tree networlt 53 o~ S ha~ l~slogK=lo~N lo~l~l=log16-log4-m-n~4-2 ~ta~e~

~ . , . ~ . : , .
: . :
:. ., ~, . :- . --- :

. . .
' 20~Z4~
- 10 ' Slmllarly in an e~pan~loD nctworit ~or u3e in a modulo 40 each banyan 3ubDetwork is an M~M nctwork haYln~ m stage~. Thu~ in E7I~r. 5 each banyan subnetwor~ 54 i~ a 434 net~vorl~ havin~ t~vo ~tage~.
E~ety Dode 60 in a bin~y trGe ~ubnet~vork 53 i8 a 1~2 switch 5 element capable o~ performing a binary routiog 41~orithm ba~ed on an ~-bit destinadGD
sddre~3 in the header ot a paclcet to be routecl. Th~t ia a nodc at ~tagc j (a~ lab~led in E~IGr. 5) send~ ~n arri~ing paclcot out on the uppcr output liDlc (liDk 0) or the lowor output link (liDl~ 1) acco~diD~ to tho J'h bit o~ tha packct header.
Simil~rly every ~od~ 61 ;D a bulyan network S4 i3 a 2~2 3~;tch 10 elemeDt which pcrform~ the ~am~ binary rou~ng algorithna. ThUJ, a pacltet arriving on one of tho two iDpUt lin~ o~ a nod~ 61 In ths stago j (a3 labelod in PIG. S) i~ routed to the upper ou~put liDk (0) or the lolvor output link (lini 1) according to tbo ~' bit of an n-bit pwltet header.
The cross-intorco~ncction of M binary tree ~ubDetw~rkY and ~
15 b4nyal~ ~ubnctwork~ ~o for~ asl e~paDsior~ ~ct~lvork i- ~ow coD3idored. The outpnt~ of a binary tree ~ubnet~or~ cau bo labeled by t~o binAry ~umbers (~ y) = (xl....x""
yl~.y~ ) where xl...~ the top down numbcring o~ thç binary tree ~ub-network alldyl....y~ is the local addrcsa of each ou~put ~ithin its biDary trce. The binary trcl: output~
sre labeled vvith tbe appropriate bin~ nu~ber3 in E~IG. 5. Similarly the input~ o~ the 20 bs~y~ subnetworlss can al!w be identifiod by two bi~ary number~ (a b) (el....a~_m.~
bl....bq~ where ~ n-~ the top do~n I~UIllbOriD~ 0~ the bany~ ~ubnetwork3 and bl...bm i~ the local ~ddress d t~e input ~vithiD it~ banyall ~ubnetworl~ e banysn iDpUt~l are labded with tho ~ppropri~te bin~ u~nbon iR PIG~. 5. Iu aD e~pal~io~ net~vorl~ a~
output r = (~ y) of ~ b~sy trc~ networ~ d An input q = (a b) of a ba~yan networi; are 5 co~ cd if (~: y) ~ (b,~ us, fot ~ample thG biD~Iry tree o~tput 3119 of FIG. 5 i~
ctod to ~e bu~yan input 10,01. l~i~ iDterco~cctior~ patterD is e~laily reali~ed in three di~cn~ion~ ~ ~ho~D in FIG. 3 lvhereiQ tlle bi~y ~ree subnet~vorl~3 53 are stacked Yertiallly a~d the bamy~n subnetwork3 S4 ~e ~tacked hori~ontally.
Tho po~ibilJty of iDtarnall CDIligion~ iD tho above-dcscribcd 30 o~pa~l~ion network i~ now con~idcred. AD iDtCrlll~l colli~ion occur- i~ a r~et~ork ~hen a ~ode atte~pts to route two packet~ ovor the ~amo interD~il liok at tho amo timo. The binary trec net~osl~ are formcd fro~ 1~2 s~vitchiDg olement~ ~hich allow for only one i~put p~ket at ~Dy in~ta~t o~ time. Thusl, paC]Cet~ will Del~er collido i~ the binary tree nct~ork~ but l~ gcneral iDterll~ io~ m~y o~ur in the ~ub~eque~t bs~ya~
35 ~ubnet~7ork~.
A~ lDdiGatod ~boYo if puckot~ with di~ti~ct dc~tination addr~sc3 arri~e at the i~pnt~ of a bany~ D0tworlc i~ a particular time 310t arranged in a~cendiDg or de~ceDding order accotdiDg to de~ atio~ sddre3s thaJ the banyan network i3 intcr~ally ~: :

~o~x~

non biockiDg. In c3ch p~cl~c~ ~witch module 40 o~ 3. 3, 4, and S, a Batcher ~ubne~ork 51 ~ort~ the sub~et ot packGt- Incid~nt on that ~odule in ~ partlcul~r timc ~lot accordlng to dc~tlnation addre~s. The cro~l i.nterconoection patt¢rD bet~eon the bJnary tree and bany~ subDetvvor~s described abovo insuret that packet~ e at ~ho banyan5 3UbDetWork3 54 o7 a s~itch rnodulc 40 ordered accordin2 to dcstinatioo sdd~cul ~o that no . i~ternal collisioD~ take pl~ the bsnya~ ~ub~lehvorl~.
Tbe Don-blocki~ property o~ thc e~pan3ion ~ctworl~ 52 (3ec FIG. 4 and ~IG. 5) may bc ~tate~ ~other way. U thc sct o~ dc~ti~ation addre~ses of input paeket~ to an c~p~Dsion notwork i9 monoto30 Imd soncentntod, tSeD ~o i~ e~ery sub~et of 10 input packets to each ba~yllD subnetwor~ of tho e~p&Dsion network. CoDaequently, a non-blocl~i~g, ~elf routiDg packet ~witch with mors output~ thaD irlput~ msy bo formed by co~bini~ a Elatcbor sortiD~ netwotlc aDd ~D e~pansioD fiet~vork in the m~nner dcscribed ~bovc. In a p~lrticular embodiment of the in~eDtioD, a switch module S2 m~y i~cludo a cDncentr~tor nGtwork (~ot !~ho~D) loccted behvee~ t~e Batcker 1ortiDg ~ubDetwork 51 and 15 tho e~pansion network 52.
Fl~. 6 suD~m~rizc~ tlle propcrtle- of a s~vitch modulo tor~i~g part of as N~N pnclcet ~witch. In par~culsr, the packet ~itch comprise~ K ~witch motule~, ~vitb cach Dwdule ha~i~ MsN/lC inpu~ and N outputs. A~ indic~ted aboYe, such a ~witch ~odule coDlpri~es a Batchcr ~ubnetwork, a set of binary tree l~ubDet~rork~ and a ~et 20 of b~y~n subactwork~. FIG. 6 iadicatl:~ th~ nu~abcr o~ each ~ypo o~ Dctwork in the swiich ~odule, the di~en~ioD~ of eac~ nets~or~ in th6 se~itch r~dul~, the numbor oî
sS~ge~ of each Dct~orlc ;D tbe ~itcb module, and thc nu3ber of nodo~ of each nc~worlc in tbe ~svitch ~odale.
~ co~pari~losl to aD NxN p~l~et ~itc~ implc~eDtcd u~ing an N~N
25 ~atchcr ~et~ror~ a~d ~ N~N bllDyu~ ne~vo~, the mo~ular architcc~ure cut~ do~vn the cor~plo~ity o~ tho Bdtcher ~et~orlc. HOWGYOr, the tot~l nU~bOr of uodc~ incTease~
(i.e. she ~ o~ module~ crea~e~. Thelle eYtra nodos ~re not simply ~Yerhead.
~ oteul9 1YlO~IUIKi~ proves both t~roughput and porfor~n~co. Tbi- i~ becau~e there sre fc~er input p~c~etll competiDg for output iD oach Jnodulo.
Thcre a~ç t~o ~peci~l cuell ~orth ~endoning. Who~ 1 (i.e. a situation ~vhere the ~ comprises oDly o~e ~odulo), th~ ~svitch architocture de~cribed abo~re roduces to a~l ordi~l~ry Bll~cher b~y~ switch I~ additio~ hen K~N, tho s~itch arc~itecturo reduw~ to the woll-~own IcDocltout 3~YitC~ (~ec o.g., Y.S. Yeb et ~1 ~Tho Knoclcout Switcb: A Simplo Modular Architoceuro for Hlgh Performsllce PackcS
3S S~vitching", I~EE Journal on S~lected Atea~ iA Coff~l~lca~lo~l~, Vol. SAC~5, No. 8, October 1987, pp. 1274~1283). I~ the l~uoclcout ~ritc~, each i~put i~l conn~cted to eYery output by a btoadlc~ss bu~. A bus illterf~ t eaGb output pro~ride~ pwkes filtcss for allowi~ p~ckcts addro~ed to thst ouSput to pa~ ~d for bloclcI~g ~11 o~er~. If e2ch , , . ~

. ~ ~. . . .
; . , ; : , .
, ; . , .
-. ~ ~ . .
.~., . , . ~ , . . ..
. . .

l0~i~41 module of tbe pac~ot Jwitcb of tbe prelle~t ~witch bu M~1 ioputa, thea the ~witch ot the pre~ellt invention i5 OqUiYalalt to a k~ocl~out ~lvitch ~ s indicated above, it is a signiflcaDt ad~/antage of the s~ritch arcilitecturo of the present invention, that the iDdiYiduai pac~et 3witch modules are 5 synchro~ized iDdepeDdently Thi- can be under~tood more clearly by lookiDg at the path~
through the ~witch modules In particular, eacb bir,~y tree subnet vork 33 carries at mo~t ono packet durln~ ~ time ~lot Tberofore ~ynehro~zation is unnece~ary ~or the biDary tree ~ehvor~s Th~ multiplesers 44 (~ee FI~. 3 a~d 4), oporating nsynchroDou~ly, are able to collcct pacltets comin~ ~rom difiere~t ba~ya~ ~ubnet~vork- 53 at different instants 10 of time Thu~, esch bulyan ~ubnetwor~ 53 ean be ~ynchroDized i~dependently wi~hout global synchroDizatioD The-e are at most M packet~l to bo ~ynchranized over tbe logM(logM+1~/2 Ytage~ in each Batcher networl~ a~d tbe logM ~tage- o~ each banysn network For tbe foregoiDg rea~lsD~ it i5 ch:ar that ouly loeal clock~ are needed a~d that no globd ~ynchronization of the ~witch module~ required iD accordance with tho prese~t 15 inveDSion This ia au importa~t adY~tage of the packet 3~itch architGcture of tbe preseut invention and enables the packet ~itch architecturo of the pre~ent i~Yentiou to be u~ed to i~plement v~ry large pacltet ~ /itche~ h~ving on the order of N=10,000 iDpUts and output~. . .

A~ indicatod ~bovo, an inherent proble~ iD the de~ign of paclcet ~witches i~ output eoDfllCts (i e e~tornal collbioD~) which occur whe~ n~ultiple pac!set~
simulianeously requs~t the ~ame outpnt pott. ~ proceduro for arbitra~dqg among iDpUt packsts bllvi~g t~e l~e de~ atioll is c~llcd a conteDtioll re~olution algorit~DI Thrce coDtcD~on N~olutloQ ~Igorith~a de~cloped iD coDL~cSion with coDYontioQal Batchc~-25 1b~yan s~itcho~ hs~o boen idendfied abo~e These are the Recirculation ~Igorithm, the Thrce-Phn~o slgor1th~ cDd tb~ Ri~g E3eaenatio~ orit~m. All three of these algorithm~
rendD Y~llid ~or tho indi~idual switch module~ ot the packet switch of t~e pre~e~t In~el~tioD.
In a packet s~itch, o~terD~I colli~ioD probabilitio~ caD be 30 ~ignifica~gly redueed by utili2iDg ~m output addrou e~eeDaion techniquo t3ee e g, Ne~v~aD, P, A Paat Packet Switch fos the Illto~rated Sbrvlcos Backbo~e Net~vork, IEEE .
Jo~rn~l on Seloctod Araa~ Commu~kstlo~w, Vol. 6, No. 9, Dece~nber 1988, pp 14681479; Wu, L T, Asthurs, ~ d Sl~ lcie, W D, A Paclce~ Net~vork tor BISDN
Applica~on~, Proc of 198~ Zs~rlc/l S~mlR~Ir on Dlgl~al Com Rull., Zurich, 1988, pp 191 35 197; Pattavius, A., Muitichall~el BandYvidth AllocatioD iD a Broadband Pacl~et Switch, IEEE lolur~l OA S~lcc~od ~Irea~ in Comml~nlcatioA3, Vol. 6, No. 9, Dcco~bor 1988, pp , ... . ... . .
~. .. . - ., ,. :, :

ZOOG24~L
13 .

1489-1499). I~ ~uch a ~ituation, the individual switch modules u~cd to torm an N~N
pAcl~et ~witch are wt M~N module3 ~ de~cribed abovc but arc iD~tead MxNP modulcs.
Such a ~odule h~s oDly N di~tinct output addro~e~, but ~t each of tb~ N dddre~c~ thcrc arc P outlet~ is~ ~llow up to P paclce~ to bc ~witched concllrrently to th~ sa~ne oDe o~
S the N output addres~c-. In thi~ c~c, the c~pan~ion network u~cd ;D tho module~ ~a~
logPN-logP+logN ~tagc~. Thu~, logP~logN addres~ bits sre rcquired to pcrform thesel~ routing algorithm in tho e~p~sion Det~vor~;. Ihero~ore, a logP=p bit group i~d¢r i5 appc~ded to the logN~bit dc3ti~atio~ addresll ~oDtuned in the pacl~et he~der~. The group IDdex enablal a pacl~ot to be routcd to a spccific output in a gro1lp of output~.
10A~ ~Igoritbm which ~ccompliYhcs both co~t~utio~ resolution aDd output addre3l1 estenSioD i~ de~cribed below iD connecdon with PIG. 7. PIG. 7 illwtrate~
a packet ~witch 100 which utilize~ output addrou e~ten~ior~ a~d ~ rçcirculatio~ ~Igorithm for ContentiOD rosolutio~. The switch 100 of FI(~. 7 h~ N iDpub 102 a~d N outpub 104.
The N input~ 102 aro di~idod iDto K set~ 103 ot M iDpUt eaGh. ThU~I the fir~t set 103 o2 15 iDputs 102 includc~ t~e i~lpUt~ 1...M and the lr'~ set 103 of i~put~ 102 ;DCIUdC8 thC iDpUt~
M(K-l)+ 1...MK.
Each ~et of M iDpUh 102 i5 a~lsOCidted with a pac~et 3witch module 106. As i~dicated in FIG. 7, thero are ~ !Iwitcb ~odules 101S. Illu~tmti~rely, e~ch module 106 compri~e~ a Batcher nctworlc ~d an e~p~sion nçtwork. I~ach modulG 106 ha~ M
20 input~ 1û8 which ~e u30ciated ~ith oDe of tho K set~ 103 of M iDpUtJ 102 dcscribed above. Each module 106 h~ NP output~ 110. More particulnsly, each module 106 has N
di~ti~ct output addreuea l...N aDd P output~ 110 ~ociated with osch output addre~.
The output~ 110 ~it~ ~ho ~llme addreu fro~ ellch ~odub 106 aro Dlultiple~ed togethçr by tho ~ultiplesers 112 a~d fed ec ~:he pac~et s~itch output 104 beari~g that address.
25AD illpUt port 114 i~ suociatod ~itb each o~ tho inputs 108 o~ tbe s~it~h mod~le 106. Dlu~tr~ dy, each ~pYt port 114 il~cl~lde~ a que~e (~ot ~bown) of pa~!cet~ ~UtiDg ~or ~ervicc by th~t psrticular inpue. The input port~ 114 a~wciated witk h modulo 106 sre coDnectod i~to a ring 116. Eacb riDg 116 includel~ a tolcen genentor (T~) 118. The tolce~ go~erator 11~ OD o~ch riDg 116 i~sues a tokon at the 30 begiDn}n~ of oach ti~~ ot. Each toiten i3 p~cd ~Iro~Jnd the ~ppropri~tc ri~ ~equonti~lly from input por~ to input po~t. The tolceDs a~e u~ed by tho input po~ 114 to reson~e output~ of th~ corrcspondi~g switch module 106 ~or tho pac~ott at tho head of thc a3sociated queue~.
The tolton~ ~re u~ed iD coojuoction with ~ Tin8 rosorvation ~llgorit~n 35 e/hich is csecutod at tbe input porb 11~ to re~erve output~ ~r particular pac~ets. The RiDg Re~ervatioD algorlthm nuly be undes~tood ill coDjunstlon with tho flo~r chart of FIG.
8 ~hicb shows ho~ tt~i~ dgorithm ii esecsted st the input port~.
~s show~ ir FIG. 8, 1I pacl~et header 200 lncludes an n-bit -.. ~ . .. . ~ - . . . . .

: . .: :: . :~ ,' :

~on~4l.

de~t~nation addrou D. The desti~ation addre~s D indicatc~ a group ot P output~. Thc pac~et heades also Includes a priority field S. ~ p bit uoup iDdes Is dote~miu~d ~y t~c riDg ro~les~alion algorithm discu!ls~d belo~ so th~t thc packct is toutedl to a 3peci~ic one of tho outputs ha~iD~ tho desti~ation D. A toke~l 202 al30 compri3ct t~o lielda. A group S i~de3 field t~....GI ~d n priosity field T. A~ tbe token arri~/e~ 3t a particular input port, each ~ubfield GJ iDdicstes the nu~ber of iDpUts ~vhlcb proviou31y rcserved the ~et of du~puts j. The reserYatio~ cycle ia di~rided into Q subcycle~, ~rhere Q is thc numbcr of pscket priorit3r cl~ses. DuriDg e~ch r~servatiloD cyclo ~ tolcCD will oirculAtc arouDd its ring ~ro~ i~put port to iDpUt port Q timcs. Dulring tbe ~Ih subcyclo, oDly packet~ in thc i'h 10 priority cla~ cn~ m~o a resenlltion for aD output.
Suppose the n-bit dostiDation addros~ in a pacl~et hcater at a p~ticular input port is D~ (d"....dl)2aS (h)lo, ~horeiD d"....dl i~ the bi~ary ropresc~tation of ~ho n-bit destiaation addro7a D and h is the dccimnl repre3ent3t}0D o~ the addrcsa D
~bo~ 204, ~ . 8). Suppose nlso t~at the ~ubtield ~;~, of th~ toke~ n~ay be repro~entcd a~
15 Gh=,~p+l 8p...~1 (bos 2û4, FIG. 8). U the priority cl~ S of the plwkot headcr is cquai to tho priiority ClaSD T of thc tokcn, and, ii~ the subfleld Gh=85~+l ~p...81 i~ les~ ~an P =S 2P, i~Ddica~ that there i~ unre~er~ed output eapaciq at the output addrea~ D = (f~)lO
(Bos 206, FI(i~. 8), then the lea3t signi~i~ant p bit~ of Gh are attached to the dc~t;natios add~es~ ~"....dl to for~ the routing ~Iddres~ d,...d" 8p - -81 îor ~ vinDinll packet and 20 tho ~ubficld ~h in the token i3 iDcremeDted by 1 ~BoY 207, ~1~;. 8). Tho ~oken i~ theD
paued to thc ~c~t i~put (E~o~ 208, PIS;. 8).
OD eho other hand, if the priority clag3 T of the tol~eD a~d the priority clas~ S o~ the paclcet are Dot cqud or i~ Ihe ~ubfield 0~ equal to P (i.e. all P . ~
ou~put~ for t~le output addrou D-(h)lo haY~ alroady bee~l rcse~ed3 theD the token is ~ i 25 3iDl~`ply p~scd ts~ tho nest input port (Bos 2û8, FIS3. 8). Whell ~uch ~ failed re~en/atioll :
pt takes p]a~, tho losillg pw~et h&3~ to lYdt ~or ~e rlest re3er~ion cyclc, possibly hi~her priori~ claos by DlodlfioAtioD o~ priority ~ield. Whe~ the token retur to it~ point after circ~lating ~rou~d a riD~ I particular ro~enction subcycle, the tolces ge~es&to~ uill iDcsea~c the priority field of the tokoD by 1 to co~nmcnce thc nc~t 30 !lub~yclo.
l~lustrstil~ely, p~c~et~ ~innia8 the contention rcsolutioD proce~s which talce~ placc durin~ a psrticubr dme ~llot sro traD~mittcd throu~b tho swi~ch module to the appsopri~te output~ duri~g tbe ue~t 3ubseque~t ti~e !llot.
Tko ril~ reJcnation ~Igorithm de~cribed abo~o i~ ection with 35 FIG. 8 i~ partlcularly sdY~tagcous for use in with a ~nodular svvitch ~rchitecture 3uch as tbat sho~D in FIG. 8 becau~c the ril~g roYerlratio~ orithm may be carried out indeponde~tly ~or each ~witcb module.

;:- .. : . .... : , .. - . . . .
.: :: ~ :. - . :
.: :: : , :: :
: : :

In hort ~ ~odular arcbitecturc for ~ very l~r~c pscket sv~itch h~
been di~clo~ed. In accordance with thc prescllt invcDtiou, switch modules, thc ~uildi~
blocks of the paclcet swit h, are themjel~re~ dependeDtly operated pwket switche~ of S relatively small size. E~ch module compri~e~ a Batcher lortiDg ~ubDctworlt, a set of binary tree subDetworlu, ~Dd a ~et o~ banyan sub~ehvorks. Becauu each of thc switch module3 oporates i~depe~de~tly, operation aDd mainteDance of the ~hole switch i~signiflc~ntly simpllfiod. For buildi~g ~ery largo packet ~itches, it b particular advantage of the i~rentive architecturo that each ot the switch module~ caD be 10 synchroDi~ed indCpeDdoDtly.
Flndly, tho 3bo~e described cmbodi~cDt~ o~ tho in~rc~tion aro inte3ded to be ill~tra~e only. Numerous alter~ative embodimcnt~ msy bo de~ised by those slcilled in the art without departing from tho spirit ~Dd ~copo o~ ~he pro~cnt DVC~tIOn IIS ~ct forth iD the ~ollowi~ cl~im~.
- ~' . .

,: : ,

Claims (32)

1. A packet switch comprising N inputs, N outputs, K (1<K<N) switch modules each of which has M=N/K inputs and at least N outputs, and N multiplexers, each of said multiplexers multiplexing corresponding outputs of said switch modules with one of said N outputs of said packet switch.
2. The packet switch of claim 1 wherein each of said modules comprises a sorting subnetwork and a routing network having more outputs than inputs.
3. The packet switch of claim 2 wherein said routing network is an expansion network comprising a set of binary tree subnetworks and a set of banyan subnetworks.
4. The packet switch of claim 3 wherein said sorting subnetwork is a Batcher subnetwork.
5. The packet of claim 4 wherein said Batcher subnetwork and said binary tree subnetworks are interconnected using a perfect shuffle interconnection pattern.
6. The packet switch of claim 1 wherein each of said switch modules includes NP outputs (P>1) and wherein each of said multiplexers multiplexes corresponding sets of P outputs of said switch modules with one of said N outputs of said packet switch.
7. The packet switch of claim 1 wherein each of said switch modules is associated with a ring located at the inputs thereof for resolving conflicts among packets containing the same destination address.
8. The packet switch of claim 7 wherein each of said rings includes means for generating a token.
9. The packet switch of claim 1 wherein each of said modules is synchronized independently.
10. The packet switch of claim 1 wherein each of said multiplexers operates asynchronously.
11. A packet switch comprising a plurality of inputs, a plurality of outputs, and a plurality of independent switch modules arranged in parallel and each located between certain of said inputs and certain of said outputs, each of said switch modules comprising a sorting subnetwork and an expansion network including a set of binary tree subnetworks and a set of banyan subnetworks.
12. The packet switch of claim 11 wherein said packet switch has N
inputs, wherein said packet switch has N outputs, and wherein said packet switch includes K(1<K<N) modules each of which has M=N/K inputs and at least N outputs.
13. The packet switch of claim 12 wherein each of said modules includes N sets of P outputs each.
14. The packet switch of claim 11 wherein each of said module is synchronized independently.
15. The packet switch of claim 11 wherein each of said switch modules further includes means for resolving conflicts between packets having the same destination address.
16. The network of claim 11 wherein said packet switch further includes a plurality of multiplexers, each of said multiplexers multiplexing at least one banyan output from each of said modules with one of said plurality of outputs of said switch.
17. A packet switch comprising a plurality of outputs, a plurality of inputs for receiving packets to be routed by said switch to particular ones of said outputs, a plurality of independently synchronized packet switch modules located between said inputs and said outputs, each of said modules being able to provide a path from any of a subset of said inputs to any of said outputs, so that a packet to be routed between a particular input and particular output is transmitted through only one module.
18. The packet switch of claim 17 wherein each of said modules comprises a sorting subnetwork, a set of binary tree subnetworks connected to said sorting networks and a set of banyan subnetwork connected to said binary tree subnetworks.
19. The packet switch of claim 17 wherein said switch further comprises a set of multiplexers, each of said multiplexers multiplexing at least one output from each of said module with one of said outputs of said packet switch.
20. The packet switch of claim 17 wherein said packet switch has N
inputs and N outputs, wherein said switch comprises K(1<K<N) modules, and wherein each of said modules has M=N/K inputs and at least N outputs.
21. The packet switch of claim 20 wherein each of said modules has N
sets of P outputs each.
22. The packet switch of claim 17 wherein each of said modules includes means for resolving conflicts between packets having the same destination.
23. A circuit for switching packets comprising a sorting network for sorting a set of input packets, a set of binary tree networks for receiving packets sorted by said sorting network, and a set of banyan networks connected to said binary tree networks.
24. The circuit of claim 23 wherein said sorting network and said binary tree networks have a perfect shuffle interconnection pattern.
25. The circuit of claim 23 wherein said binary tree networks are stacked in a first direction and said banyan networks are stacked in a second direction orthogonal to said first direction.
26. The circuit 23 wherein said circuit further comprises means for resolving conflicts between packets having the same destination address.
27. The circuit of claim 26 wherein said means comprises a ring including a token generator associated with said sorting network.
28. A method for enabling packets to reserve outputs of a packet switching device having a plurality of inputs connected in a ring and a plurality of sets of outputs, said method comprising the steps of:
during a contention resolution cycle, circulating a token around said ring sequentially from one input to the next, at at least some of said inputs, using a number contained in the token to determine if the number of inputs which previously reserved a set of outputs to which a packet is addressed is less than or equal to the number of outputs in the set, if the number of inputs which previously reserved said set of outputs is equal to the number of outputs in said set, passing said token to the next input on said ring, if the number of inputs which previously reserved said set of outputs is less than the number of outputs in said set, reserving said set of outputs for said packet by adding an index to address information contained in said packet so that said packet is routed through said packet switching device to one of the outputs in said set, incrementing said number contained in said token by one, and passing said token to the next input in said ring.
29. The method of claim 28 wherein all of said sets of outputs have the same number of outputs.
30. the method of claim 28 wherein said index comprises at least a subset of the bits forming said number in said token.
31. The method of claim 28 wherein said contention resolution cycle compr ises a plurality of subcycles, wherein during each of said subcycles said token is circulated sequentially around said ring from one input to the next, and wherein during each subcycle only packets of a particular priority class can reserve one of said output sets.
32. The method of claim 28 wherein said using stop further includes comparing a priority class of said packet with a priority field of said token, said packet reserving said set of outputs when said priority class of said packet equal said priority field of said token and when the number of input which previously reserved said output set is less than the number of outputs in said set.
CA 2006241 1989-05-02 1989-12-20 Modular architecture for very large packet switch Abandoned CA2006241A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US34598389A 1989-05-02 1989-05-02
US07/345,983 1989-05-02

Publications (1)

Publication Number Publication Date
CA2006241A1 true CA2006241A1 (en) 1990-11-02

Family

ID=23357421

Family Applications (1)

Application Number Title Priority Date Filing Date
CA 2006241 Abandoned CA2006241A1 (en) 1989-05-02 1989-12-20 Modular architecture for very large packet switch

Country Status (2)

Country Link
CA (1) CA2006241A1 (en)
WO (1) WO1990013957A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5377180A (en) * 1991-12-23 1994-12-27 U.S. Philips Corporation Data switching device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4761780A (en) * 1986-12-22 1988-08-02 Bell Communications Research, Inc. Enhanced efficiency Batcher-Banyan packet switch
JPH0683261B2 (en) * 1987-05-26 1994-10-19 富士通株式会社 Header-driven packet switch
EP0312628B1 (en) * 1987-10-20 1993-12-29 International Business Machines Corporation High-speed modular switching apparatus for circuit and packet switched traffic

Also Published As

Publication number Publication date
WO1990013957A1 (en) 1990-11-15

Similar Documents

Publication Publication Date Title
EP0997050B1 (en) Method and apparatus for multipoint-to-point transmission in an atm network
US5892932A (en) Reprogrammable switching apparatus and method
US5157654A (en) Technique for resolving output port contention in a high speed packet switch
EP0735726A2 (en) A packet-based telecommunications network
Chao et al. Design and analysis of a large-scale multicast output buffered ATM switch
AU641735B2 (en) ATM switching network
WO1997019407A9 (en) A method and apparatus for switching
US5317561A (en) Method for switching ATM message cell streams having a high bit rate via a switching equipment having a lower bit rate
EP0375249A3 (en) Self routing packet switching network architecture
CA2074346A1 (en) Telecommunication system
US5148428A (en) Modular architecture for very large packet switch
US5712854A (en) Method of routing cells in an asynchronous time-division multiplex switching network and corresponding network input switch and application
JPH10271110A (en) Switching device for asynchronous transfer mode exchange system and its method
CA2048198A1 (en) Growable switch
CA1324205C (en) Sorting unit for a switching node comprising a plurality of digital switching matrix networks for fast, asynchronous packet switching networks
US5065394A (en) Packet routing switch
CA2006241A1 (en) Modular architecture for very large packet switch
US6563819B1 (en) Augmented ring-banyan network and method for controlling routing therein
WO2000004741A1 (en) A reliable and robust atm-switch
Chen A survey of multistage interconnection networks in fast packet switches
WO1993006675A1 (en) Nonblocking multicast fast packet/circuit switching networks
Wei et al. On the multiple shared memory module approach to ATM switching
US5365522A (en) Switching network
Chao et al. A large-scale multicast output buffered ATM switch
EP1161033B1 (en) Network node for ATM transmission system

Legal Events

Date Code Title Description
EEER Examination request
FZDE Dead