WO2002054269A1 - A scalable data processing system - Google Patents

A scalable data processing system Download PDF

Info

Publication number
WO2002054269A1
WO2002054269A1 PCT/AU2001/001654 AU0101654W WO02054269A1 WO 2002054269 A1 WO2002054269 A1 WO 2002054269A1 AU 0101654 W AU0101654 W AU 0101654W WO 02054269 A1 WO02054269 A1 WO 02054269A1
Authority
WO
WIPO (PCT)
Prior art keywords
data processing
optical
processing system
network
scalable
Prior art date
Application number
PCT/AU2001/001654
Other languages
French (fr)
Inventor
Fergus O'brien
Mark Adam Summerfield
Rodney Stuart Tucker
Original Assignee
Royal Melbourne Institute Of Technology
The University Of Melbourne
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Royal Melbourne Institute Of Technology, The University Of Melbourne filed Critical Royal Melbourne Institute Of Technology
Publication of WO2002054269A1 publication Critical patent/WO2002054269A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0005Switch and router aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/62Wavelength based
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0005Switch and router aspects
    • H04Q2011/0007Construction
    • H04Q2011/0032Construction using static wavelength routers (e.g. arrayed waveguide grating router [AWGR] )

Definitions

  • the present invention relates to a scalable data processing system, and in particular to a data processing system comprising a plurality of data processing nodes intercomiected by optical fibre.
  • a scalable data processing system comprising a plurality of data processing nodes, said nodes exchanging data * by transmitting and receiving optical signals over an optical network, wherein the routing of said optical signals is based on the wavelength of said optical signals.
  • said optical network includes a passive, non-blocking optical switch to route said optical signals between said nodes.
  • said non-blocking optical switch is an arrayed waveguide grating multiplexer.
  • the topology of said network is a star topology, with said non-blocking optical switch at its centre.
  • signals from a plurality of said nodes may be transmitted over said optical network by wavelength division multiplexing.
  • each node includes a plurality of processors.
  • said optical signals are processed at -an optical transceiver interface of each node, and the bit rate of said optical signals is matched to the bit rate of a data bus of said node.
  • the present invention also provides a massively scalable data processing system comprising a plurality of scalable data processing systems as described above, the interconnections between said systems characterised by a "small world" topology.
  • the mean connectivity between said systems is approximately 1.6.
  • the present invention also (provides an optical transceiver for transmitting and receiving optical signals in a network, said transceiver including: a plurality of light sources for transmitting optical signals to said network, each of said light sources being adapted to transmit light at a respective optical wavelength; and at least one photodetector for receiving said optical signals from said network; wherein said transceiver is adapted to simultaneously transmit optical signals representing a same datagram using two or more of said plurality of light sources.
  • the present invention also provides a wavelength division multiplexing (WDM) network for use in a scalable data processing system, the WDM network including: a plurality of optical transceivers, each transceiver including: a plurality of light sources for transmitting optical signals to said network, each of said light sources being adapted to transmit light at a respective optical wavelength; and at least one photodetector for receiving said optical signals from said network; wherein each said optical transceiver is operably associated with a data processing node of said scalable data processing system; and a passive, non-blocking optical router in communication with said plurality of optical transceivers for routing data encoded in said optical signals between said data processing nodes of said scalable data processing system.
  • WDM wavelength division multiplexing
  • the scalable data processing system may be used as a proxy server for storing and retrieving data in a communications network.
  • the scalable data processing system may be used as a transaction server for storing and retrieving financial transaction data in a communications network.
  • Figure 1 is a diagram of a prior art single node computer system
  • Figure 2 is a diagram of an embodiment of a split-node computer system
  • Figure 3 is a block diagram of an embodiment of a scalable data processing system
  • Figure 4 is a schematic diagram of an embodiment of a passive optical switch for use in the scalable data processing system
  • Figure 5 is a schematic diagram of an optical receiver of an optical transceiver the scalable data processing system
  • Figure 6 is a schematic diagram illustrating the interface between the optical transceiver and a node of the scalable computer system
  • Figure 7 is a schematic diagram of a scalable transaction system
  • Figure 8 is a schematic diagram illustrating the flow of messages between a purchaser, a merchant and the transaction system
  • Figure 9 is a block diagram of a scalable data processing system of an embodiment of the invention.
  • Figure 10 is a block diagram of a massively scalable data processing system of a further embodiment of the invention.
  • the computer system of Figure 1 is a prior art, single processor system based on the Open Telecom Platform (OTP) distributed applications platform developed by Ericsson, incorporating a number of programming infrastructures, one of which is an interpreter for a functional language named Erlang.
  • the computer system 10 includes hardware 12, an operating system 14, a display 16 and a keyboard 18 and a suite of programs 20 which include application programs 22 (e.g., written in the programming languages Erlang and C), sourced programs 24, run-time programs 26, a library 28, and a database 30.
  • the system may be linked to an external database 32 if required.
  • the single node includes an interpretive Erlang environment used for application development. However, the performance of this system is limited by the interpreter and operating system layers. This system is referred to as the base system.
  • a split-node computer system 34 comprising two closely coupled nodes: a standard computer system 40 and a multi-processor (MP) computer system 50.
  • the MP computer system 50 is a shared memory MP system running software 52, 54 in compiled Erlang on top of an optimised message passing kernel 56 such as QNX.
  • the MP computer system 50 is referred to as an Erlang engine 50 to reflect the much closer coupling to the system hardware than the Erlang interpreter executing in a standard Unix operating system environment, described above.
  • the standard computer system 40 is used for application development and comprises a standard UNIX operating platform 41, application programs (e.g., in C, C++, Java and Erlang) 42 and optionally, one or more disk drives 43, graphics display 44, modem 45 and an Internet interface (TCP/IP) 46.
  • the standard computer system 40 also has a high speed interface, such as a gigabit ethernet connection, 47 for communicating with the Erlang Engine 50.
  • the Erlang Engine 50 also has a high speed interface 58, such as a gigabit ethernet connection, for communicating with the computer system 40.
  • One processor of the MP set can be devoted to monitoring software, with the remainder dedicated to functional data processing, as described in PCT/AU97/00678, the entire disclosure of which is incorporated herein by reference.
  • the Erlang Engine 50 (also termed a "node" of the scalable computer system 300) may be scaled from one processor up to the number of processors that saturate the shared memory of the node. Typically, a maximum of 8 processors would be present.
  • the eight processors coupled with a performance increase (measured in millions of instructions per second) by a factor of five by moving from interpreted to compiled code, plus a further performance increase by a factor of two by moving from Unix to QNX, gives an overall performance capacity eighty times greater than the performance of the base system of Figure 1.
  • a number of MP Erlang Engines 50 are networked in a star configuration through a switch 170 for form a scalable computer system 300.
  • Each node of the computer system 300 is an Erlang Engine 50 with multiple processors 152 running on a QNX kernel 156.
  • Each Erlang Engine 50 may include a link to an OTP system 134 in a similar manner to the split-node system of Figure 2, and may- also have a local Internet connection 145.
  • Multiprocessor systems are ultimately limited by the bandwidth of the communications channels used to interconnect the nodes, and the speed and flexibility of the switches used to route control and data messages between them.
  • conventional multiprocessor interconnects do not scale gracefully as the number of nodes increases.
  • the scalable computer system 300 overcomes these problems by interconnecting the nodes 50 with optical fibres
  • optical switch 170 also called a wavelength router.
  • the high transmission capacity of optical fibre means that exhaustion of the available bandwidth is unlikely.
  • the network which links the nodes of the scalable computer is based on wavelength division multiplexing (WDM) technology.
  • WDM wavelength division multiplexing
  • multiple data channels are multiplexed onto a single fibre by modulating each data channel onto a separate optical signal with a distinct wavelength.
  • individual channels can be selected and routed entirely in the optical domain using simple, passive optical components.
  • the scalable computer system 300 is based on a passive optical switching arrangement as illustrated in Figure 4.
  • Each node is connected to the network via a single fibre pair: data is sent to other nodes on one fibre of the pair, while signals arriving from other nodes in the network are delivered via the second fibre.
  • Figure 4 shows output fibres 411 to 414 on the right, and input fibres 401 to 404 on the left, although in reality the source and receiver of each node are co-located.
  • the topology of the network is thus a simple star.
  • the switch or wavelength router 170 At the hub of the star is the switch or wavelength router 170, which passively routes incoming signals to its output ports 411 to 414 according to (i) which input port they arrive on, and (ii) the signal wavelength; The.
  • wavelength router- 170 is inherently collision-free: Its routing properties are static, and signals arriving at the same wavelength on different input ports are never routed to the same output port. An example is illustrated in Figure 4, in which three wavelengths ( ⁇ l5 ⁇ 2 , and ⁇ 3 ) are shown arriving on two input ports 401, 403 and being routed to four output ports 411 to 414.
  • the wavelength router 170 is preferably an Arrayed Waveguide Grating Multiplexer (AWGM).
  • AWGM Arrayed Waveguide Grating Multiplexer
  • other technologies based on all-fibre devices such as in-fibre Bragg gratings may alternatively be used as the
  • the wavelengths ⁇ ls ⁇ 2 ,...., (for N up to, say, 256) are spread evenly across the available bandwidth.
  • the wavelength routing characteristics of the passive optical router 170 are static, meaning that no active "switching" occurs within the interconnect network. In order to transmit data to a chosen destination node, it is necessary to transmit it on the appropriate wavelength. However, the wavelength required to reach a particular destination node also depends upon the source node. Because the routing is static, the complete set of wavelengths corresponding to all source-destination pairs is known when the interconnect is installed and/or reconfigured. Due to the static routing properties of the wavelength router 170, the "switching" function is actually provided at the source node by selecting the appropriate wavelength for transmission.
  • each Erlang Engine 50 includes an optical transceiver 160, including at least one optical receiver 161 and a transmitter comprising one or more optical sources 162.
  • the scalable computer system 300 requires that the interface 160 be capable of receiving and transmitting optical signals of different wavelengths.
  • the transmission of different wavelengths is achieved by including in the transmitter a multi-wavelength light source, comprising a series of lasers, each with a unique wavelength, and a shared optical modulator.
  • a multi-wavelength source may comprise an integrated semiconductor laser array, i.e., a series of tuned single-wavelength lasers with distinct wavelengths and a common modulator.
  • the multiple sources and shared modulator allow each node to multicast or broadcast the same data simultaneously to many nodes.
  • Each source can be independently turned “on” and “off, but all sources share a single modulator, so that if multiple sources are active, they are all modulated with the same data signal.
  • the individual lasers in the array are produced using gratings written into Erbium-doped fibre to produce fibre distributed feedback (DFB) lasers. These sources produce spectrally pure and stable output wavelengths without the active temperature control which is required for semiconductor WDM laser sources. However, the individual lasers in these sources are not independently addressable. This means that any data to be sent is transmitted simultaneously on all wavelengths, requiring additional intelligence at the receivers to enable them to identify incoming data addressed to the local node. However, for systems including up to 256 nodes, only eight bits of addressing information is required.
  • the transceiver In addition to transmitting signals with different wavelengths, the transceiver must also be able to receive and decode multiple wavelengths. Therefore, the transceiver includes a receiver 161 with a wavelength demultiplexer 500 and an array of photodetectors 501, as shown in Figure 5.
  • a single tunable receiver is impractical because tunable optical filters are too slow, and it is in any case impossible to arrange for the destination node to tune to the correct wavelength to receive a transmission without employing a common signalling channel. This reintroduces the original problems of scalability, as the signalling channel is a shared resource which can become congested, leading to increases in latency and reduction of throughput.
  • the receiver 501 may be fabricated as an array of photodetectors integrated with an array-waveguide-grating-based demultiplexer on a single semiconductor photonic integrated circuit. Alternatively, if wide-band photodetectors are used, the demultiplexing function can be replaced by a simple splitting function.
  • the transceiver 160 is a bus interface card, in one embodiment a personal computer communications interface (PCI) card, providing a transmission speed per wavelength matched to the peak bus speed (i.e., in the order of two gigabits-per-second for a 66-MHz PCI bus using 32-bit data transfers).
  • PCI personal computer communications interface
  • the PCI bus interface card is built from standard integrated circuits and uses logic devices (e.g., Xilinx Field Programmable Gate Arrays) to implement the interface-specific logic functions.
  • the multiplexing and demultiplexing of wavelengths uses telecommunications devices developed for SONET/SDH systems.
  • the card includes discrete single-wavelength semiconductor lasers and discrete photodetectors, but integrated laser and photodiode arrays may alternatively be used.
  • the passive optical router 170 is preferably an AWGM. Additional WDM devices, such as AWGMs, are preferably used as demultiplexers 501.
  • each node is interfaced to the optical fibre 165 of the WDM network by an interface card 601 including the optical transceiver 160 (shown as WDM interface 605).
  • Data is transferred to and from the node's memory 602 (local to Erlang Engine 50) and the interface card 601 via the node's PCI bus 603, as shown in Figure 6.
  • the transmission rate of each wavelength of the WDM network is matched to the speed of the PCI bus 603, so that data is transferred directly from the bus to the network.
  • the buffering required on the transmit side is thus negligible, and the wavelength router 170 is neither a bottleneck in the system, nor unnecessarily over-dimensioned relative to the PCI bus performance.
  • DMA remote direct memory access
  • a bit mask is generated by the node 50 and forwarded to the interface card 601, along with the data to be transmitted, in order to identify which one or more of the (laser) light sources 162 is to be modulated with the same outbound data.
  • the bit mask effectively turns off the light sources which are not needed for transmission of the data. This arrangement enables multicasting or broadcasting of data over the WDM but does not allow for different datagrams to be transmitted simultaneously from the same interface card 601.
  • Logic in the receiver 161, as part of WDM interface 605, identifies incoming messages 5 - and initiates transfer to the node's memory 602 via the node's PCI bus 603.
  • Incoming messages are identified by the presence or absence of an optical carrier signal, or, in the embodiment wherein the lasers in the array are always on, by local address decoding, known to the skilled addressee.
  • the receiver 161 does not include a demultiplexer 500, and all wavelengths are simultaneously received. This allows collisions to occur, but may nevertheless result in an acceptably low error rate (e.g., ⁇ 1 in 10 10 ), depending upon conditions.
  • the WDM network has the following features:
  • a source node may transmit a message to any destination node at any time. While there may be contention amongst processors or peripherals for access to the interconnect via the local bus, there is no contention for access to the transmitters, or within the passive optical interconnect itself.
  • Messages may be unicast, multicast or broadcast. This facility is inherent in the use of multiwavelength sources with a shared modulator.
  • the optical transceiver 160 can transmit and receive up to 32 distinct wavelengths; however the number of wavelengths available in WDM systems is rapidly increasing. For example, the WaveStar 800G dense WDM system from Lucent Technologies uses 320 distinct wavelengths. A system size of 256 nodes (Erlang Engines 50) is considered to be practical. Since the system gives almost linear scaling on separable problems, 256 nodes can provide a performance speed-up of 20,000 over the basic system.
  • the computer system 300 of Figure 2 is a fully connected system in the sense that each Erlang Engine 50 has its own dedicated optical fibre link 165 to the central optical switch 170.
  • a larger, composite, massively scalable computer system may be constructed from a large number of such systems 300 in a "small world" configuration, as described in International patent application no. PCT/AU00/00796, the disclosure of which is hereby incorporated herein by reference.
  • one or more of the scalable computer systems 300 described above, each comprising up to 256 multiprocessor nodes (Erlang Engines 50) is considered to constitute a single "neighbourhood" of the overall system.
  • each having up to 256 fully connected Erlang Engines 50 with 8 processors per Erlang Engine 50 could be linked together via the optical router 170 of each scalable system 300 in a small world network to achieve a total system of over one million processors, with the small world network architecture providing effective connectivity between the nodes with only a relatively small number of cross-links, say in the order of fifty or between 10% and 20% of the number of available wavelength channels, between the neighbourhoods.
  • the total number of individual Erlang processes running on such a massively scalable system may be of the order of 256,000,000 assuming 2000 active processes per Erlang Engine node 50.
  • Figure 9 illustrates the scalable data processing system 300 in a simple form, with the passive wavelength router 170 interconnecting nodes 50 by communicating with an interface card 601 associated with each node 50.
  • Figure 9 shows only three nodes 50 connected through the optical router 170, any number of nodes can be connected in this way, depending on the routing capacity of the router 170 and the available transmission bandwidth.
  • FIG 10 is a simplified block diagram of a massively scalable data processing system comprising multiple interconnected scalable data processing systems 300.
  • ⁇ data processing systems 300 are interconnected through- optical routers 170- via optical fibre and each includes a number of processing nodes 50.
  • the network connections within and between the scalable data processing systems 300 in the massively scalable data processing system conform to those of a small world topology.
  • the scalable computer system 300 and the massively scalable computer system, both described above, have a number of important features for applications. For example, large computer systems have to be maintained over significant periods of time. In the case of the scalable computer system 300 described herein, new nodes may be added, old ones decommissioned, and failures may occur, to consider only the hardware. Similarly, software evolves, and needs to be available on an ad hoc basis when peaks of application load appear. This continuous instantiation process implies that the architecture must provide 1:1 connectivity between processes, or as close to 1:1 as possible. This means that it should not matter whether or not two connecting processes are on the same physical node, or on different ones.
  • this perfect connectivity applies up to the scalable computer system 300; with the massively scalable computer system, the mean connectivity becomes 1:1.6.
  • the scalable and massively scalable systems have a multicast capability, from individual node-to-node messaging through to system-wide broadcasts.
  • the requirement for many applications in this category is for rapid and uniform access performance for high rates of independent transactions.
  • the MicroBank application described below is a typical application in this category, aggregating small individual transactions efficiently for a banking environment.
  • Proxy server caches are a mechanism for reducing the number of bytes transferred to a site from a web server. Essentially a proxy cache looks at the request for a web page and either sends a stored copy of the web page so that future requests can be answered locally instead of having to make a request to a remote web server Periodically, the cache must be cleared of out of date pages to make space for new, or more popular pages.
  • caching proxy servers use magnetic disk storage to store requests and memory to store descriptions of the pages available locally.
  • An Ultra Cache is a scalable computer system which stores all cached pages in solid-state memory, providing rapid access to- cached pages. Furthermore,- the high separability of the caching problem makes the scalable computer system approach to scaling ideal. The task of retrieving pages and clearing out of date pages is easily distributed and relatively independent. This means that adding a new node to the Ultra Cache gives almost the full benefit of the node's processing capacity and memory to the task of caching. In conventional systems, adding a processor brings less benefit due to system overloads.
  • Authentication of individual users, to enable granting of services, is crucial for providers to ensure security and network availability.
  • the scalability, and uniform access speeds of the system make it ideal for authentication processes.
  • ISP Internet Service Provider
  • the architecture of the system a set of highly intelligent nodes linked through a very high speed non-blocking interconnect, permits new possibilities for routing packets and managing network services, including limiting bandwidth, and differentially charged services.
  • a key characteristic of the system architecture is the use of processors, within the multi- processor nodes, for continuous system monitoring. This in-built capability permits financial information for billing purposes to be directly acquired at the individual - transaction level, without any significant processing overload.
  • a key issue in the medical and social security area is that of blending rapid access with security and privacy considerations.
  • the Microbank (or ⁇ -Bank) is an application implemented on the scalable computer system allowing financial institutions to handle extremely large numbers of transactions, particularly relating to very small amounts.
  • the ⁇ -Bank performs three roles: it transfers funds to the merchant on presentation of a valid request for payment, acts as the Purchaser's agent when a merchant requests payment, and aggregates charges and collects f ⁇ nds from existing banks or credit providers.
  • Anonymity for the Purchaser is provided, as the Purchaser's identity is known only to the ⁇ -Bank and is never provided even by oblique reference to the Merchant.
  • the Merchant is absolved of retaining records about its clients as the ⁇ -Bank either approves or disapproves a transaction immediately.
  • the ⁇ -Bank transfers funds when a transaction is approved.
  • the overall architecture seen by a Purchaser is shown in Figure 7.
  • the purchase transaction in Figure 8 shows an example of how the ⁇ -Bank operates from the user perspective.
  • a purchaser uses a web browser 801 executing on their computer to access a merchant's web server 803 via the Internet (step 701).
  • the purchaser chooses an item on the merchant's web site, and indicates by pressing a button that he wishes to purchase the item by using the ⁇ -bank (step 702).
  • the merchant's web server 803 generates a unique session identifier and instructs the purchaser's web browser 801 to create a new browser window 802, directed to the ⁇ -bank server 804 and provided with the unique session identifier (step 703).
  • the new browser window 802 sends a request to the ⁇ -bank server 804 requesting execution of a Java ⁇ -bank agent and providing the unique session identifier (step 704). Then the ⁇ -bank server 804 sends the agent to the purchaser's computer at step 705. The purchaser authenticates themselves to the agent using a username/password pair and provides an upper limit for the financial transaction to the agent, which sends it to the ⁇ -Bank server 804 at step 706. Returning to the original browser window, the purchaser now finalises the purchase at step 710. The merchant's web server 803 sends a request for payment to the ⁇ -bank server 804, using the unique session identifier to identify the transaction (step 707). The ⁇ -bank server 804 checks that the transaction amount is within the limit specified by the purchase at step 706, and approves the transaction at step 708. The transaction is completed and the merchant sends the item to the purchaser (step 709).
  • a significant feature of the ⁇ -bank is that the Merchant generates the identifier which is used to identify the Purchaser. Furthermore, although the Merchant effectively initiates the Purchaser's session with the ⁇ -Bank, the Merchant does not mediate that session as the
  • the ⁇ -Bank also has the opportunity to ask the Purchaser if they wish to proceed with a transaction if it falls outside the parameters provided by the user.
  • the initiation of the session with the ⁇ -Bank can be automatic, with potential but manageable problems with "man-in-the-middle” attacks by disreputable merchants, or manual where the Purchaser either starts a local version of the agent or makes a new window and points their browser at the ⁇ -Bank. "Man-in-the-middle” attacks can be managed through the user of "Security Certificates" for Java Applets.
  • the current ⁇ -Bank design is based on a message passing core implementing the banking functions written in Erlang.
  • the implementation language for the ⁇ -Bank agent is Java.
  • the ⁇ -Bank agent communicates with the ⁇ -Bank server ⁇ 04 using a ⁇ -Bank Purchaser Protocol ( ⁇ BPP).
  • the merchant's server 803 requires clients which interface with the ⁇ -Bank Purchaser Protocol ( ⁇ BMP).
  • Accounts are maintained for each user of the ⁇ -Bank. Users may act as either merchants or purchasers.
  • the ⁇ BMP is used to designate whether the user is acting as a merchant or a purchaser.
  • Accounts hold consolidated information for each transaction session consisting of the start and end dates for the transaction session and the merchant's identity. Accounts can be run in either a credit or a debit mode. In the credit mode, the consolidated account is debited either periodically or when the credit limit is reached from some external source of funds such as a bank account or credit card. In debit mode, an amount is transferred to the ⁇ -Bank and this amount is decremented until either a zero balance is achieved and no further transactions are allowed, or until the money is replenished.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A scalable data processing system including a plurality of data processing nodes, the data processing nodes exchanging data by transmitting and receiving optical signals over an optical network, wherein the routing of the optical signals is based on the wavelength of the optical signals.

Description

A SCALABLE DATA PROCESSING SYSTEM
The present invention relates to a scalable data processing system, and in particular to a data processing system comprising a plurality of data processing nodes intercomiected by optical fibre.
The rapid growth of communications networks in recent years has fuelled the need for scalable data processing platforms able to handle the demands of an ever increasing number of concurrent users. Current communications and computing systems for processing data from the Internet are of limited scalability, and must be replaced, placed behind a data filter, or -relocated to a smaller network- hen the volume of data exceeds a certain limit. Information technology growth trends indicate the need for a data processing system that can be scaled by many orders of magnitude to meet increasing demand without the need to change application programs executed by the system. It is desired, therefore, to provide a scalable data processing system, or at least a useful alternative to existing data processing systems.
In accordance with the present invention there is provided a scalable data processing system comprising a plurality of data processing nodes, said nodes exchanging data *by transmitting and receiving optical signals over an optical network, wherein the routing of said optical signals is based on the wavelength of said optical signals.
Preferably, said optical network includes a passive, non-blocking optical switch to route said optical signals between said nodes.
Preferably, said non-blocking optical switch is an arrayed waveguide grating multiplexer.
Preferably, the topology of said network is a star topology, with said non-blocking optical switch at its centre.
Advantageously, signals from a plurality of said nodes may be transmitted over said optical network by wavelength division multiplexing. Preferably, each node includes a plurality of processors.
Preferably, said optical signals are processed at -an optical transceiver interface of each node, and the bit rate of said optical signals is matched to the bit rate of a data bus of said node.
The present invention also provides a massively scalable data processing system comprising a plurality of scalable data processing systems as described above, the interconnections between said systems characterised by a "small world" topology. Preferably, the mean connectivity between said systems is approximately 1.6.
The present invention also (provides an optical transceiver for transmitting and receiving optical signals in a network, said transceiver including: a plurality of light sources for transmitting optical signals to said network, each of said light sources being adapted to transmit light at a respective optical wavelength; and at least one photodetector for receiving said optical signals from said network; wherein said transceiver is adapted to simultaneously transmit optical signals representing a same datagram using two or more of said plurality of light sources.
The present invention also provides a wavelength division multiplexing (WDM) network for use in a scalable data processing system, the WDM network including: a plurality of optical transceivers, each transceiver including: a plurality of light sources for transmitting optical signals to said network, each of said light sources being adapted to transmit light at a respective optical wavelength; and at least one photodetector for receiving said optical signals from said network; wherein each said optical transceiver is operably associated with a data processing node of said scalable data processing system; and a passive, non-blocking optical router in communication with said plurality of optical transceivers for routing data encoded in said optical signals between said data processing nodes of said scalable data processing system.
Advantageously, the scalable data processing system may be used as a proxy server for storing and retrieving data in a communications network.
Advantageously, the scalable data processing system may be used as a transaction server for storing and retrieving financial transaction data in a communications network.
A preferred embodiment of the present invention is hereinafter described, - by way of example only, with reference to the accompanying drawings, wherein:
Figure 1 is a diagram of a prior art single node computer system;
Figure 2 is a diagram of an embodiment of a split-node computer system; Figure 3 is a block diagram of an embodiment of a scalable data processing system;
Figure 4 is a schematic diagram of an embodiment of a passive optical switch for use in the scalable data processing system;
Figure 5 is a schematic diagram of an optical receiver of an optical transceiver the scalable data processing system; Figure 6 is a schematic diagram illustrating the interface between the optical transceiver and a node of the scalable computer system;
Figure 7 is a schematic diagram of a scalable transaction system; and
Figure 8 is a schematic diagram illustrating the flow of messages between a purchaser, a merchant and the transaction system; Figure 9 is a block diagram of a scalable data processing system of an embodiment of the invention; and
Figure 10 is a block diagram of a massively scalable data processing system of a further embodiment of the invention.
The computer system of Figure 1 is a prior art, single processor system based on the Open Telecom Platform (OTP) distributed applications platform developed by Ericsson, incorporating a number of programming infrastructures, one of which is an interpreter for a functional language named Erlang. The computer system 10 includes hardware 12, an operating system 14, a display 16 and a keyboard 18 and a suite of programs 20 which include application programs 22 (e.g., written in the programming languages Erlang and C), sourced programs 24, run-time programs 26, a library 28, and a database 30. The system may be linked to an external database 32 if required. The single node includes an interpretive Erlang environment used for application development. However, the performance of this system is limited by the interpreter and operating system layers. This system is referred to as the base system.
Referring to Figure 2,- there is-shown a split-node computer system 34 comprising two closely coupled nodes: a standard computer system 40 and a multi-processor (MP) computer system 50. The MP computer system 50 is a shared memory MP system running software 52, 54 in compiled Erlang on top of an optimised message passing kernel 56 such as QNX. The MP computer system 50 is referred to as an Erlang engine 50 to reflect the much closer coupling to the system hardware than the Erlang interpreter executing in a standard Unix operating system environment, described above. The standard computer system 40 is used for application development and comprises a standard UNIX operating platform 41, application programs (e.g., in C, C++, Java and Erlang) 42 and optionally, one or more disk drives 43, graphics display 44, modem 45 and an Internet interface (TCP/IP) 46. The standard computer system 40 also has a high speed interface, such as a gigabit ethernet connection, 47 for communicating with the Erlang Engine 50. The Erlang Engine 50 also has a high speed interface 58, such as a gigabit ethernet connection, for communicating with the computer system 40. One processor of the MP set can be devoted to monitoring software, with the remainder dedicated to functional data processing, as described in PCT/AU97/00678, the entire disclosure of which is incorporated herein by reference.
The Erlang Engine 50 (also termed a "node" of the scalable computer system 300) may be scaled from one processor up to the number of processors that saturate the shared memory of the node. Typically, a maximum of 8 processors would be present. The eight processors, coupled with a performance increase (measured in millions of instructions per second) by a factor of five by moving from interpreted to compiled code, plus a further performance increase by a factor of two by moving from Unix to QNX, gives an overall performance capacity eighty times greater than the performance of the base system of Figure 1.
Referring now to Figure 3, a number of MP Erlang Engines 50 are networked in a star configuration through a switch 170 for form a scalable computer system 300. Each node of the computer system 300 is an Erlang Engine 50 with multiple processors 152 running on a QNX kernel 156. Each Erlang Engine 50 may include a link to an OTP system 134 in a similar manner to the split-node system of Figure 2, and may- also have a local Internet connection 145.
Multiprocessor systems are ultimately limited by the bandwidth of the communications channels used to interconnect the nodes, and the speed and flexibility of the switches used to route control and data messages between them. As a result, conventional multiprocessor interconnects do not scale gracefully as the number of nodes increases. Contention for the communications channels, such as a shared bus, or shared memory, and for switching resources, such as switch output ports, results in delays in the exchange of information. This ultimately limits the overall processing capacity of the system as processors in the nodes effectively lie idle waiting for required data to be returned. The scalable computer system 300 overcomes these problems by interconnecting the nodes 50 with optical fibres
165 and a non-blocking optical switch 170 (also called a wavelength router). The high transmission capacity of optical fibre means that exhaustion of the available bandwidth is unlikely.
The network which links the nodes of the scalable computer is based on wavelength division multiplexing (WDM) technology. In WDM networks, multiple data channels are multiplexed onto a single fibre by modulating each data channel onto a separate optical signal with a distinct wavelength. In addition to multiplying the transmission capacity of the fibre, individual channels can be selected and routed entirely in the optical domain using simple, passive optical components.
The scalable computer system 300 is based on a passive optical switching arrangement as illustrated in Figure 4. Each node is connected to the network via a single fibre pair: data is sent to other nodes on one fibre of the pair, while signals arriving from other nodes in the network are delivered via the second fibre. Figure 4 shows output fibres 411 to 414 on the right, and input fibres 401 to 404 on the left, although in reality the source and receiver of each node are co-located. The topology of the network is thus a simple star. At the hub of the star is the switch or wavelength router 170, which passively routes incoming signals to its output ports 411 to 414 according to (i) which input port they arrive on, and (ii) the signal wavelength; The. wavelength router- 170 is inherently collision-free: Its routing properties are static, and signals arriving at the same wavelength on different input ports are never routed to the same output port. An example is illustrated in Figure 4, in which three wavelengths (λl5 λ2, and λ3) are shown arriving on two input ports 401, 403 and being routed to four output ports 411 to 414. The wavelength router 170 is preferably an Arrayed Waveguide Grating Multiplexer (AWGM). However, other technologies based on all-fibre devices such as in-fibre Bragg gratings may alternatively be used as the
1 wavelength router 170. The wavelengths λls λ2,...., (for N up to, say, 256) are spread evenly across the available bandwidth.
The wavelength routing characteristics of the passive optical router 170 are static, meaning that no active "switching" occurs within the interconnect network. In order to transmit data to a chosen destination node, it is necessary to transmit it on the appropriate wavelength. However, the wavelength required to reach a particular destination node also depends upon the source node. Because the routing is static, the complete set of wavelengths corresponding to all source-destination pairs is known when the interconnect is installed and/or reconfigured. Due to the static routing properties of the wavelength router 170, the "switching" function is actually provided at the source node by selecting the appropriate wavelength for transmission.
To connect to the network, each Erlang Engine 50 includes an optical transceiver 160, including at least one optical receiver 161 and a transmitter comprising one or more optical sources 162. The scalable computer system 300 requires that the interface 160 be capable of receiving and transmitting optical signals of different wavelengths. The transmission of different wavelengths is achieved by including in the transmitter a multi-wavelength light source, comprising a series of lasers, each with a unique wavelength, and a shared optical modulator. In an alternative embodiment, a multi-wavelength source may comprise an integrated semiconductor laser array, i.e., a series of tuned single-wavelength lasers with distinct wavelengths and a common modulator. The multiple sources and shared modulator allow each node to multicast or broadcast the same data simultaneously to many nodes. The ability to send the same information to a number of different destinations simultaneously is- a -highly desirable property of a multiprocessor interconnect. Each source can be independently turned "on" and "off, but all sources share a single modulator, so that if multiple sources are active, they are all modulated with the same data signal. In an alternative embodiment, the individual lasers in the array are produced using gratings written into Erbium-doped fibre to produce fibre distributed feedback (DFB) lasers. These sources produce spectrally pure and stable output wavelengths without the active temperature control which is required for semiconductor WDM laser sources. However, the individual lasers in these sources are not independently addressable. This means that any data to be sent is transmitted simultaneously on all wavelengths, requiring additional intelligence at the receivers to enable them to identify incoming data addressed to the local node. However, for systems including up to 256 nodes, only eight bits of addressing information is required.
In addition to transmitting signals with different wavelengths, the transceiver must also be able to receive and decode multiple wavelengths. Therefore, the transceiver includes a receiver 161 with a wavelength demultiplexer 500 and an array of photodetectors 501, as shown in Figure 5. A single tunable receiver is impractical because tunable optical filters are too slow, and it is in any case impossible to arrange for the destination node to tune to the correct wavelength to receive a transmission without employing a common signalling channel. This reintroduces the original problems of scalability, as the signalling channel is a shared resource which can become congested, leading to increases in latency and reduction of throughput. The receiver 501 may be fabricated as an array of photodetectors integrated with an array-waveguide-grating-based demultiplexer on a single semiconductor photonic integrated circuit. Alternatively, if wide-band photodetectors are used, the demultiplexing function can be replaced by a simple splitting function.
The transceiver 160 is a bus interface card, in one embodiment a personal computer communications interface (PCI) card, providing a transmission speed per wavelength matched to the peak bus speed (i.e., in the order of two gigabits-per-second for a 66-MHz PCI bus using 32-bit data transfers). The PCI bus interface card is built from standard integrated circuits and uses logic devices (e.g., Xilinx Field Programmable Gate Arrays) to implement the interface-specific logic functions. The multiplexing and demultiplexing of wavelengths uses telecommunications devices developed for SONET/SDH systems. The card includes discrete single-wavelength semiconductor lasers and discrete photodetectors, but integrated laser and photodiode arrays may alternatively be used. The passive optical router 170 is preferably an AWGM. Additional WDM devices, such as AWGMs, are preferably used as demultiplexers 501.
Referring now to Figure 6, each node is interfaced to the optical fibre 165 of the WDM network by an interface card 601 including the optical transceiver 160 (shown as WDM interface 605). Data is transferred to and from the node's memory 602 (local to Erlang Engine 50) and the interface card 601 via the node's PCI bus 603, as shown in Figure 6. The transmission rate of each wavelength of the WDM network is matched to the speed of the PCI bus 603, so that data is transferred directly from the bus to the network. The buffering required on the transmit side is thus negligible, and the wavelength router 170 is neither a bottleneck in the system, nor unnecessarily over-dimensioned relative to the PCI bus performance. On the receiving side, dedicated buffering 604 is required, because upon receipt of a message, the card 601 may have to wait to gain access to the system bus 603. The net effect is thus to provide a form of "remote direct memory access (DMA)" from a source node to a destination node. A bit mask is generated by the node 50 and forwarded to the interface card 601, along with the data to be transmitted, in order to identify which one or more of the (laser) light sources 162 is to be modulated with the same outbound data. The bit mask effectively turns off the light sources which are not needed for transmission of the data. This arrangement enables multicasting or broadcasting of data over the WDM but does not allow for different datagrams to be transmitted simultaneously from the same interface card 601.
Due to the bandwidth limitations of each node's system bus 603, only one incoming message can be received at any given time. Logic in the receiver 161, as part of WDM interface 605, identifies incoming messages5- and initiates transfer to the node's memory 602 via the node's PCI bus 603. Incoming messages are identified by the presence or absence of an optical carrier signal, or, in the embodiment wherein the lasers in the array are always on, by local address decoding, known to the skilled addressee. In a further embodiment, with individually addressable lasers, the receiver 161 does not include a demultiplexer 500, and all wavelengths are simultaneously received. This allows collisions to occur, but may nevertheless result in an acceptably low error rate (e.g., ~ 1 in 1010), depending upon conditions.
The WDM network has the following features:
(i) A source node may transmit a message to any destination node at any time. While there may be contention amongst processors or peripherals for access to the interconnect via the local bus, there is no contention for access to the transmitters, or within the passive optical interconnect itself.
(ii) Messages may be unicast, multicast or broadcast. This facility is inherent in the use of multiwavelength sources with a shared modulator.
(iii) At the receiver, contention may occur, as messages from different sources may arrive on different wavelengths at the same time. While all such messages may be detected, only one can be received and transferred to local memory via the local bus. All other messages will be dropped, and no indication will be returned that this has occurred. Thus processes using the interconnect expect and handle lost messages.
The optical transceiver 160 can transmit and receive up to 32 distinct wavelengths; however the number of wavelengths available in WDM systems is rapidly increasing. For example, the WaveStar 800G dense WDM system from Lucent Technologies uses 320 distinct wavelengths. A system size of 256 nodes (Erlang Engines 50) is considered to be practical. Since the system gives almost linear scaling on separable problems, 256 nodes can provide a performance speed-up of 20,000 over the basic system.
The computer system 300 of Figure 2 is a fully connected system in the sense that each Erlang Engine 50 has its own dedicated optical fibre link 165 to the central optical switch 170. However, a larger, composite, massively scalable computer system may be constructed from a large number of such systems 300 in a "small world" configuration, as described in International patent application no. PCT/AU00/00796, the disclosure of which is hereby incorporated herein by reference. In such a system, one or more of the scalable computer systems 300 described above, each comprising up to 256 multiprocessor nodes (Erlang Engines 50) is considered to constitute a single "neighbourhood" of the overall system. Within the scope of the present invention, it is contemplated that in the order of 500 neighbourhoods, each having up to 256 fully connected Erlang Engines 50 with 8 processors per Erlang Engine 50, could be linked together via the optical router 170 of each scalable system 300 in a small world network to achieve a total system of over one million processors, with the small world network architecture providing effective connectivity between the nodes with only a relatively small number of cross-links, say in the order of fifty or between 10% and 20% of the number of available wavelength channels, between the neighbourhoods. For example, the total number of individual Erlang processes running on such a massively scalable system may be of the order of 256,000,000 assuming 2000 active processes per Erlang Engine node 50. Such a system could easily support 25.6 million lines of code and associated data, which is of the order of magnitude envisaged for large software systems projects. Figure 9 illustrates the scalable data processing system 300 in a simple form, with the passive wavelength router 170 interconnecting nodes 50 by communicating with an interface card 601 associated with each node 50. Although Figure 9 shows only three nodes 50 connected through the optical router 170, any number of nodes can be connected in this way, depending on the routing capacity of the router 170 and the available transmission bandwidth.
Figure 10 is a simplified block diagram of a massively scalable data processing system comprising multiple interconnected scalable data processing systems 300. The scalable
■ data processing systems 300 are interconnected through- optical routers 170- via optical fibre and each includes a number of processing nodes 50. The network connections within and between the scalable data processing systems 300 in the massively scalable data processing system conform to those of a small world topology.
The scalable computer system 300 and the massively scalable computer system, both described above, have a number of important features for applications. For example, large computer systems have to be maintained over significant periods of time. In the case of the scalable computer system 300 described herein, new nodes may be added, old ones decommissioned, and failures may occur, to consider only the hardware. Similarly, software evolves, and needs to be available on an ad hoc basis when peaks of application load appear. This continuous instantiation process implies that the architecture must provide 1:1 connectivity between processes, or as close to 1:1 as possible. This means that it should not matter whether or not two connecting processes are on the same physical node, or on different ones. In the network architecture described above, this perfect connectivity applies up to the scalable computer system 300; with the massively scalable computer system, the mean connectivity becomes 1:1.6. The scalable and massively scalable systems have a multicast capability, from individual node-to-node messaging through to system-wide broadcasts.
To demonstrate how the scalable computer system is used in practice, consider a "Yellow Pages" system based on the above-described massively scalable processing system. The system includes a tightly coupled "Yellow Pages" neighbourhood with a number of nodes containing identical copies of a look-up application. This replication then allows parallel handling of queries for performance, reliability and robustness. Furthermore, the software design does not require any index or storage indicating which nodes are holding which process, because the broadcast facility can return this information extremely rapidly or ignore the null responses for either qualified or unqualified nodes, qualified nodes being ones with the appropriate look-up software.
A similar feature exists for updating software; given a broadcast, followed by a copy, this again is a very rapid process. This mode of software distribution and replication implies that node updates or failure/recovery have no impact on the overall system.
It is envisaged that a massively scalable computer system in accordance with the present invention has widespread application, including but not limited to the following:
E-Commerce enablers
(i) Banking and Credit
The requirement for many applications in this category is for rapid and uniform access performance for high rates of independent transactions. The MicroBank application described below is a typical application in this category, aggregating small individual transactions efficiently for a banking environment.
(ii) Electronic Retailing The same characteristics occur in the e-retail sector, with many small individual transactions that need to be processed and consolidated.
Internet Infrastructure • (iii) Caching The very high and uniform access speed to large data sets that characterises the scalable computer system makes it ideal for caching applications within the Internet environment. The scalable computer system may be used as an Internet proxy server in order to accommodate an escalating number of users. Proxy server caches are a mechanism for reducing the number of bytes transferred to a site from a web server. Essentially a proxy cache looks at the request for a web page and either sends a stored copy of the web page so that future requests can be answered locally instead of having to make a request to a remote web server Periodically, the cache must be cleared of out of date pages to make space for new, or more popular pages. Currently, caching proxy servers use magnetic disk storage to store requests and memory to store descriptions of the pages available locally.
An Ultra Cache is a scalable computer system which stores all cached pages in solid-state memory, providing rapid access to- cached pages. Furthermore,- the high separability of the caching problem makes the scalable computer system approach to scaling ideal. The task of retrieving pages and clearing out of date pages is easily distributed and relatively independent. This means that adding a new node to the Ultra Cache gives almost the full benefit of the node's processing capacity and memory to the task of caching. In conventional systems, adding a processor brings less benefit due to system overloads.
(iii) Authentication
Authentication of individual users, to enable granting of services, is crucial for providers to ensure security and network availability. The scalability, and uniform access speeds of the system make it ideal for authentication processes.
(iv) Charging
Any Internet Service Provider (ISP) requires an effective and efficient way of aggregating charging for user transactions into consolidated billing. This is a similar activity to that of the MicroBank.
(iv) Routing
The architecture of the system, a set of highly intelligent nodes linked through a very high speed non-blocking interconnect, permits new possibilities for routing packets and managing network services, including limiting bandwidth, and differentially charged services.
Traditional Telecommunications Vendors (v) Authentication Under this general heading, security and access have a different orientation to the ISP's but the basic requirements still apply.
(vi) Charging and Billing
A key characteristic of the system architecture is the use of processors, within the multi- processor nodes, for continuous system monitoring. This in-built capability permits financial information for billing purposes to be directly acquired at the individual - transaction level, without any significant processing overload.
Government/Semi Government (vii) Taxation
A key debate is currently taking place over the tax evasion due to Internet commerce. The ability to handle high volumes of transactions with large data holdings means that the system architecture offers the potential for monitoring and potentially taxing, Internet commerce.
(viii) Medical
A key issue in the medical and social security area is that of blending rapid access with security and privacy considerations. In addition to the scalability of the system that permits the required large-scale data sets, it is a direct matter to design these data sets so that their custodians can control all access.
The Microbank (or μ-Bank) is an application implemented on the scalable computer system allowing financial institutions to handle extremely large numbers of transactions, particularly relating to very small amounts. The μ-Bank performs three roles: it transfers funds to the merchant on presentation of a valid request for payment, acts as the Purchaser's agent when a merchant requests payment, and aggregates charges and collects fαnds from existing banks or credit providers. Anonymity for the Purchaser is provided, as the Purchaser's identity is known only to the μ-Bank and is never provided even by oblique reference to the Merchant. The Merchant is absolved of retaining records about its clients as the μ-Bank either approves or disapproves a transaction immediately. The μ-Bank transfers funds when a transaction is approved.
The overall architecture seen by a Purchaser is shown in Figure 7. The purchase transaction in Figure 8 shows an example of how the μ-Bank operates from the user perspective. A purchaser uses a web browser 801 executing on their computer to access a merchant's web server 803 via the Internet (step 701). The purchaser chooses an item on the merchant's web site, and indicates by pressing a button that he wishes to purchase the item by using the μ-bank (step 702). The merchant's web server 803 generates a unique session identifier and instructs the purchaser's web browser 801 to create a new browser window 802, directed to the μ-bank server 804 and provided with the unique session identifier (step 703). The new browser window 802 sends a request to the μ-bank server 804 requesting execution of a Java μ-bank agent and providing the unique session identifier (step 704). Then the μ-bank server 804 sends the agent to the purchaser's computer at step 705. The purchaser authenticates themselves to the agent using a username/password pair and provides an upper limit for the financial transaction to the agent, which sends it to the μ-Bank server 804 at step 706. Returning to the original browser window, the purchaser now finalises the purchase at step 710. The merchant's web server 803 sends a request for payment to the μ-bank server 804, using the unique session identifier to identify the transaction (step 707). The μ-bank server 804 checks that the transaction amount is within the limit specified by the purchase at step 706, and approves the transaction at step 708. The transaction is completed and the merchant sends the item to the purchaser (step 709).
A significant feature of the μ-bank is that the Merchant generates the identifier which is used to identify the Purchaser. Furthermore, although the Merchant effectively initiates the Purchaser's session with the μ-Bank, the Merchant does not mediate that session as the
Merchant is unable to determine the identity of the Purchaser. All of the Purchaser's communication with the μ-Bank is direct, so although the Merchant can make suggestions about what the default limits might be, the purchaser has the ability to accept or reject those suggestions without the Merchant being aware of the Purchaser's choices. The μ- Bank also has the opportunity to ask the Purchaser if they wish to proceed with a transaction if it falls outside the parameters provided by the user.
The initiation of the session with the μ-Bank can be automatic, with potential but manageable problems with "man-in-the-middle" attacks by disreputable merchants, or manual where the Purchaser either starts a local version of the agent or makes a new window and points their browser at the μ-Bank. "Man-in-the-middle" attacks can be managed through the user of "Security Certificates" for Java Applets. The current μ-Bank design is based on a message passing core implementing the banking functions written in Erlang. The implementation language for the μ-Bank agent is Java. The μ-Bank agent communicates with the μ-Bank server §04 using a μ-Bank Purchaser Protocol (μBPP). The merchant's server 803 requires clients which interface with the μ-Bank Purchaser Protocol (μBMP).
Accounts are maintained for each user of the μ-Bank. Users may act as either merchants or purchasers. The μBMP is used to designate whether the user is acting as a merchant or a purchaser. Accounts hold consolidated information for each transaction session consisting of the start and end dates for the transaction session and the merchant's identity. Accounts can be run in either a credit or a debit mode. In the credit mode, the consolidated account is debited either periodically or when the credit limit is reached from some external source of funds such as a bank account or credit card. In debit mode, an amount is transferred to the μ-Bank and this amount is decremented until either a zero balance is achieved and no further transactions are allowed, or until the money is replenished. Limits may be imposed by both the μ-Bank and users on the size of transactions and any credit limits. Many modifications will be apparent to those skilled in the art without departing from the spirit and scope of the present invention as hereinbefore described with reference to the accompanying drawings and as defined by the claims appended hereto.

Claims

1. A scalable data processing system including a plurality of data processing nodes, said data processing nodes exchanging data by transmitting and receiving optical signals over an optical network, wherein the routing of said optical signals is based on the wavelength of said optical signals.
2. The scalable data processing system of claim 1, wherein said optical network includes a passive, non-blocking optical switch to route said optical signals between said data processing nodes .
3. The scalable data processing system of claim 2, wherein said non-blocking optical switch is an arrayed waveguide grating multiplexer.
4. The scalable data processing system of claim 2, wherein the topology of said network is a star topology, with said non-blocking optical switch at its centre.
5. The scalable data processing system of claim 1, wherein signals from a plurality of said data processing nodes may be transmitted over a single optical fibre by wavelength division multiplexing.
6. The scalable data processing system of claim 1, wherein each data processing node includes a plurality of processors.
7. The scalable data processing system of claim 1, wherein said optical signals are processed at an optical transceiver interface of each node, and the bit rate of said optical signals is matched to the bit rate of a data bus of said node.
8. A massively scalable data processing system comprising a plurality of scalable data processing systems according to claim 1, wherein interconnections between said systems are characterised by a small world topology.
9. The massively scalable data processing system of claim 8, wherein the mean connectivity between said data processing systems is approximately 1.6.
10. The scalable data processing system of claim 1, wherein the system operates as a proxy server for storing and retrieving data in a communications network.
11. The scalable data processing system of claim 1, wherein the system operates as a transaction server for storing and retrieving financial transaction data in a communications network.
12. The scalable data processing system of claim 1, wherein the number of data processing nodes is between 2 and 256.
13. The massively scalable data processing system of claim 8, wherein said optical network of each of said scalable data processing systems includes a passive, non-blocking optical switch to route said optical signals between said data processing nodes and each of said optical networks is in communication with each other via respective said optical switches.
14. The massively scalable data processing system of claim 13, wherein about 80% to 90% of the wavelength routing capacity of each said optical switch is used for routing said optical signals between said data processing nodes within the respective scalable data processing system and about 20% to 10% of the wavelength routing capacity of each said optical switch is used for routing optical signals between optical switches of respective scalable data processing systems within the massively scalable data processing system
15. An optical transceiver for transmitting and receiving optical signals in a network, said transceiver including: a plurality of light sources for transmitting optical signals to said network, each of said light sources being adapted to transmit light at a respective optical wavelength; and at least one photodetector for receiving said optical signals from said network; wherein said transceiver is adapted to simultaneously transmit optical signals representing a same datagram using two or more of said plurality of light sources.
16. The optical transceiver of claim 15, wherein said at least one photodetector discriminates between received optical signals according to the wavelength of said received signals.
17. The optical transceiver of claim 15 or 16, wherein said plurality of light sources are laser light sources.
18. A -wavelength division multiplexing. (WDM) network for use in a scalable data processing system, the WDM network including: a plurality of optical transceivers, each transceiver including: a plurality of light sources for transmitting optical signals to said network, each of said light sources being adapted to transmit light at a respective optical wavelength; and at least one photodetector for receiving said optical signals from said network; j wherein each said optical transceiver is operably associated with a data processing node of said scalable data processing system; and a passive, non-blocking optical router in communication with said plurality of optical transceivers for routing data encoded in said optical signals between said data processing nodes of said scalable data processing system.
PCT/AU2001/001654 2000-12-29 2001-12-21 A scalable data processing system WO2002054269A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AUPR2365A AUPR236500A0 (en) 2000-12-29 2000-12-29 A scalable data processing system
AUPR2365 2000-12-29

Publications (1)

Publication Number Publication Date
WO2002054269A1 true WO2002054269A1 (en) 2002-07-11

Family

ID=3826399

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2001/001654 WO2002054269A1 (en) 2000-12-29 2001-12-21 A scalable data processing system

Country Status (2)

Country Link
AU (1) AUPR236500A0 (en)
WO (1) WO2002054269A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004021609A1 (en) 2002-08-29 2004-03-11 Bae Systems Information And Electronic Systems Integration Inc. Data processing network having an optical network interface
WO2013115774A1 (en) * 2012-01-30 2013-08-08 Hewlett-Packard Development Company, L.P. Establishing connectivity of modular nodes in a pre-boot environment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5450224A (en) * 1990-12-07 1995-09-12 Telefonaktiebolaget Lm Ericsson Method and arrangement for optical switching
EP1009121A2 (en) * 1998-12-08 2000-06-14 Nippon Telegraph and Telephone Corporation Optical communication network
EP1017242A1 (en) * 1998-12-28 2000-07-05 Italtel s.p.a. Optical cross-connect architecture for WDM telecommunication systems
US6344912B1 (en) * 2000-05-26 2002-02-05 Versatile Optical Networks Hybrid and scalable opto-electronic processing in a wavelength-division multiplexed system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5450224A (en) * 1990-12-07 1995-09-12 Telefonaktiebolaget Lm Ericsson Method and arrangement for optical switching
EP1009121A2 (en) * 1998-12-08 2000-06-14 Nippon Telegraph and Telephone Corporation Optical communication network
EP1017242A1 (en) * 1998-12-28 2000-07-05 Italtel s.p.a. Optical cross-connect architecture for WDM telecommunication systems
US6344912B1 (en) * 2000-05-26 2002-02-05 Versatile Optical Networks Hybrid and scalable opto-electronic processing in a wavelength-division multiplexed system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BRIAN WEBB, AHMED LOURI: "A class of highly scalable optical crossbar-connected interconnection networks (SOCNs) for parallel computing systems", IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS IEEE, vol. 11, no. 5, May 2000 (2000-05-01), pages 444 - 458 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004021609A1 (en) 2002-08-29 2004-03-11 Bae Systems Information And Electronic Systems Integration Inc. Data processing network having an optical network interface
EP1543637A1 (en) * 2002-08-29 2005-06-22 BAE SYSTEMS Information and Electronic Systems Integration Inc. Data processing network having an optical network interface
EP1543637A4 (en) * 2002-08-29 2008-01-23 Bae Systems Information Data processing network having an optical network interface
EP2328356A3 (en) * 2002-08-29 2011-08-31 Wisterium Development LLC Optical network interface for data processing network and associated method
US8369710B2 (en) 2002-08-29 2013-02-05 Wisterium Development Llc Data processing network having an optical network interface
WO2013115774A1 (en) * 2012-01-30 2013-08-08 Hewlett-Packard Development Company, L.P. Establishing connectivity of modular nodes in a pre-boot environment
US9779037B2 (en) 2012-01-30 2017-10-03 Hewlett Packard Enterprise Development Lp Establishing connectivity of modular nodes in a pre-boot environment

Also Published As

Publication number Publication date
AUPR236500A0 (en) 2001-01-25

Similar Documents

Publication Publication Date Title
Zervas et al. Optically disaggregated data centers with minimal remote memory latency: technologies, architectures, and resource allocation
Ellis et al. Communication networks beyond the capacity crunch
US6594050B2 (en) Optical communication switch node
Dowd Wavelength division multiple access channel hypercube processor interconnection
US5999973A (en) Use of web technology for subscriber management activities
US6732175B1 (en) Network apparatus for switching based on content of application data
EP1643663B1 (en) A addressing method of quanta network and quanta network router
CN101379755B (en) Digital object title authentication
KR20080038140A (en) Adaptive gateway for switching transactions and data on unreliable networks using context-based rules
JPH077523A (en) Full-optical communication network architecture
JP2012521103A (en) Method and system for providing a logical network layer for transmitting input / output data
Li et al. Virtual topologies for WDM star LANs-the regular structures approach
Hai On solving the 1+ 1 routing, wavelength and network coding assignment problem with a bi-objective integer linear programming model
CN101326755B (en) Digital object title and transmission information
Singh et al. An AWG based optical packet switch with add-drop of data
Tang et al. Effective*-flow schedule for optical circuit switching based data center networks: A comprehensive survey
WO2002054269A1 (en) A scalable data processing system
CN101263692B (en) Hybrid optical and data networks
JP2006509427A (en) Design of scalable optical interconnection with high-speed collision avoidance technique and capable of high-speed switching
Imran et al. Optical interconnects for cloud computing data centers: Recent advances and future challenges
Gupta et al. A novel framework for content connectivity through optical data centers
Feehrer et al. Design and implementation of a prototype optical deflection network
Bergman et al. Bit-parallel wavelength links for high-performance computer networks
Marsan et al. Daisy: a scalable all-optical packet network with multifiber ring topology
Han et al. Distributed memory access architecture and control for fully disaggregated datacenter network

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP