WO1999053718A1 - Control on demand in a data network - Google Patents

Control on demand in a data network Download PDF

Info

Publication number
WO1999053718A1
WO1999053718A1 PCT/US1999/007578 US9907578W WO9953718A1 WO 1999053718 A1 WO1999053718 A1 WO 1999053718A1 US 9907578 W US9907578 W US 9907578W WO 9953718 A1 WO9953718 A1 WO 9953718A1
Authority
WO
WIPO (PCT)
Prior art keywords
control program
flow
packet
control
packets
Prior art date
Application number
PCT/US1999/007578
Other languages
French (fr)
Inventor
Samrat Bhattacharjee
Gisli Hjalmtysson
Original Assignee
At & T Corp.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by At & T Corp. filed Critical At & T Corp.
Publication of WO1999053718A1 publication Critical patent/WO1999053718A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/54Store-and-forward switching systems 
    • H04L12/56Packet switching systems
    • H04L12/5601Transfer mode dependent, e.g. ATM
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/20Traffic policing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/25Routing or path finding in a switch fabric
    • H04L49/253Routing or path finding in a switch fabric using establishment or release of connections between ports
    • H04L49/255Control mechanisms for ATM switching fabrics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/54Store-and-forward switching systems 
    • H04L12/56Packet switching systems
    • H04L12/5601Transfer mode dependent, e.g. ATM
    • H04L2012/5638Services, e.g. multimedia, GOS, QOS
    • H04L2012/5646Cell characteristics, e.g. loss, delay, jitter, sequence integrity
    • H04L2012/5647Cell loss
    • H04L2012/5648Packet discarding, e.g. EPD, PTD
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/54Store-and-forward switching systems 
    • H04L12/56Packet switching systems
    • H04L12/5601Transfer mode dependent, e.g. ATM
    • H04L2012/5678Traffic aspects, e.g. arbitration, load balancing, smoothing, buffer management
    • H04L2012/5679Arbitration or scheduling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/34Signalling channels for network management communication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/34Signalling channels for network management communication
    • H04L41/344Out-of-band transfers

Definitions

  • the present invention relates to data networks, and more particularly, to a system for providing control on demand in a data network.
  • Asynchronous Transfer Mode (ATM) has grown in popularity over the last few years. The attractiveness of ATM to large network providers comes neither from its quality of service guarantees nor its high performance, but from its potential of becoming the single transport infrastructure.
  • ATM Asynchronous Transfer Mode
  • the major network carriers have multiple backbone networks: A circuit switched backbone for traditional telephony services, a frame-relay or ATM backbone for corporate intra-network services, and a router-based Internet backbone for their Internet Service offerings.
  • communication service providers fight increased competition, managing different backbones can be cumbersome and expensive.
  • a single network infrastructure replacing these different backbones offers economies of scale.
  • the growth of the Internet over the last few years has made the Internet Protocol
  • IP IP
  • IP another contender to become the only network infrastructure.
  • IP's forte is the flexibility and robustness inherent in datagram forwarding.
  • IP IP
  • IPv6 IP
  • flow label IP
  • a flow is a sequence or group of packets.
  • IPv6 IPv6
  • flow label a flow label that will allow mechanisms to separate different flows of packets and accommodate different services without performance penalty.
  • a flow is a sequence or group of packets.
  • IPv6 IPv6
  • flow label a flow label that will allow mechanisms to separate different flows of packets and accommodate different services without performance penalty
  • a flow is a sequence or group of packets.
  • different types of services have different objectives, and even different service philosophies.
  • immediate transport availability is of primary importance.
  • delay criteria is the most important.
  • the greatest challenge in applying such control diversity is to accommodate the special control needs of some flows without penalizing all flows.
  • the control complexity of one flow must not affect other flows.
  • One way in which a specific control of one flow interferes with other flows is if it interferes with the performance or correctness of their forwarding.
  • the present invention is directed to a method and apparatus for providing control-on-demand in a data network without significantly degrading the forwarding performance of the system.
  • the system includes, one or more flow-specific control programs for controlling the individual flows, a meta-controller for installing and managing the control programs, a forwarding engine for forwarding packets, and an interface for providing communication between the meta-controller, the control programs and the forwarding engine.
  • the interface is described as four interfaces.
  • a meta-control interface is used by the meta-controller to install, manage, and control installed control programs.
  • the message exchange interface supports communication or exchanges between applications (e.g., located at end- systems) and meta-controllers, and supports communication between flow-specific control programs at different nodes.
  • Part of the message exchange interface includes an application interface that allows an application to request installation of a specific control program by a meta-controller.
  • the Subscribe/Publish interface allows a control program to subscribe to request a predetermined portion of specific packets from the forwarding engine, and allows the forwarding engine to publish or send the requested portion of the packet to the control program.
  • the facility access interface allows a control program to control the forwarding of the received packets of a specific flow.
  • an application issues a request to the meta-controller to install a control program for controlling a particular flow of packets.
  • the meta-controller obtains and installs a copy of the requested flow- - 4 -
  • control program can be obtained from one of several places, including from local memory at the node, or from the network based on a network reference provided from the application.
  • the control program requests (or subscribes) to receive a pre-determined portion of each packet in the flow.
  • the requested portion of each packet is provided from the forwarding engine to the control program.
  • the control program analyzes the requested portion (e.g., based on application level semantics) and can issue a message to the forwarding engine to control the packet.
  • the control program can control the packets without being fully in the data path, and thereby avoid degradation of the packet forwarding process.
  • control is applied to packets only to enhance forwarding performance and is provided only on a best effort basis.
  • FIG. 1 illustrates an example of a data network according to an embodiment of the present invention.
  • Fig. 2 illustrates a node system 20 according to an embodiment of the present invention.
  • Fig. 3 is a flow chart illustrating the operation of a system according to an embodiment of the present invention.
  • Fig. 4 is a diagram illustrating how a flow transitions through four states during its life-span according to an embodiment of the present invention.
  • FIG. 1 illustrates an example of a data network according to an embodiment of the present invention.
  • Data network 45 includes two end-systems 50A and SOB.
  • Applications 52A and 52B are executing on end-system 50A and 50B, respectively.
  • End-systems 50 are coupled together via a plurality of nodes.
  • a system 20 is located at each node of the data network. Therefore, data network 45 includes end-systems 50A and 50B and node systems 20 (20A, 20B, etc.).
  • the applications 52A and 52B executing on end-systems 50A and 50B can include a wide variety of applications, such as telephony, MPEG video, an E-mail program, etc.
  • each end-system 50 may be executing more than one application at a time, and there can be many additional end-systems 50 (not shown) connected to data network 45.
  • Each application or service may have its own control needs or requirements.
  • Fig. 2 illustrates a node system 20 according to an embodiment of the present invention.
  • System 20 includes a controller 22 for controlling operation of system 20, a forwarding engine 24 for forwarding packets input on line 34 to other nodes via output line 36.
  • Forwarding engine 24 includes a processor, memory and other control logic (not shown).
  • Controller 22 includes one or more control programs 25 for controlling individual flows.
  • Controller 22 also includes a meta-controller 23 for installing and managing specific control programs 25. Meta-controller 23 does not control the specific flows, but controls the control programs 25 that control the flows.
  • meta-controller 23 is a controller of the controllers (control programs 25).
  • An application 52 can issue a request to the meta-controller 23 for a specific control program.
  • the meta-controller 23 determines whether a copy of the requested control program 25 is stored or cached locally. A locally stored copy of the requested control program is used if available. If it is not stored locally, the meta-controller 23 can obtain a copy of the control program 25 (e.g, using a URL to locate the control program). Alternatively, the specified control program 25 can be provided in one or more packets and downloaded. The meta-controller 23 then installs the control program, including locating, authenticating, downloading, verifying, (possibly compiling) and running the code implementing the requested control program 25. Once installed and associated to the requesting flow, the forwarding engine 24 and the flow specific control program 25 interact directly.
  • System 20 can be implemented in hardware and/or software.
  • Control programs 25 can be provided as software or programmable or alterable hardware.
  • the packets can include Internet Protocol (IP) packets, Asynchronous Transfer Mode (ATM) application level frames (such as ATM Adaptation Layer frames), ATM cells, or the like.
  • Control signals are input to controller 22 via line 38 and may be provided on a separate signaling connection or provided in-band.
  • Forwarding engine 24 also includes a buffer 26 for storing packets input via line 34.
  • System 20 also includes an interface 28 for coupling controller 22 to forwarding engine 24, among other features.
  • a flow is a sequence or group of (usually related) packets.
  • a flow can be one or more packets that satisfy an equivalence relation.
  • a flow can be identified by a common sequence of bits in each packet.
  • a wide variety of bit sequences can be used for flow identification.
  • a flow in a connection oriented network, such as an ATM network, a flow can be, for example, the group of ATM cells or application level frames corresponding to the ATM connection.
  • flows can be identified by information in the application level frame (such as part of an RTP frame).
  • a flow can be a group of packets or datagrams that are associated with each other.
  • a flow can include all IP packets from a specific IP address, or directed to a specific IP address, or having a predetermined prefix in the IP destination address (e.g., all packets directed to England).
  • IP version 6 IP version 6
  • IPv6 IP version 6
  • a predetermined IP option can be used in a group of packets to identify a flow (e.g., provided as the name or identifier of a flow).
  • a flow could also include, for example, packets providing data of a particular type, such as MPEG-4 video packets or telephony - 7 -
  • a flow can be a specific sequence of packets for a particular service (e.g., telephony, MPEG video) to or from a specific address.
  • a service e.g., telephony, MPEG video
  • control on demand is provided for many different flows.
  • a control program 25 can be requested by an application, installed and applied to a specified flow of packets.
  • the control on demand according to an embodiment of the present invention acts both in the control plane and in the data plane, without adding necessary software in the critical forwarding path.
  • an embodiment of the present invention uses enhancement controls applied asynchronously from a flow-specific control program 25 to the forwarding engine 24.
  • the forwarding engine 24 performs basic and multicast forwarding.
  • the enhancement controls can be applied on a best-effort basis.
  • control program 25 can control the forwarding of received packets only if control signals to arrive from control program 25 are received before the packet is forwarded by forwarding engine 24.
  • Control from control program 25 is applied to packets on a best-effort basis because packets are not held or delayed in the buffer 26 while waiting for control signals to arrive from control program 25. If control signals are not received by forwarding engine prior to the normal forwarding of a packet, then the packet is forwarded normally (without influence from control program 25). In this - 8 -
  • control-on-demand of the present invention achieves the flexibility and functionality of the in-data-path approaches while achieving the forwarding efficiency of controls that only act in the control-plane.
  • control programs 25 are the control logic supplied by the application (during the flow initialization for example) as an active autonomous object (agent).
  • a general purpose control processing facility, a meta- controller 23 loads and executes the control logic for the duration of the flow, but may destroy the control program upon the flow's completion.
  • control programs 25 may be requested by name (e.g., telephony, or MPEG video), allowing nodes to cache controllers for popular services thereby amortizing the installation cost over multiple flows.
  • name e.g., telephony, or MPEG video
  • an important aspect of the present invention is a strong separation between the forwarding engine 24 and the controller (control program 25).
  • an embodiment of the present invention includes a mechanism for frame peeking or packet peeking - a mechanism that enables a control program 25 to peek at a portion of a packet (e.g., an IP packet, or an ATM frame or an ATM cell) rather than receive the full packet. Enabling the control program to only peek at parts of each packet as opposed to having to be fully in the data path enhances the efficiency by reducing the bandwidth into the controller and by avoiding removing the packet from the buffers of the forwarding engine. More specifically, an ATM switch may support frame peeking without doing reassembly and segmentation of the (full) frame. In an - 9 -
  • packet peeking limits the bandwidth from the kernel router to the flow controller executing in user space.
  • an embodiment of the present invention is optimized for enhancement controls, specifically, control programs 25 that are not essential for forwarding correctness, but enhance performance and/or perceived service quality.
  • enhancement control of the present invention is robust to failures, network heterogeneity and non-cooperation (or legacy). In other words, the present invention does not require all nodes to implement or provide control on demand of the present invention. In contrast, control that is essential for correctness requires all nodes in the path to execute (implement) the control.
  • control can advantageously be applied on a best-effort basis. Since the control is for enhancement only it is not essential that it be applied to every packet. Although some applications may require that the control is consistently applied, applications designed for the Internet have less stringent needs, gracefully adapting to changing conditions in transmission and control, but increase in utility when the control is applied more consistently. According to an embodiment of the present invention, the control program 25 for such applications is not essential and is not assumed, so it is sufficient that the control be applied on best effort basis.
  • interface 28 is divided into four interfaces: a facility access interface 30, a message exchange interface 33, a Subscribe/Publish interface 32 and a meta-control interface 31.
  • Meta-control interface 31 is used by the meta- controller 23 to install, manage and control installed control programs 25.
  • message exchange interface 33 supports communication or exchanges between applications (e.g., located at end-systems) and meta-controllers 23, and supports communication between flow-specific control programs 25 at different nodes.
  • Part of the message exchange interface 33 includes an application interface that allows an application to request the meta-controller 23 to install a specific control program.
  • the Subscribe/Publish interface 32 allows a control program 25 to subscribe to request a predetermined portion of specific packets from forwarding engine 24, and allows forwarding engine 24 to publish or send the requested portion of the packet to the control program 25.
  • Facility access interface 30 allows a control program 25 to control the forwarding of the received packets of a specific flow.
  • interface 28 Although certain functions of the present invention are explained in terms of these interfaces, those skilled in the art will appreciate that there are many different ways to implement the functionality and features of the present invention.
  • the four interfaces of interface 28 are merely exemplary and provide only one possible embodiment of the present invention. These interfaces are described in greater detail below.
  • Facility access interface 30 provides access to the resources of forwarding engine 24. Interface 30 is used by meta-controller 23 to assign controllers to flows, and by control programs 25 (which are flow-specific) with access to the resources of forwarding engine 24. The only primitive used by meta-controller 23 (Assign) assigns a control program 25 to a flow. Control program assignment can be changed dynamically. In its discretion, the meta-controller 23 can assign a null control program (e.g., no control program) to a flow.
  • the facility access interface 30 reflects (or provides access to) specific capabilities of the forwarding engine 24, such as scheduling, but hides the details of the specific forwarding technology. In - 1 1 -
  • the forwarding engine 24 is an ATM switch, the facility access interface 30 does not give access to VC/VP tables of the switch.
  • the Facility Access Interface 30 is The Facility Access Interface 30:
  • Assign flow identifier, control program reference: this allows the meta- controller 23 to assign a control program 25 to control a specific flow of packets. Notifications: topology-change ⁇ set of inputs, set of outputs): this allows a control program 25 to communicate to forwarding engine 24 a topology change for a flow.
  • Reservations Reserve-Buffer packets, bytes: can be used by a control program 25 to reserve a specified portion (in bytes or packets) of buffer 26 to store packets of this flow. This specifies a maximum number of bytes and packets. Specifying a maximum number of packets allows the control program 25 to efficiently limit the total number of packets buffered at the node, for example, for active retransmission.
  • Reserve-Bandwidth bandwidth, set of ports: can be used by a control program to reserve a specified bandwidth on the identified set of ports (an input port and an output port);
  • the control program 25 can interact with the scheduler at the node by providing a list of start byte, rate pairs ⁇ bj, r . After b j bytes, packets are forwarded at rate T j . If the control program 25 is - 12 -
  • Set-Attribute (list of ⁇ attribute, value ⁇ paris: This primitive supports assignment to named attributes. It is used for scheduling property selection, queue priority and more. For example it can be used to set or assign the bandwidth of the node system (e.g., 1500 Bytes/sec).
  • ii) Forwarding Control I block (subset of input ports): blocks input on the subset of ports specified.
  • O block blocks output on the subset of ports specified. Blocking an output port effectively removes that port from the topology. Blocking is removed on a port by excluding that port from the subset of a subsequent block.
  • I block and 0 block can manipulate virtual circuit tables in a connection-oriented flow).
  • Delay schedules arriving packets for forwarding at least ⁇ -time units after arrival, on the subset of ports specified.
  • Actions on individual packets (implicit argument: packet reference):
  • Release-at schedules packet for departure on the subset of output ports specified.
  • the primary use of this primitive is to allow a control program 25 to reschedule a retransmission of a packet (currently in buffer 26) to react to downstream losses, and also allows the control program 25 to explicitly schedule a particular packet for transmission.
  • Discard ( ) discards the packet, and removes it from the flow buffer.
  • a control program 25 can control packets via facility access interface 30.
  • a control program 25 can control flows or individual packets.
  • a control program 25 can block packets on input ports or on output ports, or can schedule arriving packets of a flow for a delayed output.
  • a control program 25 can similarly block or delay the output of individual packets using a packet reference to identify each packet to be blocked or delayed.
  • connection oriented hardware e.g., an ATM switch
  • these primitives would manipulate virtual circuit (VC) tables
  • a connectionless router e.g., an IP router
  • control program 25 can control the fate of each packet of the flow entering forwarding engine 24 without being fully in the data path.
  • a control program 25 can discard a packet, reschedule (delay) the transmission of a packet, or can do nothing and allow the packet to be forwarded normally by forwarding engine 14. In contrast to an in-data-path solution, this set of primitives supports flow level connectivity management without being fully in the data-path.
  • control program 25 There is a window of opportunity, namely from the time the packet arrives until it is forwarded, within which the control program 25 must be run or executed in order to control the packet.
  • control is applied to packets on a best effort basis. Therefore, if a control program 25 misses some packets, then the packets will be forwarded under normal operation of forwarding engine 24 and no harm is done.
  • a control program 25 may impose a fixed packet delay to increase the size of this window and thus relaxing real-time constraints on packet scheduling. This reduces the overhead of context switching, by making the control program 25 work on multiple packets each time it is run. - 14 -
  • the Message Exchange Interface 33 supports communication between application(s) 52 and meta-controllers 23 and between (two or more) flow-specific control programs 25 (service specific signaling).
  • the Message Exchange Interface 33 includes the following primitives:
  • Send-to-meta-control (flow identifier, request-type, request-data) Request-type (one of) : activate, inform, control Request-data - request type dependent:
  • control program name type
  • code reference
  • data data
  • Message exchange interface 33 has three primitives: a) Meta-control, b) Send-flood, and c) Send-next. Whereas the first is the primitive used by applications 52 to interact with meta-controllers 23, the second and third primitives are used by the application 52 and the control programs 25 to perform application level signaling. All of the primitives of interface 33 take a flow identifier as an argument.
  • a meta-control message sent using the first primitive has two parameters, a request type, that is one of activate, inform, or control and request data which is request type specific.
  • a meta-control message is sent to all meta-controllers 23 of the flow unchanged (i.e., intermediate controllers cannot change it).
  • the primary meta-control message is an activate message used by applications 52 to install and activate a specific control program 25 in all meta-controllers 23 over a communications link (e.g., all meta-controllers 23 on nodes coupled between - 15 -
  • the request data for an activate message contains three mandatory parameters: a flow identifier (which may be provided implicitly), a control program type name, and a control program implementation (reference), optionally followed by arbitrary control program-specific parameters.
  • the control program implementation parameter either contains the code (the software for the control program), or is a globally valid network reference, a URL for example, from where the specified control program may be retrieved by meta- controller 23.
  • the meta-controller 23 is responsible for locating, downloading (e.g., retrieving the program from the URL location), installing (e.g., compiling and verifying the compiled code), and activating
  • control program 25 requested by the application 52.
  • the data contains a flow identifier, followed by a list of attributes whose values are returned. If the attribute list is empty, a list of all attributes defined for the particular flow is returned. Similarly the control message contains a flow identifier, followed by a list of attribute value pairs.
  • the send-flood and send-next primitives are used for service specific (signaling) messages between installed control programs at different node systems 20, and for (signaling) exchanges between meta-controllers 23 at different node systems 20.
  • the meta-control messages are distinguished from the others by setting the flow identifier to zero. Both primitives take two additional parameters, a set of output ports, and service specific data. The message is output on the ports specified, and are either "flooded" in the case of send-flood (to send the message to all flow- specific meta-controllers 23) or sent to the "next" flow specific controller(s) only in the case of send-next.
  • send-next the message may not be sent to the next node system 20, but rather is sent to the next node system 20 executing the same control program 25.
  • send-flood does not send the message to all node systems 20, but rather, sends the message to all of the same control program 25 (e.g., on different systems 20).
  • the Subscribe/Publish Interface 32 allows (flow-specific) control programs 25 to subscribe to (request) events and information to be published (on request) by forwarding engine 24.
  • Subscribe/Publish interface 32 includes three primitives (or commands) that allows control program 25 to subscribe (or request) to receive a copy of received packets of a flow, and a primitive or command that allows the forwarding engine 24 to publish or provide the requested information to control program 25.
  • the Subscribe primitives include: Subscribe-Stats (flow identifier) - requests a subscription to simple flow statistics, such as number of packets and bytes transmitted since last invocation, or the number of bytes (or packets) currently in the buffer 26 using the subscribe-stats primitive.
  • the control program 25 receives nodal statistics about buffer length and packet loss rate.
  • Subscribe-Peek flow identifier, offset, length
  • Subscribe-Peek implements packet peeking or frame peeking according to an embodiment of the present invention, allowing a control program 25 to subscribe to receive (peek at) a portion of requested packets. Subscribe-peek does not cancel subscription to statistics.
  • Offset - is the offset where peeking is to begin within the payload of the packet.
  • Length - is the number of bytes to peek at, with 0 indicating all.
  • the Publish interface includes at least one primitive:
  • Publish (flow identifier, packet reference, requested data) - this is used by the forwarding engine 24 to publish the peek event, including the data subscribed or requested by a control program.
  • a publish message (or published peek event) is issued by forwarding engine 24 to control program 25.
  • the published peek event contains a flow identifier identifying the flow for the packet, a packet reference identifying the packet and a copy of the data from the packet (that was earlier - 17 -
  • control program 25 requested or subscribed to by control program 25.
  • the packet reference may be used by control program 25 to manipulate the packet through facility access interface 30, as described in greater detail above.
  • a node system 20 can include many executing or running control programs 25. Each control program 25 can control one or more flows. To assist in controlling the flows, each control program 25 can subscribe to particular packet data for a specific flow(s). Therefore, in the case where controller 22 includes multiple control programs 25 and each control program 25 can control one or more flows, a flow identifier is necessary in each publish message.
  • the publish primitive can be implemented in a variety of ways. For example, there may be several executing control programs 25 within system 20, wherein each control program 25 only monitors and controls a single flow. In such a case, identification of the flow can be provided implicitly from forwarding engine 24 (rather than explicitly) because there is only one control program 25 for each flow under control.
  • the meta-control interface 31 is used by the meta-controller 23 for management of the dynamically installed control programs 25.
  • Meta-control interface 31 is used by the meta-controller 23 to install, manage and control installed control programs 25.
  • the meta-control interface 31 includes two primitives implemented by the meta-controller 23 that support migration of control programs 25 " to different node systems 20.
  • the meta-control interface includes the following primitives (or methods), where each primitive includes at least a control program as an argument (e.g., provided as a name of the control program, a pointer to the control program, or provided implicitly) :
  • the create primitive has one (untyped) argument for - 18 -
  • initialization data This is the control program specific data provided in the activate message to the meta-controller 23.
  • Clone () clone a control program.
  • the clone is identical to the original control program.
  • the initial state of the clone is the current state of the "parent" at the time of cloning.
  • a clone is executed using the Run primitive.
  • resources e.g., allocated memory or buffer space and CPU or processor time.
  • Unwrap an wrapped controller: constructs a control program 25 from a previously wrapped one (using the wrapQ method).
  • Go-Flood migrates the control program 25 to all nodes downstream (that can store and run the control program) on the output ports specified.
  • the primitives for the meta-control interface 31 include operations to create a control program 25, clone an existing control program 25, and destroy a control program 25.
  • the create primitive has one (untyped) argument for initialization data. This is the controller specific data provided in the activate message to the meta- controller.
  • a controller may be wrapped (for shipping or storage), and later (re)created (unwrapped). Whether created by create, clone or unwrapping, the execution of a new control program 25 always starts by executing the Run method (primitive). - 19 -
  • a control program 25 may migrate, either jump one hop (go-next), or be "flooded” (Go-Flood) to all nodes in the flow downstream of the specified set of output ports.
  • the meta-controller 23 implements this method by wrapping the controller, and sending a message to other meta-controllers 23 (at other nodes) using Send-Next or Send-Flood respectively with the flow identifier to which the controller belongs.
  • the receiving meta-controller 23 unwraps the controller, assigns it to the specified flow, and then Runs the control program.
  • Fig. 3 is a flow chart illustrating the operation of a system according to an embodiment of the present invention. As an example, a control-on-demand router is provided at a node.
  • the control-on-demand router can be implemented using all or part of a node system 20 (including a controller 22, an interface 28 and a forwarding engine 24).
  • the router includes a default control program for controlling the forwarding of packets.
  • the router allows for customized or service- specific control programs to be installed and applied on request for several different services (e.g., telephony, E-mail, MPEG-4 video).
  • the requested control program 25 will be installed and used to provide flow-specific (or service-specific) control forwarding of packets in the flow, instead of using the default control program.
  • the default control program will continue to be used by the router to control forwarding for flows that have not requested a particular control program.
  • each service-specific control program 25 can be requested and applied to many different flows (e.g., all the telephony flows may request use of the telephony control program).
  • an MPEG-4 video application 52 running on an end-system 50 (Fig. 1) has particular control needs that have been implemented in a MPEG-4 control program. - 20 -
  • step 60 the MPEG-4 video application 52 issues a request to the router to install and run the MPEG-4 control program 25 to control a specific flow of MPEG- 4 packets from the MPEG-4 video application 52.
  • the MPEG-4 application issues the request to all routers or nodes in the flow.
  • step 60 can be performed by the MPEG-4 application sending a meta-control message to meta-controller 23 of the router using the Send-to-meta-control primitive (of message exchange interface 33) of type Activate to request that the meta-controller 23 install and run the MPEG-4 control program.
  • the flow to be controlled is identified in the flow field of the message (or which can be provided implicitly).
  • an identifier or a name of the requested control program is provided (e.g., "MPEG-4"), and either the software code of the control program or a globally valid network reference (e.g., a URL) is provided in the Code field.
  • the network reference identifies the location where the MPEG-4 control program can be obtained.
  • the meta-controller 23 may first determine whether or not the requested control program (MPEG-4 control program) is stored locally at the router. If it is not locally stored, the meta-controller 23 obtains the control program 25 from one or more packets in the meta-control message or uses the network reference to retrieve or obtain a copy of the MPEG-4 control program. Regardless where the control program is obtained from (either locally or retrieved from the network), the meta- controller 23 next installs the MPEG-4 control program, which may include, for example, verifying that the control program is correct, compiling and then running the control program. The meta-controller 23 uses the Assign primitive (in the facility access interface 30) to assign the MPEG-4 control program to the identified flow. After being installed and assigned to the specified flow, the MPEG-4 control program 25 will now control the forwarding of the packets of the specified flow. - 21 -
  • the MPEG-4 control program 25 requests (or subscribes) to receive a pre-determined portion of packets corresponding to this particular flow.
  • the MPEG-4 control program 25 can subscribe to receive (peek-at) a portion of the incoming packets of this flow using the Subscribe-Peek primitive or command, or other technique.
  • the flow identifier argument of the Subscribe-Peek command can be used to identify the flow (the group of packets of interest).
  • the flow identifier argument can identify the flow using a variety of different bit sequences in each packet (e.g., IP packets directed to a specified IP address, packets having a specific IPv6 "flow label").
  • the offset argument identifies the number of bytes or bits offset from the beginning of the packet where the peeking shall begin, and the length argument identifies the number of bytes to be provided to the MPEG-4 control program 25.
  • forwarding engine 24 identifies the received packets which are part of the flow. In other words, forwarding engine 24 identifies received packets that are part of the flow to which the peek-subscription applies. This can be performed by analyzing each packet received by forwarding engine 24. For example, forwarding engine 24 can identify the requested packets by comparing a predetermined sequence of bits (e.g., the IP address for the destination, or the IPv6 flow label) in each packet to the flow identifier. A match indicates that the packet is part of the flow which has been subscribed to or requested by the MPEG-4 control program.
  • a predetermined sequence of bits e.g., the IP address for the destination, or the IPv6 flow label
  • the subscribed (requested) portion of each packet of the flow is copied and provided with a packet reference from forwarding engine 24 to the MPEG-4 control program 25.
  • This can be done using the publish primitive, described above, or using another technique.
  • forwarding engine 24 uses the offset and length arguments (provided in the Peek-Subscribe message or primitive from controller 22) to identify the beginning and length of the portion of the packet of interest. This portion of the - 22 -
  • the packet is then copied and placed in a Publish message that also includes the flow identifier (optional) and a packet reference.
  • the flow identifier is the same as that provided by control program 25 to the forwarding engine.
  • the packet reference is assigned by forwarding engine 24 and may indicate, for example, a packet number (e.g., packet number 17).
  • the Publish message is then sent from forwarding engine 24 to the MPEG-4 control program 25. Therefore, it can be seen that, rather than routing the entire packet from the buffer 26 to control program 25, only the selected (subscribed) portion of the received packet and a packet reference (identifying the packet) is provided to control program 25. This minimizes the amount of data passing through controller 22 (in contrast to prior art systems in which the entire packets were routed to the controller) . In this manner, control program 25 can control the packets without being in the data path, and thereby avoid degradation of the packet forwarding process.
  • control program 25 analyzes the received portion of the packet to determine how the packet should be controlled (and even if the packet should be controlled at all).
  • the MPEG-4 control program implements a selective discard of packets after network congestion is detected.
  • MPEG-4 video data is hierarchically coded using three types of video frames, I-frames, P-frames and B-frames. Whereas the loss of an I-frame can affect all frames until the next I-frame, losing a small number of P and B frames only degrades quality marginally. Therefore, the MPEG-4 control program 25 selectively discards both P and B frames during network congestion.
  • control program 25 subscribes to a portion of each packet of this MPEG-4 video flow that identifies whether the packet contains I, B, or P frame data.
  • This MPEG-4 discussion is provided merely as an example of how the present invention can be implemented. However, the present invention is not limited to MPEG or video applications, or to the specific mechanisms and techniques described in the example. - 23 -
  • the MPEG-4 control program 25 based on analyzing the requested portion of each packet of the flow (step 70), the MPEG-4 control program 25 preferably applies enhancement or non-essential control on the packets of the MPEG-4 flow.
  • the MPEG-4 control program 25 uses the Discard primitive of the facility access interface 30 to selectively discard the packets containing P and B frame data.
  • the Discard message is provided to the forwarding engine 24 and includes a packet reference identifying the packet(s) to be discarded.
  • the Discard message can request the forwarding engine to discard one or more packets.
  • the control program 25 could use the Delay primitive to delay the forwarding of the P and B packets, or perform some other control function. This control on the packets performed by control program 25 is not essential for forwarding correctness.
  • control exercised by the control program 25 only operates to enhance network performance.
  • the forwarding engine 24 does not receive control information (e.g., a discard or delay message) from the control program 25 before the packet is normally (default with no control) scheduled for forwarding, the packet is forwarded and is not delayed while awaiting control information from control program 25.
  • control program 25 only operates to enhance forwarding performance and network congestion and will not degrade forwarding performance.
  • meta-controller 23 deletes or destroys the MPEG-4 control program 25 when the flow is completed. Meta-controller 23 in the router can destroy the control program using the Delete primitive of the meta-control interface 31.
  • IPv6 provides an extension header (instead of options) to allow a variable length header.
  • a new extension header is provided on at least the first packet in a flow - the courtesy copy (cc) extension header according to an embodiment of the present invention.
  • the semantics of the cc-extension are: a) that the full packet containing this - 24 -
  • extension is to be copied to meta-controller 23; and, b) that the copying to the meta- controller 23 should not delay the forwarding of the packet (i.e., it should be copied after it has been forwarded, or copied from a separate copy of the packet maintained in buffer 26 independently of the copy that will be forwarded).
  • the cc- extension header should be the first extension header.
  • Fig. 4 is a diagram illustrating how a flow transitions through four states during its life-span according to an embodiment of the present invention.
  • the flow is in the Null state 80 (the flow is unknown).
  • a packet (or datagram) arrives with a currently unknown flow identifier.
  • the flow identifier in effect, operates as a request to use the specified control program to perform flow processing.
  • the flow is identified as the beginning of a new flow at the current node.
  • a new flow state is created, and its state set to the Initialize state 82. If a packet contains a cc-extension header, a copy of the packet is forwarded to the meta- controller 23, after the original packet is routed and forwarded according to the IPv6 routing tables.
  • Subsequent packets of the flow are routed based on its flow identifier, effectively pinning the routes for the labeled flow. If the first packet does not contain a cc-extension no other flow processing is requested and the state transits to the Ignore state 86.
  • meta-controller 23 For flows requesting control-on-demand, meta-controller 23, running in user space, identifies the control program 25 requested based on the extension header. (In this manner, the first packet containing the cc extension header operates as a meta-control message requesting a specific control program for the flow). In particular the meta-controller 23 consults its cache of control programs 25 to see if the requested control program is locally available. If not, using the code reference provided in the first packet, the meta-controller 23 retrieves and installs the requested control program. On successful installation of the on-demand control - 25 -
  • the flow state becomes the Active state 84, indicating to the router that the control program 25 is ready to act on arriving datagrams or packets.
  • the controller reference in the state is updated. If the installation fails, the flow state is set to the Ignore state 86. (In principle controlled by the default policy).
  • the meta-controller 23 may change the state of a control program 25 from Active 84 to Ignore 86 at any time, for example if a control program 25 fails for some reason. While the requested control program 25 is being installed, flow datagrams or packets are simply forwarded. In addition, some general (per flow) statistics are complied. These include number of bytes served on the flow.
  • the flow specific control program 25, when ready, gains access to this information.
  • the meta-controller 23 removes the flow state and reclaims resources allocated to the flow.
  • an application can request installation of a control program for collecting statistics regarding a flow.
  • the control program is installed to observe a flow and collect statistics regarding the flow, and provide messages to the requesting application describing the flow statistics.
  • the application requests installation of a second control program for controlling the forwarding of the packets using frame peeking (e.g., a selective discard technique or other control technique can be employed).
  • the present invention includes many advantages. Three advantages of the present invention include: 1) The control can be advantageously applied (to control forwarding of packets, etc.), but the control program is not essential to forwarding correctness. 2) Control can be applied to a flow either always or on a best effort basis. According to an embodiment of the present invention, control is advanatageously applied to a flow on a best effort basis to avoid degrading forwarding performance of the forwarding engine. And, 3) Control can be applied - 26 -
  • control can be advantageously applied asynchronously to operate on one or more flows, rather than invoking a control program in response to each packet.
  • the present invention can operate on packets in the data path (the data plane) (e.g,. operate on data packets to be forwarded) or can operate on control packets (those packets that include control information). Some packets can include data and/or control information.
  • the present invention can be applied in the control plane (the data plane is involved in packet forwarding, while the control plane is involved with functions other than packet forwarding, such as providing control signals or control packets).
  • control-on-demand of the present invention can be advantageously applied.
  • applications that can combine application semantics, network location and knowledge available inside the network are primary candidates for control-on-demand.
  • Other applications can be candidates as well.
  • Stream thinning Several applications encode information in the data stream that may be interpreted as multiple separable substreams. These substreams may, for example, correspond to different media (audio, video, white-board), may correspond to layered encoding, or may encode relative importance of the packets of the flow.
  • receivers may subscribe to different substreams, for example, a subset of the media offered or different resolution by selecting a subset of the layered encoding.
  • On the other hand during congestion it is desirable to selectively discard the less important packages to promote the others.
  • the basic problem of stream thinning is to identify different substreams and filter them to accommodate - 27 -
  • the stream thinning must be performed at the branch-point inside the network where the different branches subscribe to different substreams. Selective discard requires congestion indication at the congested node to be effective.
  • the substreams are encoded using application level framing, so application semantics are necessary to identify the different substreams. This makes stream thinning ideal for control-on-demand of the present invention.
  • control on demand (application defined) layered encoding can be implemented by encoding the layer in all of the packets in the stream.
  • Application defined policies or control programs instituted at branch points of the multicast tree keep track of the layers subscribed to by each downstream branch.
  • frame peeking e.g., Subscribe-peek primitive
  • Block ( ) primitive the application level header and packet blocking
  • the control program 25 monitors its flow queue or buffer 26, using a high watermark crossing as a congestion indication.
  • control program 25 discards less important packets from its queue to make room for the higher ones. Note that in both cases the stream thinning is viable as a best effort control, although in the case of variegated multicast this requires that the receiver be able to discard unwanted packets.
  • the retransmission multicast groups are defined as sub-flows of the overall multicast flow, using receiver loss notification (NACK) to join the corresponding retransmission sub-flow.
  • NACK receiver loss notification
  • the control programs keep track of downstream "subscriptions" to retransmissions, blocking (filtering) retransmitted packets from being forwarded on those branches without an interested receiver, just as for variegated multicast above. Since receivers would normally have to cope with (ignore) duplicate packets, the blocking can be provided on a best effort basis.
  • the on-demand control programs cannot observe the original packet transmissions caching a small number of packets inside the network. Assuming that lost packages are requested from the leafs and downstream controllers, when receiving a NACK the controller retransmits the packet if it is still available in the cache. Buffering only a small number of packets is sufficient to obtain significant benefit. Moreover, since lost packets are retransmitted only about one hop round- trip after their initial transmission, a short receiver play-out buffer may be sufficient to hide the loss while meeting the real-time constraints. If cache space on the multicast tree is to be conserved, certain nodes might only cache selected packets.
  • control-on-demand is purely an enhancement, and need only be applied on best effort basis.
  • the approach is robust to failures. For example, a node may temporarily lose its cache of packets or datagrams, or may - 29 -
  • control program 25 is not be in the data path, but instead allocates a small amount of extra buffer space, and limits the (fixed) number the packets in the buffer. In this manner the flow queue or buffer 26 is used as a circular array, incurring minimal cost in the data-path.
  • the packets can be retransmitted using the Schedule-at primitive.
  • Traffic shaping may be desirable within the network to reduce peak (and effective) bandwidth and enhance traffic characteristics. This is particularly true at network seams, access, egress or peering points, where agreements or capabilities may require the delivered traffic to obey more stringent characteristics than bandwidth alone.
  • Per flow application specific bandwidth shaping and smoothing is easily implemented in a control-on-demand architecture, according to an embodiment of the present invention.
  • the controller Using information of buffer availability at sender, receiver(s) and intermediate nodes, and the shaping objectives, the controller exploits (local) network observations to optimize the shaping.
  • the controller subscribes to flow statistics, but does not have to peek at any packet.
  • the control program When activated the control program prepares a schedule as a list of ⁇ byte, rate ⁇ pairs that it then gives to the scheduler using the set-schedule primitive.
  • the control program can be operated on a best effort basis.
  • the present invention includes a method and apparatus for providing control- on-demand in a data network frame peeking that operates on data in the data path without significantly degrading the forwarding performance of the system.
  • the system includes, one or more flow-specific control programs 25 for controlling the individual flows, a meta- controller 23 for installing and managing the control programs, a forwarding engine 24 for forwarding packets, and an interface 28 for providing communication between the meta-controller, the control programs 25 and the forwarding engine 24.
  • the interface 28 is described as four interfaces.
  • a meta-control interface 31 is used by the meta- controller 23 to install, manage and control installed control programs 25.
  • the message exchange interface 33 supports communication or exchanges between applications (e.g., located at end-systems) and meta-controllers 23, and supports communication between flow-specific control programs 25 at different nodes. Part of the message exchange interface allows an application to request installation of a specific control program 25 by a meta-controller 23.
  • the subscribe/publish 32 interface allows a control program to subscribe to request specific packet information.
  • the facility access interface 30 allows a control program 25 to control the forwarding of the received packets of a specific flow.
  • an application issues a request to the meta-controller 23 to install a flow-specific control program 25 for controlling a particular flow of packets.
  • the meta-controller 23 obtains and installs a copy of the requested flow-specific control program 25.
  • a portion of each flow packet is provided to the control program 25.
  • the control program 25 can issue a message to the forwarding engine 24 to control the packet.
  • the control program 25 can control the packets without being in the data path, and thereby avoid degradation of the packet forwarding process.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A system for providing control-on-demand includes one or more flow-specific control programs for controlling specific flows of packets, a meta-controller for installing and managing the control programs, a forwarding engine for forwarding packets, and an interface for providing communication between the meta-controller, the control programs and the forwarding engine. After installing a requested control program, the control program can subscribe to receive a portion of each packet in the flow. The control program can control forwarding of the packet based on the subscribed portion of the packet to provide flow-specific control on-demand without degrading the forwarding performance of the system.

Description

CONTROL ON DEMAND IN A DATA NETWORK
BACKGROUND OF THE INVENTION The present invention is related to pending application entitled "Method and
Apparatus For Frame Peeking," attorney docket number Hjalmtysson-14, 2685-1 12774, filed on April 13, 1998, and incorporated by reference herein in its entirety.
The present invention relates to data networks, and more particularly, to a system for providing control on demand in a data network. Asynchronous Transfer Mode (ATM) has grown in popularity over the last few years. The attractiveness of ATM to large network providers comes neither from its quality of service guarantees nor its high performance, but from its potential of becoming the single transport infrastructure. Currently, the major network carriers have multiple backbone networks: A circuit switched backbone for traditional telephony services, a frame-relay or ATM backbone for corporate intra-network services, and a router-based Internet backbone for their Internet Service offerings. As communication service providers fight increased competition, managing different backbones can be cumbersome and expensive. A single network infrastructure replacing these different backbones offers economies of scale. The growth of the Internet over the last few years has made the Internet Protocol
(IP) another contender to become the only network infrastructure. When compared to ATM, IP's forte is the flexibility and robustness inherent in datagram forwarding.
For a single infrastructure to satisfy all types of communication services, it must not only support their different transport needs, but must also support their different - 2 -
control needs. ATM is already flexible and will support diverse services at various levels of quality. IP is inherently flexible. Moreover, the next version of IP (IPv6) includes a flow label that will allow mechanisms to separate different flows of packets and accommodate different services without performance penalty (a flow is a sequence or group of packets). Still each network system promotes a particular control paradigm at the expense of others. In terms of service and network control, different types of services have different objectives, and even different service philosophies. For shortlived data flows, immediate transport availability is of primary importance. Whereas, for delay-sensitive flows like interactive audio/video meeting, the delay criteria is the most important. The greatest challenge in applying such control diversity is to accommodate the special control needs of some flows without penalizing all flows. Just as specific transport requirements of one flow must not interfere with service quality of another, the control complexity of one flow must not affect other flows. One way in which a specific control of one flow interferes with other flows is if it interferes with the performance or correctness of their forwarding.
Current research in active and programmable networks may be classified either as putting essential software in the data path, or as being limited to the control plane. The approaches pursuing active networking by acting in the data path range from using capsules where every packet contains a program that must be downloaded and executed at every node, to adaptation of advanced intelligent networking (AIN) of telephony to packet networks wherein an arriving packet contains a list of menu options to be applied to the packet at every node. However, by inserting essential software in the data-path, these approaches achieve their flexibility at the significant degradation of the forwarding performance of all flows. On the other hand, although efficient in terms of forwarding, restricting the control to act only in the control-plane (e.g., without operating in the data path) severely limits the amount of control that can be exercised and thereby relegates the control to connectivity management. Many of the techniques that operate only in the control plane require the installation of the complete control - 3 -
program to be performed either off-line or during an extensive setup phase. Therefore, a need exists for flexible network control which acts both in the control plane and on data in the data path without adding software in the critical forwarding path. SUMMARY OF THE INVENTION The present invention is directed to a method and apparatus for providing control-on-demand in a data network without significantly degrading the forwarding performance of the system. According to an embodiment of the present invention, the system includes, one or more flow-specific control programs for controlling the individual flows, a meta-controller for installing and managing the control programs, a forwarding engine for forwarding packets, and an interface for providing communication between the meta-controller, the control programs and the forwarding engine.
According to an embodiment of the present invention, the interface is described as four interfaces. A meta-control interface is used by the meta-controller to install, manage, and control installed control programs. The message exchange interface supports communication or exchanges between applications (e.g., located at end- systems) and meta-controllers, and supports communication between flow-specific control programs at different nodes. Part of the message exchange interface includes an application interface that allows an application to request installation of a specific control program by a meta-controller. The Subscribe/Publish interface allows a control program to subscribe to request a predetermined portion of specific packets from the forwarding engine, and allows the forwarding engine to publish or send the requested portion of the packet to the control program. The facility access interface allows a control program to control the forwarding of the received packets of a specific flow.
According to an embodiment of the present invention, an application issues a request to the meta-controller to install a control program for controlling a particular flow of packets. The meta-controller obtains and installs a copy of the requested flow- - 4 -
specific control program. The control program can be obtained from one of several places, including from local memory at the node, or from the network based on a network reference provided from the application. The control program then requests (or subscribes) to receive a pre-determined portion of each packet in the flow. The requested portion of each packet is provided from the forwarding engine to the control program. The control program then analyzes the requested portion (e.g., based on application level semantics) and can issue a message to the forwarding engine to control the packet. By providing only a portion of each packet to the control program (rather than the complete packet), the control program can control the packets without being fully in the data path, and thereby avoid degradation of the packet forwarding process. According to an embodiment of the present invention, control is applied to packets only to enhance forwarding performance and is provided only on a best effort basis.
BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 illustrates an example of a data network according to an embodiment of the present invention.
Fig. 2 illustrates a node system 20 according to an embodiment of the present invention.
Fig. 3 is a flow chart illustrating the operation of a system according to an embodiment of the present invention.
Fig. 4 is a diagram illustrating how a flow transitions through four states during its life-span according to an embodiment of the present invention.
DETAILED DESCRIPTION Referring to the drawings in detail, wherein like numerals indicate like elements,
Fig. 1 illustrates an example of a data network according to an embodiment of the present invention. Data network 45 includes two end-systems 50A and SOB. Applications 52A and 52B are executing on end-system 50A and 50B, respectively. - 5 -
End-systems 50 are coupled together via a plurality of nodes. A system 20 is located at each node of the data network. Therefore, data network 45 includes end-systems 50A and 50B and node systems 20 (20A, 20B, etc.). The applications 52A and 52B executing on end-systems 50A and 50B can include a wide variety of applications, such as telephony, MPEG video, an E-mail program, etc. Moreover, each end-system 50 may be executing more than one application at a time, and there can be many additional end-systems 50 (not shown) connected to data network 45. As a result, there are packets being transmitted over data network 45 that are associated with a variety of services or applications (telephony, MPEG video, E-mail, etc.). Each application or service may have its own control needs or requirements.
Fig. 2 illustrates a node system 20 according to an embodiment of the present invention. System 20 includes a controller 22 for controlling operation of system 20, a forwarding engine 24 for forwarding packets input on line 34 to other nodes via output line 36. Forwarding engine 24 includes a processor, memory and other control logic (not shown). Controller 22 includes one or more control programs 25 for controlling individual flows. Controller 22 also includes a meta-controller 23 for installing and managing specific control programs 25. Meta-controller 23 does not control the specific flows, but controls the control programs 25 that control the flows. Thus, meta-controller 23 is a controller of the controllers (control programs 25). An application 52 can issue a request to the meta-controller 23 for a specific control program. The meta-controller 23 determines whether a copy of the requested control program 25 is stored or cached locally. A locally stored copy of the requested control program is used if available. If it is not stored locally, the meta-controller 23 can obtain a copy of the control program 25 (e.g, using a URL to locate the control program). Alternatively, the specified control program 25 can be provided in one or more packets and downloaded. The meta-controller 23 then installs the control program, including locating, authenticating, downloading, verifying, (possibly compiling) and running the code implementing the requested control program 25. Once installed and associated to the requesting flow, the forwarding engine 24 and the flow specific control program 25 interact directly.
System 20 can be implemented in hardware and/or software. Control programs 25 can be provided as software or programmable or alterable hardware. The packets can include Internet Protocol (IP) packets, Asynchronous Transfer Mode (ATM) application level frames (such as ATM Adaptation Layer frames), ATM cells, or the like. Control signals are input to controller 22 via line 38 and may be provided on a separate signaling connection or provided in-band. Forwarding engine 24 also includes a buffer 26 for storing packets input via line 34. System 20 also includes an interface 28 for coupling controller 22 to forwarding engine 24, among other features.
According to an embodiment of the present invention, control is provided on a per-flow basis (a flow oriented approach). A flow is a sequence or group of (usually related) packets. According to an embodiment of the present invention, there should be an identification mechanism to group packets into flows. A flow can be one or more packets that satisfy an equivalence relation. Typically, a flow can be identified by a common sequence of bits in each packet. A wide variety of bit sequences can be used for flow identification. For example, in a connection oriented network, such as an ATM network, a flow can be, for example, the group of ATM cells or application level frames corresponding to the ATM connection. In ATM, flows can be identified by information in the application level frame (such as part of an RTP frame). In a connectionless network, such as IP, a flow can be a group of packets or datagrams that are associated with each other. For example, a flow can include all IP packets from a specific IP address, or directed to a specific IP address, or having a predetermined prefix in the IP destination address (e.g., all packets directed to England). IP version 6 (IPv6) even provides a "flow label" for identifying flows of packets or IP datagrams. In IP, a predetermined IP option can be used in a group of packets to identify a flow (e.g., provided as the name or identifier of a flow). A flow could also include, for example, packets providing data of a particular type, such as MPEG-4 video packets or telephony - 7 -
packets. In such case, the type of data (e.g., MPEG-4 video data) carried in the packet payload may be identified in the packet header or other location in the packet. As a further illustrative example, a flow can be a specific sequence of packets for a particular service (e.g., telephony, MPEG video) to or from a specific address. There are several advantages to a flow oriented approach. One is to amortize the "investment" of installing the on-demand control program 25. Another one is that many important applications only apply at flow or connectivity level, for example, group management or floor control in a multicast. Lastly, some important data-path applications, for example smoothing or active retransmission, depend on correlating multiple packets of a flow.
According to an embodiment of the present invention, control on demand is provided for many different flows. A control program 25 can be requested by an application, installed and applied to a specified flow of packets. In contrast to previous work on active and programmable networks, the control on demand according to an embodiment of the present invention acts both in the control plane and in the data plane, without adding necessary software in the critical forwarding path. Instead of applying essential programs on-line to every packet, an embodiment of the present invention uses enhancement controls applied asynchronously from a flow-specific control program 25 to the forwarding engine 24. The forwarding engine 24 performs basic and multicast forwarding. To further reduce potential performance impact, the enhancement controls can be applied on a best-effort basis. By best-effort basis, it is meant that the control program 25 can control the forwarding of received packets only if control signals to arrive from control program 25 are received before the packet is forwarded by forwarding engine 24. Control from control program 25 is applied to packets on a best-effort basis because packets are not held or delayed in the buffer 26 while waiting for control signals to arrive from control program 25. If control signals are not received by forwarding engine prior to the normal forwarding of a packet, then the packet is forwarded normally (without influence from control program 25). In this - 8 -
manner, the control-on-demand of the present invention achieves the flexibility and functionality of the in-data-path approaches while achieving the forwarding efficiency of controls that only act in the control-plane.
Rather than the architecture specifying a single control program, the present invention is designed to accommodate customized control programs for each flow. Conceptually the control programs are transient per flow entities. According to an embodiment of the present invention, the control programs 25 are the control logic supplied by the application (during the flow initialization for example) as an active autonomous object (agent). A general purpose control processing facility, a meta- controller 23, loads and executes the control logic for the duration of the flow, but may destroy the control program upon the flow's completion. More practically, control programs 25 may be requested by name (e.g., telephony, or MPEG video), allowing nodes to cache controllers for popular services thereby amortizing the installation cost over multiple flows. In general, the architectural principles discussed for the present invention apply to ATM networks, Internet (IP) networks and others as well.
An important aspect of the present invention is a strong separation between the forwarding engine 24 and the controller (control program 25). To achieve this separation, an embodiment of the present invention includes a mechanism for frame peeking or packet peeking - a mechanism that enables a control program 25 to peek at a portion of a packet (e.g., an IP packet, or an ATM frame or an ATM cell) rather than receive the full packet. Enabling the control program to only peek at parts of each packet as opposed to having to be fully in the data path enhances the efficiency by reducing the bandwidth into the controller and by avoiding removing the packet from the buffers of the forwarding engine. More specifically, an ATM switch may support frame peeking without doing reassembly and segmentation of the (full) frame. In an - 9 -
software IP router, packet peeking limits the bandwidth from the kernel router to the flow controller executing in user space.
Rather than catering to or providing controls that are essential, an embodiment of the present invention is optimized for enhancement controls, specifically, control programs 25 that are not essential for forwarding correctness, but enhance performance and/or perceived service quality. Similar to soft state, enhancement control of the present invention is robust to failures, network heterogeneity and non-cooperation (or legacy). In other words, the present invention does not require all nodes to implement or provide control on demand of the present invention. In contrast, control that is essential for correctness requires all nodes in the path to execute (implement) the control.
To further reduce the on-line (real-time) demands, the control can advantageously be applied on a best-effort basis. Since the control is for enhancement only it is not essential that it be applied to every packet. Although some applications may require that the control is consistently applied, applications designed for the Internet have less stringent needs, gracefully adapting to changing conditions in transmission and control, but increase in utility when the control is applied more consistently. According to an embodiment of the present invention, the control program 25 for such applications is not essential and is not assumed, so it is sufficient that the control be applied on best effort basis.
The specific details of interface 28 (Fig. 2) will now be described, followed by some examples to explain several features of the present invention.
The Interfaces
Referring to Fig. 2, interface 28 is divided into four interfaces: a facility access interface 30, a message exchange interface 33, a Subscribe/Publish interface 32 and a meta-control interface 31. Meta-control interface 31 is used by the meta- controller 23 to install, manage and control installed control programs 25. The - 10 -
message exchange interface 33 supports communication or exchanges between applications (e.g., located at end-systems) and meta-controllers 23, and supports communication between flow-specific control programs 25 at different nodes. Part of the message exchange interface 33 includes an application interface that allows an application to request the meta-controller 23 to install a specific control program. The Subscribe/Publish interface 32 allows a control program 25 to subscribe to request a predetermined portion of specific packets from forwarding engine 24, and allows forwarding engine 24 to publish or send the requested portion of the packet to the control program 25. Facility access interface 30 allows a control program 25 to control the forwarding of the received packets of a specific flow. Although certain functions of the present invention are explained in terms of these interfaces, those skilled in the art will appreciate that there are many different ways to implement the functionality and features of the present invention. The four interfaces of interface 28 are merely exemplary and provide only one possible embodiment of the present invention. These interfaces are described in greater detail below.
The Facility Access Interface 30
Facility access interface 30 provides access to the resources of forwarding engine 24. Interface 30 is used by meta-controller 23 to assign controllers to flows, and by control programs 25 (which are flow-specific) with access to the resources of forwarding engine 24. The only primitive used by meta-controller 23 (Assign) assigns a control program 25 to a flow. Control program assignment can be changed dynamically. In its discretion, the meta-controller 23 can assign a null control program (e.g., no control program) to a flow. The facility access interface 30 reflects (or provides access to) specific capabilities of the forwarding engine 24, such as scheduling, but hides the details of the specific forwarding technology. In - 1 1 -
particular, if the forwarding engine 24 is an ATM switch, the facility access interface 30 does not give access to VC/VP tables of the switch.
Some of the facility access primitives according to an embodiment of the present invention are listed below.
The Facility Access Interface 30:
Meta-Controller Actions:
Assign (flow identifier, control program reference): this allows the meta- controller 23 to assign a control program 25 to control a specific flow of packets. Notifications: topology-change<set of inputs, set of outputs): this allows a control program 25 to communicate to forwarding engine 24 a topology change for a flow.
Actions on flows (implicit argument: flow identifier): i) Reservations Reserve-Buffer (packets, bytes): can be used by a control program 25 to reserve a specified portion (in bytes or packets) of buffer 26 to store packets of this flow. This specifies a maximum number of bytes and packets. Specifying a maximum number of packets allows the control program 25 to efficiently limit the total number of packets buffered at the node, for example, for active retransmission. Reserve-Bandwidth (bandwidth, set of ports): can be used by a control program to reserve a specified bandwidth on the identified set of ports (an input port and an output port);
Set-Schedule (ordered list of {byte number, rate} pairs): The control program 25 can interact with the scheduler at the node by providing a list of start byte, rate pairs {bj, r . After bj bytes, packets are forwarded at rate Tj. If the control program 25 is - 12 -
not activated for some minimum amount of time, the rate simply remains unchanged. This supports flow level smoothing while minimizing coupling between the control program 25 and the forwarding engine 24, and allows us to do flow level smoothing (reducing variability of traffic) on a best effort basis. Set-Attribute (list of {attribute, value} paris: This primitive supports assignment to named attributes. It is used for scheduling property selection, queue priority and more. For example it can be used to set or assign the bandwidth of the node system (e.g., 1500 Bytes/sec). ii) Forwarding Control I block (subset of input ports): blocks input on the subset of ports specified.
Arriving packets on these ports are discarded. Blocking is removed on a port by excluding that port from the subset of a subsequent block.
O block (subset of output ports): blocks output on the subset of ports specified. Blocking an output port effectively removes that port from the topology. Blocking is removed on a port by excluding that port from the subset of a subsequent block. (I block and 0 block can manipulate virtual circuit tables in a connection-oriented flow).
Delay (Δ-time, subset of output ports): schedules arriving packets for forwarding at least Δ-time units after arrival, on the subset of ports specified. Actions on individual packets (implicit argument: packet reference):
Release-at (time, subset of output ports): schedules packet for departure on the subset of output ports specified. The primary use of this primitive is to allow a control program 25 to reschedule a retransmission of a packet (currently in buffer 26) to react to downstream losses, and also allows the control program 25 to explicitly schedule a particular packet for transmission. - 13 -
Block (subset of output ports): blocks packet on the subset of output ports specified.
Discard ( ) : discards the packet, and removes it from the flow buffer.
A control program 25 can control packets via facility access interface 30. A control program 25 can control flows or individual packets. At the flow level, a control program 25 can block packets on input ports or on output ports, or can schedule arriving packets of a flow for a delayed output. A control program 25 can similarly block or delay the output of individual packets using a packet reference to identify each packet to be blocked or delayed. On connection oriented hardware (e.g., an ATM switch), these primitives would manipulate virtual circuit (VC) tables, whereas in a connectionless router (e.g., an IP router), an output port is blocked. Therefore, control program 25 can control the fate of each packet of the flow entering forwarding engine 24 without being fully in the data path. A control program 25 can discard a packet, reschedule (delay) the transmission of a packet, or can do nothing and allow the packet to be forwarded normally by forwarding engine 14. In contrast to an in-data-path solution, this set of primitives supports flow level connectivity management without being fully in the data-path.
There is a window of opportunity, namely from the time the packet arrives until it is forwarded, within which the control program 25 must be run or executed in order to control the packet. However, according to an embodiment of the present invention, control is applied to packets on a best effort basis. Therefore, if a control program 25 misses some packets, then the packets will be forwarded under normal operation of forwarding engine 24 and no harm is done. A control program 25 may impose a fixed packet delay to increase the size of this window and thus relaxing real-time constraints on packet scheduling. This reduces the overhead of context switching, by making the control program 25 work on multiple packets each time it is run. - 14 -
The Message Exchange Interface 33
The Message Exchange Interface 33 supports communication between application(s) 52 and meta-controllers 23 and between (two or more) flow-specific control programs 25 (service specific signaling). The Message Exchange Interface 33 includes the following primitives:
Send-to-meta-control (flow identifier, request-type, request-data) Request-type (one of) : activate, inform, control Request-data - request type dependent:
Activate: control program name (type), code (reference), data - optional
Inform: list of attributes Control: list of {attribute, value} pairs Send-flood (flow identifier, set of output ports, data) Send-next (flow identifier, set of output ports, data) Message exchange interface 33 has three primitives: a) Meta-control, b) Send-flood, and c) Send-next. Whereas the first is the primitive used by applications 52 to interact with meta-controllers 23, the second and third primitives are used by the application 52 and the control programs 25 to perform application level signaling. All of the primitives of interface 33 take a flow identifier as an argument. In addition, a meta-control message sent using the first primitive, has two parameters, a request type, that is one of activate, inform, or control and request data which is request type specific. A meta-control message is sent to all meta-controllers 23 of the flow unchanged (i.e., intermediate controllers cannot change it). The primary meta-control message is an activate message used by applications 52 to install and activate a specific control program 25 in all meta-controllers 23 over a communications link (e.g., all meta-controllers 23 on nodes coupled between - 15 -
applications 52 A and 52B, Fig. 1). The request data for an activate message contains three mandatory parameters: a flow identifier (which may be provided implicitly), a control program type name, and a control program implementation (reference), optionally followed by arbitrary control program-specific parameters. The control program implementation parameter either contains the code (the software for the control program), or is a globally valid network reference, a URL for example, from where the specified control program may be retrieved by meta- controller 23. The meta-controller 23 is responsible for locating, downloading (e.g., retrieving the program from the URL location), installing (e.g., compiling and verifying the compiled code), and activating
(e.g, running or executing) the control program 25 requested by the application 52.
For an inform message, the data contains a flow identifier, followed by a list of attributes whose values are returned. If the attribute list is empty, a list of all attributes defined for the particular flow is returned. Similarly the control message contains a flow identifier, followed by a list of attribute value pairs.
The send-flood and send-next primitives are used for service specific (signaling) messages between installed control programs at different node systems 20, and for (signaling) exchanges between meta-controllers 23 at different node systems 20. The meta-control messages are distinguished from the others by setting the flow identifier to zero. Both primitives take two additional parameters, a set of output ports, and service specific data. The message is output on the ports specified, and are either "flooded" in the case of send-flood (to send the message to all flow- specific meta-controllers 23) or sent to the "next" flow specific controller(s) only in the case of send-next. In the case of send-next, the message may not be sent to the next node system 20, but rather is sent to the next node system 20 executing the same control program 25. Similarly, send-flood does not send the message to all node systems 20, but rather, sends the message to all of the same control program 25 (e.g., on different systems 20). - 16 -
The Subscribe/Publish Interface 32
The Subscribe/Publish Interface 32 allows (flow-specific) control programs 25 to subscribe to (request) events and information to be published (on request) by forwarding engine 24. According to an embodiment of the present invention, Subscribe/Publish interface 32 includes three primitives (or commands) that allows control program 25 to subscribe (or request) to receive a copy of received packets of a flow, and a primitive or command that allows the forwarding engine 24 to publish or provide the requested information to control program 25. The Subscribe primitives include: Subscribe-Stats (flow identifier) - requests a subscription to simple flow statistics, such as number of packets and bytes transmitted since last invocation, or the number of bytes (or packets) currently in the buffer 26 using the subscribe-stats primitive. If the flow identifier is set to 0, the control program 25 receives nodal statistics about buffer length and packet loss rate. Subscribe-Peek (flow identifier, offset, length) - implements packet peeking or frame peeking according to an embodiment of the present invention, allowing a control program 25 to subscribe to receive (peek at) a portion of requested packets. Subscribe-peek does not cancel subscription to statistics. Offset - is the offset where peeking is to begin within the payload of the packet. Length - is the number of bytes to peek at, with 0 indicating all.
Subscribe-Ignore (flow identifier) - cancels all subscriptions.
The Publish interface includes at least one primitive:
Publish (flow identifier, packet reference, requested data) - this is used by the forwarding engine 24 to publish the peek event, including the data subscribed or requested by a control program. A publish message (or published peek event) is issued by forwarding engine 24 to control program 25. The published peek event contains a flow identifier identifying the flow for the packet, a packet reference identifying the packet and a copy of the data from the packet (that was earlier - 17 -
requested or subscribed to by control program 25). The packet reference may be used by control program 25 to manipulate the packet through facility access interface 30, as described in greater detail above.
A node system 20 can include many executing or running control programs 25. Each control program 25 can control one or more flows. To assist in controlling the flows, each control program 25 can subscribe to particular packet data for a specific flow(s). Therefore, in the case where controller 22 includes multiple control programs 25 and each control program 25 can control one or more flows, a flow identifier is necessary in each publish message. However, as with all of the primitives described herein, the publish primitive can be implemented in a variety of ways. For example, there may be several executing control programs 25 within system 20, wherein each control program 25 only monitors and controls a single flow. In such a case, identification of the flow can be provided implicitly from forwarding engine 24 (rather than explicitly) because there is only one control program 25 for each flow under control. The Meta-Control Interface 31
The meta-control interface 31 is used by the meta-controller 23 for management of the dynamically installed control programs 25. Meta-control interface 31 is used by the meta-controller 23 to install, manage and control installed control programs 25. In addition, the meta-control interface 31 includes two primitives implemented by the meta-controller 23 that support migration of control programs 25 "to different node systems 20. The meta-control interface includes the following primitives (or methods), where each primitive includes at least a control program as an argument (e.g., provided as a name of the control program, a pointer to the control program, or provided implicitly) :
Create (data): constructs a new control program 25, passing data to the instance for initialization. The create primitive has one (untyped) argument for - 18 -
initialization data. This is the control program specific data provided in the activate message to the meta-controller 23.
Clone (): clone a control program. The clone is identical to the original control program. The initial state of the clone is the current state of the "parent" at the time of cloning. A clone is executed using the Run primitive.
Delete (): destroy a control program 25, reclaiming all of its resources (e.g., allocated memory or buffer space and CPU or processor time).
Run (): execute the flow controller. Initial point of execution for all new controllers, including clones and unwrapped control programs. Wrap (): wraps the control program (and its state) for transmission. Returns a wrapped controller (byte string) (like the serialize function in JAVA).
Unwrap (an wrapped controller): constructs a control program 25 from a previously wrapped one (using the wrapQ method).
Go-Next (set of output ports): migrates the control program 25 one hop, on the output ports specified. This migrates the control program 25 to the next system 20 that can receive control programs (not all nodes will store and run control programs).
Go-Flood (set of output ports): migrates the control program 25 to all nodes downstream (that can store and run the control program) on the output ports specified.
The primitives for the meta-control interface 31 include operations to create a control program 25, clone an existing control program 25, and destroy a control program 25. The create primitive has one (untyped) argument for initialization data. This is the controller specific data provided in the activate message to the meta- controller. A controller may be wrapped (for shipping or storage), and later (re)created (unwrapped). Whether created by create, clone or unwrapping, the execution of a new control program 25 always starts by executing the Run method (primitive). - 19 -
A control program 25 may migrate, either jump one hop (go-next), or be "flooded" (Go-Flood) to all nodes in the flow downstream of the specified set of output ports. The meta-controller 23 implements this method by wrapping the controller, and sending a message to other meta-controllers 23 (at other nodes) using Send-Next or Send-Flood respectively with the flow identifier to which the controller belongs. The receiving meta-controller 23 unwraps the controller, assigns it to the specified flow, and then Runs the control program. The present invention will now be described using several examples. Examples Fig. 3 is a flow chart illustrating the operation of a system according to an embodiment of the present invention. As an example, a control-on-demand router is provided at a node. The control-on-demand router can be implemented using all or part of a node system 20 (including a controller 22, an interface 28 and a forwarding engine 24). The router includes a default control program for controlling the forwarding of packets. In addition, the router allows for customized or service- specific control programs to be installed and applied on request for several different services (e.g., telephony, E-mail, MPEG-4 video). Thus, for specified flows, the requested control program 25 will be installed and used to provide flow-specific (or service-specific) control forwarding of packets in the flow, instead of using the default control program. However, the default control program will continue to be used by the router to control forwarding for flows that have not requested a particular control program. There may be, for example, thousands of flows being processed by the router, and hundreds of different service-specific control programs may be installed to control the different flows. Therefore, each service-specific control program 25 can be requested and applied to many different flows (e.g., all the telephony flows may request use of the telephony control program). In this example, an MPEG-4 video application 52 running on an end-system 50 (Fig. 1) has particular control needs that have been implemented in a MPEG-4 control program. - 20 -
At step 60, the MPEG-4 video application 52 issues a request to the router to install and run the MPEG-4 control program 25 to control a specific flow of MPEG- 4 packets from the MPEG-4 video application 52. (Actually, the MPEG-4 application issues the request to all routers or nodes in the flow). According to an embodiment of the present invention, step 60 can be performed by the MPEG-4 application sending a meta-control message to meta-controller 23 of the router using the Send-to-meta-control primitive (of message exchange interface 33) of type Activate to request that the meta-controller 23 install and run the MPEG-4 control program. The flow to be controlled is identified in the flow field of the message (or which can be provided implicitly). Also, an identifier or a name of the requested control program is provided (e.g., "MPEG-4"), and either the software code of the control program or a globally valid network reference (e.g., a URL) is provided in the Code field. The network reference identifies the location where the MPEG-4 control program can be obtained. At step 62, meta-controller 23 of the router obtains and installs a copy of the
MPEG-4 control program 25. According to an embodiment of the present invention, the meta-controller 23 may first determine whether or not the requested control program (MPEG-4 control program) is stored locally at the router. If it is not locally stored, the meta-controller 23 obtains the control program 25 from one or more packets in the meta-control message or uses the network reference to retrieve or obtain a copy of the MPEG-4 control program. Regardless where the control program is obtained from (either locally or retrieved from the network), the meta- controller 23 next installs the MPEG-4 control program, which may include, for example, verifying that the control program is correct, compiling and then running the control program. The meta-controller 23 uses the Assign primitive (in the facility access interface 30) to assign the MPEG-4 control program to the identified flow. After being installed and assigned to the specified flow, the MPEG-4 control program 25 will now control the forwarding of the packets of the specified flow. - 21 -
At step 64, the MPEG-4 control program 25 requests (or subscribes) to receive a pre-determined portion of packets corresponding to this particular flow. The MPEG-4 control program 25 can subscribe to receive (peek-at) a portion of the incoming packets of this flow using the Subscribe-Peek primitive or command, or other technique. The flow identifier argument of the Subscribe-Peek command can be used to identify the flow (the group of packets of interest). The flow identifier argument can identify the flow using a variety of different bit sequences in each packet (e.g., IP packets directed to a specified IP address, packets having a specific IPv6 "flow label"). The offset argument identifies the number of bytes or bits offset from the beginning of the packet where the peeking shall begin, and the length argument identifies the number of bytes to be provided to the MPEG-4 control program 25.
At step 66, packets are received at forwarding engine 24 and stored in buffer 26. Forwarding engine 24 identifies the received packets which are part of the flow. In other words, forwarding engine 24 identifies received packets that are part of the flow to which the peek-subscription applies. This can be performed by analyzing each packet received by forwarding engine 24. For example, forwarding engine 24 can identify the requested packets by comparing a predetermined sequence of bits (e.g., the IP address for the destination, or the IPv6 flow label) in each packet to the flow identifier. A match indicates that the packet is part of the flow which has been subscribed to or requested by the MPEG-4 control program.
At step 68, the subscribed (requested) portion of each packet of the flow is copied and provided with a packet reference from forwarding engine 24 to the MPEG-4 control program 25. This can be done using the publish primitive, described above, or using another technique. According to an embodiment of the present invention, forwarding engine 24 uses the offset and length arguments (provided in the Peek-Subscribe message or primitive from controller 22) to identify the beginning and length of the portion of the packet of interest. This portion of the - 22 -
packet is then copied and placed in a Publish message that also includes the flow identifier (optional) and a packet reference. The flow identifier is the same as that provided by control program 25 to the forwarding engine. The packet reference is assigned by forwarding engine 24 and may indicate, for example, a packet number (e.g., packet number 17). The Publish message is then sent from forwarding engine 24 to the MPEG-4 control program 25. Therefore, it can be seen that, rather than routing the entire packet from the buffer 26 to control program 25, only the selected (subscribed) portion of the received packet and a packet reference (identifying the packet) is provided to control program 25. This minimizes the amount of data passing through controller 22 (in contrast to prior art systems in which the entire packets were routed to the controller) . In this manner, control program 25 can control the packets without being in the data path, and thereby avoid degradation of the packet forwarding process.
At Step 70, control program 25 analyzes the received portion of the packet to determine how the packet should be controlled (and even if the packet should be controlled at all). In this MPEG-4 example, the MPEG-4 control program implements a selective discard of packets after network congestion is detected. MPEG-4 video data is hierarchically coded using three types of video frames, I-frames, P-frames and B-frames. Whereas the loss of an I-frame can affect all frames until the next I-frame, losing a small number of P and B frames only degrades quality marginally. Therefore, the MPEG-4 control program 25 selectively discards both P and B frames during network congestion. Therefore, the control program 25 subscribes to a portion of each packet of this MPEG-4 video flow that identifies whether the packet contains I, B, or P frame data. This MPEG-4 discussion is provided merely as an example of how the present invention can be implemented. However, the present invention is not limited to MPEG or video applications, or to the specific mechanisms and techniques described in the example. - 23 -
At Step 72, based on analyzing the requested portion of each packet of the flow (step 70), the MPEG-4 control program 25 preferably applies enhancement or non-essential control on the packets of the MPEG-4 flow. For this example, during network congestion, the MPEG-4 control program 25 uses the Discard primitive of the facility access interface 30 to selectively discard the packets containing P and B frame data. The Discard message is provided to the forwarding engine 24 and includes a packet reference identifying the packet(s) to be discarded. The Discard message can request the forwarding engine to discard one or more packets. Likewise, the control program 25 could use the Delay primitive to delay the forwarding of the P and B packets, or perform some other control function. This control on the packets performed by control program 25 is not essential for forwarding correctness. The control exercised by the control program 25 only operates to enhance network performance. According to an embodiment of the present invention, if the forwarding engine 24 does not receive control information (e.g., a discard or delay message) from the control program 25 before the packet is normally (default with no control) scheduled for forwarding, the packet is forwarded and is not delayed while awaiting control information from control program 25. In this manner, control program 25 only operates to enhance forwarding performance and network congestion and will not degrade forwarding performance. At step 74, meta-controller 23 deletes or destroys the MPEG-4 control program 25 when the flow is completed. Meta-controller 23 in the router can destroy the control program using the Delete primitive of the meta-control interface 31.
IPv6 provides an extension header (instead of options) to allow a variable length header. To implement the message exchange interface efficiently in IPv6, a new extension header is provided on at least the first packet in a flow - the courtesy copy (cc) extension header according to an embodiment of the present invention. The semantics of the cc-extension are: a) that the full packet containing this - 24 -
extension is to be copied to meta-controller 23; and, b) that the copying to the meta- controller 23 should not delay the forwarding of the packet (i.e., it should be copied after it has been forwarded, or copied from a separate copy of the packet maintained in buffer 26 independently of the copy that will be forwarded). Furthermore, the cc- extension header should be the first extension header. These semantics implement a lightweight signaling and ensure that packets carrying the cc-extension headers can be forwarded in a fast path (e.g., without delay).
Fig. 4 is a diagram illustrating how a flow transitions through four states during its life-span according to an embodiment of the present invention. Initially, the flow is in the Null state 80 (the flow is unknown). A packet (or datagram) arrives with a currently unknown flow identifier. The flow identifier, in effect, operates as a request to use the specified control program to perform flow processing. The flow is identified as the beginning of a new flow at the current node. A new flow state is created, and its state set to the Initialize state 82. If a packet contains a cc-extension header, a copy of the packet is forwarded to the meta- controller 23, after the original packet is routed and forwarded according to the IPv6 routing tables. Subsequent packets of the flow are routed based on its flow identifier, effectively pinning the routes for the labeled flow. If the first packet does not contain a cc-extension no other flow processing is requested and the state transits to the Ignore state 86.
For flows requesting control-on-demand, meta-controller 23, running in user space, identifies the control program 25 requested based on the extension header. (In this manner, the first packet containing the cc extension header operates as a meta-control message requesting a specific control program for the flow). In particular the meta-controller 23 consults its cache of control programs 25 to see if the requested control program is locally available. If not, using the code reference provided in the first packet, the meta-controller 23 retrieves and installs the requested control program. On successful installation of the on-demand control - 25 -
program 25, the flow state becomes the Active state 84, indicating to the router that the control program 25 is ready to act on arriving datagrams or packets. In addition, the controller reference in the state is updated. If the installation fails, the flow state is set to the Ignore state 86. (In principle controlled by the default policy). The meta-controller 23 may change the state of a control program 25 from Active 84 to Ignore 86 at any time, for example if a control program 25 fails for some reason. While the requested control program 25 is being installed, flow datagrams or packets are simply forwarded. In addition, some general (per flow) statistics are complied. These include number of bytes served on the flow. The flow specific control program 25, when ready, gains access to this information. Upon flow termination, determined either by inactivity (time-out) or via an explicit notification from the control program 25, the meta-controller 23 removes the flow state and reclaims resources allocated to the flow.
According to an embodiment of the present invention, an application can request installation of a control program for collecting statistics regarding a flow. In response to the request, the control program is installed to observe a flow and collect statistics regarding the flow, and provide messages to the requesting application describing the flow statistics. After a predetermined event (e.g., when congestion occurs at the system), the application requests installation of a second control program for controlling the forwarding of the packets using frame peeking (e.g., a selective discard technique or other control technique can be employed).
The present invention includes many advantages. Three advantages of the present invention include: 1) The control can be advantageously applied (to control forwarding of packets, etc.), but the control program is not essential to forwarding correctness. 2) Control can be applied to a flow either always or on a best effort basis. According to an embodiment of the present invention, control is advanatageously applied to a flow on a best effort basis to avoid degrading forwarding performance of the forwarding engine. And, 3) Control can be applied - 26 -
either synchronously (a control program is invoked or resumed in response to receipt of each packet), or asynchronously (a control program is invoked or resumed independently of the arrival of packets). Because an embodiment of the present invention applies customized control on demand to flows (flow oriented approach), control can be advantageously applied asynchronously to operate on one or more flows, rather than invoking a control program in response to each packet.
In addition, the present invention can operate on packets in the data path (the data plane) (e.g,. operate on data packets to be forwarded) or can operate on control packets (those packets that include control information). Some packets can include data and/or control information. The present invention can be applied in the control plane (the data plane is involved in packet forwarding, while the control plane is involved with functions other than packet forwarding, such as providing control signals or control packets).
There are several additional applications in which control-on-demand of the present invention can be advantageously applied. In particular, applications that can combine application semantics, network location and knowledge available inside the network are primary candidates for control-on-demand. Other applications can be candidates as well. Stream thinning Several applications encode information in the data stream that may be interpreted as multiple separable substreams. These substreams may, for example, correspond to different media (audio, video, white-board), may correspond to layered encoding, or may encode relative importance of the packets of the flow. In a variegated multicast, receivers may subscribe to different substreams, for example, a subset of the media offered or different resolution by selecting a subset of the layered encoding. On the other hand during congestion it is desirable to selectively discard the less important packages to promote the others. The basic problem of stream thinning is to identify different substreams and filter them to accommodate - 27 -
these needs. For variegated multicast, the stream thinning must be performed at the branch-point inside the network where the different branches subscribe to different substreams. Selective discard requires congestion indication at the congested node to be effective. In both cases, the substreams are encoded using application level framing, so application semantics are necessary to identify the different substreams. This makes stream thinning ideal for control-on-demand of the present invention. Variegated Multicast
Using control on demand, (application defined) layered encoding can be implemented by encoding the layer in all of the packets in the stream. Application defined policies or control programs instituted at branch points of the multicast tree keep track of the layers subscribed to by each downstream branch. Using frame peeking (e.g., Subscribe-peek primitive) to examine the application level header and packet blocking (Block ( ) primitive) on a packet, the on-demand control programs may then peek at the layer information blocking the forwarding of higher layers onto the lower bandwidth links, thus tailoring the multicast to the receiver's needs at each branch point. Similarly, to implement selective discard, the control program 25 monitors its flow queue or buffer 26, using a high watermark crossing as a congestion indication. During congestion, using the same primitives as above the control program 25 discards less important packets from its queue to make room for the higher ones. Note that in both cases the stream thinning is viable as a best effort control, although in the case of variegated multicast this requires that the receiver be able to discard unwanted packets. Reliable Multicast
Towsley and others at University of Massachusetts have proposed a mechanism for reliable multicast, where retransmission of a datagram is limited to only those who have not successfully received it. This is achieved by creating a multicast group per retransmitted datagram, which interested receivers join. Their analysis shows that with high probability only a small number of groups is active at - 28 -
any time. Implementing this scheme however using traditional multicast is prohibitively expensive. However, this solution to reliable multicast can be elegantly implemented in the control-on-demand architecture according to an embodiment of the present invention. Using application level framing, the retransmission multicast groups are defined as sub-flows of the overall multicast flow, using receiver loss notification (NACK) to join the corresponding retransmission sub-flow. The control programs keep track of downstream "subscriptions" to retransmissions, blocking (filtering) retransmitted packets from being forwarded on those branches without an interested receiver, just as for variegated multicast above. Since receivers would normally have to cope with (ignore) duplicate packets, the blocking can be provided on a best effort basis. Active Retransmission for Reliable Multicast with Real-Time Constraints
Exploiting the control-on-demand paradigm of the present invention even further, the on-demand control programs cannot observe the original packet transmissions caching a small number of packets inside the network. Assuming that lost packages are requested from the leafs and downstream controllers, when receiving a NACK the controller retransmits the packet if it is still available in the cache. Buffering only a small number of packets is sufficient to obtain significant benefit. Moreover, since lost packets are retransmitted only about one hop round- trip after their initial transmission, a short receiver play-out buffer may be sufficient to hide the loss while meeting the real-time constraints. If cache space on the multicast tree is to be conserved, certain nodes might only cache selected packets. For example, using a simple modulus function on the packet or datagram identifier, would spread the datagrams or packets along the caches in the multicast tree. As for the reliable multicast described above, according to an embodiment of the present invention, the control-on-demand is purely an enhancement, and need only be applied on best effort basis. Moreover the approach is robust to failures. For example, a node may temporarily lose its cache of packets or datagrams, or may - 29 -
miss a set of acknowledgments, or NACKs. In addition, using interface 28, the control program 25 is not be in the data path, but instead allocates a small amount of extra buffer space, and limits the (fixed) number the packets in the buffer. In this manner the flow queue or buffer 26 is used as a circular array, incurring minimal cost in the data-path. The packets can be retransmitted using the Schedule-at primitive. Per Flow Application-Specified Bandwidth Shaping
Traffic shaping may be desirable within the network to reduce peak (and effective) bandwidth and enhance traffic characteristics. This is particularly true at network seams, access, egress or peering points, where agreements or capabilities may require the delivered traffic to obey more stringent characteristics than bandwidth alone. Per flow application specific bandwidth shaping and smoothing is easily implemented in a control-on-demand architecture, according to an embodiment of the present invention. Using information of buffer availability at sender, receiver(s) and intermediate nodes, and the shaping objectives, the controller exploits (local) network observations to optimize the shaping. The controller subscribes to flow statistics, but does not have to peek at any packet. When activated the control program prepares a schedule as a list of {byte, rate} pairs that it then gives to the scheduler using the set-schedule primitive. The control program can be operated on a best effort basis.
The present invention includes a method and apparatus for providing control- on-demand in a data network frame peeking that operates on data in the data path without significantly degrading the forwarding performance of the system. According to an embodiment of the present invention, the system includes, one or more flow-specific control programs 25 for controlling the individual flows, a meta- controller 23 for installing and managing the control programs, a forwarding engine 24 for forwarding packets, and an interface 28 for providing communication between the meta-controller, the control programs 25 and the forwarding engine 24. - 30 -
According to an embodiment of the present invention, the interface 28 is described as four interfaces. A meta-control interface 31 is used by the meta- controller 23 to install, manage and control installed control programs 25. The message exchange interface 33 supports communication or exchanges between applications (e.g., located at end-systems) and meta-controllers 23, and supports communication between flow-specific control programs 25 at different nodes. Part of the message exchange interface allows an application to request installation of a specific control program 25 by a meta-controller 23. The subscribe/publish 32 interface allows a control program to subscribe to request specific packet information. The facility access interface 30 allows a control program 25 to control the forwarding of the received packets of a specific flow.
According to an embodiment of the present invention, an application issues a request to the meta-controller 23 to install a flow-specific control program 25 for controlling a particular flow of packets. The meta-controller 23 obtains and installs a copy of the requested flow-specific control program 25. A portion of each flow packet is provided to the control program 25. The control program 25 can issue a message to the forwarding engine 24 to control the packet. By providing only a portion of the packets to the control program 25 (rather than the complete packet), the control program 25 can control the packets without being in the data path, and thereby avoid degradation of the packet forwarding process.

Claims

- 31 -
WHAT IS CLAIMED IS:
L A method of providing control on demand in a data network comprising the steps of: receiving a request for installation of a control program for controlling the forwarding of a group of packets; obtaining the control program; installing the control program at a node in the data network; receiving a packet of the group of packets; identifying that the received packet is part of the group; controlling the forwarding of the received packet based on the installed control program.
2. The method of claim 1 wherein said step of receiving a request comprises the step of receiving a message from an application to a node system requesting installation of a control program for controlling forwarding of packets received at the node system.
3. The method of claim 2 wherein said message includes a flow identifier that identifies the group of packets to be controlled and an identification of the control program to be installed.
4. The method of claim 3 wherein said message further includes at least one of: code for the control program; and a valid network reference where the control program can be retrieved. - 32 -
5. The method of claim 1 wherein said step of installing the control program comprises the steps of: running the control program; and assigning the control program to the group of packets.
6. The method of claim 1 wherein said control program is non-essential and applied on a best effort basis.
7. The method of claim 1 wherein said control program is service-specific.
8. The method of claim 3 wherein said group of packets are identified implicitly.
9. The method of claim 1 wherein said group of packets comprise a flow.
10. A method of providing control on demand at a node in a data network comprising the steps of: receiving a request message at the node from an application requesting installation of a control program for controlling a flow of packets, said request message identifying the flow and the control program; obtaining the control program; installing the control program at the node; assigning the control program to the flow; receiving a packet of the flow at the node; identifying that the received packet is part of the flow; providing a portion of the packet of the flow to the control program; the control program controlling the forwarding of the packet based on the portion of the packet. - 33 -
11. The method of claim 10 wherein said request message further includes a network reference identifying the location where the control program can be obtained, said step of obtaining comprises the steps of: determining whether the control program is locally stored at the node; obtaining the control program from the locally stored location if it is locally stored; otherwise, retrieving the control program from the location identified by the network reference.
12. The method of claim 10 wherein said step of installing comprises the step of running the control program.
13. The method of claim 10 wherein said step of installing comprises the steps of verifying the control program and running the verified control program.
14. The method of claim 10 wherein said received packet includes a flow identifier, and wherein said step of identifying comprises the steps of: comparing the flow identifier in the packet to the identification of the flow contained in the request message, wherein a match indicates that the received packet is part of the flow.
15. The method of claim 10 wherein said step of providing a portion of the packet comprises the steps of: the control program subscribing to receive a portion of the flow packets received at the node; and providing a copy of the subscribed portion of each packet to the control program in response to said step of subscribing. - 34 -
16. The method of claim 10 wherein said step of the control program controlling the forwarding of the packet comprises the step of the control program performing one of the following functions: discard the packet; reschedule retransmission of the packet; and allow the packet to be forwarded.
17. The method of claim 10 wherein said packet comprises an ATM application level frame.
18. The method of claim 10 wherein said packet comprises an IP packet.
19. The method of claim 14 wherein said flow identifier in the packet is provided in an IPv6 flow label.
20. The method of claim 14 wherein said flow identifier is provided as an address in the packet.
21. The method of claim 14 wherein said flow identifier comprises a sequence of bits in the packet that satisfies an equivalence relation.
22. An apparatus at a node of a data network for providing control on demand comprising: a flow-specific control program for controlling the forwarding of a flow of packets; a meta-controller operable to obtain and install the control program at the node in response to a request; and - 35 -
a forwarding engine receiving a plurality of packets as an input and forwarding at least some of the packets, said forwarding engine forwarding at least some of the packets of the flow under control of the installed control program.
23. The apparatus of claim 22 and further comprising: an interface coupling at least the control program and the forwarding engine, the interface allowing the control program to request a copy from the forwarding engine of a portion of a packet of the flow received by the forwarding engine and allowing the forwarding engine to send the portion of the requested packet from the forwarding engine to the controller.
24. The apparatus of claim 23 wherein said interface further allows the controller to control the forwarding of the requested packet based on the received portion of the requested packet.
25. An apparatus for providing control on demand in a data network comprising: means for requesting installation of a control program for controlling the forwarding of a group of packets; means for obtaining the control program; means for installing the control program at a node in the data network; means for receiving a packet of the group of packets; means for identifying that the received packet is part of the group; means for controlling the forwarding of the received packet based on the installed control program.
26. An apparatus for providing control on demand at a node in a data network comprising: - 36 -
means for receiving a request message at the node from an application requesting installation of a control program for controlling a flow of packets, the request message identifying the flow and the control program; means for obtaining the control program; means for installing the control program at the node; means for assigning the control program to the flow; means for receiving a packet of the flow at the node; means for identifying that the received packet is part of the flow; means for providing a portion of the packet of the flow to the control program; and the control program controlling the forwarding of the packet based on the portion of the packet.
27. A method of providing control on demand in a data network for a plurality of flows comprising the steps of: receiving a request for installation of a first control program for controlling the forwarding of packets of a first flow; receiving a request for installation of a second control program for controlling the forwarding of packets of a second flow; obtaining the first and second control programs; installing the first and second control programs at a node in the data network; controlling the forwarding of packets of the first flow based on the first installed control program; and controlling the forwarding of packets of the second flow based on the second installed control program.
28. A method of providing control on demand in a data network comprising the steps of: - 37 -
receiving a request for installation of a first control program for collecting statistics regarding a flow; obtaining the first control program; installing the first control program at a node in the data network; collecting statistics on the flow based on the first control program; and obtaining and installing a second control program for controlling forwarding of the packets of the flow.
29. A method of providing control on demand in a data network comprising the steps of: receiving a request for installation of a non-essential control program for controlling the forwarding of a group of packets; obtaining the control program; installing the control program at a node in the data network; receiving a packet of the group of packets; identifying that the received packet is part of the group; controlling the forwarding of the received packet based on the installed control program, wherein said step of controlling is not essential for packet forwarding correctness.
30. The method of claim 29 wherein said control program is applied asynchronously to control forwarding of the received packet.
PCT/US1999/007578 1998-04-16 1999-04-07 Control on demand in a data network WO1999053718A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US6111098A 1998-04-16 1998-04-16
US09/061,110 1998-04-16

Publications (1)

Publication Number Publication Date
WO1999053718A1 true WO1999053718A1 (en) 1999-10-21

Family

ID=22033650

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/007578 WO1999053718A1 (en) 1998-04-16 1999-04-07 Control on demand in a data network

Country Status (1)

Country Link
WO (1) WO1999053718A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0740441A2 (en) * 1995-04-28 1996-10-30 Sun Microsystems, Inc. A method for distribution of utilization data in ATM networks and a communication network therefor

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0740441A2 (en) * 1995-04-28 1996-10-30 Sun Microsystems, Inc. A method for distribution of utilization data in ATM networks and a communication network therefor

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BOECKING S: "SOCKETS + +: A UNIFORM APPLICATION PROGRAMMING INTERFACE FOR BASIC -LEVEL COMMUNICATION SERVICES", IEEE COMMUNICATIONS MAGAZINE, vol. 34, no. 12, 1 December 1996 (1996-12-01), XP000636462, ISSN: 0163-6804 *
KOIZUMI M ET AL: "DCNP: DATA COMMUNICATION MANAGEMENT SYSTEM FOR NETWORK NODE PROCESSORS IN A DISTRIBUTED PROCESSING ENVIRONMENT", COMMUNICATION FOR GLOBAL USERS, INCLUDING A COMMUNICATIONS THEORY MINI CONFERENCE ORLANDO, DEC. 6 - 9, 1992, vol. 2, 6 December 1992 (1992-12-06), INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, pages 1219 - 1223, XP000357744, ISBN: 0-7803-0608-2 *
MENGJOU LIN ET AL: "DISTRIBUTED NETWORK COMPUTING OVER LOCAL ATM NETWORKS", IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, vol. 13, no. 4, 1 May 1995 (1995-05-01), pages 733 - 747, XP000501268, ISSN: 0733-8716 *
NEWMAN I ET AL: "HOT POTATO WORM ROUTING VIA STORE-AND-FORWARD PACKET ROUTING", JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, vol. 30, no. 1, 1 October 1995 (1995-10-01), pages 76 - 84, XP000527355, ISSN: 0743-7315 *
NEWMAN P ET AL: "IP SWITCHING - ATM UNDER IP", IEEE / ACM TRANSACTIONS ON NETWORKING, vol. 6, no. 2, 1 April 1998 (1998-04-01), pages 117 - 129, XP000751625, ISSN: 1063-6692 *

Similar Documents

Publication Publication Date Title
US6798743B1 (en) Packet prioritization processing technique for routing traffic in a packet-switched computer network
US6392996B1 (en) Method and apparatus for frame peeking
JP4410408B2 (en) Service quality management method and apparatus for network equipment
Angin et al. The Mobiware toolkit: Programmable support for adaptive mobile networking
US8230110B2 (en) Work-conserving packet scheduling in network devices
US20050128951A1 (en) Apparatus and methods for dynamic bandwidth allocation
US7315896B2 (en) Server network controller including packet forwarding and method therefor
US20070183415A1 (en) Method and system for internal data loop back in a high data rate switch
US20030033467A1 (en) Method and apparatus for resource allocation in network router and switch
US8996724B2 (en) Context switched route look up key engine
US20040081091A1 (en) Priority-based efficient fair queuing for quality of service classificatin for packet processing
US7269752B2 (en) Dynamically controlling power consumption within a network node
US20080144503A1 (en) Method and system for network stack tuning
US20060218300A1 (en) Method and apparatus for programmable network router and switch
CZ20021442A3 (en) Method and system for classification of frames and protocols
Yau et al. Migrating sockets-end system support for networking with quality of service guarantees
Haas et al. Creating advanced functions on network processors: Experience and perspectives
EP1442578A1 (en) Method, apparatus and system for routing messages within a packet operating system
Hjalmtysson et al. Control-on-demand: An efficient approach to router programmability
WO1999053718A1 (en) Control on demand in a data network
Hjálmtýsson et al. Control on demand
Cisco Configuring Switching Paths
Cisco Configuring Switching Paths
Li et al. Active gateway: a facility for video conferencing traffic control
Raghavendra et al. Multicast routing in internetworks using dynamic core based trees

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA MX

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase