CN101040489B - Network device architecture for consolidating input/output and reducing latency - Google Patents

Network device architecture for consolidating input/output and reducing latency Download PDF

Info

Publication number
CN101040489B
CN101040489B CN 200580034646 CN200580034646A CN101040489B CN 101040489 B CN101040489 B CN 101040489B CN 200580034646 CN200580034646 CN 200580034646 CN 200580034646 A CN200580034646 A CN 200580034646A CN 101040489 B CN101040489 B CN 101040489B
Authority
CN
China
Prior art keywords
frame
buffer
virtual channel
rules
received
Prior art date
Application number
CN 200580034646
Other languages
Chinese (zh)
Other versions
CN101040489A (en
Inventor
西尔瓦诺·加伊
托马斯·埃兹尔
戴维·贝尔加马斯科
迪内希·达特
弗拉维欧·博诺米
Original Assignee
思科技术公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US62139604P priority Critical
Priority to US60/621,396 priority
Priority to US11/094,877 priority patent/US7830793B2/en
Priority to US11/094,877 priority
Application filed by 思科技术公司 filed Critical 思科技术公司
Priority to PCT/US2005/037239 priority patent/WO2006057730A2/en
Publication of CN101040489A publication Critical patent/CN101040489A/en
Application granted granted Critical
Publication of CN101040489B publication Critical patent/CN101040489B/en

Links

Abstract

本发明提供了用于实现低延迟以太网(“LLE”)解决方案的方法和设备,在这里LLE解决方案也被称为数据中心以太网(“DCE”)解决方案,它简化了数据中心的连通性,并且提供了用于传输以太网和存储流量的高带宽、低延迟网络。 The present invention provides a method and apparatus for implementing a low-latency Ethernet ( "LLE") solution, here also referred LLE solutions Data Center Ethernet ( "DCE") solution, which simplifies the data center connectivity, and provides high-bandwidth transmission for Ethernet and storage traffic, low-latency network. 本发明的一些方面包括将FC帧变换成适合于在以太网上传输的格式。 Some aspects of the present invention include FC frame into a format suitable for transmission on Ethernet. 本发明的一些优选实现方式在数据中心或类似网络的单个物理连接中实现了多个虚拟通道(“VL”)。 Some preferred implementations of the present invention achieves a plurality of virtual channels ( "VL") in a single physical data center or similar network connection. 一些VL是“丢弃”VL,具有类似以太网的行为,其他的是具有类似FC行为的“无丢弃”通道。 Some VL is "discarded" VL, Ethernet has a similar behavior, others are having similar behavior FC "no-discard" channel. 本发明的一些优选实现方式基于信用和VL提供了保证带宽。 Some preferred implementations of the present invention is based on credits and VL provide guaranteed bandwidth. 活动缓冲器管理既允许高可靠性,又允许了低延迟,同时还使用较小的帧缓冲器。 Activities buffer management allows both high reliability and allows low latency, while also using a smaller frame buffer. 优选地,对于丢弃和无丢弃VL,用于活动缓冲器管理规则是不同的。 Preferably, and for discarding without discarding the VL, the rules for active buffer management are different.

Description

用于统一输入/输出和降低延迟的网络设备体系结构 A unified input / output architecture and a device to reduce network latency

[0001] 与相关串请的交叉引用 [0001] Please reference string associated with the cross

[0002] 本申请要求2004年10月22日递交的题为“FC Over Ethernet”的美国临时申请No. 60/621,396 (律师案卷号No. CISCP404P)和2005年3月30日递交的题为“NetworkDevice Architecture For Consolidatinglnput/Output And Reducing Latency,,的美国专利申请No. 11/094,877(律师案卷号No. CISCP417)的优先权,这里通过引用将这些申请的全部内容结合进来。 [0002] This application claims filed October 22 entitled "FC Over Ethernet" US Provisional Application No. 60 / 621,396 (Attorney Docket No. No. CISCP404P) and March 30, 2005 submitted questions It is "NetworkDevice Architecture For Consolidatinglnput / Output And Reducing Latency ,, U.S. Patent application No. 11 / 094,877 (attorney Docket No. No. CISCP417) filed by reference herein the entire contents of these applications are incorporated.

背景技术 Background technique

[0003] 图I示出了要求高可用性和网络存储的企业(例如金融机构)可能使用的一般类型的数据中心的简化版本。 [0003] FIG. I shows that the company (e.g., financial institutions) a simplified version of the general type of data center may be used for high availability and network storage. 数据中心100包括具有冗余连接以获得高可用性的冗余以太网交换机。 Data center 100 includes a redundant connection for high availability of redundant Ethernet switches. 数据中心100经由防火墙115经由网络105连接到客户端。 Data center 100 is connected to a client via a network 105 via the firewall 115. 网络105例如可以是企业内联网、DMZ和/或因特网。 Network 105 may be for example a corporate intranet, DMZ and / or the Internet. 以太网十分适合于客户端(例如远程客户端180和185) 和数据中心之间的TCP/IP流量。 Ethernet is very suitable for the client (for example, the remote client 180 and 185) TCP / IP traffic between and data centers.

[0004] 在数据中心105内,有许多网络设备。 [0004] In the data center 105, there are many network devices. 例如,许多服务器一般被布置在具有标准外形参数的机架(rack)上(例如一个“机架单元”可能宽19”并且厚约1.25”)。 For example, many servers are generally arranged on the machine frame (Rack) having a standard form factor (e.g., a "rack units" may be 19 "wide and a thickness of about 1.25"). “机架单元”或“U”是用于机架安装型设备的电子工业联盟(或者更常称之为“EIA”)标准计量单元。 "Rack units" or "U" league for rack-mounted electronic equipment industry (or more commonly referred to as "EIA") standard measurement unit. 近来,由于出现在许多种商业、工业和军事市场中的机架安装产品的激增,该术语变得更加流行。 Recently, due to appear in a surge of rack-mount products for commercial, industrial and military markets many, the term is becoming more popular. “机架单元”的高度等于I. 75”。为了计算机架外壳的内部可用空间,只需要简单地用机架单元的总量乘以1.75”。 Height "rack units" is equal to I. 75 ". For the internal free space of the cartridge housing the computer, simply by multiplying the total amount of 1.75 rack unit." 例如,44U机架外壳将具有77”的内部可用空间(44X1.75)。数据中心内的每个机架例如可具有约40个服务器。数据中心可具有数千个或甚至更多的服务器。最近,一些厂商已经发布了“刀片服务器(blade server)”,其允许甚至更高密度的服务器封装(大约每机架60至80个服务器)。 For example, 44U housing shell 77 having "internal free space (44X1.75). Each rack in a data center may have, for example, about 40 server data center may have thousands or even more servers. recently, several vendors have released "blade servers (blade server)", which allows even higher-density server package (about 60-80 per rack server).

[0005] 但是,随着数据中心内网络设备的数目增长,连通性变得越来越复杂和昂贵。 [0005] However, as the number of network devices within the data center grows, the communication becomes more and more complex and expensive. 在最低限度上,数据中心105的服务器、交换机等等一般将经由以太网连接。 At a minimum, the data center 105 servers, switches and the like typically via an Ethernet connection. 为了获得高可用性,至少将有2个以太网连接,如图I所示。 For high availability, there will be at least two Ethernet connections, as shown I FIG.

[0006] 此外,不希望服务器包括巨大的存储能力。 [0006] In addition, we do not want the server to include a huge storage capacity. 由于此原因和其他原因,企业网络包括与存储设备(例如存储阵列150)的连通性的情形变得越来越常见。 Due to this situation of communication and other reasons, the enterprise network includes a storage device (e.g. memory array 150) is becoming increasingly common. 历史上,存储流量是通过SCSI (小型计算机系统接口)和/或FC (光纤信道)来实现的。 Historically, storage traffic is realized by a SCSI (Small Computer System Interface), and / or FC (Fiber Channel).

[0007] 在20世纪90年代中期,SCSI流量只能行进较短的距离。 [0007] In the mid-1990s, SCSI traffic can only travel a short distance. 当时最感兴趣的主题是如何使SCSI去往“盒外”。 How was the topic of most interest is the SCSI go "outside the box." 就像一直以来一样,希望速度能够更快。 As it has always been the same, hoping to speed faster. 当时,以太网正从IOMb/s发展到100Mb/S。 At that time, Ethernet is being developed from IOMb / s to 100Mb / S. 一些人设想了未来达到lGb/s的速度,但是许多人认为这已经接近了物理极限。 Some people imagine a future rate reached lGb / s, but many people think that this is close to the physical limits. 对于10Mb/S以太网,存在半双工和冲突的问题。 For 10Mb / S Ethernet, there is a half-duplex and conflict. 以太网被认为是不太可靠的,这一部分是因为分组可能会丢失并且可能存在冲突。 Ethernet is considered to be less reliable, in part because the packet may be lost and there may be a conflict. (虽然术语本领域的技术人员通常使用的“分组”和“巾贞”的含义可能略有不同,但是在这里这两个术语可以互换使用)。 (Although "packet" and the meaning of "Zhen towel" may look slightly different term of art commonly used in the art, but here these terms are used interchangeably).

[0008] FC被认为是存储应用的有吸引力并且可靠的选择,这是因为根据FC协议,分组不会被故意丢弃,并且FC已经能够以lGb/s运行。 [0008] FC is considered attractive storage applications and reliable option, because according to the FC protocol, is not intentionally discarded packets, and is able to run at FC lGb / s. 但是,在2004年间,以太网和FC都达到了lOGb/s的速度。 However, in 2004, Ethernet and FC have reached a speed lOGb / s of. 此外,以太网已经发展到了全双工并且没有冲突的阶段。 In addition, Ethernet has evolved to the stage of full-duplex and there is no conflict. 因此,FC比起以太网来不再有速度优势。 Therefore, FC Ethernet to no more than the speed advantage. 但是,交换机中的拥塞可能导致以太网分组被丢弃,这对于存储流量来说是不合需要的特征。 However, the switch may cause congestion in the Ethernet packet is dropped, which is undesirable feature for storage traffic is.

[0009] 在21世纪的最初几年,大量工作投入到了开发iSCSI中,以在TCP/IP网络上实现SCSI。 [0009] In the early years of the 21st century, a lot of work put into the development of iSCSI in order to achieve the SCSI over TCP / IP networks. 虽然这些努力取得了一些成功,但是iSCSI并没有变得十分流行:iSCSI占有存储网络市场的约1% -2%,与之相比,FC占有大约98% -99%。 While these efforts have had some success, but did not become very popular iSCSI: iSCSI represent about 1% -2% of the storage networking market, compared, FC occupies about 98% -99%.

[0010] 一个原因在于iSCSI栈与FC栈相比有些复杂。 [0010] One reason is somewhat complicated compared to iSCSI stack with FC stack. 参考图7A,可以看出iSCSI栈700需要5个层:以太网层705、IP层710、TCP层715、iSCSI层720和SCSI层725。 Referring to Figure 7A, it can be seen iSCSI stack 700 requires five layers: Ethernet layer 705, IP layer 710, TCP layer 715, iSCSI layer 720 and SCSI layer 725. TCP层715是栈的必要部分,因为以太网层705可能丢失分组,而SCSI层725不能容忍分组丢失。 TCP layer 715 is a necessary part of the stack, layer 705 may be lost because of Ethernet packets, and SCSI layer 725 can not tolerate packet loss. TCP层715为SCSI层725提供了可靠的分组传送。 TCP layer 715 provides a reliable packet transfer layer 725 is SCSI. 但是,在I到lOGb/s的速度下,TCP层715是一个难以实现的协议。 However, I to lOGb / s speed, TCP protocol layer 715 is an elusive. 相反,由于FC不丢失帧,因此不需要利用TCP层之类的层来补偿丢失的帧。 In contrast, since the FC frame is not lost, so no layer using the TCP layer or the like to compensate for the lost frame. 因此,如图7B所示,FC栈750更简单,只需要FC层755、FCP层760和SCSI层765。 Thus, as shown in FIG. 7B, FC stack 750 simpler, it requires only FC layer 755, FCP layer 760 and SCSI layer 765.

[0011] 因此,FC协议通常被用于网络上的服务器和存储设备(例如存储阵列150)之间的通信。 Communication between the [0011] Thus, FC protocol is typically servers and storage devices (e.g., memory array 150) on a network. 因此,数据中心105包括FC交换机140和145,用于服务器110和存储阵列150之间的通信,在此示例中FC交换机140和145由Cisco Systems, Inc.提供。 Thus, the data center 105 includes FC switches 140 and 145, communications between the server 110 and storage array 150, in this example, FC switches 140 and 145 by the Cisco Systems, Inc. provided.

[0012] IRU和刀片服务器非常流行,因为它们相对而言更低廉、强大、标准化,并且可以运行任何一种最流行的操作系统。 [0012] IRU and blade servers are very popular because they are relatively cheaper, powerful, standardized, and can run any one of the most popular operating system. 已经熟知近来年典型服务器的成本已经降低并且其性能水平已经提高。 It is well known the cost of a typical server has recently been reduced and the performance level has improved. 由于服务器相对较低的成本和在一个服务器上运行多于一类软件应用可能造成的潜在问题,每个服务器一般被专用于一个特定应用。 Due to the relatively low cost and servers running more than one class of potential problems may be caused in the software application on a server, each server is typically dedicated to a particular application. 在典型企业网络上运行的大量应用继续增加了网络中服务器的数目。 Large number of applications running on a typical enterprise networks continue to increase the number of network servers.

[0013] 但是,由于维持与每个服务器的各种类型的连通性(例如以太网和FC连通性)较为复杂(其中为了获得高可用性每类连通性最好是冗余的),服务器的连通性成本变得高于服务器本身的成本。 [0013] However, since various types of maintaining communication connectivity with each server (e.g., FC and Ethernet connectivity) more complicated (wherein each class in order to obtain the best high availability of communication is redundant), the server the cost becomes higher than the cost of the server itself. 例如,服务器的单个FC接口的成本就可能与服务器本身一样高。 For example, the cost of a single FC interface server may as high as the server itself. 服务器与以太网的连接一般是经由网络接口卡(“NIC”)进行的,而它与FC网络的连接是利用主机总线适配器(“HBA”)进行的。 Ethernet connection to the server is generally carried out via a network interface card ( "NIC"), which is connected to the network using FC host bus adapter ( "HBA") performed.

[0014] FC网络和以太网网络中的设备的角色就网络流量来说有些不同,这主要是因为响应于TCP/IP网络中的拥塞,分组照例会被丢弃,而在FC网络中帧不会被故意丢弃。 [0014] FC and Ethernet devices in a network roles on network traffic is a bit different, mainly because in response to a TCP / IP network congestion, the packet is discarded as usual, but in the FC network frames are not intentionally discarded. 因此,在这里有时FC将被称为“无丢弃”(nodrop)网络的一个示例,而以太网将会被称为“丢弃”(drop)网络的一种表现形式。 Thus, the FC are sometimes referred to as an example of the network will be "no discard" (nodrop), Ethernet and will be referred to a form of "discard" (drop) network. 当在TCP/IP网络上分组被丢弃时,系统将会迅速恢复,例如在几百微秒中恢复。 When a packet is dropped on the TCP / IP network, the system will recover rapidly, e.g. in the recovery of several hundred microseconds. 但是,FC网络的协议一般是基于帧不会被丢弃的假设的。 However, FC protocol of the network is generally based on the frame will not be discarded hypotheses. 因此,当在FC网络上帧被丢弃时,系统不会迅速恢复,并且SCSI将会花若干分钟来恢复。 Thus, when a frame is dropped on the FC network, the system does not recover quickly, and SCSI will take several minutes to recover.

[0015]目前,以太网交换机的端口可在丢弃分组之前将其缓冲达约100毫秒。 [0015] Currently, port Ethernet switch may discard the packet before it is buffered up to about 100 milliseconds. 由于实现了10Gb/s以太网,以太网交换机的每个端口将需要约100MB的RAM来将分组缓冲100毫秒。 By allowing for 10Gb / s Ethernet, each Ethernet switch port will require about 100MB RAM to the packet buffer 100 milliseconds. 这将会昂贵得令人望而生畏。 This will be expensive too daunting.

[0016] 对于一些企业,希望“集群(cluster) ”多于一个服务器,如图I中服务器S2和S3周围的虚线所示。 [0016] For some companies, it is desirable "cluster (Cluster)" more than one server, the server S2 and S3 surrounding dotted line in FIG. I in FIG. 集群使得偶数个服务器被视为单个服务器。 Cluster makes an even number of servers are treated as a single server. 为了进行集群,希望执行远程直接存储器访问(“RDMA”),其中一个虚拟存储器空间(可能分散在许多物理存储器空间上)的内容可在没有CPU干预的情况下被拷贝到另一个虚拟存储器空间。 For clustering the content, it is desirable to perform remote direct memory access ( "RDMA"), wherein a virtual memory space (may be spread across a number of physical memory space) can be copied to another virtual memory space without intervention by the CPU. 应当以非常低的延迟(latency)来执行RDMA。 It should be very low latency (latency) to perform RDMA. 在一些企业网络中,存在专用于集群服务器的第三类网络,如交换机175所示。 In some corporate network, a dedicated server in the cluster network of the third type, the switch 175 as shown in FIG. 这例如可以是“Myrinet”、“Quadrix”或“Infiniband”网络。 This can be "Myrinet", "Quadrix" or "Infiniband" network.

[0017] 因此,服务器的集群将给数据中心网络添加更多的复杂因素。 [0017] Therefore, the data center network servers in the cluster will add more complications. 但是,与Quadrix和Myrinet不同,Inf iniband允许进行集群并且提供了简化数据中心网络的可能性。 However, with different Quadrix and Myrinet, Inf iniband allow clustering and provides the possibility of a simplified data center network. Infiniband网络设备相对更低廉,这主要是因为它们使用较小的缓冲器空间、铜介质和简单的转发方案。 Infiniband network device relatively inexpensive, mainly because they use a smaller buffer space, copper media and simple forwarding scheme.

[0018] 但是,Infiniband网络有若干缺陷。 [0018] However, Infiniband network has several drawbacks. 例如,目前只有一个Infiniband交换机组件来源。 For example, there is only one component source Infiniband switch. 此外,尚未证明Infiniband能够在例如大型企业的数据中心这样的上下文中正确工作。 In addition, Infiniband has not proved able to work properly in a context such as this large enterprise data centers. 例如,没有已知的用于互连Infiniband子网的Infiniband路由器的实现方式。 For example, no known implementation of a router for interconnecting Infiniband Infiniband subnet. 虽然在Infiniband和光纤信道以及Infiniband到以太网之间可能有网关,但是,将以太网从数据中心中去除的可能性是非常小的。 Although there may be a gateway between the Infiniband Infiniband, and Fiber Channel and Ethernet, however, the possibility of removing from the Ethernet data center is very small. 这也意味着主机不仅需要Infiniband连接,还需要以太网连接。 This also means that not only Infiniband host connection, you also need an Ethernet connection.

[0019] 因此,即使大型企业希望忽略前述缺点并改为基于Infiniband的系统,该企业也需要在企业测试基于Infiniband系统时有遗留数据中心网络(例如如图I所示)安装并工作。 [0019] Thus, even if large-scale enterprises to wish to ignore the aforementioned disadvantages and Infiniband-based system, based on which companies need to have a data center network when Infiniband legacy systems (e.g. as shown in FIG. I) is mounted in the enterprise and the test work. 因此,基于Infiniband的成本将不会是替代成本,而是额外的成本。 Therefore, based on the cost Infiniband it will not be the replacement cost, but additional costs.

[0020] 非常需要以允许相对于现有数据中心网络发生进化改变的方式简化数据中心网络。 [0020] is highly desirable to allow the conventional manner with respect to the data center network evolutionary adaptation occurs simplified data center network. 理想的系统将会提供用于以低成本统一服务器I/o和提供低延迟和高速度的进化系统。 The ideal system would provide for the evolution of the system at a low cost unified server I / o and provides low latency and high speed.

发明内容 SUMMARY

[0021] 本发明提供了用于实现低延迟以太网(“LLE”)解决方案的方法和设备,在这里LLE解决方案也被称为数据中心以太网(“DCE”)解决方案,它简化了数据中心的连通性,并且提供了用于传输以太网和存储流量的高带宽、低延迟网络。 [0021] The present invention provides a method and apparatus for low latency Ethernet ( "LLE") solution, here also referred LLE solutions Data Center Ethernet ( "DCE") solution, which simplifies the communication of the data center, and provides high-bandwidth transmission for Ethernet and storage traffic, low-latency network. 本发明的一些方面包括将FC巾贞变换成适合于在以太网上传输的格式。 Some aspects of the present invention include FC Chen towel into a format suitable for transmission on Ethernet.

[0022] 本发明的一些优选实现方式在数据中心或类似网络的单个物理连接中实现了多个虚拟通道(“VL”)(也称为虚拟链路)。 [0022] Some preferred implementations of the present invention achieves a plurality of virtual channels ( "VL") (also known as a virtual link) in a single physical data center or similar network connection. 一些VL是“丢弃” VL,具有类似以太网的行为,其他的是具有类似FC行为的“无丢弃”通道。 Some VL is "discarded" VL, Ethernet has a similar behavior, others are having similar behavior FC "no-discard" channel. 一些实现方式提供了“丢弃”和“无丢弃”之间的中间行为。 Some implementations provide an intermediate behavior between the "discard" and "no discard." 一些这样的实现方式是“推迟丢弃”(delayed drop),其中当缓冲器满时帧不会立即被丢弃,相反在丢弃帧之前存在有限时间(例如毫秒量级)的上游“回推”。 Some such implementations are "delayed discard" (delayed drop), wherein the frame when the buffer is full is not immediately discarded, the presence of opposite upstream finite time (e.g. milliseconds) before dropping frames "push back."

[0023] VL部分地可通过标记帧来实现。 [0023] VL part may be implemented by a frame marker. 由于每个VL可具有其自己的信用,因此每个VL可被独立地对待。 Since each VL may have its own credit, so each VL can be treated independently. 甚至可以依据补充速率根据指派给VL的信用来确定每个VL的性能。 VL can even give credit to determine the performance of each VL based replenishment rate according to an assignment. 为了允许更复杂的拓扑并允许对交换机内的帧进行更好的管理,TTL信息以及帧长度字段可被添加到帧。 In order to allow more complex topologies in the frame and allow better management of the switch, TTL field information and frame length may be added to the frame. 还可以有关于拥塞的编码信息,以便源可以接收显式(explicit)消息以减慢下来。 There may also encode information about the congestion, so that the source may receive an explicit message (Explicit) to slow down.

[0024] 本发明的一些优选实现方式提供了基于信用和VL的保证带宽。 [0024] Some preferred implementations of the invention provide guaranteed bandwidth based on credits and VL. 不同的VL可被指派以可随时间改变的不同保证带宽。 Different VL may be assigned to different guaranteed bandwidth may vary over time. 优选地,VL将会保持相同的特性(例如将会保持是丢弃或无丢弃通道),但是VL的带宽将会取决于一天中的时间、要完成的任务等等而动态改变。 Preferably, VL will remain the same characteristics (for example, will remain or be discarded without discard channel), but the bandwidth will VL depends on the time of day, etc. to complete the task change dynamically.

[0025] 活动缓冲器管理既允许高可靠性,又允许了低延迟,同时还使用较小的帧缓冲器,甚至对于lOGB/s以太网也是如此。 [0025] The buffer management activities allows both high reliability and allows low latency, while also using a smaller frame buffer, even for lOGB / s Ethernet as well. 优选地,对于不同类型的VL,例如丢弃和无丢弃VL,用于活动缓冲器管理的规则是不同的。 Preferably, the VL of different types, for example, and discarded without discard the VL, the rules for active buffer management are different. 本发明的一些实施例是利用铜介质而不是光纤实现的。 Some embodiments of the present invention is to use optical fiber rather than copper media implemented. 给定所有这些属性,可以以有竞争力的、相对低廉的方式实现I/O统一。 Given all these attributes can be competitive, relatively inexpensive way to achieve I / O unification.

[0026] 本发明的一些方面提供了一种用于在单个网络设备中处理多于一类网络流量的方法。 Some aspects of the [0026] present invention provides a method for processing a single network device in more than one type of network traffic. 该方法包括以下步骤:在逻辑上被划分成多个虚拟通道的物理链路上接收第一帧和第二帧;将网络设备的缓冲器划分成用于在第一虚拟通道上接收的第一帧的第一缓冲器空间和用于在第二虚拟通道上接收的第二帧的第二缓冲器空间;将第一帧存储在第一缓冲器空间中;将第二帧存储在第二缓冲器空间中;向第一帧应用第一组规则,其中第一组规则使得第一帧响应于延迟被丢弃;以及向第二帧应用第二组规则,其中第二组规则不会使得第二帧响应于延迟被丢弃。 The method comprises the steps of: receiving a first frame is divided into a second frame and a plurality of virtual channels on a physical link logically; the buffer into a network device for receiving a first virtual channel on the first a first frame buffer space and a second space of the second frame buffer received on a second virtual channel; a first buffer space in the first frame memory; the second frame in the second buffer memory the space; applying a first frame to a first set of rules, wherein the first set of rules such that the first frame is discarded in response to delay; and a second frame to apply a second set of rules, wherein the second set of rules does not cause the second the frame is discarded in response to the delay. 该方法还可包括针对每个虚拟通道指派保证最小量缓冲器空间的步骤。 The method may further comprise the step of assigning guaranteed minimum amount of buffer space for each virtual channel.

[0027] 第一帧可以是以太网帧,例如这里描述的扩展以太网帧。 [0027] The first frame may be an Ethernet frame, for example, where the extended Ethernet frame is described. 在一些实现方式中,第一组规则使得第一帧响应于延迟被丢弃,而第二组规则不会使得第二帧响应于延迟被丢弃。 In some implementations, a first set of rules such that the first frame is discarded in response to the delay, while the second set of rules does not cause a delay in response to the second frame is discarded. 但是,第二组规则可使得帧被丢弃以避免死锁。 However, the second set of rules so that the frame may be discarded to avoid deadlocks. 第一组规则和/或第二组规则可使得显式拥塞通知响应于延迟被从网络设备发送。 The first set of rules and / or the second set of rules may be such that the explicit congestion notification is sent from the network device in response to the delay. 显式拥塞通知可被发送到源设备或边缘设备之一,并且可以经由数据帧或控制帧之一发送。 Explicit Congestion Notification can be transmitted to one of a source device or an edge device, and may be transmitted via one of a data frame or a control frame.

[0028] “无丢弃”和“推迟丢弃” VL的流控制可通过利用缓冲器到缓冲器信用计入方案和/或暂停帧的任何便利组合来实现。 [0028] "No discard" and "delayed discard" VL flow control can be achieved by using the buffer to the buffer credit included any convenient combination of programs and / or the PAUSE frame is achieved. 例如,一些实现方式在网络设备内使用缓冲器到缓冲器信用计入方案,并且在链路上使用暂停帧来进行流控制。 For example, some implementations use the buffer solutions to the buffer included in the credit network device and to pause frames used for flow control on the link. 因此,第二组规则可包括为第二帧实现缓冲器到缓冲器信用计入方案。 Thus, the second set of rules may include a buffer to achieve a second frame buffer included in the credit scheme. 缓冲器到缓冲器信用计入方案包括根据帧大小计入信用,并且可以实现在网络设备内和/或网络链路上。 Buffer to the buffer included in the embodiment comprises a credit credit included with the frame size, and may be implemented within a network device and / or on network link. 在网络设备内,缓冲器到缓冲器信用可由仲裁器管理。 Within the network equipment, buffer to buffer credit management by the arbiter. 如果缓冲器到缓冲器信用计入方案既被用于网络设备内又被用于链路上,则在网络设备内管理的信用最好与在链路上管理的信用不相同。 If the buffer to the buffer included in the credit scheme it is used for both the network device has been used on the link, in the network device management is preferably not the same as the credit on a credit link management.

[0029] 划分步骤可包括根据缓冲器占用率、一天中的时间、流量负载、拥塞、保证最小带宽分配、已知的要求较大带宽的任务和最大带宽分配来划分缓冲器。 [0029] The dividing step may include a buffer occupancy, time of day, traffic load, congestion, guaranteed minimum bandwidth allocation, known task requires a larger bandwidth and maximum bandwidth allocation to divide buffer. 第一帧和第二帧可被存储在虚拟输出队列(“V0Q”)中。 The first and second frames may be stored in the virtual output queue ( "V0Q") in. 每个VOQ可与一个目的地端口/虚拟通道对相关联。 Each VOQ may be a destination port / virtual channel associated pair. 该方法可包括响应于VOQ长度、每虚拟通道缓冲器占用率、整体缓冲器占用率和分组年龄执行缓冲器管理的步骤,其中分组年龄是分组进入缓冲器时的时刻与当前时刻的差值。 The method may comprise in response to the VOQ length, each virtual channel buffer occupancy, buffer occupancy and the overall age of the packet buffer management step wherein the packet is the difference in time when the age of the packet into the buffer at the current moment.

[0030] 本发明的一些实施例提供了一种网络设备,其包括:被配置用于在多条物理链路上接收帧的多个端口,以及多个线路卡。 [0030] Some embodiments of the present invention provides a network device, comprising: a plurality of ports configured for, and a plurality of line cards received frame over multiple physical links. 每个端口与多个线路卡之一通信。 Each port cards and one of the plurality of communication lines. 每个线路卡被配置为执行以下步骤:从多个端口之一接收帧;识别在第一虚拟通道上接收的第一帧和在第二虚拟通道上接收的第二帧;将所述线路卡中的缓冲器划分成用于第一帧的第一缓冲器空间和用于第二帧的第二缓冲器空间;将第一帧存储在第一缓冲器空间中;将第二帧存储在第二缓冲器空间中;向第一帧应用第一组规则,其中第一组规则使得第一帧响应于延迟被丢弃;以及向第二帧应用第二组规则,其中第二组规则不会使得第二帧响应于延迟被丢弃。 Each line card is configured to perform the steps of: receiving a frame from one of the plurality of ports; identifying the first and second frames received on a second virtual channel received on a first virtual channel; the line card the buffer is divided into a first space of the first frame buffer and a second buffer space of the second frame; a first frame is stored in the first buffer space; the second frame stored in the first two buffer space; applying a first frame to a first set of rules, wherein the first set of rules such that the first frame is discarded in response to delay; and a second frame to apply a second set of rules, wherein the second set of rules does not cause a second delay in response to the frame is discarded.

[0031] 本发明的另一些实现方式提供了一种用于在单个网络设备上承载多于一类流量的方法。 [0031] Other implementations of the present invention provides a method for carrying more than one class of traffic on a single network device. 该方法包括以下步骤:识别在第一虚拟通道上接收的第一帧和在第二虚拟通道上接收的第二帧;动态地将网络设备的缓冲器划分成用于第一帧的第一缓冲器空间和用于第二帧的第二缓冲器空间;将第一帧存储在第一缓冲器空间的第一VOQ中;将第二帧存储在第二缓冲器空间的第二VOQ中;向第一帧应用第一组规则,其中第一组规则使得第一帧响应于延迟被丢弃;以及向第二帧应用第二组规则,其中第二组规则不会使得第二帧响应于延迟被丢弃。 The method comprises the steps of: identifying a first frame received over the first and second virtual channel frame received on a second virtual channel; dynamically network device into the buffer for buffering a first frame of a first space and a second space for a second frame buffer; the first frame stored in the first first VOQ buffer space; second frame is stored in a second buffer space in a second VOQ; the applying a first frame of a first set of rules, wherein the first set of rules such that the first frame is discarded in response to delay; and a second frame to apply a second set of rules, wherein the second set of rules does not cause a delay in response to the second frame throw away. 可以根据以下因素来动态划分缓冲器:整体缓冲器占用率、每虚拟通道缓冲器占用率、一天中的时间、流量负载、拥塞、保证最小带宽分配、已知的要求较大带宽的任务和最大带宽分配。 It can be divided according to the following factors dynamic buffer: buffer occupancy rate overall, each virtual channel buffer occupancy, time of day, traffic load, congestion, guaranteed minimum bandwidth allocation, known require larger bandwidth and maximum task bandwidth allocation.

[0032] 本发明的其他实施例提供了一种网络设备,其包括被配置用于在多条物理链路上接收帧的多个端口,以及多个线路卡。 [0032] Other embodiments of the present invention provides a network device, which comprises a plurality of ports configured for, and a plurality of line cards received frame on multiple physical links. 每个端口与多个线路卡之一通信。 Each port cards and one of the plurality of communication lines. 每个线路卡被配置为执行以下步骤:识别在第一虚拟通道上接收的第一帧和在第二虚拟通道上接收的第二帧;动态地将网络设备的缓冲器划分成用于第一帧的第一缓冲器空间和用于第二帧的第二缓冲器空间;将第一帧存储在第一缓冲器空间的第一虚拟输出队列(“V0Q”)中;将第二帧存储在第二缓冲器空间的第二VOQ中;向第一帧应用第一组规则,其中第一组规则使得第一帧响应于延迟被丢弃;以及向第二帧应用第二组规则,其中第二组规则不会使得第二帧响应于延迟被丢弃。 Each line card is configured to perform the steps of: identifying first and second frames received on a second virtual channel received on a first virtual channel; dynamically buffer into a network device for a first the first and second buffer space for a second frame buffer space frame; a first frame is stored in a first virtual output queue of the first buffer space ( "V0Q"); and the second frame is stored in the a second buffer space in a second VOQ; applying a first frame to a first set of rules, wherein the first set of rules such that the first frame is discarded in response to delay; and a second frame to apply a second set of rules, wherein the second a second set of rules does not cause frames to be dropped in response to the delay. 可以根据以下因素来动态划分缓冲器:整体缓冲器占用率、每虚拟通道缓冲器占用率、一天中的时间、流量负载、拥塞、保证最小带宽分配、已知的要求较大带宽的任务和最大带宽分配。 It can be divided according to the following factors dynamic buffer: buffer occupancy rate overall, each virtual channel buffer occupancy, time of day, traffic load, congestion, guaranteed minimum bandwidth allocation, known require larger bandwidth and maximum task bandwidth allocation. 这里描述的方法可以按多种方式实现和/或表现,所述多种方式例如包括硬件、软件等等。 The method described herein may be implemented and / or expression in various ways, for example, the plurality of modes including hardware, software and the like.

[0033] 附图说明 [0033] BRIEF DESCRIPTION OF DRAWINGS

[0034] 通过结合附图参考以下描述将理解本发明,附图图示了本发明的具体实现方式。 [0034] reference to the following description taken in conjunction with the present invention will be understood, the accompanying drawings illustrate a particular implementation of the present invention.

[0035] 图I是示出数据中心的简化网络图。 [0035] FIG. I is a simplified network diagram illustrating a data center.

[0036] 图2是示出根据本发明一个实施例的数据中心的简化网络图。 [0036] FIG. 2 is a simplified network diagram illustrating a data center according to one embodiment of the present invention.

[0037] 图3是示出在单个物理链路上实现的多个VL的框图。 [0037] FIG. 3 is a block diagram of a plurality of VL implemented on a single physical link.

[0038] 图4示出根据本发明一些实现方式携带有用于实现DCE的额外字段的以太网帧的一种格式。 [0038] FIG. 4 shows a format of Ethernet frames carrying the field for achieving additional DCE according to some implementations of the invention.

[0039] 图5示出根据本发明一些实现方式的链路管理帧的一种格式。 [0039] FIG. 5 shows a link management frame format in accordance with some implementations of the present invention.

[0040] 图6A是示出本发明的简化的基于信用(credit)的方法的网络图。 [0040] FIG 6A is a simplified method based on credit (Credit) of the present invention, the network of FIG.

[0041] 图6B是示出本发明的信用计入(crediting)方法的表。 [0041] FIG 6B is a table showing credit recorded (crediting) of the present invention is a method.

[0042] 图6C是概括根据本发明用于初始化链路的一种示例性方法的流程图。 [0042] FIG 6C is a flowchart outlining an exemplary method of initializing link method according to the present invention.

[0043]图 7A 示出了iSCSI 栈。 [0043] FIG 7A illustrates an iSCSI stack.

[0044] 图7B示出了用于在FC上实现SCSI的栈。 [0044] FIG 7B illustrates a stack for implementing SCSI over FC is.

[0045] 图8示出了根据本发明一些方面用于在DCE上实现SCSI的栈。 [0045] FIG. 8 shows a stack for implementing SCSI DCE in accordance with some aspects of the invention.

[0046] 图9A和9B示出了根据本发明一些方面用于在以太网上实现FC的方法。 [0046] FIGS. 9A and 9B illustrate a method for implementing FC over Ethernet according to some aspects of the invention.

[0047] 图10是根据本发明一些方面用于在以太网上实现FC的简化网络图。 [0047] FIG. 10 is a simplified network diagram of the FC over Ethernet according to some aspects of the invention.

[0048] 图11是根据本发明一些方面用于聚集DCE交换机的简化网络图。 [0048] FIG. 11 is a simplified network diagram aggregated DCE switch according to some aspects of the present invention.

[0049] 图12示出了根据本发明一些实施例的DCE交换机的体系结构。 [0049] FIG. 12 shows the architecture of a DCE in accordance with some embodiments of the switch of the present invention.

[0050] 图13是示出根据本发明一些实现方式每个VL的缓冲器管理的框图。 [0050] FIG. 13 is a block diagram illustrating a buffer management per VL according to some implementations of the invention.

[0051] 图14是示出根据本发明的某些类型的显式拥塞通知的网络图。 [0051] FIG. 14 is a network diagram illustrating some types of notification according to the present invention, explicit congestion.

[0052] 图15是示出根据本发明一些实现方式的每个VL的缓冲器管理的框图。 [0052] FIG. 15 is a block diagram illustrating a buffer management per VL according to some implementations of the present invention.

[0053] 图16是示出根据本发明一些方面的概率性丢弃函数的图线。 [0053] FIG. 16 is a graph illustrating a probabilistic drop function in accordance with some aspects of the present invention. [0054] 图17是示出一段时间中VL缓冲器的示例性占用率的图线。 [0054] FIG 17 is a diagram showing a time line example in FIG VL buffer occupancy.

[0055] 图18是示出根据本发明的另一些方面的概率性丢弃函数的图线。 [0055] FIG. 18 is a graph illustrating a probabilistic drop function in accordance with further aspects of the present invention.

[0056] 图19示出可被配置执行本发明的一些方法的网络设备。 [0056] FIG. 19 illustrates a network device may be configured to perform some methods of the present invention.

具体实施方式 Detailed ways

[0057] 现在将详细参考本发明的一些具体实施例,其中包括发明人认为实现本发明的最佳模式。 [0057] Reference will now be made in detail to some specific embodiments of the present invention embodiments, including the inventors believe that the best mode of the invention. 这些具体实施例的示例在附图中示出。 Examples of these specific embodiments are illustrated in the accompanying drawings. 虽然是结合这些具体实施例来描述本发明的,但是应当理解,并不希望将本发明限制到所描述的实施例。 Although these specific embodiments described in conjunction with the present invention, it should be understood that is not intended to limit the invention to the described embodiments. 相反,希望覆盖所附权利要求所限定的本发明的精神和范围所包括的替换、修改和等同。 In contrast, the replacement is intended to cover in the appended claims as defined by the spirit and scope of the present invention include, modifications and equivalents. 此外,下面阐述了许多具体细节以帮助全面理解本发明。 In addition, numerous specific details are set forth below in order to provide a thorough understanding of the present invention. 没有这些具体细节也能实现本发明。 Without these specific details of the present invention can be achieved. 在其他情况下,没有详细描述公知的过程操作,以避免喧宾夺主。 In other instances, detailed descriptions of well-known process operations to avoid distracting.

[0058] 本发明提供了用于简化数据中心的连通性和提供用于传输以太网和存储流量的高带宽、低延迟网络的方法和设备。 [0058] The present invention provides high-bandwidth connectivity and Ethernet and storage traffic for transmission, low-latency network for a simplified method and apparatus of the data center. 本发明的一些优选实现方式在数据中心或类似网络的单个物理连接中实现了多个VL。 Some preferred implementations of the present invention achieves a more VL single physical data center or similar network connection. 优选地,针对每个VL维护缓冲器到缓冲器信用。 Preferably, a buffer to maintain the buffer credit for each VL. 一些VL是“丢弃” VL,具有类似以太网的行为;其他的是具有类似FC的行为的“无丢弃”通道。 Some VL is "discarded" VL, Ethernet has a similar behavior; the other is a similar behavior of FC "no-discard" channel.

[0059] 一些实现方式提供了“丢弃”和“无丢弃”之间的中间行为。 [0059] Some implementations provide an intermediate behavior between the "discard" and "No discard." 一些这样的实现方式是“推迟丢弃”,其中在缓冲器满时,帧不会立即被丢弃,相反在丢弃帧之前存在有限时间(例如毫秒量级)的上游“回推”。 Some such implementations are "delayed discard", wherein the buffer is full, the frame is not discarded immediately upstream finite time (e.g. milliseconds) is present before discarding the frame opposite to "push back." 推迟丢弃实现方式对于管理短暂的拥塞是有用的。 Discard postpone implementation for short-term congestion management is useful.

[0060] 优选地,拥塞控制方案在第2层实现。 [0060] Preferably, the congestion control scheme implemented in the second layer. 本发明的一些优选实现方式基于信用和VL提供了保证带宽。 Some preferred implementations of the present invention is based on credits and VL provide guaranteed bandwidth. 使用信用的替换方案是针对每个VL使用标准IEEE 802. 3暂停(PAUSE)中贞,以实现“无丢弃”或“推迟丢弃”VL。 The alternative is to use a credit using IEEE standard for each VL 802. 3 Pause (the PAUSE) in Chen, in order to achieve "no discard" or "delayed discard" VL. 这里通过引用将IEEE 802. 3标准结合进来,用于所有目的。 Here IEEE 802. 3 standard incorporated by reference for all purposes. 例如,通过引用具体结合了802. 3ae-2002标准的附录31B,其标题为“MACControlPAUSE Operation”。 For example, by reference to the specific combination of Appendix 31B 802. 3ae-2002 standard, entitled "MACControlPAUSE Operation". 还要理解,在没有VL的情况下本发明仍能工作,但是在这种情况下,整条链路将会表现出“丢弃”或“推迟丢弃”或“无丢弃”行为。 Also understood that the invention can still work in the absence of VL cases, but in this case, the whole link will show "discard" or "postpone discard" or "no-discard" behavior.

[0061] 优选实现方式支持协商机制,例如IEEE 802. Ix指定的那种,这里通过引用将其结合进来。 [0061] The preferred implementation support negotiation mechanism, such as that specified in IEEE 802. Ix, which is herein incorporated by reference. 协商机制例如可以确定主机设备是否支持LLE,并且,如果支持的话,则允许主机接收VL和信用信息,例如:支持多少个VL,VL是使用信用还是暂停,如果是信用那么有多少信用,每个个体VL的行为如何。 Consultation mechanisms such can determine whether the host device supports LLE, and, if supported, it allows the host to receive VL and credit information, such as: How many VL support, VL is the use of credit or suspended, if the credit so how much credit each VL how individual behavior.

[0062] 活动缓冲器管理既允许高可靠性,又实现了低延迟,同时还使用较小的帧缓冲器。 [0062] The buffer management activities allows both high reliability, but also to achieve a low latency, while also using a smaller frame buffer. 优选地,对于丢弃和无丢弃VL,用于活动缓冲器管理的规则是不同的。 Preferably, and for discarding without discarding the VL, the rules for active buffer management are different.

[0063] 本发明的一些实现方式支持对于集群实现方式尤其有用的高效RDMA协议。 [0063] Some implementations of the invention is particularly useful and efficient support for the RDMA protocol implemented in groups. 在本发明的一些实现方式中,网络接口卡(“NIC”)实现了用于集群应用的RDMA,还实现了用于RDMA的可靠传输。 In some implementations of the present invention, a network interface card ( "NIC") implements RDMA for clustering applications, but also to achieve a reliable transmission for RDMA. 本发明的一些方面是经由来自用户直接访问编程库(“uDAPL”)的用户API实现的。 Some aspects of the present invention from a user via the user direct access to the programming library ( "uDAPL") of the API. uDAPL定义了用于所有具有RDMA能力的传输的一组用户API,这里通过弓I用将其结合进来。 uDAPL defines a set of user API for all RDMA transfer capability of having, I bow here by incorporated with it.

[0064] 图2是示出用于简化数据中心200的连通性的LLE解决方案的一个示例的简化网络图。 [0064] FIG. 2 is a diagram illustrating a simplified network diagram of an exemplary communication LLE solutions simplify the data center 200. 数据中心200包括LLE交换机240,其具有用于经由防火墙215与TCP/IP网络205和主机设备280和285的连通性的路由器260。 LLE data center 200 includes a switch 240 having a firewall 215 and via a TCP / IP router network connectivity devices 280 and 205 and the host 285 260. 示例性的LLE交换机的体系结构在这里详细阐述。 Exemplary architecture LLE switches set forth in detail herein. 优选地,本发明的LLE交换机可运行lOGb/s以太网,并且具有相对较小的帧缓冲器。 Preferably, the LLE switches of the present invention may operate lOGb / s Ethernet, and has a relatively small frame buffer. 一些优选的LLE交换机只支持第2层功能。 Some preferred LLE switches support Layer 2 functions.

[0065] 虽然本发明的LLE交换机可以利用光纤和光收发机实现,但是一些优选的LLE是利用铜连通性来实现的,以便降低成本。 [0065] While LLE switches of the present invention may be utilized to achieve an optical fiber and an optical transceiver, but some preferred LLE communication using copper achieved in order to reduce costs. 一些这样的实现方式是根据提议的IEEE 802. 3ak标准实现的,该标准被称为10Base-CX4,这里通过引用将其结合进来,用于所有目的。 Some such implementations are implemented in accordance with the proposed IEEE 802. 3ak standard, which is referred to as 10Base-CX4, herein incorporated by reference for all purposes. 发明人预期其他实现方式将使用新兴的标准IEEE P802. 3an(10GBASE-T),这里也通过引用将其结合进来,用于所有目的。 The inventors expected that other implementations will use the emerging standard IEEE P802. 3an (10GBASE-T), which is here incorporated by reference for all purposes.

[0066] 服务器210也与LLE交换机245相连接,LLE交换机245包括用于与盘阵列250通信的FC网关270。 [0066] Server 210 is also connected to the LLE switch 245, the switch 245 includes LLE FC gateway 270 for communication with the disk array 250. FC网关270在以太网上实现FC (这里将对其进行详细描述),从而消除了在数据中心200内有单独的FC和以太网网络的需要。 FC in the FC gateway 270 Ethernet (which will be described in detail herein), thereby eliminating the need to separate FC and Ethernet network 200 in the data center. 网关270可以是诸如Cisco Systems的MDS 900 IP存储服务模块这样的设备,该设备已被配置有用于执行本发明的一些方法的软件。 Gateway 270 may be a device from Cisco Systems MDS 900 IP service module such as a memory, the device has been configured with software to perform some methods of the present invention. 以太网流量按原始格式承载于数据中心200内。 Ethernet traffic carried in its original form in the data center 200. 之所以能够这样是因为LLE是以太网的扩展,它除了原有的以太网外还能承载以太网上FC(FC over Ethernet)和RDMA。 LLE has been able to so because Ethernet is an extension of its addition to the existing Ethernet can carry Ethernet on FC (FC over Ethernet) and RDMA. [0067] 图3示出了由物理链路315连接的两个交换机305和310。 [0067] FIG. 3 shows two switches connected by the physical link 315 and 305 310. 交换机305和310的行为一般来说受IEEE 802. I约束,而物理链路315的行为一般来说受IEEE 802. 3约束。 Switches 305 and 310 acts in general by the IEEE 802. I constraint, while the behavior of the physical link 315 generally bound by the IEEE 802. 3. 大体上,本发明提供了LLE交换机的两个一般行为,以及多种中间行为。 In general, the present invention provides two general behavior of the LLE switches, and a variety of intermediate behavior. 第一一般行为是“丢弃”行为,该行为与以太网的类似。 The first general behavior is "discarded" behavior, the behavior is similar to the Ethernet network. 一般行为是“无丢弃”行为,该行为与FC的类似。 General behavior is a "no-discard" behavior, which is similar to the FC. 本发明还提供了“丢弃”和“无丢弃”行为之间的中间行为,包括但不限于本文中其他地方描述的“推迟丢弃”行为。 The present invention further provides an intermediate behavior between the behaviors "discard" and "no discard", including but not limited to "defer discard" behavior described elsewhere herein.

[0068] 为了在同一物理链路315上实现两种行为,本发明提供了用于实现VL的方法和设备。 [0068] In order to achieve two kinds of behavior on the same physical link 315, the present invention provides a method and apparatus for implementing the VL. VL是将一条物理链路分割成多个逻辑实体以便一个VL中的流量不受其他VL上的流量的影响的方式。 VL is a physical link into a plurality of logical entities to flow VL is not affected by traffic on other ways VL. 这是通过为每个VL维护单独的缓冲器(或一个物理缓冲器的单独的部分)来完成的。 This is accomplished by maintaining a separate buffer (or a physically separate portion of the buffer) for each VL. 例如,可以使用一个VL来传送控制平面流量和一些其他的高优先级流量,而这些流量不会由于另一VL上的低优先级大块流量而被阻塞。 For example, a VL may be used to transfer control plane traffic and some other high-priority traffic, while the traffic is not due to the low-priority traffic on the other bulk VL is blocked. VLAN可被分组成不同的VL,以便一组VLAN中的流量的行进可以不受其他VLAN上流量的阻碍。 VLAN can be grouped into different VL, VLAN traffic to a group of traveling unimpeded traffic on the other VLAN.

[0069] 在图3所示的示例中,交换机305和310实际上在物理链路315上提供了4个VL。 [0069] In the example shown in FIG. 3, the switches 305 and 310 actually provides four VL on the physical link 315. 在这里,VL 320和325是丢弃VL,而VL 330和335是无丢弃VL。 Here, VL 320 and 325 are discarded VL, VL 330 and 335 are not discarded and VL. 为了同时实现“丢弃”行为和“无丢弃”行为两者,必须为每类行为指派至少一个VL,总共2个。 In order to simultaneously achieve both the "discard" behavior and "No discard" behavior, it must be assigned at least one VL behavior for each category, a total of two. (理论上,可以只有一个VL,该VL被临时指派给每类行为,但是这种实现方式不是优选的)。 (In theory, there may be only a VL, VL which is temporarily assigned to each type of behavior, but this is not preferred implementation). 为了支持遗留设备和/或其他缺乏LLE功能的设备,本发明的优选实现方式支持没有VL的链路,并且将该链路的所有流量映射到第一LLE端口处的单个VL。 In order to support legacy devices and / or other devices lack features LLE, preferred implementation of the present invention is not VL support link, and the links VL all traffic mapped to a single port at a first LLE. 从网络管理角度来看,最好有2到16个VL,虽然也可以实现更多个。 From the network management perspective, there is preferably 2 to 16 the VL, although more may be achieved.

[0070] 优选地,将链路动态划分为VL,这是因为静态划分不太灵活。 [0070] Preferably, the VL is divided into dynamic link, which is less flexible because the static partitioning. 在本发明的一些优选实现方式中,动态划分例如是通过添加扩展头部而在逐分组基础上(或逐帧基础上)实现的。 In some preferred implementations of the present invention, for example, by adding dynamic partitioning in expanded header packet by packet basis (or frame by frame basis) achieved. 本发明涵盖了这种头部的很多种格式。 The present invention encompasses a wide variety of formats such a head. 在本发明的一些实现方式中,在DCE链路上有两类帧发送:这些类型是数据帧和链路管理帧。 In some implementations of the present invention, there are two types of frames transmitted on the link DCE: These types of data link frames and management frames.

[0071] 虽然图4和5分别示出了用于实现本发明的一些方面的以太网数据帧和链路管理帧的格式,但是本发明的其他实现方式提供了具有更多或更少字段的帧、不同顺序的帧或其他变体。 [0071] Although FIGS. 4 and 5 show a format of Ethernet data frame, and for implementing some aspects of the present invention, a link management frame, other implementations of the present invention provides a field having a more or less frames, frames of different sequence or other variants. 图4的字段405和410是分别用于帧的目的地地址和源地址的标准以太网字段。 Fields 405 and 410 of FIG. 4 respectively for a standard Ethernet destination and source address fields of frames. 类似地,协议类型字段430、有效载荷435和CRC字段440可以是标准以太网帧的那些字段。 Similarly, protocol type field 430, a payload field 435, and CRC field 440 may be those of a standard Ethernet frame.

[0072] 但是,协议类型字段420指示以下字段是DCE头部425的那些字段。 [0072] However, the protocol type field 420 indicates that the field is a header field 425 of DCE. 如果存在的话,DCE头部最好尽可能地接近帧的开始处,这是因为它使得在硬件中能够很容易进行解析。 If present, the head of the DCE is preferably as close as possible at the beginning of the frame, because it makes it possible to easily parse in hardware. DCE头部可被携带在以太网数据帧中,如图4所示,以及携带在链路管理帧中(见图5和相应的描述)。 DCE header may be carried in the Ethernet data frame, as shown in Figure 4, and carried in a link management frame (see FIG. 5 and corresponding description). 该头部最好被MAC剥去,并且不需要被存储在帧缓冲器中。 The MAC header is preferably stripped, and need not be stored in the frame buffer. 在本发明的一些实现方式中,当不存在数据流量或由于缺乏信用而不能发送常规帧时,生成链路管理帧的连续流。 In some implementations of the present invention, when the absence of traffic data or lack of credit and can not send regular frames, generating a continuous flow link management frame.

[0073] DCE头部中携带的大多数信息与包含有该DCE头部的以太网帧相关。 [0073] Most of the information carried in the head with a DCE includes Ethernet frames related to the DCE head. 但是,一些字段是用于为相反方向上的流量补充信用的缓冲器信用字段。 However, some fields are used to supplement the credit to flow in the opposite direction of the buffer credit field. 在该示例中,缓冲器信用字段只由具有长DCE头部的帧携带。 In this example, the buffer credit field only is carried by the frame having a long DCE header. 如果解决方案使用暂停帧而不是信用,则可能不需要信用字段。 If the solution using a pause frame instead of credit, you may not need credit field.

[0074] TTL字段445指示存活时间,这是一个每当帧400被转发时就被递减的数。 [0074] TTL field 445 indicates survival time, which is when it is decremented each time a frame number 400 are forwarded. 通常,第2层网络不需要TTL字段。 Typically, layer 2 networks do not require the TTL field. 以太网使用生成树拓扑,这种拓扑是非常保守的。 Ethernet Spanning Tree topology that is very conservative. 生成树对活动拓扑施加约束,并且对于从一个交换机到另一个交换机的分组只允许一条路径。 Impose constraints on the spanning tree active topology and, for a packet from the switch to another switch allows only one path.

[0075] 在本发明的优选实现方式中,不遵从对活动拓扑的这种限制。 [0075] In a preferred implementation of the present invention, which does not comply with restrictions on the active topology. 相反,优选地,多条路径同时活动,例如经由链路状态协议,比如OSPF(最短路径优先)或IS-IS (中间系统到中间系统)。 Instead, preferably, multiple paths simultaneously active, for example via a link state protocol such as the OSPF (Open Shortest Path First) or IS-IS (Intermediate System to Intermediate System). 但是,已经知道链路状态协议在拓扑重配置期间会导致瞬时环路。 However, it is known link-state protocol during topology reconfiguration can cause transient loops. 利用TTL或类似的特征确保了瞬时环路不会变成大问题。 TTL or use similar features to ensure the instantaneous loop does not become a big problem. 因此,在本发明的优选实现方式中,TTL被编码在帧中,从而实际上在第2层实现链路状态协议。 Thus, in the preferred implementation of the present invention, TTL is encoded in frames, thereby effectively implement a link state protocol at layer 2. 与使用链路状态协议不同的是,本发明的一些实现方式使用以不同的LLE交换机为根的多个生成树,并获得了类似的行为。 Using a link-state protocol is different, some implementations of the present invention is used in a different LLE switches as the root of a plurality of spanning trees, and similar behavior is obtained.

[0076] 字段450标识帧400的VL。 VL [0076] Field 450 identifies the frame 400. 根据字段450对VL的标识允许了设备将帧指派给适当的VL,并且为不同的VL应用不同的规则。 The identification field 450 pairs of VL allows the device to assign the frame to the appropriate VL, and different applications of different VL rule. 如本文中其他地方详细描述的,规则将会根据各种标准而有所不同,所述标准例如是VL是丢弃还是无丢弃VL,VL是否具有保证带宽,当前在VL上是否有拥塞,以及其他因素。 As described in detail elsewhere herein, the rule will vary depending on a variety of criteria, which, for example, whether to discard or not discard a VL VL, VL whether guaranteed bandwidth, whether the current congestion in VL, and other factor.

[0077] ECN(显式拥塞通知)字段455被用于指示缓冲器(或分配给此VL的缓冲器的一部分)就要被填满,因而对于所指示的VL,源应当减慢其传送速率。 [0077] ECN (Explicit Congestion Notification) field is used to indicate the buffer 455 (or a portion of the buffer assigned for this VL) is about to be filled, and thus indicated for the VL, the source should slow down its transmission rate . 在本发明的优选实现方式中,网络中的至少一些主机设备可理解ECN信息,并且将会对所指示的VL应用整形器,SP一个a/k/a速率限制器。 In a preferred implementation of the invention, at least some of the network host device ECN understandable information, and will be on the indicated application VL shaper, SP a a / k / a rate limiter. 显式拥塞通知可以按至少两种一般方式发生。 Explicit Congestion Notification can take place in at least two general ways. 在一种方法中,出于发送ECN的明确目的而发送一个分组。 In one method, for the express purpose of transmitting transmits a packet ECN. 在另一种方法中,该通知被“捎带(piggy-back)”在已被传送的分组上。 In another approach, the notification is "piggybacked (piggy-back)" on the packet has been transmitted.

[0078] 如其他地方所述的,显式拥塞通知可被发送到源或发送到边缘设备。 [0078] As described elsewhere, explicit congestion notification may be sent to the source or to the edge device. ECN可发源于DCE网络中的各种设备中,包括末端设备和核心设备。 DCE ECN can originate a variety of network devices, comprising end devices and core devices. 如下面的交换机体系结构部分中更详细讨论的,拥塞通知和对其的响应是控制拥塞同时维持较小的缓冲器大小的重要部分。 The structure of the switch part of the system below discussed in more detail below, the congestion notification and response thereto is an important part of controlling congestion while maintaining a smaller buffer size.

[0079] 本发明的一些实现方式允许了ECN被从发端设备向上游发送,以及/或者允许了ECN被向下游发送,然后返回上游。 [0079] Some implementations of the present invention allow the ECN to be sent from the originating device to the upstream, and / or allow the ECN to be sent downstream, and then returned to the upstream. 例如,ECN字段455可以包括前向ECN部分(“FECN”)和后向ECN部分(“BECN”)。 For example, ECN field 455 may include a front portion to the ECN ( "FECN") and the backward ECN portion ( "BECN"). 当交换机端口经历拥塞时,它可对FECN部分中的一位进行置位,并正常地转发帧。 When the port is experiencing congestion, it may be a portion of the FECN be set, and forwards the frame normally. 在接收到FECN位被置位的帧时,末端站对BECN位进行置位,并且帧被发送回源。 When the received frame FECN bit is set, the end station of the BECN bit is set, and the frame is transmitted back to the source. 源接收帧,检测到BECN位已被置位,并且减少注入网络中的流量,或者至少对于所指示的VL减少注入网络中的流量。 Source of the received frame, detecting the BECN bit has been set, and to reduce the flow injected into the network, or at least to reduce the flow of VL indicated injected network. [0080] 帧信用字段465被用于指示应当为帧400分配的信用的数目。 [0080] The frame credit field 465 is used to indicate the number of credits should be allocated to the 400 frames. 在本发明的范围内,存在许多可能的实现这种系统的方式。 Within the scope of the present invention, there are many possible ways to implement such systems. 最简单的解决方案是为个体分组或帧计入信用。 The simplest solution is for the individual packets or frames included in the credits. 从缓冲器管理角度来看,这可能不是最佳的解决方案:如果为单个信用预留一个缓冲器并且每个分组上应用一个信用,那么就为单个分组预留了整个缓冲器。 From the buffer management perspective, this may not be the best solution: If you reserve a buffer for a single credit and a credit application on each packet, then set aside for the entire single packet buffer. 即使缓冲器的大小只等于预期的标准大小帧的大小,这种信用计入方案经常也会导致对每个缓冲器的利用率很低,这是因为许多帧将会小于最大大小。 Even if the size of the buffer is only equal to the expected size of a standard frame size, the credit included in the program often can lead to low utilization of each buffer, because many will be less than the maximum frame size. 例如,如果标准大小帧为9KB,并且所有缓冲器都是9KB,但是平均帧大小是1500字节,则通常每个缓冲器中只有1/6被使用。 For example, if the standard size frame 9KB, and all buffers are 9KB, but the average frame size is 1500 bytes, each of the buffers is usually only 1/6 is used.

[0081] 一种更好的解决方案是根据帧大小来计入信用。 [0081] A better solution is to be included in credit depending on the frame size. 虽然可以为例如单个字节计入一个信用,但是在实际中最好使用更大的单位,例如64B、128B、256B、512B、1024B,等等。 While, for example, a credit can be included in a single byte, in practice it is preferable to use larger units, e.g. 64B, 128B, 256B, 512B, 1024B, and the like. 例如,如果信用是针对512B这一单位的,则前述的平均1500字节的帧将会需要3个信用。 For example, if the credit is for this unit 512B, the aforementioned average 1500 byte frame would require three credits. 如果根据本发明的一个这种实现方式传送这种帧,帧信用字段465将会指示帧需要3个信用。 If this frame is transmitted in accordance with one embodiment of this implementation of the present invention, a frame credit field 465 indicates that the frame will take up to 3 credits.

[0082] 根据帧大小的信用计入允许了更高效地使用缓冲器空间。 [0082] The frame size is counted credit allows more efficient use of buffer space. 已知分组的大小不仅指示了将会需要多少缓冲器空间,还指示了何时可将分组从缓冲器移走。 Known packet size indicates not only how much buffer space will be needed, also indicates when the packet can be removed from the buffer. 例如,如果交换机的内部传送速度不同于数据到达交换机端口的速率的话,那么这就尤其重要。 For example, if the internal data transmission rate different from the rate of arrival of the switch ports of the switch, then this is particularly important.

[0083] 此示例提供了DCE头部的较长版本和较短版本。 [0083] This example provides a longer and a shorter version of the DCE header version. 长头部字段460指示了DCE头部是长版本还是短版本。 Long header field 460 indicates that the DCE is the head of the long version or the short version. 在该实现方式中,所有的数据帧都至少包含短头部,该短头部在字段445,450,455和465中分别包括TTL、VL、ECN和帧信用信息。 In this implementation, all of the data frame contains at least the short preamble, the head portion each include a short TTL, VL, ECN and credit information in the field frames 445,450,455 and 465. 如果数据帧除需要携带存在于短头部中的信息外还需要携带与每个VL相关联的信用信息,则数据帧可包含长头部。 If the data frame is required to carry in addition to the information present in the short header also needs to carry the credit information associated with each VL, then the data frame may include a header length. 在该示例中,存在8个VL以及用于指示每个VL的缓冲器信用的8个相应字段。 In this example, there are eight VL and eight fields corresponding to each buffer for indicating the VL of credit. 短DCE头部和长DCE头部两者的使用减少了在所有帧中携带信用信息的开销。 The use of both short and long DCE DCE head head reduces the overhead carry credit information in all frames.

[0084] 当没有要发送的数据帧时,本发明的一些实施例使得链路管理帧(“LMF”)被发送,以宣告信用信息。 [0084] When there is no data frame to be transmitted, some embodiments of the present invention is such that the link management frame ( "LMF") is sent, to announce credit information. LMF还可用于携带来自接收者的缓冲器信用或者携带来自发送者的所发送的帧信用。 LMF buffer credits may also be used to carry or carries a frame from the recipient from the sender's credit is transmitted. LMF应当在无信用的情况下被发送(帧信用=O),因为它最好被端口消耗,而不被转发。 LMF should be sent in case of non-credit (credit frame = O), because it is best consumed port, without being forwarded. LMF可被周期性地发送和/或响应于预定条件发送,例如在每IOMB的有效载荷被数据帧传送之后。 After LMF can be transmitted periodically and / or in response to a predetermined transmission condition, for example, a data frame is transmitted in each IOMB payload.

[0085] 图5示出了根据本发明的一些实现方式的LMF格式的示例。 [0085] FIG. 5 shows an example of the format of LMF some implementations of the present invention. LMF 500开始于标准的6B以太网字段510和520,它们分别用于帧的目的地地址和源地址。 LMF 500 begins 6B standard Ethernet field 510 and a destination address and a source address 520, respectively, for the frame. 协议类型头部530指示之后是DCE头部540,该DCE头部在本示例中是短DCE头部(例如长头部字段=O)。 After the protocol type is DCE header 530 indicates the head 540, the head portion is short DCE DCE head in this example (e.g., header length field = O). DCE头部540的VL、TTL、ECN和帧信用字段被发送者设置为零并且被接收者忽略。 VL 540 of DCE header, TTL, ECN field and the frame is sent by the credit set to zero and ignored recipient. 因此,LMF可由以下特性标识:Protocol Type = DCE Header 并且Long Header = O 并且Frame Credit=O0 Thus, LMF identified by the following characteristics: Protocol Type = DCE Header and Long Header = O and Frame Credit = O0

[0086] 字段550指示活动VL的接收者缓冲器信用。 [0086] field 550 indicates activity VL recipient buffer credit. 在本示例中,存在8个活动VL,因此由字段551至558指示每个活动VL的缓冲器信用。 In the present example, there are eight active VL, so the field is indicated for each active VL 551-558 buffer credit. 类似地,字段560指示发送设备的缓冲器信用,因此由字段561至568指示每个活动VL的帧信用。 Similarly, field 560 indicates a buffer credit transmitting device, thus indicated by the fields 561 to 568 for each active frame VL credit.

[0087] LMF 500不包含任何有效载荷。 [0087] LMF 500 does not contain any payload. 如果必要的话,就像本示例中一样,LMF 500被填充字段570填充到64字节,以创建合法的最小大小的以太网帧。 If necessary, just as in this example, LMF 500 is filled with padding field 570 to 64 bytes of Ethernet frame to create a legal minimum size. LMF 500终止于标准的以太网CRC字段580。 LMF 500 terminates in a standard Ethernet CRC field 580.

[0088] 一般来说,本发明的缓冲器到缓冲器信用计入方案是根据以下两条规则来实现的:(1)发送者在来自接收者的信用数大于或等于要发送的帧所需的信用数时发送该帧;以及(2)接收者在其能够接受额外的帧时向发送者发送信用。 Desired frame (1) transmitting from the recipient's credit greater than or equal to the number of transmission: [0088] In general, the present invention buffer credit to the buffer included in the program according to the following two rules are implemented transmitting the frames of the number of credits; and (2) the receiver transmits a credit to the sender when it is able to accept additional frames. 如上所述,利用数据帧或LMF中的任何一种都可以补充信用。 As described above, by using any one of a data frame or may be supplemented in the LMF credit. 仅当至少存在数目等于帧长度(排除DCE头部的长度)的信用时,才允许端口为特定VL发送帧。 Only when there is at least equal in number to the frame length (excluding the length of the DCE header) of credit allowed for a particular port to send a frame VL.

[0089] 如果使用暂停帧而不是信用,则应用类似的规则。 [0089] If a pause frame instead of credit, the similar rule applies. 发送者在帧未被接收者暂停时发送该帧。 The sender transmits the frame when the frame is not the recipient suspended. 接收者在无法接受额外的帧时向发送者发送暂停帧。 Receiver sends a pause frame to the sender if you can not accept additional frames.

[0090] 以下是数据传输和信用补充的简化示例。 [0090] The following is a simplified example of data transfer and credit supplement. 图6A示出了从交换机B发送到交换机A的数据帧605,其具有短DCE头部。 6A shows a transmitted data frame from switch B to switch A 605, having a short DCE header. 在分组605到达交换机A之后,它将被保存在缓冲器610的存储器空间608中。 After the packet arrives at the switch 605 A, it will be stored in the buffer 610 memory space 608. 由于缓冲器610的存储器中有一些被消耗了,因此交换机B的可用信用将会有相应的减少。 Since the memory buffer 610 is consumed some, so the available credit switch B will have a corresponding decrease. 类似地,当数据帧615 (也具有短DCE头部)被从交换机A发送到交换机B时,数据帧615将会消耗缓冲器620的存储器空间618,从而交换机A可用的信用将会相应减少。 Similarly, when a data frame 615 (also having a short DCE header) when it is transmitted from switch A to switch B, and the data buffer 620 the frame 615 will consume memory space 618, so that the switch A credit available will be reduced accordingly.

[0091] 但是,在巾贞605和615已被转发之后,在发送方交换机的缓冲器中相应的存储器空间将会可用。 [0091] However, after Chen towel 605 and 615 has been forwarded, in the buffer in the sender switch corresponding memory space will be available. 在某个时刻,例如周期性地或根据需要地,该缓冲器空间再次可用这一事实应当被传送给链路另一端的设备。 At some point, for example, periodically or as needed, the fact that buffer space is available again be transmitted to the other end of the link apparatus. 具有长DCE头部的数据帧和LMF被用于补充信用。 Having a long DCE header and the data frame LMF be used to complement credit. 如果不补充信用,则可使用短DCE头部。 If you do not replenish the credit, you can use the short DCE head. 虽然一些实现方式对所有传送都使用较长的DCE头部,但是这种实现方式的效率不那么高,这是因为为不包含关于信用补充的信息的分组消耗了超额的带宽。 While some implementations of all transmissions use a longer DCE head, but the efficiency of this implementation is not so high, it is because consuming excess bandwidth packet does not contain supplementary information about the credit.

[0092] 图6B示出了本发明的信用信令方法的一个示例。 [0092] FIG 6B illustrates an example of a credit signaling method of the present invention. 传统的信用信令方案650通告接收者希望返回的新信用。 Traditional credit signaling scheme 650 recipient notices hope that the new credit returned. 例如,在时刻t4,接收者希望返回5个信用,因此值5被携带在帧中。 For example, at time t4, 5 credits recipient desires to return, the value of 5 is carried in the frame. 在时刻t5,接收者没有信用要返回,因此值O被携带在帧中。 At time t5, no credit to return to the recipient, the value O is carried in the frame. 如果在时刻t4帧丢失,则五个/[目用丢失。 If the frame at time t4 is lost, the five / [mesh with the lost.

[0093] DCE方案660通告累积的信用值。 [0093] DCE advertised program 660 accumulated credit value. 换言之,每个通告将要返回的新信用加到先前返回的信用的总数模m(对于8位,m为256)。 In other words, each advertisement to the new credit is added to return the total number of credits previously returned modulus m (for 8-bit, m is 256). 例如,在时刻t3,从链路初始化开始返回的信用总数是3 ;在时刻t4,由于需要返回5个信用,因此将5加到3,并且在帧中发送8。 For example, at time t3, the total number of credits returned from the start link initialization is 3; at time t4, the need to return credits 5, 5 thus added to 3, and 8 in a frame transmitted. 在时刻t5,不需要返回信用,从而再次发送8。 At time t5, no return credit, thereby sending 8 again. 如果在时刻t4帧丢失,那么没有信用丢失,因为在时刻t5巾贞包含相同的信息。 If you lose a frame at a time t4, so no credit is lost, because it contains the same information at the time t5 towel infidelity.

[0094] 根据本发明的一种示例性实现方式,接收方DCE交换机维护以下信息(其中VL指示信息是针对每个虚拟通道维护的): [0094] According to one exemplary implementation of the present invention, the receiver side DCE switch maintains the following information (information indicating where VL is maintained for each virtual channel):

[0095] · BufCrd[VL]-按能够发送的信用数递增的模数计数器; [0095] · BufCrd [VL] - credit number incremented modulo counter can be transmitted;

[0096] · BytesFromLastLongDCE-自最后一个长DCE头部以来发送的字节数; [0096] · BytesFromLastLongDCE- last a long DCE since the number of bytes sent since the head;

[0097] · BytesFromLastLMF-自最后一个LMF以来发送的字节数; [0097] · BytesFromLastLMF- since the number of bytes sent since the last LMF;

[0098] · MaxIntBetLongDCE-在发送长DCE头部之间的最大间隔; [0098] · MaxIntBetLongDCE- maximum interval between transmission of the long DCE header;

[0099] · MaxIntBetLMF-在发送LMF之间的最大间隔;以及 [0099] · MaxIntBetLMF- maximum interval between transmission LMF; and

[0100] · FrameRx-按接收巾贞的FrameCredit字段递增的模数计数器。 [0100] · FrameRx- FrameCredit field received by Zhen towel incremented modulo counter.

[0101 ] 发送DCE交换机端口维护以下信息: [0101] transmitting DCE switch port maintains the following information:

[0102] · LastBufCrd[VL]-接收者的BufCrd[VL]变量的最后估计值;以及 [0102] · LastBufCrd [VL] - BufCrd recipient [VL] Finally, the estimated value of the variable; and

[0103] · FrameCrd[VL]-按用于发送帧的信用数递增的模数计数器。 [0103] · FrameCrd [VL] - a frame for transmitting a credit according to the number of the modulo counter is incremented.

[0104] 当链路建立时,链路每一端的网络设备将会协商DCE头部的存在性。 [0104] When the link is established, each end of the link of the network device will negotiate the presence of DCE header. 如果头部不存在,则网络设备例如将会简单地使链路能够进行标准以太网操作。 If the head does not exist, then the network device, such that the link will simply perform a standard Ethernet operation. 如果头部不存在,则网络设备将会启用根据本发明一些方面的DCE链路的特征。 If the head is not present, the network device will feature of the DCE link to some aspects of the present invention is enabled.

[0105] 图6C是示出根据本发明的一些实现方式如何初始化DCE链路的流程图。 [0105] FIG 6C is a flowchart showing how to initialize the link DCE according to some implementations of the present invention. 本领域的技术人员将会意识到,方法680 (与这里描述的其他方法一样)的步骤不需要按所指示的顺序执行,并且在一些情况下没有按所指示的顺序执行。 Those skilled in the art will appreciate that the method 680 (and other methods as described herein) steps need not be performed in the order indicated, and are not executed in the order indicated in some cases. 此外,这些方法的一些实现方式包括比所指示的更多或更少的步骤。 In addition, some implementations of these methods include the indicated more than or fewer steps.

[0106] 在步骤661中,两个交换机端口之间的物理链路建立,并且在步骤663中,第一分组被接收。 [0106] In step 661, the physical link established between the two switch ports, and in step 663, the first packet is received. 在步骤665中,确定(由接收方端口)该分组是否具有DCE头部。 In step 665, it is determined (the destination port) if the packet has a DCE header. 如果没有,则使该链路能够传送标准以太网流量。 If not, that the link is capable of transmitting standard Ethernet traffic. 如果该分组具有DCE头部,则端口执行步骤以将该链路配置为DCE链路。 If the packet has a DCE header, the port to the step link configured as DCE link. 在步骤671中,接收者和发送者将与链路上的流量相关的所有阵列清零。 In step 671, the receiver and the sender associated with the array of all traffic on the link is cleared. 在步骤673中,MaxIntBetLongDCE的值被初始化为配置的值,并且在步骤675中,MaxIntBetLMF被初始化为配置的值。 In step 673, MaxIntBetLongDCE value is initialized to the value of the configuration, and in step 675, MaxIntBetLMF is initialized to a configured value.

[0107] 在步骤677中,两个DCE端口优选地通过发送LMF来交换每个VL的可用信用信息。 [0107] In step 677, two ports are preferably DCE credit available to exchange information by sending each VL LMF. 如果某个VL不被使用,则它的可用信用被宣告为O。 If a VL is not used, then it's available credit is declared as O. 在步骤679中,使链路能够传送DCE,并且包括数据帧在内的常规DCE流量可根据这里描述的方法在该链路上发送。 At step 679, that the link capable of transmitting DCE, and the conventional flow DCE including a data frame may be sent on that link according to the methods described herein.

[0108] 为了在存在单个帧丢失的情况下正确工作,优选实现方式的DCE自恢复机制要求帧中通告的信用的最大数目小于最大可通告值的1/2。 The maximum number [0108] In order to work correctly in the presence of a single frame loss, DCE preferred implementation of the self-recovery mechanism requires frame advertised credits advertised is less than the maximum value of 1/2. 在短DCE头部的一些实现方式中,每个信用字段为8位,即等于256的值。 In some implementations, the short DCE header, each of the credit field is 8 bits, i.e. equal to a value of 256. 从而,在单个帧中可通告最多达127个额外信用。 Thereby, up to 127 additional credits can be advertised in a single frame. 127个信用的最大值是合理的,因为最坏情况由一个方向上的一长串最小大小帧和相反方向上的单个巨大帧所代表。 Maximum of 127 credits is reasonable, since the worst case is represented by the minimum size of the frame and a long list of great single frame in the opposite direction in one direction. 在传送9KB的巨大帧期间,最小大小帧的最大数目约为9220B/84B=110个信用(假定9200字节的最大传送单位和20字节的IPG和前导)。 During the great 9KB frame transmission, the maximum size of the frame is about the minimum number 9220B / 84B = 110 credits (assuming the maximum transmission unit 9200 bytes and 20 bytes of a preamble and IPG).

[0109] 如果多个连续的帧丢失,则LMF恢复方法可“修复”链路。 [0109] If a plurality of consecutive frames is lost, LMF recovery method can "fix" the link. 一个这种LMF恢复方法基于以下观点,即在一些实现方式中,由DCE交换机的端口维护的内部计数器为16位,但是为了节省带宽,只有较低的8位在长DCE头部中发送。 Such a LMF recovery method is based on the idea that in some implementations, the switch maintained by the port DCE internal counter is 16 bits, but in order to save bandwidth, transmit only the lower 8 bits long DCE header. 如果没有连续帧丢失则这一方式工作得很好,如前所述。 If there are no consecutive frames are lost in this way works well, as previously described. 当链路经历多个连续差错时,长DCE头部可能不再能够同步计数器,但这却通过包含所有计数器的全部16位的LMF实现了。 When the link is subjected to a plurality of consecutive errors, the long DCE header may no longer be able to synchronize the counter, but it implements all through LMF 16 contains all the counters. 8个额外的位允许了恢复多256倍的差错,即总共512个连续差错。 8 extra bits allow the error to recover more than 256 times, for a total of 512 consecutive errors. 优选地,在遇到这种情况之前,链路被声明为不可工作并被重置。 Preferably, before this happens, the link is declared inoperable and reset.

[0110] 为了实现低延迟以太网系统,必须考虑至少3种一般类型的流量。 [0110] In order to achieve low-latency Ethernet systems, at least three general types of traffic must be considered. 这些类型是IP网络流量、存储流量和集群流量。 These are the type of IP network traffic, storage traffic and cluster traffic. 如上面详细描述的,LLE为“无丢弃” VL提供了适合于例如存储流量的类似FC的特性。 As described above in detail, LLE providing similar properties suitable for e.g. FC storage traffic "No discard" VL. “无丢弃” VL不会丢失分组/巾贞,并且可以根据例如图8所示的简单栈来提供。 "No discard" VL not lost packets / towels Chen, and may be provided according to a simple stack such as that shown in FIG. 8. 只有一小“片”LLE上FC(FC over LLE) 810处于LLE层805和FC第2层(815)之间。 Only a small "sheet" LLE on FC (FC over LLE) 810 LLE layer 805 is between the second layer and FC (815). 层815、820和825与FC栈750的那些相同。 Layers 815, 820 and 825 and 750 the same as those of the FC stack. 因此,以前运行在FC上的存储应用现在可以运行在LLE上。 Therefore, before running on FC storage applications can now run on LLE.

[0111] 现在将参考图9A、9B和10描述根据LLE上FC层810的一个示例性实现方式的FC帧到以太网上FC(FC over Ethernet)帧的映射。 [0111] Referring now to FIGS. 9A, 9B, and 10 describe the mapping Ethernet FC (FC over Ethernet) frames based on the LLE exemplary implementation of a FC 810 FC frame to the layer. 图9A是FC帧的简化版本。 FIG 9A is a simplified version of the FC frame. FC帧900包括SOF 905和EOF 910,它们是有序的符号集合,不仅用于限定帧900的边界,还用于传达诸如帧的种类、帧是序列(一组FC帧)的开始还是结束,帧是正常还是非正常之类的信息。 FC frame 900 includes SOF 905 and EOF 910, which is an ordered set of symbols, not only to define the frame boundaries 900, further for communicating such frame type, the frame is a start sequence (a group of FC frame) or at the end, frame is a normal and non-normal information and the like. 这些符号中的至少一些是非法的“代码违规(code violation) ”符号。 At least some of these symbols is illegal "code violation (code violation)" symbol. FC帧900还包括24位的源FC ID字段915、24位的目的地FC ID字段920和有效载荷925。 FC frame 900 further comprises a 24-bit source FC ID field of 915,24 bit destination FC ID field 920 and a payload 925. [0112] 本发明的一个目标是在以太网上传达FC帧(例如FC帧900)中包含的存储信息。 [0112] An object of the present invention is to convey the FC frame (e.g. frame FC 900) for storing information contained in the Ethernet. 图10示出了用于能够传达这种存储流量的LLE的本发明的一种实现方式。 FIG. 10 illustrates one implementation of the present invention can be used to convey the LLE flow of such storage. 网络1000包括LLE云1005,设备1010、1015和1020附接到该LLE云。 LLE cloud 1005 comprises network 1000, devices 1010, 1015 and 1020 attached to the LLE cloud. LLE云1005包括多个LLE交换机1030,其示例性体系结构在本文中其他地方讨论。 LLE LLE cloud 1005 includes a plurality of switches 1030, which is exemplary architecture discussed elsewhere herein. 设备1010、1015和1020可以是主机设备、服务器、交换机等等。 Equipment 1010, 1015 and 1020 can be host devices, servers, switches, and so on. 存储网关1050将LLE云1005与存储设备1075相连。 Storage gateway LLE cloud 1050 connects storage devices 1075 and 1005. 出于移动存储流量的目的,网络100可被配置为充当FC网络。 For the purpose of moving the storage traffic, network 100 may be configured to act as the FC network. 因此,设备1010、1015和1020的端口分别具有其自己的FC ID,并且存储设备1075的端口具有FC ID。 Thus, the device ports 1010, 1015 and 1020 each have its own FC ID, and the storage device 1075 has a port FC ID.

[0113] 为了在设备1010、1015和1020与存储设备1075之间高效地移动存储流量(包括帧900),本发明的一些优选实现方式将来自FC帧900的字段的信息映射到LLE分组950的相应字段。 [0113] In order to efficiently devices 1010, 1015 and movable between 1075 and storage device 1020 stores the flow rate (including frame 900), some preferred implementations of the invention will FC frame information field 900 from packet 950 is mapped to the LLE the appropriate fields. LLE分组950包括SOF 955、目的地MAC字段的组织ID字段965和设备ID字段970、源MAC字段的组织ID字段975和设备ID字段980、协议类型字段985、字段990和有效载荷995。 LLE packet 950 includes SOF 955, the destination MAC field organization ID field 965 and the device ID field 970, a source MAC field organization ID field 975 and the device ID field 980, a protocol type field 985, field 990 and a payload 995.

[0114] 优选地,字段965、970、975和980都是24位字段,符合常规以太网协议。 [0114] Preferably, the fields 965,970,975, and 980 are 24-bit field, consistent with conventional Ethernet protocol. 因此,在本发明的一些实现方式中,FC帧900的目的地FC ID字段的内容被映射到字段965或970中的一个,优选映射到字段970。 Thus, in some implementations of the present invention, the contents of destination FC ID field of the FC frame 900 are mapped to a field 965 or 970, preferably 970 to a field map. 类似地,FC帧900的源FC ID字段的内容被映射到字段975或980中的一个,优选映射到字段980。 Similarly, the content source FC ID field of the FC frame 900 are mapped to a field 975 or 980, preferably 980 to a field map. 优选地,将FC帧900的目的地FC ID字段915和源FC ID字段920的内容分别映射到LLE分组950的字段970和980,因为约定俗成地,IEEE为单个组织代码指派许多设备代码。 Preferably, the contents of destination FC ID field 900 of FC frame 915 and the source FC ID field 920 are mapped to fields 970 LLE packet 950 and 980, because the convention, IEEE many devices assigned to a single code for the organization code. 这种映射功能例如可由存储网关1050执行。 This mapping function may be performed by, for example, a storage gateway 1050.

[0115] 因此,FC帧到LLE分组的映射部分地可通过向IEEE购买与一组设备代码相对应的组织唯一标识符(“0UI”)代码来实现。 [0115] Thus, FC frame is mapped to the code part LLE packet by the IEEE later with a set of device code corresponding to the Organizationally Unique Identifier ( "0UI") is achieved. 在一个这种示例中,当前的受让人Cisco Systems支付了OUI的注册费,并将OUI指派给“以太网上的FC”。 In one such example, the current assignee Cisco Systems to pay the registration fee OUI, and OUI assigned to "FC over Ethernet." 根据本发明的这一方面配置的存储网关(例如存储网关1050)将OUI置于字段965和975中,将目的地FC ID字段915的24位内容拷贝到24位字段970,并将源FC ID字段920的24位内容拷贝到24位字段980。 The storage of the gateway configuration of this aspect of the present invention (e.g., storage gateway 1050) placed the OUI field 965 and 975, destination FC ID field 915 of the content copy 24 to the 24-bit field 970, and source FC ID field 24 contents of 920 copied to 980 24-bit field. 存储网关在协议类型字段985中插入指示以太网上FC的代码,并将有效载荷925的内容拷贝到有效载荷字段995。 Gateway stored in the protocol type field 985 indicates Ethernet FC inserted code and the content payload 925 is copied to the payload field 995.

[0116] 由于上述映射,不需要在存储网络上明确指派MAC地址。 [0116] Due to the above mapping, the MAC address need not be explicitly assigned on the storage network. 然而,由于映射,目的地和源FC ID的以算法导出的版本被编码在了LLE帧的相应部分中,这些相应部分在常规以太网分组中将被指派给目的地和源MAC地址。 However, since the version of the mapping algorithm is derived, source and destination FC ID is encoded in the corresponding portion of LLE frame, which is assigned to the corresponding portions of the source and destination MAC address in the conventional Ethernet packet. 通过就好像这些字段是MAC地址字段那样利用这些字段的内容,可在LLE网络上路由存储流量。 As the contents of these fields by use if the fields are the MAC address field, may be stored by the road network traffic LLE.

[0117] SOF字段905和EOF字段910包含有序的符号集合,其中一些(例如用于指示FC帧的开始和结束的那些)是保留的符号,有时这些符号被称为“非法”或“代码违规”符号。 [0117] SOF EOF field 905 and field 910 contains an ordered set of symbols, some of which (e.g., those used to indicate the beginning and end of FC frame) is reserved symbols, these symbols are sometimes referred to as "illegal" or "Code illegal "sign. 如果这些符号之一被拷贝到LLE分组950内的某个字段(例如字段990),则该符号将会导致差错,例如通过指示LLE分组950应当在该符号处终止。 If one of these symbols is copied into a field (e.g., field 990) of the LLE packet 950, the symbol error will lead to, for example, should terminate at the symbol indicated by LLE packet 950. 但是,由这些符号传达的信息必须被保留,因为它指示了FC帧的种类,帧是序列的开始还是结束,以及其他重要信息。 However, the information conveyed by these symbols must be preserved, because it indicates the type of FC frame, the frame is the beginning or end of the sequence, and other important information.

[0118]因此,本发明的优选实现方式提供了将非法符号转换成合法符号的另一种映射功能。 [0118] Thus, a preferred implementation of the present invention are provided to convert illegal symbols into another legal symbol mapping function. 这些合法符号随后可被插入在LLE分组950的内部部分中。 These legal symbols may then be inserted into the inner portion of LLE packet 950. 在一个这种实现方式中,经转换的符号被置于字段990中。 In one such implementation, the converted symbol field 990 is placed. 字段990不需要很大;在一些实现方式,它的长度仅为I或2个字节。 Field 990 need not be great; in some implementations, I or its length only 2 bytes.

[0119] 为了允许贯通(cut-through)交换的实现,字段990可被分割成两个单独的字段。 [0119] In order to allow through (cut-through) to achieve exchange, field 990 may be divided into two separate fields. 例如,一个字段可以位于帧开始处,另一个可位于帧的另一端。 For example, a field may be located at the beginning of the frame, the other may be located at the other end of the frame.

[0120] 前述方法只是用于将FC帧封装在扩展的以太网帧内的各种技术的一个示例。 [0120] the method is only one example of various techniques for the FC frame is encapsulated in the extended Ethernet frame. 其他方法包括任何便利的映射,例如包括从三元组{VSAN,D_ID,S_ID}导出三元组{VLAN,DSTMAC Addr, Src MAC Addr}。 Other methods include any convenient mapping, for example ranging from triplet {VSAN, D_ID, S_ID} derived triplet {VLAN, DSTMAC Addr, Src MAC Addr}.

[0121] 上述映射和符号转换过程产生了LLE分组,例如LLE分组950,其允许去往或来自基于FC的存储设备1075的存储流量经由LLE云1005被转发到末端节点设备1010、1015和1020。 [0121] The mapping and symbol conversion process produces LLE packet, e.g. LLE packet 950, which allows flow to or from the FC-based storage memory device 1075 via LLE cloud 1005 is forwarded to the end node apparatus 1010, 1015 and 1020. 映射和符号转换过程例如可以由存储网关1050在逐帧基础上运行。 Mapping and symbol conversion processes, for example, may be executed by a storage gateway 1050 on a frame.

[0122] 因此,本发明提供了用于在FC-以太网云的入口边缘处将FC帧封装在扩展的以太网帧内的示例性方法。 [0122] Accordingly, the present invention provides a method of an exemplary extended Ethernet frame is encapsulated FC frame for the edge of the inlet FC- Ethernet cloud. 本发明的类似方法提供了在以太网-FC云的出口边缘处执行的相反过程。 Similar methods of the present invention provides a process performed at the outlet of the opposite edges of the Ethernet -FC cloud. FC帧可从扩展以太网帧中解封出来,然后在FC网络上传送。 FC frame may be deblocked from the extended Ethernet frame FC is then transmitted on the network.

[0123] 一些这样的方法包括这些步骤:接收以太网帧(例如按这里描述的方式封装);将以太网帧的目的地MAC字段的第一部分的目的地内容映射到FC帧的目的地FC ID字段;将以太网帧的源MAC字段的第二部分的源内容映射到FC帧的源FC ID字段;将以太网帧的合法符号转换成非法符号;将非法符号插入到FC帧的所选字段中;将以太网帧的有效载荷字段的有效载荷内容映射到FC帧有效载荷字段;以及在FC网络上传送FC帧。 [0123] Some such method comprising the steps of: receiving an Ethernet frame (e.g., encapsulating the manner described herein); the destination of the first part of the destination MAC field of the Ethernet frame is mapped to the destination FC ID FC frame field; the second part of the source field of the Ethernet frame source MAC mapped to the source FC ID field of the FC frame; converting Ethernet frames into valid symbol illegal symbols; illegal symbols inserted into a selected field of the FC frame ; the payload content payload field of an Ethernet frame is mapped to the payload field FC frame; and transmitting the frame on FC FC network.

[0124] 不需要保留关于帧的状态信息。 [0124] does not need to maintain state information about the frame. 因此,可以迅速地处理巾贞,例如以40Gb/s的速率处理帧。 Thus, Chen towels can be quickly processed, for example, at a rate of 40Gb / s frame process. 末端节点可基于SCSI运行存储应用,因为存储应用能看到图8所示的LLE栈800的SCSI层825。 End node may run on SCSI storage applications, since the memory application to see the stack shown in FIG. 8 LLE layer 825,800 of SCSI. 不同于经由专用于FC流量的交换机(例如图I所示的FC交换机140和145)转发存储流量,这种FC交换机可由LLE交换机1030所取代。 Unlike forward dedicated to storage traffic via the FC switch traffic (e.g., as shown in Figure I FC switches 140 and 145), the FC switch 1030 may be replaced by LLE switches.

[0125] 此外,LLE交换机的功能允许了空间强大的管理灵活性。 [0125] In addition, LLE function switch allows space strong management flexibility. 参考图11,在一种管理方案中,LLE云1105的LLE交换机1130中的每一个可被视为单独的FC交换机。 Referring to FIG 11, in one management program, LLE switches 1105 LLE cloud 1130 may each be considered a separate FC switch. 或者,LLE交换机1130中的一些或全部可被聚集起来,并且出于管理目的被视为FC交换机。 Or, LLE 1130 switch some or all may be gathered, and for administrative purposes is considered an FC switch. 例如,出于网络管理目的,通过将LLE云1105中的所有LLE交换机视为单个FC交换机,形成了虚拟FC交换机1140。 For example, for network management purposes, by all LLE LLE switch cloud in 1105 as a single FC switch, forming a virtual FC switch 1140. 个体LLE交换机1130的所有端口例如可被视为虚拟FC交换机140的端口。 LLE all individual switch ports 1130, for example, can be regarded as a virtual port FC switch 140. 或者,可以聚集较少量的LLE交换机1130。 Alternatively, the aggregate may be a lesser amount of LLE switches 1130. 例如,3个LLE交换机被聚集起来以形成虚拟FC交换机1160,4个LLE交换机被聚集起来以形成虚拟FC交换机1165。 For example, three LLE switches are aggregated together to form a virtual FC switch 1160,4 LLE switches are aggregated together to form virtual FC switch 1165. 网络管理者可通过考虑个体LLE交换机具有多少端口等等来决定聚集多少交换机。 Network managers can have the number of ports by taking into account individual LLE switches and so on to determine how much the switch aggregation. 通过将每个LLE交换机视为一个FC交换机,或者通过将多个LLE交换机聚集成一个虚拟FC交换机,可实现FC的控制平面功能,例如分区(zoning)、DNS、FSPF和其他功能。 Each LLE by the FC switch as a switch, or by a plurality of LLE switches aggregated into a virtual FC switch, enabling the FC control plane functions, such as zoning (zoning), DNS, FSPF and other features.

[0126] 此外,同一LLE云1105可支持许多虚拟网络。 [0126] In addition, the same cloud LLE 1105 can support many virtual network. 虚拟局域网(“VLAN”)是本领域中已知的,用于提供虚拟的基于以太网的网络。 Virtual local area network ( "VLAN") is known in the art for providing a virtual Ethernet-based network. 题为“Interswitch Link Mechanism forConnecting High-Performance NetworkSwitches” 的美国专利No. 5, 742, 604 描述了相关系统,这里通过引用将其结合进来。 US Patent No. entitled "Interswitch Link Mechanism forConnecting High-Performance NetworkSwitches" the 5, 742, 604 describes a related system, here incorporated by reference. 本受让人的各种专利申请,包括2001年12月26日递交的题为“Methods And Apparatus For Encapsulating A Frame For Transmission InAStorage Area Network”的美国专利申请No. 10/034, 160,提供了用于为基于FC的网络实现虚拟存储区域网(“VSAN”)的方法和设备。 This assignee of various patent applications, including the December 26, 2001 filed entitled "Methods And Apparatus For Encapsulating A Frame For Transmission InAStorage Area Network" US Patent Application No. 10/034, 160, provided by in order to achieve a virtual storage area network ( "VSAN") FC-based network devices and methods. 这里通过引用将该申请完全结合进来。 The fully incorporated herein by reference in its application to come. 由于LLE网络既能支持以太网流量又能支持FC流量,本发明的一些实现方式实现了为FC和以太网流量两者在同一物理LLE云上形成虚拟网络。 Since the network LLE can support both Ethernet traffic FC traffic, some implementations of the present invention achieves both FC and Ethernet traffic for the virtual network is formed on the same physical LLE cloud.

[0127] 图12是示出根据本发明一个实施例的DCE交换机1200的简化体系结构的示意图。 [0127] FIG. 12 is a schematic diagram illustrating a simplified architecture DCE switch 1200 according to an embodiment of the present invention. DCE交换机1200包括N个线路卡,每个线路卡的特征在于入口侧(或输入)1205和出口侧(或输出)1225。 DCE switch 1200 includes N line cards, each line card is characterized in that the inlet side (or input) 1205 and an outlet side (or output) 1225. 线路卡入口侧1205经由交换结构1250连接到线路卡出口侧1225,在本示例中该交换结构包括纵横交换机。 The inlet side of line cards 1205 are connected via the switch fabric 1250 to 1225 line card outlet side, in this example, the switching structure comprises a crossbar switch.

[0128] 在该实现方式中,在输入和输出侧都执行缓冲。 [0128] In this implementation, perform buffering input and output sides. 也可能实现其他体系结构,例如具有输入缓冲器、输出缓冲器和共享存储器的那些。 Other architectures may also be achieved, for example, those having an input buffer, an output buffer, and shared memory. 因此,输入线路卡1205中的每一个包括至少一个缓冲器1210,并且输出线路卡1225中的每一个包括至少一个缓冲器1230,所述缓冲器可以是本领域已知的任何便利类型的缓冲器,例如外部的基于DRAM的缓冲器或片上的基于SRAM的缓冲器。 Thus, the input line cards 1205 includes at least one buffer 1210, and each output line card 1225 includes at least one buffer 1230, the buffer may be any convenient buffer type known in the art , for example based on a buffer DRAM or SRAM-based sheet external buffer. 缓冲器1210例如用于输入缓冲,以便在等待输出线路卡处有足够缓冲器可用于存储要经由交换结构1250发送的分组的同时暂时保存分组。 Input buffer for example the buffer 1210, to wait at the output line card has sufficient storage buffer may be used for temporarily storing a packet switched via packet structure 1250 simultaneously transmitted. 缓冲器1230例如用于输出缓冲,以便在等待有足够信用用于要发送到另一DCE交换机的分组的同时暂时保存接收自输入线路卡1205中的一个或多个的分组。 Buffer 1230 for example, an output buffer, so that while there is sufficient credit in waiting for a packet to be transmitted to another DCE switch temporarily stores the packet received from the input line cards 1205 of one or more.

[0129] 值得注意的是,虽然信用可在交换机内部和外部使用,但是在内部和外部信用之间不一定存在一对一映射。 [0129] It is noteworthy that, although the credit can be used in internal and external switches, but there is not necessarily one to one mapping between internal and external credit. 此外,可以在内部或外部使用暂停帧。 Further, the pause frame may be used internally or externally. 例如,四个可能组合暂停-暂停(PUASE-PAUSE)、暂停-信用(PAUSE-CREDITS)、信用-暂停(CREDITs-PAUSE)和信用-信用(CREDIT-CREDIT)中的任何一种都可产生可行的解决方案。 For example, four possible combinations Pause - pauses (PUASE-PAUSE), Pause - Credit (PAUSE-CREDITS), Credit - Pause (CREDITs-PAUSE) and credit - any kind of credit (CREDIT-CREDIT) can produce viable in s solution.

[0130] DCE交换机1200包括某种形式用于施加流控制的信用机制。 [0130] DCE switch mechanism 1200 includes some form of credit for flow control is applied. 该流控制机制可在缓冲器1230之一的输出队列达到其最大容量时在缓冲器1210上施加反向压力。 Applying a reverse pressure on the buffer 1210 when the flow control mechanism can reach its maximum capacity of 1230 one of the output queue buffers. 例如,在发送巾贞之前,输入线路卡1205之一可在从输入队列1215向输出队列1235发送巾贞之前向仲裁器1240(它例如可以是位于中央位置的单独的芯片或分布在输出线路卡上的一组芯片)请求信用。 For example, before sending towel Chen, one of the input line cards 1205 may send 1215 from the input queue to an output queue 1235 Chen towel before the arbiter 1240 (which may for example be located in a separate chip central location or distributed in the output line card a set of chips on a) a credit request. 优选地,该请求例如根据DCE头部的帧信用字段指示帧的大小。 Preferably, for example, the request indicates the size of a frame according to a frame credit field of the DCE header. 仲裁器1240将会确定输出队列1235是否能够接受该帧(即输出缓冲器1230具有足够的空间来容纳该帧)。 The arbiter 1240 determines whether the output queue will be able to accept the frame 1235 (i.e., the output buffer 1230 has enough space to accommodate the frame). 如果能够,则信用请求将会被准予,并且仲裁器1240将会向输入队列1215发送信用授予。 If so, then the request will be granted credit, and the arbiter will send a credit grant 1240 to 1215 input queue. 但是,如果输出队列1235太满,则该请求将会被拒绝,并且不会向输入队列1215发送信用。 However, if too full output queue 1235, then the request will be rejected and not sent to the input queue 1215 credits.

[0131] 如本文中其他地方所讨论的,DCE交换机1200需要能够支持虚拟通道所要求的“丢弃”、“无丢弃”和中间行为。 [0131] As discussed elsewhere herein, the, the DCE switch 1200 need to be able to support the required virtual channel "discard", "No discard" and an intermediate behavior. 部分地通过在内部向DCE交换机施加类似上面描述的某类信用机制来启用“无丢弃”功能。 Part similar to that described above by applying a certain credit mechanism within the DCE switch to turn on the "discard" function. 在外部,“无丢弃”功能可根据先前描述的缓冲器到缓冲器信用机制或暂停帧来实现。 Externally, "No discard" function to a buffer credit mechanism according to the previously described buffer or pause frames to achieve. 例如,如果输入线路卡1205之一正经历着通过内部信用机制来自一个或多个输出线路卡1225的反向压力,那么线路卡可经由类似FC的那种的缓冲器到缓冲器信用系统在上游方向上在外部传播反向压力。 For example, if one of the input line cards 1205 are experiencing 1225 through a back pressure mechanism internal credit output from one or more line cards, the line card may be a credit to the upstream system via a buffer similar to that of FC buffer back pressure propagation direction outside.

[0132] 优选地,提供“无丢弃”和中间功能的同一芯片(例如同一ASIC)也将提供类似经典以太网交换机的那种的“丢弃”功能。 [0132] Preferably, a "no-discard" of the same chip and the intermediate function (e.g., the same ASIC) will also provide a similar kind of classic Ethernet switch to "discard" function. 虽然这些任务可被分配在不同芯片上,但是在同一芯片上提供丢弃、无丢弃和中央功能允许了以低得多的成本来提供DCE交换机。 While these tasks may be assigned on different chips, but dropped provided on the same chip, and without discarding the central feature allows much lower cost to provide DCE switch.

[0133] 每个DCE分组将会在例如本文中其他地方描述的DCE头部中包含指示DCE分组所属的虚拟通道的信息。 [0133] Each packet will contain DCE DCE packet belongs, information indicating the virtual channel in DCE header described elsewhere herein, for example, in. DCE交换机1200将会根据DCE分组被指派到的VL是丢弃还是无丢弃VL来处理每个DCE分组。 The DCE DCE switch 1200 will be assigned to the packet is discarded or not discarded is VL VL DCE to process each packet.

[0134] 图13示出了为VL划分缓冲器的示例。 [0134] FIG. 13 shows an example of division VL buffer. 在该示例中,指派了4个VL。 In this example, four assigned VL. VL 1305和VL 1310是丢弃VL。 VL 1305 and VL 1310 are discarded VL. VL 1315和VL 1320是无丢弃VL。 VL 1315 and VL 1320 are not discarded VL. 在该示例中,输入缓冲器1300具有为每个VL指派的特定区域:VL 1305被指派到缓冲器空间1325,VL 1310被指派到缓冲器空间1330,VL 1315被指派到缓冲器空间1335,VL 1320被指派到缓冲器空间1340。 In this example, the input buffer 1300 has a specific area is assigned to each VL: VL 1305 is assigned to buffer space 1325, VL 1310 is assigned to buffer space 1330, VL 1315 is assigned to buffer space 1335, VL 1320 assigned buffer space 1340. VL1305和VL 1310上的流量的管理方式很像常规的以太网流量,并且部分是根据缓冲器空间1325和1330的操作。 Management traffic on the VL1305 and VL 1310 like a conventional Ethernet traffic, and the operating portion 1325 and 1330 in accordance with the buffer space. 类似地,VL 1315和1320的无丢弃特征部分是根据仅为缓冲器空间1335和1340启用的缓冲器到缓冲器信用流控制方案来实现的。 Similarly, VL features 1315 and 1320 without discarding the buffer space only partly on the buffer 1335 and 1340 to enable buffer credit flow control scheme to achieve.

[0135] 在一些实现方式中,指派给VL的缓冲器空间的量可根据诸如以下标准动态地指派:缓冲器占用率、一天中的时间、流量负载/拥塞、保证最小带宽分配、已知的要求更大带宽的任务、最大带宽分配,等等。 [0135] In some implementations, the amount of buffer space assigned to the VL can be assigned dynamically according to the following criteria, such as: buffer occupancy, time of day, traffic load / congestion, guaranteed minimum bandwidth allocation, known the task requires greater bandwidth, maximum bandwidth allocation, and so on. 优选地,公平原则将会应用来防止一个VL获得过多量的缓冲器空间。 Preferably, the fairness principle will apply a VL to prevent an excessive amount of buffer space is obtained.

[0136] 在每个缓冲器空间内,在作为与目的地相关联的逻辑队列(虚拟输出队列或V0Q)的数据结构中存在数据的组织。 [0136], the presence of data organized in a data structure as a logical queue associated with the destination (or virtual output queue V0Q) in buffer in each space. (“A Practical SchedulingAlgorithm to Achieve 100%Throughput in Input-Queued Switches,,,AdisakMekkittikul和Nick McKeown,ComputerSystems Laboratory, StanfordUniversity (InfoCom 1998)和其中引用的参考文献描述了 ( "A Practical SchedulingAlgorithm to Achieve 100% Throughput in Input-Queued Switches ,,, AdisakMekkittikul and Nick McKeown, ComputerSystems Laboratory, StanfordUniversity (InfoCom 1998) and references cited therein described

用于实现VQO的相关方法,这里通过引用将它们结合进来)。 VQO related method for realization thereof herein incorporated by reference). 目的地最好是目的地端口/虚拟通道对。 The best destination is the destination port / virtual channel pair. 利用VOQ方案避免了在输出端口阻塞时和/或目的地输出端口的另一虚拟通道阻塞时引起的输入线路卡处的队头阻塞(head of lineblocking)。 Using VOQ approach avoids head of line blocking (head of lineblocking) at the input line cards in the output port blocking and / or causing another virtual channel blocked when the destination output port.

[0137] 在一些实现方式中,在VL之间不共享VOQ。 [0137] In some implementations, not shared between VL VOQ. 在其他实现方式中,在丢弃VL之间或无丢弃VL之间可共享V0Q。 In other implementations, VL between discarded without discard or may be shared between V0Q VL. 但是,在无丢弃VL和丢弃VL之间不应当共享V0Q。 However, no discard between VL and VL should not be discarded shared V0Q. 在一些实施例中,VOQ与单个缓冲器相关联。 In some embodiments, VOQ associated with a single buffer. 但是,在其他实施例中,VOQ可由多于一个缓冲器实现。 However, in other embodiments, more than one buffer may be implemented VOQs.

[0138] DCE交换机的缓冲器可实现各种类型的活动队列管理。 Buffer [0138] DCE switches may implement various types of active queue management. DCE交换机缓冲器的一些优选实施例提供了至少4种基本类型的活动队列管理:流控制;出于拥塞避免目的,针对丢弃VL进行丢弃或针对无丢弃VL进行标记;丢弃以避免无丢弃VL中的死锁;以及为了延迟控制而丢弃。 Some preferred embodiments of DCE switches the buffer to provide at least four basic types of active queue management: flow control; for the purpose of avoiding congestion, discards marked for discard or VL VL for no discard; no discarded to avoid discarding the VL deadlock; and a delay control for discarding.

[0139] DCE网络的流控制具有至少两种基本表现,一种被实现在DCE交换机内,另一种被实现在网络的链路上。 Flow Control [0139] DCE network having at least two basic performance of a switch to be implemented in the DCE, is implemented on another link of the network. 一种流控制表现是缓冲器到缓冲器的基于信用的流控制,它主要用于实现“无丢弃”或推迟丢弃VL。 A streaming control performance is a buffer to buffer credit based flow control, which is mainly used to implement the "No discard" discard or delay VL. 如上所述,暂停帧之类也可用于为“无丢弃”或“推迟丢弃”VL实现流控制。 As described above, the pause frame or the like may also be used as a "no-discard" or "discard delayed" VL implement flow control. 信用和暂停帧的任何便利组合,不论是在DCE交换机内还是在链路上,都可用于实现流控制。 Any convenient combination of credit and PAUSE frame, both in DCE switch or on the link, can be used to achieve flow control. 注意到以下这一点是很重要的:在优选实施例中,在DCE交换机内管理的信用与在链路上管理的信用是不同的。 It is noted that the following is important: In the preferred embodiment, the switch management within the DCE with the credit in the credit link management are different. 一些优选实施例在链路上使用暂停帧,而在DCE交换机内使用信用。 Some preferred embodiments use pause frames on a link, the use of credit in DCE switch.

[0140] 一些优选实现方式的另一种流控制表现包括向网络中的其他设备的显式上游拥塞通知。 [0140] Another flow control performance of some preferred implementations include an explicit upstream congestion notification to other devices in a network. 该显式上游拥塞通知例如可由DCE头部的显式拥塞通知(“ECN”)字段实现,如本文中其他地方所描述。 The explicit upstream congestion notification may be explicit congestion notification DCE header ( "ECN") field to achieve, as described elsewhere herein.

[0141] 图14示出了DCE网络1405,包括边缘DCE交换机1410、1415、1425和1430以及核心DCE交换机1420。 [0141] FIG. 14 shows a DCE network 1405, including an edge DCE switch 1410,1415,1425 and 1430 and the core DCE switch 1420. 在这种情况下,核心DCE交换机1420的缓冲器1450实现3类流控制。 In this case, the buffer core DCE switch 1420 3 1450 implement flow control class. 一类是缓冲器到缓冲器流控制指示1451,它由缓冲器1450和边缘DCE交换机1410的缓冲器1460之间的缓冲器到缓冲器信用的授予(与否)来传达。 One is to the buffer flow control instruction buffer 1451, a buffer between it and the edge buffer 1450 buffers 14601410 DCE switches to the buffer credits awarded (or not) to convey.

[0142] 缓冲器1450还发送2个ECN 1451和1452,这两个ECN都是经由DCE分组的DCE头部的ECN字段实现的。 [0142] Buffer 1450 also transmits two ECN 1451 and 1452, these two are via DCE ECN ECN field of the packet header DCE implementation. ECN 1451可被视为核心到边缘通知,因为它是由核心设备1420发送并由边缘DCE交换机1410的缓冲器1460接收的。 ECN 1451 can be considered a notification core to the edge, since it is transmitted by the device 1420 by the edge of the core DCE switch 1410 receive buffer 1460. ECN 1452将被视为核心到末端通知,因为它是由核心设备1420发送并由末端节点1440的NIC卡1465接收的。 ECN 1452 is notified as a core to the tip, since it is transmitted by the device 1420 by the end of the core node 1465 NIC card 1440 is received.

[0143] 在本发明的一些实现方式中,ECN是通过对存储到经历拥塞的缓冲器中的分组进行采样来生成的。 [0143] In some implementations of the present invention, ECN is stored to the buffer by the packet congestion is experienced in generating the sampling. 通过将其目的地地址设置为等于被采样分组的源地址,ECN被发送到该分组的源。 By its destination address is set equal to the source address of the packet is sampled, ECN is sent to the source of the packet. 边缘设备将会得知源是像末端节点1440那样支持DCE ECN,还是像末端节点1435那样不支持。 Edge equipment will be informed source as end nodes that support DCE ECN 1440, or as end nodes that do not support 1435. 在后一情况下,边缘设备1410将会终止ECN并实现适当的动作。 In the latter case, the edge device terminates ECN 1410 and implement appropriate action.

[0144] 活动队列管理(AQM)将响应于各种标准被执行,这些标准包括但不限于缓冲器占用率(例如针对每个VL)、每个VOQ的队列长度以及VOQ中的分组的年龄。 [0144] active queue management (the AQM) is executed in response to various criteria, which include but are not limited to, buffer occupancy (e.g., for the VL each), the queue length and the age of each VOQ of VOQ packet. 为了简单起见,在讨论AQM时,一般会假定在VL之间不共享V0Q。 For simplicity, in the discussion of AQM, we will generally assume that V0Q not shared between VL.

[0145] 现在将参考图15描述根据本发明的AQM的一些示例。 Description [0145] Referring now to FIG. 15 in accordance with some examples of the present invention AQM. 图15示出了特定进刻的缓冲器使用情况。 FIG 15 shows a specific use of the buffer into engraved. 在该时刻,物理缓冲器1500的部分1505已被分配给丢弃VL,而部分1510已被分配给无丢弃VL。 At this time, the physical buffer portion 15051500 has been assigned to the VL discarded, while the part 1510 has been allocated to non-discarded VL. 如本文中其他地方所述,缓冲器1500被分配给丢弃VL或无丢弃VL的量可随时间而改变。 As described elsewhere herein, the buffer 1500 is allocated to the non-discarded VL or VL discards may change over time. 在分配给丢弃VL的部分1505中,部分1520当前正被使用,而部分1515当前未被使用。 Discarding the portion allocated to the 1505 VL, a portion 1520 is currently being used, and the portion 1515 is not in use.

[0146] 在部分1505和1510内,存在许多V0Q,包括VOQ 1525、1530和1535。 [0146] In the inner portion 1505 and 1510, there are many V0Q, including VOQ 1525,1530 and 1535. 在该示例中,确立了阈值VOQ长度L。 In this example, the threshold established VOQ length L. VOQ 1525和1535的长度大于L,而VOQ 1530的长度小于L。 VOQ 1525 and 1535 is greater than the length L, but less than 1530 VOQ length L. 长的VOQ指示下游拥塞。 VOQ length downstream congestion indication. 活动队列管理优选地防止了任何VOQ变得太大,这是因为否则的话影响一个VOQ的下游拥塞将会不利地影响去往其他目的地的流量。 Active queue management preferably prevents any VOQ becomes too large, because otherwise affect a VOQ downstream congestion will adversely affect traffic destined for other destinations.

[0147] VOQ中的分组的年龄是用于AQM的另一标准。 [0147] Age packet VOQ is a further criterion for the AQM. 在优选实现方式中,分组在进入缓冲器并被排队到适当的VOQ中时被加上时间戳。 When the time stamp is added to achieve a preferred embodiment, the packet into the buffer and is queued to the appropriate VOQ. 因此,分组1540在到达缓冲器1500时接收时间戳1545,并被根据其目的地和VL标示置于VOQ中。 Thus, the packet receiving time stamp 1540 1545 1500 upon reaching the buffer, and is labeled according to its destination and VL placed in VOQ. 如其他地方所述,VL标示将会指示是应用丢弃还是无丢弃行为。 As described elsewhere, VL is applied marking will indicate whether to discard or not discard behavior. 在该示例中,分组1540的头部指示分组1540将在丢弃VL上传送,并且具有与VOQ 1525相对应的目的地,因此分组1540被置于VOQ 1525中。 In this example, the 1540 packet header indicates that the packet is discarded in the transmitting 1540 the VL, and has a destination corresponding VOQ 1525, 1540 and therefore the packet is placed in the VOQ 1525.

[0148] 通过比较时间戳1545的时刻和当前时刻,在之后的时刻可确定分组1540的年龄。 [0148] By comparing the time stamp and the current time 1545, the time after 1540 may determine the age groups. 在此上下文中,“年龄”仅指分组在交换机中花费的时间,而不是在网络中的某个其他部分中花的时间。 In this context, "age" refers only to time spent in a packet switch, rather than time spent in some other part of the network. 然而,通过分组的年龄可推断网络的其他部分的状况。 However, by age group inferred status of the rest of the network. 例如,如果分组的年龄变得相对较大,则这种状况指示去往分组的目的地的路径正经历着拥塞。 For example, if the age of the packet becomes relatively large, this situation indicates the path to the destination of the packet is experiencing congestion.

[0149] 在优选实现方式中,年龄超过预定年龄的分组将被丢弃。 [0149] In a preferred implementation, the packet is older than a predetermined age will be discarded. 如果在确定年龄时发现VOQ中的若干个分组超过了预定的年龄阈值,则可以进行多个丢弃。 If it is found VOQ several of packets exceeds a predetermined threshold value in determining the age of age may be a plurality of discard.

[0150] 在一些优选实现方式中,存在用于延迟控制(IY)和用于避免死锁(Td)的单独的年龄极限。 [0150] In some preferred implementations, the presence of a delay control (IY) and a separate age limit to avoid deadlock (Td) of. 当分组到达ΊΥ时要采取的动作最好取决于分组是在丢弃VL还是在无丢弃VL上传送。 When the packet arrives ΊΥ best action to take depending on the packet is discarded or transmitted over a non-VL discarded VL. 对于无丢弃通道上的流量,数据完整性比延迟更重要。 For the non-discarded traffic channel, the delay is more important than the integrity of data. 因此,在本发明的一些实现方式中,当无丢弃VL中的分组的年龄超过IY时,分组不被丢弃,而是采取另一动作。 Thus, in some implementations of the present invention, when a packet is not discarded VL over the age IY, packets are not discarded, but take other action. 例如,在一些这样的实现方式中,分组可被标记和/或上游拥塞通知可被触发。 For example, in some such implementations, the packet may be marked and / or upstream congestion notification may be triggered. 对于丢弃VL中的分组,延迟控制相对更重要,因此当分组的年龄超过IY时采取更激进的动作较为适当。 For the VL packet discard, delay control relatively more important, and therefore more aggressive actions more appropriate when a packet is older than IY. 例如,可向该分组应用概率性丢弃函数。 For example, packets may be discarded probabilistically to the application function.

[0151] 图16的图线1600提供了概率性丢弃函数的一些示例。 FIG line [0151] 1600 of FIG. 16 provides some examples of probabilistic drop function. 根据丢弃函数1605、1610和1615,当分组的年龄超过Taj,即延迟截止阈值时,随着分组的年龄增大达到ΊΥ,它将会被故意丢弃的概率从0%增大到100%,这取决于函数。 The discard probability function 1605, 1610 and 1615, when a packet is older than Taj, i.e. delay cutoff threshold, with increasing age of the packet reaches ΊΥ, it will be dropped deliberately increased from 0% to 100%, which It depends on the function. 丢弃函数1620是阶梯函数,其故意丢弃的概率为0%,直到达到IY为止。 1620 discarding function is a step function, deliberately discard probability is 0%, up until IY. 当分组的年龄达到IY时,丢弃函数1605、1610、1615和1620都达到100%的故意丢弃机率。 When the packet reaches the age IY, discard function 1605,1610,1615 and 1620 are 100% chance of intentionally discarded. 虽然Τω、ΐγ和Td可以是任何便利的时间,但是在本发明的一些实现方式中,Taj在数十微秒量级,I;在数毫秒至数十毫秒量级,而Td在数百毫秒量级,例如500晕秒。 Although Τω, ΐγ and Td may be any convenient time, in some implementations of the present invention, the order of tens of microseconds in the Taj, I; in the order of several milliseconds to several tens of milliseconds, several hundreds of milliseconds while the Td the order of, for example, 500 halo seconds.

[0152] 如果丢弃VL或无丢弃VL中的分组的年龄超过TD,则分组将会被丢弃。 [0152] If the drop VL or VL of the non-discarded packets over the age of TD, the packet will be discarded. 在优选实现方式中,针对无丢弃VL的Td比针对丢弃VL的Td要大。 In a preferred implementation, the ratio of Td for discarding without discarding VL VL for the larger Td. 在一些实现方式中,ΊΥ和/或Td也可部分取决于分组在其上传送的VL的带宽以及同时将分组传送到该VL的VOQ的数目。 In some implementations, ΊΥ and / or Td may also depend in part on the number of packets transmitted to the VOQ VL VL bandwidth in which the transport packet and simultaneously.

[0153] 对于无丢弃VL,与图16所示类似的概率函数可被用于触发上游拥塞通知,或设置属于能够支持TCP ECN的连接的TCP分组的头部中的拥塞经历位(CE)。 [0153] For the VL dropped without a head, similar to the probability function shown in FIG. 16 may be used to trigger an upstream congestion notification, or set belonging to the ECN capable of supporting TCP connections TCP packet congestion experienced bit (CE).

[0154] 在一些实现方式中,分组是否被丢弃、上游拥塞通知是否被发送以及TCP分组的CE位是否被标记不仅取决于分组的年龄,还取决于分组被置于其中的VOQ的长度。 [0154] In some implementations, packets are discarded, whether upstream congestion notification is transmitted TCP packet and whether the CE bit is marked only depends on the age of the packet, the packet is placed also depends on the length in which the VOQ. 如果这种长度高于阈值Lmax,则采取AQM动作;否则将对从长度超过阈值Lmax的VOQ出队的第一分组执行AQM动作。 If this is above a threshold length Lmax, AQM action is taken; otherwise, will exceed the threshold of Lmax VOQ first packet dequeue operation is performed from the AQM length.

[0155] 每VL缓冲器占用率的使用 [0155] Using the buffer occupancy per VL

[0156] 如图15所示,缓冲器被划分成VL。 [0156] As shown in FIG. 15, the buffer is divided into VL. 对于缓冲器中被划分给丢弃VL的部分(例如缓冲器1500的部分1505),如果在任何给定时刻VL的占用率大于预定的最大值,则分组将被丢弃。 For the buffer is allocated to the portion (e.g. the buffer portion 15051500) discard the VL, VL if the occupancy is greater than a predetermined maximum value at any given moment, the packet is discarded. 在一些实现方式中,计算并维护VL的平均占用率。 In some implementations, calculate and maintain an average occupancy rate of VL. 基于这种平均占用率,可采取AQM动作。 Based on this average occupancy rate, AQM can take action. 例如,对于与无丢弃VL相关联的部分1505,DCE ECN将被触发,而不是像与丢弃VL相关联的部分1510的情况下那样进行分组丢弃。 For example, for non-disposable portion 1505 associated VL, DCE ECN will be triggered, instead of discarding the packet is discarded as the case of the VL associated portion 1510 above.

[0157] 图17示出了一段时间(水平轴)中VL占用率B (VL)(垂直轴)的图线1700。 [0157] FIG. 17 shows a period of time (horizontal axis) in which the VL occupancy B (VL) (vertical axis) of the line 1700 in FIG. 在这里,的阈值。 Here, the threshold value. 在本发明的一些实现方式中,在确定B(VL)达到时VL中的一些分组将被丢弃。 In some implementations of the present invention, when determining B (VL) VL reaches some packets will be dropped. 一段时间中B(VL)的实际值由曲线1750示出,但是B(VL)仅在时刻&至tN是确定的。 Period of time B (VL) of the actual value shown by curve 1750, but B (VL) only at the time is determined to tN &. 在该示例中,在点1705、1710和1715分组将被丢弃,这些点对应于时刻t2、t3和t6。 In this example, the points 1705,1710 and 1715 packets will be dropped, which correspond to the time points t2, t3 and t6. 将会根据分组的年龄(例如最老的最先)、其大小、分组的虚拟网络的QoS、随机地、根据丢弃函数或以其他方式丢弃分组。 It will be based on age groups (e.g., oldest first), size, QoS packet of the virtual network, randomly, according to a drop function, or otherwise discard the packet.

[0158] 此外(或者作为替换),当B(VL)的平均值、加权平均值等等达到或超过Bt时,可采取活动队列管理动作。 [0158] In addition (or alternatively), when B (VL) of an average value, weighted average, etc. of Bt is reached or exceeded, action can be taken to active queue management. 这种平均可根据各种方法来计算,例如通过将确定出的B(VL)值相加起来并且除以确定数。 Such an average can be calculated according to various methods, for example, add up the values ​​determined by the B (VL) and dividing the number determined. 一些实现方式应用加权函数,例如通过为更新近的样本赋予更大的权重。 Some implementations of weighting function is applied, for example, given more weight by the weight of the sample recent update. 可以应用本领域已知的任何类型的加权函数。 Known in the art may be applied to any type of weighting function.

[0159] 所采取的活动队列管理动作例如可以是发送ECNjP /或应用概率性丢弃函数,例如与图18所示的那些之一类似的函数。 [0159] The active queue management action taken, for example, may be transmitted ECNjP / or application of a probabilistic drop function, one of the functions similar to those shown in FIG. 18, for example. 在该示例中,图线1880的水平轴是B(VL)的平均值。 In this example, the horizontal axis plot 1880 is the average of B (VL) of. 当平均值低于第一值1805时,故意丢弃分组的机率是0%。 When the average value falls below a first value 1805, deliberately discard probability of the packet is 0%. 当平均值达到或超过第二值1810时,故意丢弃分组的机率为100%。 When the average value reaches or exceeds the second value 1810, deliberately discard probability of the packet is 100%. 可以向居间值应用任何便利的函数,不论是与1815、1820或1825类似的函数还是其他函数。 It can be applied to any convenient function intervening value, whether it is similar to the function of 1815,1820 or 1825 or other functions.

[0160] 参考图15,很明显,VOQ 1525和1535的长度超过了预定长度L。 [0160] Referring to FIG 15, it is clear that, VOQs 1525 and 1535 length exceeds a predetermined length L. 在本发明的一些实现方式中,这种状况触发了活动队列管理响应,例如发送一个或多个ECN。 In some implementations of the invention, this condition triggers the event queue management response, such as sending one or more ECN. 优选地,缓冲器1500中包含的分组将会指示源是否能够对ECN作出响应。 Preferably, the buffer 1500 included in the packet will indicate whether the source is able to respond to the ECN. 如果分组的发送者不能对ECN作出响应,则该状况可触发概率性丢弃函数或者就简单地进行丢弃。 If the sender does not respond to the packet ECN, the condition may trigger a probabilistic drop function, or is simply discarded. VOQ 1535不仅长于预定长度L1,还长于预定长度L2。 VOQ 1535 is not only longer than a predetermined length L1, but also longer than the predetermined length L2. 根据本发明的一些实现方式,该状况触发分组的丢弃。 According to some implementations of the present invention, the condition is triggered to discard the packet. 本发明的一些实现方式利用了平均VOQ长度作为触发活动队列管理响应的标准,但是由于它需要大量的计算因此不是优选的。 Some implementations of the invention utilizes as a standard an average length VOQ queue manager in response to a trigger event, but because it requires a large amount of calculation is not preferable.

[0161] 希望具有多个用于触发AQM动作的标准。 [0161] desirable to have criteria for triggering a plurality of AQM action. 例如,虽然提供对VOQ长度的响应是有用的,但是这种措施对于每端口具有约I至2MB缓冲器空间的DCE交换机可能是不够的。 For example, while providing a response to the VOQ length is useful, but this measure for each port having from about I to 2MB DCE switch buffer space may not be sufficient. 对于给定缓冲器,可能有数千个活动V0Q。 For a given buffer, there may be thousands of activities V0Q. 但是,可能只有足够用于IO3个分组量级的存储器空间,或者可能更少。 However, memory space may be sufficient for the order of packets IO3, or possibly less. 因此,可能会出现没有哪个个体VOQ有足够触发任何AQM响应的分组,但某个VL却已用尽空间的情况。 Therefore, there may be no individual VOQ enough to trigger any response packet AQM, but a VL already run out of space situation.

[0162] 无丢弃VL的队列管理 [0162] None of the discard queue manager VL

[0163] 在本发明的优选实现方式中,丢弃VL和无丢弃VL的活动队列管理之间的主要差别是对于丢弃VL将会触发分组丢弃的(一个或多个)标准对于无丢弃VL将会导致DCE ECN被发送或TCP CE位被标记。 [0163] In a preferred implementation of the invention, the main difference between the discarded without discard VL and VL of active queue management for VL discarded packets discarded will trigger (s) for the standard will not be discarded VL resulting in DCE ECN bits to be transmitted or TCP CE marked. 例如,对于丢弃VL将会触发概率性分组丢弃的状况一般将会导致去往上游边缘设备或末端(主机)设备的概率性ECN。 For example, the trigger will be dropped probabilistically VL dropped packets conditions will generally result in a probabilistic ECN destined upstream device or terminal edge (host) device. 基于信用的方案不是基于分组去往何处的,而是基于分组来自何处的。 Credit-based scheme is not based on where the packet is destined, but from where the packet-based. 因此,上游拥塞通知帮助了提供缓冲器使用的公平性,并且避免了在用于无丢弃VL的流控制的唯一方法是基于信用的流控制的情况下可能造成的死锁。 Therefore, the upstream congestion notification help provide fairness buffer usage, and the only way to avoid free flow for discarding VL control is based on the case where the deadlock credit-based flow control may be caused.

[0164] 例如,对于使用每VL缓冲器占用率作为标准,优选地不会仅因为每VL缓冲器占用率达到或超过了阈值就丢弃分组。 [0164] For example, for buffer occupancy per VL as a standard, preferably not only as buffer occupancy per VL reaches or exceeds the threshold value is discarded packets. 相反,例如,分组将被标记,或者ECN将被发送。 Instead, for example, the packet will be marked, or ECN to be sent. 类似地,仍可以计算某种类型的平均每VL缓冲器占用率,并应用概率函数,但是要采取的基本动作是标记和/或发送ECN。 Similarly, we can still calculate some type of buffer occupancy per VL, and applied probability function, but the basic actions to be taken are marked and / or send ECN. 分组将不会被丢弃。 Packets will not be discarded.

[0165] 但是,即使对于无丢弃VL,响应于例如如本文中其他地方所描述的由超过阈值的分组年龄所指示的阻塞或死锁状况,分组仍将被丢弃。 [0165] However, even with no discard the VL, for example, in response to such a deadlock condition by a blocking or exceeds the threshold age of packets indicated elsewhere herein described, the packet will be discarded. 本发明的一些实现方式还允许了无丢弃VL的分组响应于延迟状况被丢弃。 Some implementations of the present invention also allows the non-discarded packets VL in response to the delay condition is discarded. 这将会取决于对于该具体无丢弃VL,对延迟设置的重要程度。 This will depend on the particular for non-discarded VL, the degree of importance of the delay set. 一些这样的实现方式应用概率性丢弃算法。 Some of these implementations use probabilistic dropping algorithm. 例如,与存储应用相比,一些集群应用可在延迟因素上设置更高的值。 For example, compared with storage applications, some of the cluster application may set a higher value on the delay factor. 数据完整性对于集群应用来说仍是重要的,但是通过放弃某个程度的数据完整性来降低延迟可能会是有利的。 Data integrity is still important for cluster applications, but reduced by giving up some degree of data integrity of the delay may be beneficial. 在一些实现方式中,比起用于丢弃通道的相应值,较大值的TL(即延迟控制阈值)可被用于无丢弃通道。 In some implementations, than the corresponding values ​​for discarding TL channel, a large value (i.e., the delay control threshold) may be discarded without a channel.

[0166] 图19示出了可被配置来实现本发明的一些方法的网络设备的示例。 [0166] FIG. 19 shows an example may be a network device configured to implement some methods of the present invention. 网络设备1960包括主中央处理单元(CPU) 1962、接口1968和总线1967 (例如PCI总线)。 Network device 1960 includes a master central processing unit (CPU) 1962, interfaces 1968, and a bus 1967 (e.g., a PCI bus). 一般来说,接口1968包括适用于与适当介质通信的端口1969。 Generally, interfaces 1968 include ports 1969 suitable for communication with the appropriate media. 在一些实施例中,接口1968中的一个或多个包括至少一个独立的处理器1974,并且在一些情况下包括易失性RAM。 In some embodiments, the interface 1968 includes one or more of at least one independent processor 1974 and includes both volatile RAM in some cases. 独立处理器1974例如可以是ASIC或任何其他适当的处理器。 Independent processors 1974 may be, for example, ASIC, or any other suitable processor. 根据一些这样的实施例,这些独立处理器1974执行这里描述的逻辑的至少一些功能。 According to some such embodiments, these independent processors 1974 perform the logic described herein at least some functions. 在一些实施例中,接口1968中的一个或多个控制诸如媒体控制和管理这样的通信密集型任务。 In some embodiments, one or more of interfaces 1968 control such communications in intensive tasks as media control and management. 通过为通信密集型任务提供单独的处理器,接口1968允许了主微处理器1962高效地执行诸如路由计算、网络诊断、安全性功能之类的其他功能。 By providing separate processors for the communications intensive tasks, interfaces 1968 allow the master microprocessor 1962 to efficiently perform such routing computations, network diagnostics, security functions of other functional classes.

[0167] 接口1968 —般是作为接口卡(有时称为“线路卡”)提供的。 [0167] Interface 1968 - generally as interface cards (sometimes referred to as "line cards") offer. 一般来说,接口1968控制网络上数据分组的发送和接收,并且有时支持与网络设备1960 —起使用的其他外围设备。 Generally, interfaces 1968 control the sending and receiving of data packets over the network and sometimes support a network device 1960-- from other peripherals used. 可以提供的接口包括光纤(“FC”)接口、以太网接口、帧中继接口、线缆接口、DSL接口、令牌环接口,等等。 Interface may include an optical fiber ( "FC") interfaces, Ethernet interfaces, frame relay interfaces, cable interfaces, DSL interfaces, token ring interfaces, and the like. 此外,可以提供各种甚高速接口,例如以太网接口、吉比特以太网接口、ATM 接口、HSSI 接口、POS 接口、FDDI 接口、ASI 接口、DHEI 接口,等等。 In addition, various very high-speed interfaces may be provided, such as Ethernet interfaces, Gigabit Ethernet interfaces, ATM interfaces, HSSI interfaces, POS interfaces, FDDI interfaces, the ASI interfaces, DHEI interfaces, and the like. [0168] 当在适当软件或固件的控制之下动作时,在本发明的一些实现方式中,CPU 1962可负责实现与所需网络设备的功能相关联的特定功能。 [0168] When acting under the control of appropriate software or firmware, in some implementations of the present invention, CPU 1962 may be responsible for implementing specific functions associated with the desired function of the network device. 根据一些实施例,CPU 1962在包括操作系统(例如Linux、VxWorks等等)和任何适当的应用软件在内的软件的控制之下实现所有这些功能。 According to some embodiments, CPU 1962 to achieve all these functions under the control comprising an operating system (e.g. Linux, VxWorks, etc.) and any appropriate applications software, including software.

[0169] CPU 1962可以包括一个或多个处理器1963,例如来自Motorola微处理器家族或MIPS微处理器家族的处理器。 [0169] CPU 1962 may include one or more processors 1963, for example from the Motorola family of microprocessors or the MIPS family of microprocessors processor. 在另一实施例中,处理器863是特别设计的用于控制网络设备1960的操作的硬件。 In another embodiment, processor 863 is specially designed hardware for controlling the operations of network device 1960. 在特定实施例中,存储器1961 (例如非易失性RAM和/或ROM)也形成CPU 1962的一部分。 In a particular embodiment, the memory 1961 (e.g., non-volatile RAM and / or ROM) also forms part of the CPU 1962. 但是,存在许多不同的存储器耦合到系统的方式。 However, there are many different memory coupled to the system mode. 存储器块1961可以用于多种目的,例如,缓存和/或存储数据,编程指令等等。 Memory block 1961 may be used for various purposes, for example, caching and / or storing data, programming instructions and the like.

[0170] 不论网络设备的配置如何,它都可以采用被配置为存储数据、用于通用通用网络操作的程序指令和/或与这里描述的技术的功能相关的其他信息的一个或多个存储器或存储器模块(例如存储器块1965)。 [0170] Regardless of network device's configuration, it may employ be configured to store data, program instructions for the general-purpose network operations in general and / or technical functions described herein relating to a plurality memories or other information or a memory module (e.g., memory block 1965). 程序指令例如可以控制操作系统和/或一个或多个应用的操作。 The program instructions may control the operation of the operating system and / or one or more applications.

[0171] 由于这种信息和程序指令可以用来实现这里描述的系统/方法,因此本发明涉及包括用于执行这里描述的各种操作的程序指令、状态信息等的机器可读介质。 [0171] Because such information and program instructions may be used to implement the systems / methods described, the present invention relates to performing various operations described herein include program instructions, state information, etc. of a machine-readable medium. 机器可读介质的示例包括但不限于:磁介质,例如硬盘、软盘和磁带;光介质,例如CD-ROM盘;磁光介质;以及特别配置为存储和执行程序指令的硬件,例如只读存储器器件(ROM)和随机访问存储器(RAM)。 Machine-readable media include, but are not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media; and hardware specially configured to store and perform program instructions, such as read only memory means (ROM) and random access memory (RAM). 本发明也可以实现在在诸如无线电波、光线路、电线路之类的适当介质上传播的载波中。 The present invention may also be implemented in a carrier wave propagating on a suitable medium such as radio waves, optical lines, electric lines of class. 程序指令的示例既包括机器代码,例如由编译器产生的机器代码,也包括包含可以由计算机利用解释器执行的更高级别代码的文件。 Examples of program instructions include both machine code, such as machine code generated by a compiler, but also can include files executed by the computer using an interpreter higher level code.

[0172] 虽然图19所示的系统示出了本发明的一个特定网络设备,但是它决非其上可实现本发明的唯一的一种网络设备。 [0172] Although the system shown in FIG. 19 illustrates one specific network device of the present invention, but it is by no means may be implemented on a single network device according to the present invention. 例如,经常使用具有处理通信以及路由计算等的单个处理器的体系结构。 For example, the architecture is often used with a single processor, and communication calculation processing route. 此外,其他类型的接口和介质也可与网络设备一起使用。 In addition, other types of interfaces and media may also be used with the network device. 接口/线路卡之间的通信路径可以是基于总线的(如图19所示)或者基于交换结构的(例如纵模交换机) Communication path between interfaces / line cards may be based on (FIG. 19) or switch fabric based (e.g., longitudinal mode switch) bus

[0173] 虽然已经参考特定实施例详细地示出和描述了本发明,但是本领域的技术人员将会理解,在不脱离本发明的精神和范围的情况下,可以对所公开的实施例作出形式和细节上的改变。 [0173] Although specific embodiments have been illustrated with reference to embodiments shown and described in detail the present invention, those skilled in the art will appreciate that, without departing from the spirit and scope of the present invention, may be made to the disclosed embodiments changes in form and detail. 例如,本发明的一些实现方式允许VL从丢弃VL变为推迟丢弃或无丢弃VL。 For example, some implementations of the present invention allows to discard from the VL VL becomes delayed or discarded without discard VL. 从而,这里描述的示例并不想要限制本发明。 Thus, the examples described herein are not intended to limit the present invention. 因此,希望所附权利要求书被解释为包括落在本发明的真实精神和范围之内的所有变化、等同、改变和修改。 Therefore, it intended that the appended claims be construed to include all changes which fall within the true spirit of the invention and scope of the equivalents, changes and modifications.

Claims (22)

1. 一种用于在单个网络设备中处理多于一类网络流量的方法,该方法包括: 将所述网络设备的缓冲器划分成第一缓冲器空间和第二缓冲器空间,所述第一缓冲器空间用于存储在所述网络设备的一物理链路的第一虚拟通道上接收的帧,所述第二缓冲器空间用于存储在所述网络设备的该物理链路的第二虚拟通道上接收的帧; 将多个帧接收到所述网络设备的该物理链路中,其中每个帧分别地基于该帧的头部中包含的指示该帧所属于的虚拟通道的信息而指示第一虚拟通道或第二虚拟通道;以及对于每个接收帧,基于该接收帧是指定所述第一虚拟通道还是指定所述第二虚拟通道而针对该接收帧分别地应用第一组规则或第二组规则,其中所述第一组规则分别地基于所述第一缓冲器空间是否已经被充满预定量而使得该接收帧被丢弃或被存储在所述第一缓冲器空 CLAIMS 1. A network device for a single type of network traffic in a method of treatment more than, the method comprising: a buffer of the network device into a first space and a second buffer space buffer, said first a buffer space for storing frames received on a first virtual channel link to the physical network device, the second buffer of the space for a second physical link in the network storage device frames received on the virtual channel; receiving the plurality of frames of said physical link in a network device, wherein the information indicating for each frame based on the frame header of the virtual channel included in the frame belongs and indicates the first virtual channel or virtual channel a second; and for each received frame, a first set of rules are applied based on the received frame is the first virtual channel specified or designated for the second virtual channel and the receiver frame or the second set of rules, wherein each of said first set of rules based on whether the first buffer space has been filled with a predetermined amount such that the received frame is discarded or stored in the first buffer is empty 中,其中所述第二组规则禁止响应于延迟而丢弃该接收帧并且使得该接收帧被存储在所述第二缓冲器空间中。 In which the second set of rules in response to the delay prohibits discards the received frame and such that the received frame is stored in the second buffer space.
2.如权利要求I所述的方法,还包括针对每个虚拟通道指派保证最小量缓冲器空间的步骤。 The method of claim I as claimed in claim 2, further comprising the step of assigning, for each virtual channel to ensure that the minimum amount of buffer space.
3.如权利要求I所述的方法,其中所述接收帧包括指定所述第一虚拟通道的以太网帧和指定所述第二虚拟通道的光纤信道帧。 The method of claim I as claimed in claim 3, wherein said receiving frame comprises a first fiber channel frame specifying the virtual channel specified in the Ethernet frame and the second virtual channel.
4.如权利要求I所述的方法,其中所述第一组规则使得显式拥塞通知响应于所述接收帧的延迟被从所述网络设备发送。 4. The method of claim I, wherein said first set of rules such that explicit congestion notification in response to the delay of the received frame is transmitted from the network device.
5.如权利要求I所述的方法,其中所述第二组规则使得显式拥塞通知响应于所述接收帧的延迟被从所述网络设备发送。 5. The method of claim I, wherein said second set of rules such that explicit congestion notification in response to the delay of the received frame is transmitted from the network device.
6.如权利要求I所述的方法,其中所述划分步骤包括根据从由以下因素构成的因素群组中选择出的一个或多个因素划分缓冲器:缓冲器占用率、一天中的时间、流量负载、拥塞、保证最小带宽分配、已知的要求较大带宽的任务和最大带宽分配。 6. The method of claim I, wherein said dividing step comprises dividing factor buffer in accordance with one or more elements selected from the group consisting of the following factors: buffer occupancy, time of day, traffic load, congestion, guaranteed minimum bandwidth allocation, known require larger bandwidth and maximum bandwidth allocation task.
7.如权利要求I所述的方法,其中所述第二组规则使得帧被丢弃以避免死锁。 7. The method of claim I, wherein said second set of rules so that the frame is discarded to avoid deadlocks.
8.如权利要求I所述的方法,其中所述第一组规则响应于延迟应用概率性丢弃函数。 8. The method of claim I, wherein said first set of rules in response to the stochastic delays of the drop function.
9.如权利要求I所述的方法,其中所述第二组规则包括为所述第二帧实现缓冲器到缓冲器信用计入方案。 9. The method of claim I, wherein said second set of rules including the second frame buffer to the buffer implemented credit recorded program.
10.如权利要求I所述的方法,还包括将所述第一帧和所述第二帧存储在虚拟输出队列中的步骤,其中每个虚拟输出队列与一个目的地端口/虚拟通道对相关联。 10. The method of claim I, further comprising the step of the first frame and the second frame is stored in the virtual output queue, wherein each virtual output queue and a destination port / virtual channel related Union.
11.如权利要求I所述的方法,还包括响应于每虚拟通道缓冲器占用率和分组年龄中的至少一个执行缓冲器管理的步骤,其中分组年龄是分组进入缓冲器时的时刻与当前时刻的差值。 11. The method of claim I, further comprising the step of in response to each execution of at least one virtual channel buffer management and packet buffer occupancy of age, age of packets which is the time when the packet into the buffer and the current time difference.
12.如权利要求4所述的方法,其中所述显式拥塞通知被发送到源设备或边缘设备之一。 12. The method according to claim 4, wherein the explicit congestion notification is sent to one of a source device or edge device.
13.如权利要求4所述的方法,其中所述显式拥塞通知是经由数据帧或控制帧之一发送的。 13. The method according to claim 4, wherein the explicit congestion notification is sent via one of a data frame or a control frame.
14.如权利要求9所述的方法,其中所述缓冲器到缓冲器信用计入方案包括根据帧大小计入信用。 14. The method according to claim 9, wherein said buffer comprises a buffer credits recorded programs included in credit with the frame size.
15.如权利要求9所述的方法,其中缓冲器到缓冲器信用由仲裁器管理。 15. The method according to claim 9, wherein the buffer to the buffer credit management by the arbiter.
16.如权利要求9所述的方法,其中所述第二组规则包括在网络设备内实现所述缓冲器到缓冲器信用计入方案,并且在所述第二虚拟通道上使用暂停帧来进行流控制。 16. The method according to claim 9, wherein said second set of rules implemented within the network device comprises a buffer to buffer the recorded credit scheme, and using a pause frame on the second virtual channel for flow control.
17.如权利要求10所述的方法,还包括以下步骤:响应于虚拟输出队列长度、每虚拟通道缓冲器占用率、整体缓冲器占用率和分组年龄中的至少一个执行缓冲器管理,其中分组年龄是分组进入缓冲器时的时刻与当前时刻的差值。 17. The method according to claim 10, further comprising the step of: in response to a virtual output queue length, each virtual channel buffer occupancy, buffer occupancy and the overall age of packets at least one execution buffer management, wherein the packet Age is the difference between the time when the packet enters the buffer at the current moment.
18. 一种网络设备,包括: 用于将所述网络设备的缓冲器划分成第一缓冲器空间和第二缓冲器空间的装置,所述第一缓冲器空间用于存储在所述网络设备的一物理链路的第一虚拟通道上接收的帧,所述第二缓冲器空间用于存储在所述网络设备的该物理链路的第二虚拟通道上接收的帧; 用于将多个帧接收到所述网络设备的该物理链路中的装置,其中每个帧分别地基于该帧的头部中包含的指示该帧所属于的虚拟通道的信息而指示第一虚拟通道或第二虚拟通道;以及用于对于每个接收帧,基于该接收帧是指定所述第一虚拟通道还是指定所述第二虚拟通道来针对该接收帧分别地应用第一组规则或第二组规则的装置,其中所述第一组规则分别地基于所述第一缓冲器空间是否已经被充满预定量而使得该接收帧被丢弃或被存储在所述第一缓冲器空间中,其中所述 18. A network device, comprising: a buffer for the network device means into a first space and a second buffer space in the buffer, the first buffer space for storing the network device received on the first virtual channel link to a physical frame, the second buffer space for storing the received frames over the second virtual channel of said physical link of the network device; a plurality of the received frame means a physical link in the network device, wherein for each frame based on the frame information of the virtual channel indicated belongs header of the frame contains the indication of the first or second virtual channel virtual channel; and means for each received frame, a first set of rules are applied based on the received frame is the first virtual channel specified or designated for the second virtual channel to the receive frame or the second set of rules, means, wherein each of said first set of rules based on whether the first buffer space has been filled with a predetermined amount such that the received frame is discarded or stored in the first buffer space, wherein the 二组规则禁止响应于延迟而丢弃该接收帧并且使得该接收帧被存储在所述第二缓冲器空间中。 Two groups in response to delay rules prohibit discards the received frame and such that the received frame is stored in the second buffer space.
19. 一种网络设备,包括: 被配置用于在多条物理链路上接收帧的多个端口; 多个线路卡,每个线路卡与所述多个端口之一通信,并且被配置为执行以下步骤: 从所述多个端口的所述物理链路中的第一物理链路接收帧,其中每个帧分别地基于该帧的头部中包含的指示该帧所属于的虚拟通道的信息而指示第一虚拟通道或第二虚拟通道; 将接收帧识别为在该第一物理链路的第一虚拟通道上接收的第一帧或在该第一物理链路的第二虚拟通道上接收的第二帧; 将所述线路卡中的缓冲器划分成用于存储所识别的第一帧的第一缓冲器空间和用于存储所识别的第二帧的第二缓冲器空间;以及对于每个接收帧,基于该接收帧是指定所述第一虚拟通道还是指定所述第二虚拟通道来针对该接收帧分别地应用第一组规则或第二组规则,其中所述第一组规则分别地基于所 19. A network device, comprising: a plurality of ports configured to receive frames on a plurality of physical links; a plurality of line cards, each one of said plurality of line cards communicate with the port, and is configured to performing the steps of: receiving a frame from the plurality of the physical ports in a first link physical links, wherein for each frame based on the virtual channel indicated by the frame belongs to the head of the frame contained in information indicative of a first virtual channel or the second virtual channel; a first frame or a second virtual channel of the first physical link to identify the received frame as received on the first virtual channel link to the first physical receiving a second frame; the line card into a buffer for storing a first identification and a buffer space for storing the identified first frame a second frame a second buffer space; and for each received frame, each received frame based on the application of the first virtual channel is specified or designated for the second virtual channel to a first set of rules to the received frame or the second set of rules, wherein said first set rules are based on the 第一缓冲器空间是否已经被充满预定量而使得该接收帧被丢弃或被存储在所述第一缓冲器空间中,其中所述第二组规则禁止响应于延迟而丢弃该接收帧并且使得该接收帧被存储在所述第二缓冲器空间中。 If the first buffer space has been filled with a predetermined amount such that the received frame is discarded or stored in the first buffer space, wherein the second set of rules in response to the delay prohibits discards the received frame and such that the received frame is stored in the second buffer space.
20. 一种用于在单个网络设备上承载多于一类流量的方法,该方法包括: 分别地基于接收帧的头部中包含的指示所述接收帧所属于的虚拟通道的信息而将所述接收帧识别为在第一虚拟通道上接收的第一帧或在第二虚拟通道上接收的第二帧; 动态地将所述网络设备的缓冲器划分成第一缓冲器空间和第二缓冲器空间,所述第一缓冲器空间具有用于存储所述第一帧的第一虚拟输出队列VOQ,所述第二缓冲器空间具有用于存储所述第二帧的第二虚拟输出队列VOQ,其中所述缓冲器是根据从由以下因素构成的因素群组中选择出的一个或多个因素来动态划分的:整体缓冲器占用率、每虚拟通道缓冲器占用率、一天中的时间、流量负载、拥塞、保证最小带宽分配、已知的要求较大带宽的任务和最大带宽分配;以及对于每个接收帧,基于该接收帧是已经被识别为在所述 20. A method for carrying more than one class of traffic on a single network device, the method comprising: receiving each information frame belongs virtual channel based on the received indication of the head of the frame and comprising The identifying said received frame as the first or second frame received on a second virtual channel received on a first virtual channel; dynamically buffer of the network device into a first buffer and a second buffer space space, the first space having a first buffer VOQ virtual output queue for storing the first frame, the second buffer having a second spatial VOQ virtual output queue for storing said second frame wherein said buffer is in accordance with one or more factors selected from the group consisting of elements factors to dynamic partitioning: the whole buffer occupancy per virtual channel buffer occupancy, time of day, traffic load, congestion, guaranteed minimum bandwidth allocation, known task requires a larger bandwidth and a maximum bandwidth allocation; and for each received frame, the received frame is based has been identified as the 一虚拟通道上还是已经被识别为在所述第二虚拟通道上来针对该接收帧分别地应用第一组规则或第二组规则,其中所述第一组规则分别地基于所述第一缓冲器空间是否已经被充满预定量而使得该接收帧被丢弃或被存储在所述第一VOQ中,其中所述第二组规则禁止响应于延迟而丢弃该接收帧并且使得该接收帧被存储在所述第二VOQ中。 Or on a virtual channel has been identified up to the second virtual channel applications, respectively for the received frame in the first set of rules or the second set of rules, wherein each of said first set of rules based on the first buffer whether the space has been filled with a predetermined amount such that the received frame is discarded or stored in the VOQ in the first, wherein the second set of rules in response to the delay prohibits discards the received frame and such that the received frame is stored in the in said second VOQ.
21. —种网络设备,包括: 用于分别地基于接收帧的头部中包含的指示所述接收帧所属于的虚拟通道的信息而将所述接收帧识别为在第一虚拟通道上接收的第一帧或在第二虚拟通道上接收的第二帧的装置; 用于动态地将所述网络设备的缓冲器划分成第一缓冲器空间和第二缓冲器空间的装置,所述第一缓冲器空间具有用于存储所述第一帧的第一虚拟输出队列V0Q,所述第二缓冲器空间具有用于存储所述第二帧的第二虚拟输出队列V0Q,其中所述缓冲器是根据从由以下因素构成的因素群组中选择出的一个或多个因素来动态划分的:整体缓冲器占用率、每虚拟通道缓冲器占用率、一天中的时间、流量负载、拥塞、保证最小带宽分配、已知的要求较大带宽的任务和最大带宽分配;以及用于对于每个接收帧,基于该接收帧是已经被识别为在所述第一虚拟通道上还 21. - kind of network device, comprising: means for receiving information frames belonging to the virtual channel based on the received indication of the head of the frame, respectively, and contained the received frame is identified as received on the first virtual channel receiving a second frame means in a first frame or a second virtual channel; means for dynamically buffer of the network device into a first device and a second buffer space in the buffer space, the first buffer space having a first V0Q virtual output queue for storing the first frame, the second buffer having a second spatial V0Q virtual output queue for storing said second frame, wherein said buffer is a according to one or more factors selected from the group consisting of elements factors to dynamic partitioning: the whole buffer occupancy per virtual channel buffer occupancy, time of day, traffic load, congestion, guaranteed minimum bandwidth allocation, known task requires a larger bandwidth and a maximum bandwidth allocation; and means for each received frame, based on the received frame that has been further identified as being on the first virtual channel 已经被识别为在所述第二虚拟通道上来针对该接收帧分别地应用第一组规则或第二组规则的装置,其中所述第一组规则基于所述第一缓冲器空间是否已经被充满预定量而分别地使得该接收帧被丢弃或被存储在所述第一VOQ中,其中所述第二组规则禁止响应于延迟而丢弃该接收帧并且使得该接收帧被存储在所述第二VOQ中。 It has been identified as a means for applying the received frame, respectively first set of rules or the second set of rules in the second virtual channel up, wherein the first set of rules based on the first buffer space has already been filled a predetermined amount, respectively, such that the received frame is discarded or stored in the VOQ in the first, wherein the second set of rules in response to the delay prohibits discards the received frame and such that the received frame is stored in the second VOQ in.
22. —种网络设备,包括: 被配置用于在多条物理链路上接收帧的多个端口; 多个线路卡,每个线路卡与所述多个端口之一通信,并且被配置为执行以下步骤: 分别地基于接收帧的头部中包含的指示所述接收帧所属于的虚拟通道的信息而将所述接收帧识别为在第一虚拟通道上接收的第一帧或在第二虚拟通道上接收的第二帧; 动态地将所述网络设备的缓冲器划分成第一缓冲器空间和第二缓冲器空间,所述第一缓冲器空间具有用于存储所述第一帧的第一虚拟输出队列V0Q,所述第二缓冲器空间具有用于存储所述第二帧的第二虚拟输出队列V0Q,其中所述缓冲器是根据从由以下因素构成的因素群组中选择出的一个或多个因素来动态划分的:整体缓冲器占用率、每虚拟通道缓冲器占用率、一天中的时间、流量负载、拥塞、保证最小带宽分配、已知的要求较大 22. - kind of network device, comprising: a plurality of ports configured to receive frames on a plurality of physical links; a plurality of line cards, each one of said plurality of line cards communicate with the port, and is configured to perform the following steps: indicating the head of each received frame based on the information contained in the received virtual channel frame belongs and the received frame is identified as received on the first virtual channel on the first frame or the second receiving a second frame on the virtual channel; dynamically buffer of the network device into a first space and a second buffer buffer space, the space having a first buffer for storing the first frame V0Q first virtual output queue, the second buffer having a second spatial V0Q virtual output queue for storing said second frame, wherein said buffer is selected according to factors from the group consisting of factors one or more factors of dynamic partitioning: the entire buffer occupancy per virtual channel buffer occupancy, time of day, traffic load, congestion, guaranteed minimum bandwidth allocation, known to require a larger 宽的任务和最大带宽分配;以及对于每个接收帧,基于该接收帧是已经被识别为在所述第一虚拟通道上还是已经被识别为在所述第二虚拟通道上来针对该接收帧分别地应用第一组规则或第二组规则,其中所述第一组规则分别地基于所述第一缓冲器空间是否已经被充满预定量而使得该接收帧被丢弃或被存储在所述第一VOQ中,其中所述第二组规则禁止响应于延迟而丢弃该接收帧并且使得该接收帧被存储在所述第二VOQ中。 Wide and maximum bandwidth allocation task; and for each received frame, based on the received frame that has been identified as being in or on the first virtual channel has been identified up to the second virtual channel for the received frame, respectively, applying a first set of rules or the second set of rules, wherein each of said first set of rules based on whether the first buffer space has been filled with a predetermined amount such that the received frame is discarded or stored in the first in VOQ, wherein the second set of rules in response to the delay prohibits discards the received frame and the receiver such that the second frame is stored in the VOQ.
CN 200580034646 2004-10-22 2005-10-13 Network device architecture for consolidating input/output and reducing latency CN101040489B (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US62139604P true 2004-10-22 2004-10-22
US60/621,396 2004-10-22
US11/094,877 US7830793B2 (en) 2004-10-22 2005-03-30 Network device architecture for consolidating input/output and reducing latency
US11/094,877 2005-03-30
PCT/US2005/037239 WO2006057730A2 (en) 2004-10-22 2005-10-13 Network device architecture for consolidating input/output and reducing latency

Publications (2)

Publication Number Publication Date
CN101040489A CN101040489A (en) 2007-09-19
CN101040489B true CN101040489B (en) 2012-12-05

Family

ID=38809008

Family Applications (4)

Application Number Title Priority Date Filing Date
CN 200580034646 CN101040489B (en) 2004-10-22 2005-10-13 Network device architecture for consolidating input/output and reducing latency
CN 200580034647 CN101040471B (en) 2004-10-22 2005-10-14 Ethernet extension for the data center
CN 200580035946 CN100555969C (en) 2004-10-22 2005-10-17 Fibre channel over ethernet
CN 200580034955 CN101129027B (en) 2004-10-22 2005-10-18 Forwarding Forwarding reduction and multi-path network

Family Applications After (3)

Application Number Title Priority Date Filing Date
CN 200580034647 CN101040471B (en) 2004-10-22 2005-10-14 Ethernet extension for the data center
CN 200580035946 CN100555969C (en) 2004-10-22 2005-10-17 Fibre channel over ethernet
CN 200580034955 CN101129027B (en) 2004-10-22 2005-10-18 Forwarding Forwarding reduction and multi-path network

Country Status (1)

Country Link
CN (4) CN101040489B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7821939B2 (en) 2007-09-26 2010-10-26 International Business Machines Corporation Method, system, and computer program product for adaptive congestion control on virtual lanes for data center ethernet architecture
CN101184098B (en) * 2007-12-11 2011-11-02 华为技术有限公司 Data transmission method and transmission apparatus
US8355345B2 (en) * 2009-08-04 2013-01-15 International Business Machines Corporation Apparatus, system, and method for establishing point to point connections in FCOE
CN101656721B (en) 2009-08-27 2012-08-08 杭州华三通信技术有限公司 Method for controlling virtual link discovering and Ethernet bearing fiber channel protocol system
CN102045248B (en) * 2009-10-19 2012-05-23 杭州华三通信技术有限公司 Virtual link discovery control method and Ethernet fiber channel protocol system
CN102577331B (en) 2010-05-28 2015-08-05 华为技术有限公司 Virtual Layer 2 mechanism and make it scalable
EP2569905A2 (en) 2010-06-29 2013-03-20 Huawei Technologies Co. Ltd. Layer two over multiple sites
CN102377661A (en) * 2010-08-24 2012-03-14 鸿富锦精密工业(深圳)有限公司 Blade server and method for building shortest blade transmission path in blade server
US8917722B2 (en) * 2011-06-02 2014-12-23 International Business Machines Corporation Fibre channel forwarder fabric login sequence
CN102347955A (en) * 2011-11-01 2012-02-08 杭州依赛通信有限公司 Reliable data transmission protocol based on virtual channels
US20140153443A1 (en) * 2012-11-30 2014-06-05 International Business Machines Corporation Per-Address Spanning Tree Networks
US9160678B2 (en) 2013-04-15 2015-10-13 International Business Machines Corporation Flow control credits for priority in lossless ethernet
US9479457B2 (en) 2014-03-31 2016-10-25 Juniper Networks, Inc. High-performance, scalable and drop-free data center switch fabric
US9703743B2 (en) * 2014-03-31 2017-07-11 Juniper Networks, Inc. PCIe-based host network accelerators (HNAS) for data center overlay network
CN104301229B (en) * 2014-09-26 2016-05-04 深圳市腾讯计算机系统有限公司 Method of packet forwarding, the routing table generating method and apparatus
CN104767606B (en) * 2015-03-19 2018-10-19 华为技术有限公司 Data synchronization unit and method
US10243840B2 (en) 2017-03-01 2019-03-26 Juniper Networks, Inc. Network interface card switching for virtual networks

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020141427A1 (en) 2001-03-29 2002-10-03 Mcalpine Gary L. Method and apparatus for a traffic optimizing multi-stage switch fabric network
US20030037127A1 (en) 2001-02-13 2003-02-20 Confluence Networks, Inc. Silicon-based storage virtualization
US20030061379A1 (en) 2001-09-27 2003-03-27 International Business Machines Corporation End node partitioning using virtualization
US20030169690A1 (en) 2002-03-05 2003-09-11 James A. Mott System and method for separating communication traffic
US20030195983A1 (en) 1999-05-24 2003-10-16 Krause Michael R. Network congestion management using aggressive timers
US20040100980A1 (en) 2002-11-26 2004-05-27 Jacobs Mick R. Apparatus and method for distributing buffer status information in a switching fabric
US20040120332A1 (en) 2002-12-24 2004-06-24 Ariel Hendel System and method for sharing a resource among multiple queues

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5920566A (en) * 1997-06-30 1999-07-06 Sun Microsystems, Inc. Routing in a multi-layer distributed network element
US5974467A (en) * 1997-08-29 1999-10-26 Extreme Networks Protocol for communicating data between packet forwarding devices via an intermediate network interconnect device
KR100309748B1 (en) 1997-12-26 2001-09-11 윤종용 Bidirectional trunk amplifier for cable hybrid fiber coaxial network by using upstream signals and cable modem of hybrid fiber coaxial network
US6684031B1 (en) 1998-06-18 2004-01-27 Lucent Technologies Inc. Ethernet fiber access communications system
US6556541B1 (en) * 1999-01-11 2003-04-29 Hewlett-Packard Development Company, L.P. MAC address learning and propagation in load balancing switch protocols
CN1104800C (en) * 1999-10-27 2003-04-02 华为技术有限公司 Dual-table controlled data frame forwarding method
US7782784B2 (en) 2003-01-10 2010-08-24 Cisco Technology, Inc. Port analyzer adapter

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030195983A1 (en) 1999-05-24 2003-10-16 Krause Michael R. Network congestion management using aggressive timers
US20030037127A1 (en) 2001-02-13 2003-02-20 Confluence Networks, Inc. Silicon-based storage virtualization
US20020141427A1 (en) 2001-03-29 2002-10-03 Mcalpine Gary L. Method and apparatus for a traffic optimizing multi-stage switch fabric network
US20030061379A1 (en) 2001-09-27 2003-03-27 International Business Machines Corporation End node partitioning using virtualization
US20030169690A1 (en) 2002-03-05 2003-09-11 James A. Mott System and method for separating communication traffic
US20040100980A1 (en) 2002-11-26 2004-05-27 Jacobs Mick R. Apparatus and method for distributing buffer status information in a switching fabric
US20040120332A1 (en) 2002-12-24 2004-06-24 Ariel Hendel System and method for sharing a resource among multiple queues

Also Published As

Publication number Publication date
CN100555969C (en) 2009-10-28
CN101129027A (en) 2008-02-20
CN101129027B (en) 2011-09-14
CN101040471A (en) 2007-09-19
CN101040489A (en) 2007-09-19
CN101044717A (en) 2007-09-26
CN101040471B (en) 2012-01-11

Similar Documents

Publication Publication Date Title
US7245627B2 (en) Sharing a network interface card among multiple hosts
CN1633786B (en) A method and apparatus for priority based flow control in an Ethernet architecture
US7233570B2 (en) Long distance repeater for digital information
US6563790B1 (en) Apparatus and method for modifying a limit of a retry counter in a network switch port in response to exerting backpressure
US8774215B2 (en) Fibre channel over Ethernet
CA2469803C (en) Methods and apparatus for network congestion control
JP2528626B2 (en) De - data communication method and apparatus
US7145914B2 (en) System and method for controlling data paths of a network processor subsystem
US6973031B1 (en) Method and apparatus for preserving frame ordering across aggregated links supporting a plurality of transmission rates
US6577600B1 (en) Cost calculation in load balancing switch protocols
US6778495B1 (en) Combining multilink and IP per-destination load balancing over a multilink bundle
CN100405344C (en) Apparatus and method for distributing buffer status information in a switching fabric
US7200144B2 (en) Router and methods using network addresses for virtualization
US7209445B2 (en) Method and system for extending the reach of a data communication channel using a flow control interception device
US7215680B2 (en) Method and apparatus for scheduling packet flow on a fibre channel arbitrated loop
US6456597B1 (en) Discovery of unknown MAC addresses using load balancing switch protocols
US6907042B1 (en) Packet processing device
US6493318B1 (en) Cost propagation switch protocols
EP1142213B1 (en) Dynamic assignment of traffic classes to a priority queue in a packet forwarding device
US7813278B1 (en) Systems and methods for selectively performing explicit congestion notification
US20030016624A1 (en) Path recovery on failure in load balancing switch protocols
US20040258062A1 (en) Method and device for the classification and redirection of data packets in a heterogeneous network
US20080008202A1 (en) Router with routing processors and methods for virtualization
JP3829165B2 (en) Strengthening of 802.3 media access control and associated signaling scheme for full-duplex Ethernet
US20080002663A1 (en) Virtual network interface card loopback fastpath

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C14 Granted