CN101044717A - Fibre channel over ethernet - Google Patents

Fibre channel over ethernet Download PDF


Publication number
CN101044717A CN 200580035946 CN200580035946A CN101044717A CN 101044717 A CN101044717 A CN 101044717A CN 200580035946 CN200580035946 CN 200580035946 CN 200580035946 A CN200580035946 A CN 200580035946A CN 101044717 A CN101044717 A CN 101044717A
Prior art keywords
Prior art date
Application number
CN 200580035946
Other languages
Chinese (zh)
Other versions
CN100555969C (en
Original Assignee
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US62139604P priority Critical
Priority to US60/621,396 priority
Priority to US11/078,992 priority
Application filed by 思科技术公司 filed Critical 思科技术公司
Publication of CN101044717A publication Critical patent/CN101044717A/en
Application granted granted Critical
Publication of CN100555969C publication Critical patent/CN100555969C/en



The present invention provides methods and devices for implementing a Low Latency Ethernet (''LLE'') solution, also referred to herein as a Data Center Ethernet (''DCE'') solution, which simplifies the connectivity of data centers and provides a high bandwidth, low latency network for carrying Ethernet and storage traffic. Some aspects of the invention involve transforming FC frames into a format suitable for transport on an Ethernet. Some preferred implementations of the invention implement multiple virtual lanes (''VLs'') in a single physical connection of a data center or similar network. Some VLs are ''drop'' VLs, with Ethernet-like behavior, and others are ''no-drop'' lanes with FC-like behavior. Some preferred implementations of the invention provide guaranteed bandwidth based on credits and VL. Active buffer management allows for both high reliability and low latency while using small frame buffers. Preferably, the rules for active buffer management are different for drop and no drop VLs.


以太网上的光纤信道 Fiber Channel over Ethernet

相关申请的交叉引用本申请要求2004年10月22日递交的题为“FC Over Ethernet”的美国临时申请No.60/621,396(律师案卷号No.CISCP404P)和2005年3月30日递交的题为“Fibre Channel Over Ethernet”的美国专利申请No.11/078,992(律师案卷号No.CISCP409)的优先权,这里通过引用将这些申请的全部内容结合进来。 Cross-Reference to Related Applications This application claims filed October 22 entitled "FC Over Ethernet" US Provisional Application No.60 / 621,396 (Attorney Docket No. No.CISCP404P) and March 30, 2005 submitted questions as "Fibre Channel Over Ethernet" US Patent application No.11 / 078,992 (attorney Docket No. No.CISCP409) filed herein by reference the entire contents of these applications are incorporated.

背景技术 Background technique

图1示出了要求高可用性和网络存储的企业(例如金融机构)可能使用的一般类型的数据中心的简化版本。 FIG 1 shows a business (e.g., financial institutions) a simplified version of the general type of data center may be used for high availability and network storage. 数据中心100包括具有冗余连接以获得高可用性的冗余以太网交换机。 Data center 100 includes a redundant connection for high availability of redundant Ethernet switches. 数据中心100经由防火墙115经由网络105连接到客户端。 Data center 100 is connected to a client via a network 105 via the firewall 115. 网络105例如可以是企业内联网、DMZ和/或因特网。 Network 105 may be for example a corporate intranet, DMZ and / or the Internet. 以太网十分适合于客户端(例如远程客户端180和185)和数据中心之间的TCP/IP流量。 Ethernet is very suitable for the client (for example, the remote client 180 and 185) TCP / IP traffic between and data centers.

在数据中心105内,有许多网络设备。 105 in the data center, there are many network devices. 例如,许多服务器一般被布置在具有标准外形参数的机架(rack)上(例如一个“机架单元”可能宽19”并且厚约1.25”)。 For example, many servers are generally arranged on the machine frame (Rack) having a standard form factor (e.g., a "rack units" may be 19 "wide and a thickness of about 1.25"). “机架单元”或“U”是用于机架安装型设备的电子工业联盟(或者更常称之为“EIA”)标准计量单元。 "Rack units" or "U" league for rack-mounted electronic equipment industry (or more commonly referred to as "EIA") standard measurement unit. 近来,由于出现在许多种商业、工业和军事市场中的机架安装产品的激增,该术语变得更加流行。 Recently, due to appear in a surge of rack-mount products for commercial, industrial and military markets many, the term is becoming more popular. “机架单元”的高度等于1.75”。为了计算机架壳体的内部可用空间,只需要简单地用机架单元的总量乘以1.75”。 Height "rack units" is equal to 1.75. "To rack space available inside the computer case, simply by multiplying the total amount of 1.75 rack unit." 例如,44U机架壳体将具有77”的内部可用空间(44×1.75)。数据中心内的每个机架例如可具有约40个服务器。数据中心可具有数千个或甚至更多的服务器。最近,一些厂商已经发布了“刀片服务器(blade server)”,其允许甚至更高密度的服务器封装(大约每机架60至80个服务器)。 For example, 44U chassis 77 having "internal free space (44 × 1.75). Each rack in a data center may have, for example, about 40 server data center may have thousands or even more servers recently, some manufacturers have released "blade servers (blade server)", which allows even higher-density server package (about 60-80 per rack server).

但是,随着数据中心内网络设备的数目增长,连通性变得越来越复杂和昂贵。 However, as the number of network devices within the data center grows, the communication becomes more and more complex and expensive. 在最低限度上,数据中心105的服务器、交换机等等一般将经由以太网连接。 At a minimum, the data center 105 servers, switches and the like typically via an Ethernet connection. 为了获得高可用性,至少将有2个以太网连接,如图1所示。 For high availability, there will be at least two Ethernet connections, as shown in FIG.

此外,不希望服务器包括巨大的存储能力。 In addition, you do not want the server to include a huge storage capacity. 由于此原因和其他原因,企业网络包括与存储设备(例如存储阵列150)的连通性的情形变得越来越常见。 Due to this situation of communication and other reasons, the enterprise network includes a storage device (e.g. memory array 150) is becoming increasingly common. 历史上,存储流量是通过SCSI(小型计算机系统接口)和/或FC(光纤信道)来实现的。 Historically, storage traffic is realized by a SCSI (Small Computer System Interface), and / or FC (Fiber Channel).

在20世纪90年代中期,SCSI流量只能行进较短的距离。 In the mid-1990s, SCSI traffic can only travel a short distance. 当时最感兴趣的主题是如何使SCSI去往“盒外”。 How was the topic of most interest is the SCSI go "outside the box." 就像一直以来一样,希望速度能够更快。 As it has always been the same, hoping to speed faster. 当时,以太网正从10Mb/s发展到100Mb/s。 At that time, Ethernet is from 10Mb / s development to 100Mb / s. 一些人设想了未来达到1Gb/s的速度,但是许多人认为这已经接近了物理极限。 Some people imagine the future reach 1Gb / s speed, but many people think that this is close to the physical limits. 对于10Mb/s以太网,存在半双工和冲突的问题。 For the 10Mb / s Ethernet, there is a half-duplex and conflict. 以太网被认为是不太可靠的,这一部分是因为分组可能会丢失并且可能存在冲突。 Ethernet is considered to be less reliable, in part because the packet may be lost and there may be a conflict. (虽然术语本领域的技术人员通常使用的“分组”和“帧”的含义可能略有不同,但是在这里这两个术语可以互换使用)。 (Although the meaning of "packet" and "frame" is a term of art commonly used in the art may be slightly different, but here these terms are used interchangeably).

FC被认为是存储应用的有吸引力并且可靠的选择,这是因为根据FC协议,分组不会被故意丢弃,并且FC已经能够以1Gb/s运行。 FC is considered attractive storage applications and reliable option, because according to the FC protocol, is not intentionally discarded packets, and is able to run at FC 1Gb ​​/ s. 但是,在2004年间,以太网和FC都达到了10Gb/s的速度。 However, in 2004, Ethernet and FC have reached a speed of 10Gb s /. 此外,以太网已经发展到了全双工并且没有冲突的阶段。 In addition, Ethernet has evolved to the stage of full-duplex and there is no conflict. 因此,FC比起以太网来不再有速度优势。 Therefore, FC Ethernet to no more than the speed advantage. 但是,交换机中的拥塞可能导致以太网分组被丢弃,这对于存储流量来说是不合需要的特征。 However, the switch may cause congestion in the Ethernet packet is dropped, which is undesirable feature for storage traffic is.

在21世纪的最初几年,大量工作投入到了开发iSCSI中,以在TCP/IP网络上实现SCSI。 In the early years of the 21st century, a lot of work put into the development of iSCSI in order to achieve the SCSI over TCP / IP networks. 虽然这些努力取得了一些成功,但是iSCSI并没有变得十分流行:iSCSI占有存储网络市场的约1%-2%,与之相比,FC占有大约98%-99%。 While these efforts have had some success, but did not become very popular iSCSI: iSCSI represent about 1% -2% of the storage networking market, compared, FC occupies about 98% -99%.

一个原因在于iSCSI栈与FC栈相比有些复杂。 One reason is somewhat complicated compared to iSCSI stack with FC stack. 参考图7A,可以看出iSCSI栈700需要5个层:以太网层705、IP层710、TCP层715、iSCSI层720和SCSI层725。 Referring to Figure 7A, it can be seen iSCSI stack 700 requires five layers: Ethernet layer 705, IP layer 710, TCP layer 715, iSCSI layer 720 and SCSI layer 725. TCP层715是栈的必要部分,因为以太网层705可能丢失分组,而SCSI层725不能容忍分组丢失。 TCP layer 715 is a necessary part of the stack, layer 705 may be lost because of Ethernet packets, and SCSI layer 725 can not tolerate packet loss. TCP层715为SCSI层725提供了可靠的分组传送。 TCP layer 715 provides a reliable packet transfer layer 725 is SCSI. 但是,在1到10Gb/s的速度下,TCP层715是一个难以实现的协议。 However, at a rate of 1 to 10Gb / s is, TCP protocol layer 715 is an elusive. 相反,由于FC不丢失帧,因此不需要利用TCP层之类的层来补偿丢失的帧。 In contrast, since the FC frame is not lost, so no layer using the TCP layer or the like to compensate for the lost frame. 因此,如图7B所示,FC栈750更简单,只需要FC层755、FCP层760和SCSI层765。 Thus, as shown in FIG. 7B, FC stack 750 simpler, it requires only FC layer 755, FCP layer 760 and SCSI layer 765.

因此,FC协议通常被用于网络上的服务器和存储设备(例如存储阵列150)之间的通信。 Thus, FC protocol is typically used for communication between servers and storage devices (e.g., memory array 150) on the network. 因此,数据中心105包括FC交换机140和145,用于服务器110和存储阵列150之间的通信,在此示例中FC交换机140和145由Cisco Systems,Inc.提供。 Thus, the data center 105 includes FC switches 140 and 145, communications between the server 110 and storage array 150, in this example, FC switches 140 and 145 are provided by Cisco Systems, Inc..

1RU和刀片服务器非常流行,因为它们相对而言更低廉、强大、标准化,并且可以运行任何一种最流行的操作系统。 1RU and blade servers are very popular because they are relatively cheaper, powerful, standardized, and can run any one of the most popular operating system. 已经熟知近来年典型服务器的成本已经降低并且其性能水平已经提高。 It is well known the cost of a typical server has recently been reduced and the performance level has improved. 由于服务器相对较低的成本和在一个服务器上运行多于一类软件应用可能造成的潜在问题,每个服务器一般被专用于一个特定应用。 Due to the relatively low cost and servers running more than one class of potential problems may be caused in the software application on a server, each server is typically dedicated to a particular application. 在典型企业网络上运行的大量应用继续增加了网络中服务器的数目。 Large number of applications running on a typical enterprise networks continue to increase the number of network servers.

但是,由于维持与每个服务器的各种类型的连通性(例如以太网和FC连通性)较为复杂(其中为了获得高可用性每类连通性优选是冗余的),服务器的连通性成本变得高于服务器本身的成本。 However, due to the cost of maintaining communication connectivity with the various types of each server (e.g., FC and Ethernet connectivity) more complicated (wherein each class in order to obtain high availability of communication is preferably redundant), the server becomes higher than the cost of the server itself. 例如,服务器的单个FC接口的成本就可能与服务器本身一样高。 For example, the cost of a single FC interface server may as high as the server itself. 服务器与以太网的连接一般是经由网络接口卡(“NIC”)进行的,而它与FC网络的连接是利用主机总线适配器(“HBA”)进行的。 Ethernet connection to the server is generally carried out via a network interface card ( "NIC"), which is connected to the network using FC host bus adapter ( "HBA") performed.

FC网络和以太网网络中的设备的角色就网络流量来说有些不同,这主要是因为响应于TCP/IP网络中的拥塞,分组照例会被丢弃,而在FC网络中帧不会被故意丢弃。 FC Ethernet network and the role of the network devices on the network traffic is a bit different, mainly because in response to a TCP / IP network congestion, the packet is discarded as usual, and the frame in the FC network is not deliberately discarded . 因此,在这里有时FC将被称为“无丢弃”(nodrop)网络的一个示例,而以太网将会被称为“丢弃”(drop)网络的一种表现形式。 Thus, the FC are sometimes referred to as an example of the network will be "no discard" (nodrop), Ethernet and will be referred to a form of "discard" (drop) network. 当在TCP/IP网络上分组被丢弃时,系统将会迅速恢复,例如在几百微秒中恢复。 When a packet is dropped on the TCP / IP network, the system will recover rapidly, e.g. in the recovery of several hundred microseconds. 但是,FC网络的协议一般是基于帧不会被丢弃的假设的。 However, FC protocol of the network is generally based on the frame will not be discarded hypotheses. 因此,当在FC网络上帧被丢弃时,系统不会迅速恢复,并且SCSI将会花若干分钟来恢复。 Thus, when a frame is dropped on the FC network, the system does not recover quickly, and SCSI will take several minutes to recover.

目前,以太网交换机的端口可在丢弃分组之前将其缓冲达约100毫秒。 Currently, port Ethernet switch may discard the packet before it is buffered up to about 100 milliseconds. 由于实现了10Gb/s以太网,以太网交换机的每个端口将需要约100MB的RAM来将分组缓冲100毫秒。 By allowing for 10Gb / s Ethernet, each Ethernet switch port will require about 100MB RAM to the packet buffer 100 milliseconds. 这将会昂贵得令人望而生畏。 This will be expensive too daunting.

对于一些企业,希望“集群(cluster)”多于一个服务器,如图1中服务器S2和S3周围的虚线所示。 For some companies, it is desirable "cluster (Cluster)" more than one server, the server broken lines S2 and S3 as shown in Figure 1 around. 集群使得偶数个服务器被视为单个服务器。 Cluster makes an even number of servers are treated as a single server. 为了进行集群,希望执行远程直接存储器访问(“RDMA”),其中一个虚拟存储器空间(可能分散在许多物理存储器空间上)的内容可在没有CPU干预的情况下被拷贝到另一个虚拟存储器空间。 For clustering the content, it is desirable to perform remote direct memory access ( "RDMA"), wherein a virtual memory space (may be spread across a number of physical memory space) can be copied to another virtual memory space without intervention by the CPU. 应当以非常低的延迟(latency)来执行RDMA。 It should be very low latency (latency) to perform RDMA. 在一些企业网络中,存在专用于集群服务器的第三类网络,如交换机175所示。 In some corporate network, a dedicated server in the cluster network of the third type, the switch 175 as shown in FIG. 这例如可以是“Myrinet”、“Quadrix”或“Infiniband”网络。 This can be "Myrinet", "Quadrix" or "Infiniband" network.

因此,服务器的集群将给数据中心网络添加更多的复杂因素。 Therefore, the data center network servers in the cluster will add more complications. 但是,与Quadrix和Myrinet不同,Infiniband允许进行集群并且提供了简化数据中心网络的可能性。 However, with different Quadrix and Myrinet, Infiniband allow clustering and provides the possibility of a simplified data center network. Infiniband网络设备相对更低廉,这主要是因为它们使用较小的缓冲器空间、铜介质和简单的转发方案。 Infiniband network device relatively inexpensive, mainly because they use a smaller buffer space, copper media and simple forwarding scheme.

但是,Infiniband网络有若干缺陷。 However, Infiniband network has a number of drawbacks. 例如,目前只有一个Infiniband交换机组件来源。 For example, there is only one component source Infiniband switch. 此外,尚未证明Infiniband能够在例如大型企业的数据中心这样的上下文中正确工作。 In addition, Infiniband has not proved able to work properly in a context such as this large enterprise data centers. 例如,没有已知的用于互连Infiniband子网的Infiniband路由器的实现方式。 For example, no known implementation of a router for interconnecting Infiniband Infiniband subnet. 虽然在Infiniband和光纤信道以及Infiniband到以太网之间可能有网关,但是,将以太网从数据中心中去除的可能性是非常小的。 Although there may be a gateway between the Infiniband Infiniband, and Fiber Channel and Ethernet, however, the possibility of removing from the Ethernet data center is very small. 这也意味着主机不仅需要Infiniband连接,还需要以太网连接。 This also means that not only Infiniband host connection, you also need an Ethernet connection.

因此,即使大型企业希望忽略前述缺点并改为基于Infiniband的系统,该企业也需要在企业测试基于Infiniband系统时有遗留数据中心网络(例如如图1所示)安装并工作。 Accordingly, even if large companies want to ignore the aforementioned disadvantages and Infiniband-based system, the enterprise need (e.g. FIG. 1) and is mounted with a working test is based on the enterprise data center network when Infiniband legacy system. 因此,基于Infiniband的成本将不会是替代成本,而是额外的成本。 Therefore, based on the cost Infiniband it will not be the replacement cost, but additional costs.

非常需要以允许相对于现有数据中心网络发生进化改变的方式简化数据中心网络。 Highly desirable to allow the conventional manner with respect to the evolution of the data center network change occurs simplified data center network. 理想的系统将会提供用于以低成本统一服务器I/O和提供低延迟和高速度的进化系统。 The ideal system would provide for the evolution of the system at a low cost unified server I / O and provide low latency and high speed.


本发明提供了用于实现低延迟以太网(“LLE”)解决方案的方法和设备,在这里LLE解决方案也被称为数据中心以太网(“DCE”)解决方案,它简化了数据中心的连通性,并且提供了用于传输以太网和存储流量的高带宽、低延迟网络。 The present invention provides a method and apparatus for implementing a low-latency Ethernet ( "LLE") solution, here also referred LLE solutions Data Center Ethernet ( "DCE") solution, which simplifies the data center connectivity, and provides high-bandwidth transmission for Ethernet and storage traffic, low-latency network. 本发明的一些方面包括将FC帧变换成适合于在以太网上传送的格式。 Some aspects of the present invention include FC frame into a format suitable for transmission over Ethernet. 本发明的一些优选实现方式在数据中心或类似网络的单个物理连接中实现了多个虚拟通道(“VL”)(也称为虚拟链路)。 Some preferred implementations of the present invention achieves a plurality of virtual channels ( "VL") (also known as a virtual link) in a single physical data center or similar network connection. 一些VL是“丢弃”VL,具有类似以太网的行为,其他的是具有类似FC行为的“无丢弃”通道。 Some VL is "discarded" VL, Ethernet has a similar behavior, others are having similar behavior FC "no-discard" channel.

VL部分地可通过标记帧来实现。 VL part may be implemented by a frame marker. 由于每个VL可具有其自己的信用,因此每个VL可被独立地对待。 Since each VL may have its own credit, so each VL can be treated independently. 甚至可以依据补充速率根据指派给VL的信用来确定每个VL的性能。 VL can even give credit to determine the performance of each VL based replenishment rate according to an assignment. 为了允许更复杂的拓扑并允许对交换机内的帧进行更好的管理,TTL信息以及帧长度字段可被添加到帧。 In order to allow more complex topologies in the frame and allow better management of the switch, TTL field information and frame length may be added to the frame. 还可以有关于拥塞的编码信息,以便源可以接收显式(explicit)消息以减慢下来。 There may also encode information about the congestion, so that the source may receive an explicit message (Explicit) to slow down.

本发明的一些优选实现方式提供了基于信用和VL的保证带宽。 Some preferred implementations of the invention provide guaranteed bandwidth based on credits and VL. 不同的VL可被指派以可随时间改变的不同保证带宽。 Different VL may be assigned to different guaranteed bandwidth may vary over time. 优选地,VL将会保持相同的特性(例如将会保持是丢弃或无丢弃通道),但是VL的带宽将会取决于一天中的时间、要完成的任务等等而动态改变。 Preferably, VL will remain the same characteristics (for example, will remain or be discarded without discard channel), but the bandwidth will VL depends on the time of day, etc. to complete the task change dynamically.

活动缓冲器管理既允许高可靠性,又允许了低延迟,同时还使用较小的帧缓冲器,甚至对于10GB/s以太网也是如此。 Activities buffer management allows both high reliability and allows low latency, while also using a smaller frame buffer, even for 10GB / s Ethernet as well. 优选地,对于不同类型的VL,例如丢弃和无丢弃VL,用于活动缓冲器管理的规则是不同的。 Preferably, the VL of different types, for example, and discarded without discard the VL, the rules for active buffer management are different. 本发明的一些实施例是利用铜介质而不是光纤实现的。 Some embodiments of the present invention is to use optical fiber rather than copper media implemented. 给定所有这些属性,可以以有竞争力的、相对低廉的方式实现I/O统一。 Given all these attributes can be competitive, relatively inexpensive way to achieve I / O unification.

本发明的一些方面提供了一种用于变换光纤信道(“FC”)帧以在以太网上传送的方法。 Some aspects of the present invention provides a method of converting a frame Fiber Channel ( "FC") is used to communicate over Ethernet. 该方法包括以下步骤:接收FC帧;将FC帧的目的地FC ID字段的目的地内容映射到以太网帧的目的地MAC字段的第一部分;将FC帧的源FC ID字段的源内容映射到以太网帧的源MAC字段的第二部分;将FC帧的非法符号转换成合法符号;将合法符号插入到以太网帧的选定字段中;将FC帧有效载荷的有效载荷内容映射到以太网帧的有效载荷字段;以及在以太网上传输以太网帧。 The method comprises the steps of: receiving the FC frame; mapping destination contents of destination FC ID field of the FC frame to the first portion of the destination MAC field of the Ethernet frame; source content source FC ID field of the FC frame is mapped to the second portion of the source MAC field of the Ethernet frame; converting illegal symbol into a legal symbol FC frame; legitimate symbols inserted into a selected field of an Ethernet frame; mapping FC payload contents to the Ethernet frame payload a payload field of the frame; and transmitting the Ethernet frame over the Ethernet.

第一部分可以是目的地MAC字段的设备ID字段,第二部分可以是源MAC字段的设备ID字段。 The first portion may be a device ID field of the destination MAC field, the second portion may be a device ID field source MAC field. 非法符号可以是FC帧的SOF字段和EOF字段中的符号。 Illegal symbols may be SOF and EOF field frame FC fields symbols. 插入步骤可以包括将合法符号插入到以太网帧的至少一个内部字段中。 Inserting step may include the valid symbol into at least one internal field of the Ethernet frame. 所述方法还包括向准备在以太网上传送的FC帧指派一个组织唯一标识符(“OUI”)代码以及将该OUI代码插入到以太网帧的源MAC字段和目的地MAC字段的组织ID字段中的步骤。 The method further comprises preparing to transmit in the Ethernet frame is assigned a FC organization unique identifier ( "OUI") code, and the OUI code into the organization ID field and the source MAC field of the Ethernet frame destination MAC field of the A step of.

本发明的一些实施例提供了一种网络设备,包括被配置用于与FC网络通信的多个FC端口以及被配置用于与以太网通信的多个以太网端口。 Some embodiments of the present invention provides a network device comprising a plurality of FC ports are configured for network communication with a plurality of FC and Ethernet ports are configured for Ethernet communications. 该网络设备还包括至少一个逻辑设备,该逻辑设备被配置用于执行下列步骤:从多个FC端口中的一个接收FC帧;将FC帧的目的地FC ID字段的目的地内容映射到以太网帧的目的地MAC字段的第一部分;将FC帧的源FC ID字段的源内容映射到以太网帧的源MAC字段的第二部分;将FC帧的非法符号转换成合法符号;将合法符号插入到以太网帧的选定字段中;将FC帧有效载荷的有效载荷内容映射到以太网帧的有效载荷字段;以及将以太网帧转发到多个以太网端口中的一个以在以太网上传输。 The network device further comprises at least one logic device, the logic device is configured to perform the following steps: from a plurality of receiving ports FC FC frame; mapping destination contents of destination FC ID field of the FC frame to the Ethernet a first portion of the destination MAC frame field; source content source FC ID field of the FC frame is mapped to a second portion of the source MAC field of the Ethernet frame; converting illegal FC frame symbols into legal symbols; symbols into legitimate to selected fields of the Ethernet frame; FC frame payload content payload is mapped into a payload field of an Ethernet frame; and forwarding the Ethernet frame to one of a plurality of Ethernet ports for transmission on Ethernet. 该网络设备可以是存储网关。 The network gateway device may be stored.

第一部分可以是目的地MAC字段的设备ID字段,第二部分可以是源MAC字段的设备ID字段。 The first portion may be a device ID field of the destination MAC field, the second portion may be a device ID field source MAC field. 非法符号可以是FC帧的SOF字段和EOF字段中的符号。 Illegal symbols may be SOF and EOF field frame FC fields symbols. 该逻辑设备可被配置为将合法符号插入到以太网帧的至少一个内部字段中。 The logic device may be configured to at least one valid symbol is inserted into the internal field of the Ethernet frame. 该逻辑设备还可被配置为向准备在以太网上传送的FC帧指派一个组织唯一标识符(“OUI”)代码并将该OUI代码插入到以太网帧的源MAC字段和目的地MAC字段的组织ID字段中。 The logic device may also be configured to transmit to prepare an FC over Ethernet frame is assigned a unique organization identifier ( "OUI") code and the OUI code into the source MAC field of the Ethernet frame destination MAC and field organizations ID field.

本发明的替换方面提供了变换以太网帧以在光纤信道(“FC”)网络上传送的方法。 Alternative aspect the present invention provides a method of converting an Ethernet frame to be transmitted over the network Fiber Channel ( "FC"). 一些这样的方法包括这些步骤:接收以太网帧;将以太网帧的目的地MAC字段的第一部分的目的地内容映射到FC帧的目的地FCID字段;将以太网帧的源MAC字段的第二部分的源内容映射到FC帧的源FC ID字段;将以太网帧的合法符号转换成非法符号;将非法符号插入到FC帧的选定字段中;将以太网帧的有效载荷字段的有效载荷内容映射到FC帧有效载荷字段;以及在FC网络上传输FC帧。 Some such methods include these steps: receiving an Ethernet frame; the first part of the destination Ethernet frame destination MAC field is mapped to the FC frame destination field of FCID; second source MAC field of the Ethernet frame content source mapped to the source part of the FC ID field of the FC frame; converting Ethernet frames into valid symbol illegal symbols; illegal symbols inserted into selected fields of FC frame; payload of the payload field of the Ethernet frame content mapping FC frames to the payload field; and a transmission frame on FC FC network.

这里描述的方法可以按多种方式实现和/或表现,所述多种方式例如包括硬件、软件等等。 The method described herein may be implemented and / or expression in various ways, for example, the plurality of modes including hardware, software and the like.


通过结合附图参考以下描述将理解本发明,附图图示了本发明的具体实现方式。 With reference to the following description in conjunction with the accompanying drawings it will be appreciated that the present invention, illustrate specific implementations of the present invention.

图1是示出数据中心的简化网络图。 FIG. 1 is a simplified network diagram illustrating a data center.

图2是示出根据本发明一个实施例的数据中心的简化网络图。 FIG 2 is a simplified network diagram illustrating a data center according to one embodiment of the present invention.

图3是示出在单个物理链路上实现的多个VL的框图。 FIG 3 is a block diagram of a plurality of VL implemented on a single physical link.

图4示出根据本发明一些实现方式携带有用于实现DCE的额外字段的以太网帧的一种格式。 FIG 4 illustrates a format of an Ethernet frame in accordance with some implementations of the invention for realizing the additional field carrying a DCE.

图5示出根据本发明一些实现方式的链路管理帧的一种格式。 FIG. 5 shows a link management frame format in accordance with some implementations of the present invention.

图6A是示出本发明的简化的基于信用(credit)的方法的网络图。 6A is a simplified method based on credit (Credit) of the present invention, the network of FIG.

图6B是示出本发明的信用计入(crediting)方法的表。 FIG 6B is a table showing the credit included in the present invention (crediting) method.

图6C是概括根据本发明用于初始化链路的一种示例性方法的流程图。 6C is a flowchart outlining an exemplary method of initializing link method according to the present invention.

图7A示出了iSCSI栈。 7A shows the iSCSI stack.

图7B示出了用于实现FC上的SCSI的栈。 FIG 7B shows a stack for implementing SCSI over FC is.

图8示出了根据本发明一些方面用于实现DCE上的SCSI的栈。 Figure 8 shows a stack for implementing SCSI the DCE according to some aspects of the present invention.

图9A和9B示出了根据本发明一些方面用于实现以太网上的FC的方法。 9A and 9B illustrate a method for FC over Ethernet according to some aspects of the present invention is implemented.

图10是根据本发明一些方面用于实现以太网上的FC的简化网络图。 FIG 10 is a simplified network diagram of the FC over Ethernet according to some aspects of the present invention.

图11是根据本发明一些方面用于聚集DCE交换机的简化网络图。 11 according to some aspects of the present invention, a simplified network diagram for aggregating DCE switch.

图12示出了根据本发明一些实施例的DCE交换机的体系结构。 FIG 12 illustrates the architecture of a DCE in accordance with some embodiments of the switch of the present invention.

图13是示出根据本发明一些实现方式每个VL的缓冲器管理的框图。 FIG 13 is a block diagram illustrating a buffer management per VL according to some implementations of the invention.

图14是示出根据本发明的某些类型的显式拥塞通知的网络图。 FIG 14 is a network diagram illustrating some types of notification according to the present invention Explicit Congestion.

图15是示出根据本发明一些实现方式的每个VL的缓冲器管理的框图。 FIG 15 is a block diagram illustrating a buffer management per VL according to some implementations of the present invention.

图16是示出根据本发明一些方面的概率性丢弃函数的图。 FIG 16 is a diagram illustrating a probabilistic drop function in accordance with some aspects of the present invention, FIG.

图17是示出一段时间中VL缓冲器的示例性占用率的图。 FIG 17 is a diagram illustrating an exemplary period of time VL buffer occupancy.

图18是示出根据本发明的替换方面的概率性丢弃函数的图。 FIG 18 is a diagram illustrating a probabilistic drop function in accordance with alternative aspects of the present invention.

图19示出可被配置执行本发明的一些方法的网络设备。 Figure 19 illustrates a network device may be configured to perform some methods of the present invention.

具体实施方式 Detailed ways

现在将详细参考本发明的一些具体实施例,其中包括发明人认为实现本发明的最佳模式。 Reference will now be made in detail to some specific embodiments of the present invention embodiments, including the inventors believe that the best mode of the invention. 这些具体实施例的示例在附图中示出。 Examples of these specific embodiments are illustrated in the accompanying drawings. 虽然是结合这些具体实施例来描述本发明的,但是应当理解,并不希望将本发明限制到所描述的实施例。 Although these specific embodiments described in conjunction with the present invention, it should be understood that is not intended to limit the invention to the described embodiments. 相反,希望覆盖所附权利要求所限定的本发明的精神和范围所包括的替换、修改和等同物。 On the contrary, it is intended to cover in the appended claims as defined by the spirit and scope of the present invention includes alternatives, modifications, and equivalents thereof. 此外,下面阐述了许多具体细节以帮助全面理解本发明。 In addition, numerous specific details are set forth below in order to provide a thorough understanding of the present invention. 没有这些具体细节也能实现本发明。 Without these specific details of the present invention can be achieved. 在其他情况下,没有详细描述公知的过程操作,以避免模糊本发明。 In other instances, detailed descriptions of well-known process operations to avoid obscuring the present invention.

本发明提供了用于简化数据中心的连通性和提供用于传输以太网和存储流量的高带宽、低延迟网络的方法和设备。 The present invention provides a communication and provide a method and apparatus for simplifying the data center for the transmission of high-bandwidth Ethernet and storage traffic, low-latency network. 本发明的一些优选实现方式在数据中心或类似网络的单个物理连接中实现了多个VL。 Some preferred implementations of the present invention achieves a more VL single physical data center or similar network connection. 优选地,针对每个VL维护缓冲器到缓冲器信用。 Preferably, a buffer to maintain the buffer credit for each VL. 一些VL是“丢弃”VL,具有类似以太网的行为;其他的是具有类似FC的行为的“无丢弃”通道。 Some VL is "discarded" VL, Ethernet has a similar behavior; the other is a similar behavior of FC "no-discard" channel.

一些实现方式提供了“丢弃”和“无丢弃”之间的中间行为。 Some implementations provide an intermediate behavior between the "discard" and "no discard." 一些这样的实现方式是“推迟丢弃”(delayed drop),其中在缓冲器满时,帧不会立即被丢弃,相反在丢弃帧之前存在有限时间(例如毫秒量级)的上游“回推”。 Some such implementations are "delayed discard" (delayed drop), wherein the buffer is full, the frame is not discarded immediately upstream finite time (e.g. milliseconds) is present before discarding the frame opposite to "push back." 推迟丢弃实现方式对于管理短暂的拥塞是有用的。 Discard postpone implementation for short-term congestion management is useful.

优选地,拥塞控制方案在第2层实现。 Preferably, the congestion control scheme implemented in the second layer. 本发明的一些优选实现方式基于信用和VL提供了保证带宽。 Some preferred implementations of the present invention is based on credits and VL provide guaranteed bandwidth. 使用信用的替换方案是针对每个VL使用标准IEEE 802.3暂停(PAUSE)帧,以实现“无丢弃”或“推迟丢弃”VL。 The alternative is to use a credit to use IEEE 802.3 standard suspension (the PAUSE) frame, in order to achieve "no discard" or "discard delayed" VL for each VL. 这里通过引用将IEEE 802.3标准结合进来,用于所有目的。 Here the IEEE 802.3 standard incorporated by reference for all purposes. 例如,通过引用具体结合了802.3ae-2002标准的附录31B,其标题为“MACControl PAUSE Operation”。 For example, by reference to the specific combination of Appendix 31B 802.3ae-2002 standard, entitled "MACControl PAUSE Operation". 还要理解,在没有VL的情况下本发明仍能工作,但是在这种情况下,整条链路将会表现出“丢弃”或“推迟丢弃”或“无丢弃”行为。 Also understood that the invention can still work in the absence of VL cases, but in this case, the whole link will show "discard" or "postpone discard" or "no-discard" behavior.

优选实现方式支持协商机制,例如IEEE 802.1x指定的那种,这里通过引用将其结合进来。 Preferred implementation support negotiation mechanism, such as that specified in IEEE 802.1x, which is herein incorporated by reference. 协商机制例如可以确定主机设备是否支持LLE,并且,如果支持的话,则允许主机接收VL和信用信息,例如:支持多少个VL,VL是使用信用还是暂停,如果是信用那么有多少信用,每个个别VL的行为如何。 Consultation mechanisms such can determine whether the host device supports LLE, and, if supported, it allows the host to receive VL and credit information, such as: How many VL support, VL is the use of credit or suspended, if the credit so how much credit each VL how individual behavior.

活动缓冲器管理既允许高可靠性,又实现了低延迟,同时还使用较小的帧缓冲器。 Activities buffer management allows both high reliability, but also to achieve a low latency, while also using a smaller frame buffer. 优选地,对于丢弃和无丢弃VL,用于活动缓冲器管理的规则是不同的。 Preferably, and for discarding without discarding the VL, the rules for active buffer management are different.

本发明的一些实现方式支持对于集群实现方式尤其有用的高效RDMA协议。 Some implementations of the invention is particularly useful for achieving supports trunking efficient RDMA protocol. 在本发明的一些实现方式中,网络接口卡(“NIC”)实现了用于集群应用的RDMA,还实现了用于RDMA的可靠传输。 In some implementations of the present invention, a network interface card ( "NIC") implements RDMA for clustering applications, but also to achieve a reliable transmission for RDMA. 本发明的一些方面是经由来自用户直接访问编程库(“uDAPL”)的用户API实现的。 Some aspects of the present invention from a user via the user direct access to the programming library ( "uDAPL") of the API. uDAPL定义了用于所有具有RDMA能力的传输的一组用户API,这里通过引用将其结合进来。 uDAPL defines a set of user API for all RDMA transfer capability of having, herein incorporated by reference.

图2是示出用于简化数据中心200的连通性的LLE解决方案的一个示例的简化网络图。 FIG 2 is a diagram illustrating a simplified network diagram of an exemplary communication LLE solutions simplify the data center 200. 数据中心200包括LLE交换机240,其具有用于经由防火墙215与TCP/IP网络205和主机设备280和285的连通性的路由器260。 LLE data center 200 includes a switch 240 having a firewall 215 and via a TCP / IP router network connectivity devices 280 and 205 and the host 285 260. 示例性的LLE交换机的体系结构在这里详细阐述。 Exemplary architecture LLE switches set forth in detail herein. 优选地,本发明的LLE交换机可运行10Gb/s以太网,并且具有相对较小的帧缓冲器。 Preferably, the LLE switches of the present invention may operate 10Gb / s Ethernet, and has a relatively small frame buffer. 一些优选的LLE交换机只支持第2层功能。 Some preferred LLE switches support Layer 2 functions.

虽然本发明的LLE交换机可以利用光纤和光收发机实现,但是一些优选的LLE是利用铜连通性来实现的,以便降低成本。 While LLE switches of the present invention may be utilized to achieve an optical fiber and an optical transceiver, but some preferred LLE communication using copper achieved in order to reduce costs. 一些这样的实现方式是根据提议的IEEE 802.3ak标准实现的,该标准被称为10Base-CX4,这里通过引用将其结合进来,用于所有目的。 Some such implementations are implemented in accordance with the proposed IEEE 802.3ak standard, which is referred to as 10Base-CX4, herein incorporated by reference for all purposes. 发明人预期其他实现方式将使用新兴的标准IEEE P802.3an(10GBASE-T),这里也通过引用将其结合进来,用于所有目的。 The inventors expected that other implementations will use the emerging standard IEEE P802.3an (10GBASE-T), which is here incorporated by reference for all purposes.

服务器210也与LLE交换机245相连接,LLE交换机245包括用于与盘阵列250通信的FC网关270。 Server 210 is also connected to the LLE switch 245, the switch 245 includes LLE FC gateway 270 for communication with the disk array 250. FC网关270在以太网上实现FC(这里将对其进行详细描述),从而消除了在数据中心200内有单独的FC和以太网网络的需要。 FC in the FC gateway 270 Ethernet (which will be described in detail herein), thereby eliminating the need to separate FC and Ethernet network 200 in the data center. 网关270可以是诸如Cisco Systems的MDS 900IP存储服务模块这样的设备,该设备已被配置有用于执行本发明的一些方法的软件。 Gateway 270 may be a device such as storage services module MDS 900IP from Cisco Systems, the device has been configured with software to perform some methods of the present invention. 以太网流量按原始格式承载于数据中心200内。 Ethernet traffic carried in its original form in the data center 200. 之所以能够这样是因为LLE是以太网的扩展,它除了原有的以太网外还能承载以太网上FC(FC over Ethernet)和RDMA。 LLE has been able to so because Ethernet is an extension of its addition to the existing Ethernet can carry Ethernet on FC (FC over Ethernet) and RDMA.

图3示出了由物理链路315连接的两个交换机305和310。 FIG. 3 shows two switches connected by the physical link 315 and 305 310. 交换机305和310的行为一般来说受IEEE 802.1约束,而物理链路315的行为一般来说受IEEE 802.3约束。 Switches 305 and 310 acts in general bound by the IEEE 802.1, and the general behavior of a physical link 315 constrained by the IEEE 802.3. 大体上,本发明提供了LLE交换机的两个一般行为,以及多种中间行为。 In general, the present invention provides two general behavior of the LLE switches, and a variety of intermediate behavior. 第一一般行为是“丢弃”行为,该行为与以太网的类似。 The first general behavior is "discarded" behavior, the behavior is similar to the Ethernet network. 一般行为是“无丢弃”行为,该行为与FC的类似。 General behavior is a "no-discard" behavior, which is similar to the FC. 本发明还提供了“丢弃”和“无丢弃”行为之间的中间行为,包括但不限于本文中其他地方描述的“推迟丢弃”行为。 The present invention further provides an intermediate behavior between the behaviors "discard" and "no discard", including but not limited to "defer discard" behavior described elsewhere herein.

为了在同一物理链路315上实现两种行为,本发明提供了用于实现VL的方法和设备。 To achieve the two actions on the same physical link 315, the present invention provides a method and apparatus for implementing the VL. VL是将一条物理链路分割成多个逻辑实体以便一个VL中的流量不受其他VL上的流量的影响的方式。 VL is a physical link into a plurality of logical entities to flow VL is not affected by traffic on other ways VL. 这是通过为每个VL维护单独的缓冲器(或一个物理缓冲器的单独的部分)来完成的。 This is accomplished by maintaining a separate buffer (or a physically separate portion of the buffer) for each VL. 例如,可以使用一个VL来传送控制平面流量和一些其他的高优先级流量,而这些流量不会由于另一VL上的低优先级大块流量而被阻塞。 For example, a VL may be used to transfer control plane traffic and some other high-priority traffic, while the traffic is not due to the low-priority traffic on the other bulk VL is blocked. VLAN可被分组成不同的VL,以便一组VLAN中的流量的行进可以不受其他VLAN上流量的阻碍。 VLAN can be grouped into different VL, VLAN traffic to a group of traveling unimpeded traffic on the other VLAN.

在图3所示的示例中,交换机305和310实际上在物理链路315上提供了4个VL。 In the example shown in FIG. 3, the switches 305 and 310 actually provides four VL on the physical link 315. 在这里,VL 320和325是丢弃VL,而VL 330和335是无丢弃VL。 Here, VL 320 and 325 are discarded VL, VL 330 and 335 are not discarded and VL. 为了同时实现“丢弃”行为和“无丢弃”行为两者,必须为每类行为指派至少一个VL,总共2个。 In order to simultaneously achieve both the "discard" behavior and "No discard" behavior, it must be assigned at least one VL behavior for each category, a total of two. (理论上,可以只有一个VL,该VL被临时指派给每类行为,但是这种实现方式不是优选的)。 (In theory, there may be only a VL, VL which is temporarily assigned to each type of behavior, but this is not preferred implementation). 为了支持遗留设备和/或其他缺乏LLE功能的设备,本发明的优选实现方式支持没有VL的链路,并且将该链路的所有流量映射到第一LLE端口处的单个VL。 In order to support legacy devices and / or other devices lack features LLE, preferred implementation of the present invention is not VL support link, and the links VL all traffic mapped to a single port at a first LLE. 从网络管理角度来看,优选有2到16个VL,虽然也可以实现更多个。 From the network management perspective, there is preferably 2 to 16 the VL, although more may be achieved.

优选地,将链路动态划分为VL,这是因为静态划分不太灵活。 Preferably, the VL is divided into dynamic link, which is less flexible because the static partitioning. 在本发明的一些优选实现方式中,动态划分例如是通过添加扩展头部而在逐分组基础上(或逐帧基础上)实现的。 In some preferred implementations of the present invention, for example, by adding dynamic partitioning in expanded header packet by packet basis (or frame by frame basis) achieved. 本发明涵盖了这种头部的很多种格式。 The present invention encompasses a wide variety of formats such a head. 在本发明的一些实现方式中,在DCE链路上有两类帧发送:这些类型是数据帧和链路管理帧。 In some implementations of the present invention, there are two types of frames transmitted on the link DCE: These types of data link frames and management frames.

虽然图4和5分别示出了用于实现本发明的一些方面的以太网数据帧和链路管理帧的格式,但是本发明的其他实现方式提供了具有更多或更少字段的帧、不同顺序的帧或其他变体。 Although FIGS. 4 and 5 each shows a format of Ethernet data frames and management frames link to some aspects of the present invention, but other implementations of the present invention provides a frame having more or fewer fields, different frame or other sequence variant thereof. 图4的字段405和410是分别用于帧的目的地地址和源地址的标准以太网字段。 Fields 405 and 410 of FIG. 4 respectively for a standard Ethernet destination and source address fields of frames. 类似地,协议类型字段430、有效载荷435和CRC字段440可以是标准以太网帧的那些字段。 Similarly, protocol type field 430, a payload field 435, and CRC field 440 may be those of a standard Ethernet frame.

但是,协议类型字段420指示以下字段是DCE头部425的那些字段。 However, the protocol type field 420 indicates the following fields are those fields DCE header 425. 如果存在的话,DCE头部优选尽可能地接近帧的开始处,这是因为它使得在硬件中能够很容易进行解析。 If present, the DCE head preferably as close to the beginning of the frame, because it makes it possible to easily parse in hardware. DCE头部可被携带在以太网数据帧中(如图4所示),以及携带在链路管理帧中(见图5和相应的描述)。 DCE header may be carried in the Ethernet data frame (FIG. 4), and carried in a link management frame (see FIG. 5 and corresponding description). 该头部优选被MAC剥去,并且不需要被存储在帧缓冲器中。 The MAC header is preferably stripped, and need not be stored in the frame buffer. 在本发明的一些实现方式中,当不存在数据流量或由于缺乏信用而不能发送常规帧时,生成链路管理帧的连续流。 In some implementations of the present invention, when the absence of traffic data or lack of credit and can not send regular frames, generating a continuous flow link management frame.

DCE头部中携带的大多数信息与包含有该DCE头部的以太网帧相关。 Most of the information carried in the head DCE associated with the Ethernet frame contains the DCE head. 但是,一些字段是用于为相反方向上的流量补充信用的缓冲器信用字段。 However, some fields are used to supplement the credit to flow in the opposite direction of the buffer credit field. 在该示例中,缓冲器信用字段只由具有长DCE头部的帧携带。 In this example, the buffer credit field only is carried by the frame having a long DCE header. 如果解决方案使用暂停帧而不是信用,则可能不需要信用字段。 If the solution using a pause frame instead of credit, you may not need credit field.

TTL字段445指示存活时间,这是一个每当帧400被转发时就被递减的数。 The TTL field 445 indicates survival time, which is when it is decremented each time a frame number 400 are forwarded. 通常,第2层网络不需要TTL字段。 Typically, layer 2 networks do not require the TTL field. 以太网使用生成树拓扑,这种拓扑是非常保守的。 Ethernet Spanning Tree topology that is very conservative. 生成树对活动拓扑施加约束,并且对于从一个交换机到另一个交换机的分组只允许一条路径。 Impose constraints on the spanning tree active topology and, for a packet from the switch to another switch allows only one path.

在本发明的优选实现方式中,不遵从对活动拓扑的这种限制。 In a preferred implementation of the present invention, which does not comply with restrictions on the active topology. 相反,优选地,多条路径同时活动,例如经由链路状态协议,比如OSPF(最短路径优先)或IS-IS(中间系统到中间系统)。 Instead, preferably, multiple paths simultaneously active, for example via a link state protocol such as the OSPF (Open Shortest Path First) or IS-IS (Intermediate System to Intermediate System). 但是,已经知道链路状态协议在拓扑重配置期间会导致瞬时环路。 However, it is known link-state protocol during topology reconfiguration can cause transient loops. 利用TTL或类似的特征确保了瞬时环路不会变成大问题。 TTL or use similar features to ensure the instantaneous loop does not become a big problem. 因此,在本发明的优选实现方式中,TTL被编码在帧中,从而实际上在第2层实现链路状态协议。 Thus, in the preferred implementation of the present invention, TTL is encoded in frames, thereby effectively implement a link state protocol at layer 2. 与使用链路状态协议不同的是,本发明的一些实现方式使用以不同的LLE交换机为根的多个生成树,并获得了类似的行为。 Using a link-state protocol is different, some implementations of the present invention is used in a different LLE switches as the root of a plurality of spanning trees, and similar behavior is obtained.

字段450标识帧400的VL。 Field 450 identifies the frame of VL 400. 根据字段450对VL的标识允许了设备将帧指派给适当的VL,并且为不同的VL应用不同的规则。 The identification field 450 pairs of VL allows the device to assign the frame to the appropriate VL, and different applications of different VL rule. 如本文中其他地方详细描述的,规则将会根据各种标准而有所不同,所述标准例如是VL是丢弃还是无丢弃VL,VL是否具有保证带宽,当前在VL上是否有拥塞,以及其他因素。 As described in detail elsewhere herein, the rule will vary depending on a variety of criteria, which, for example, whether to discard or not discard a VL VL, VL whether guaranteed bandwidth, whether the current congestion in VL, and other factor.

ECN(显式拥塞通知)字段455被用于指示缓冲器(或分配给此VL的缓冲器的一部分)就要被填满,因而对于所指示的VL,源应当减慢其传送速率。 ECN (Explicit Congestion Notification) field is used to indicate the buffer 455 (or a portion of the buffer assigned for this VL) is about to be filled, and thus indicated for the VL, the source should slow down its transmission rate. 在本发明的优选实现方式中,网络中的至少一些主机设备可理解ECN信息,并且将会对所指示的VL应用整形器,即一个a/k/a速率限制器。 In a preferred implementation of the invention, at least some of the network host device ECN understandable information, and will be on the indicated application VL shaper, i.e. a a / k / a rate limiter. 显式拥塞通知可以按至少两种一般方式发生。 Explicit Congestion Notification can take place in at least two general ways. 在一种方法中,出于发送ECN的明确目的而发送一个分组。 In one method, for the express purpose of transmitting transmits a packet ECN. 在另一种方法中,该通知被“捎带(piggy-back)”在已被传送的分组上。 In another approach, the notification is "piggybacked (piggy-back)" on the packet has been transmitted.

如其他地方所述的,显式拥塞通知可被发送到源或发送到边缘设备。 As described elsewhere, explicit congestion notification may be sent to the source or to the edge device. ECN可发源于DCE网络中的各种设备中,包括末端设备和核心设备。 DCE ECN can originate a variety of network devices, comprising end devices and core devices. 如下面的交换机体系结构部分中更详细讨论的,拥塞通知和对其的响应是控制拥塞同时维持较小的缓冲器大小的重要部分。 The structure of the switch part of the system below discussed in more detail below, the congestion notification and response thereto is an important part of controlling congestion while maintaining a smaller buffer size.

本发明的一些实现方式允许了ECN被从发端设备向上游发送,以及/或者允许了ECN被向下游发送,然后返回上游。 Some implementations of the invention allow the ECN to be sent from the originating device to the upstream, and / or allow the ECN to be sent downstream, and then returned to the upstream. 例如,ECN字段455可以包括前向ECN部分(“FECN”)和后向ECN部分(“BECN”)。 For example, ECN field 455 may include a front portion to the ECN ( "FECN") and the backward ECN portion ( "BECN"). 当交换机端口经历拥塞时,它可对FECN部分中的一位进行置位,并正常地转发帧。 When the port is experiencing congestion, it may be a portion of the FECN be set, and forwards the frame normally. 在接收到FECN位被置位的帧时,末端站对BECN位进行置位,并且帧被发送回源。 When the received frame FECN bit is set, the end station of the BECN bit is set, and the frame is transmitted back to the source. 源接收帧,检测到BECN位已被置位,并且减少注入网络中的流量,或者至少对于所指示的VL减少注入网络中的流量。 Source of the received frame, detecting the BECN bit has been set, and to reduce the flow injected into the network, or at least to reduce the flow of VL indicated injected network.

帧信用字段465被用于指示应当为帧400分配的信用的数目。 Frame credit field 465 is used to indicate the number of credits should be allocated to the 400 frames. 在本发明的范围内,存在许多可能的实现这种系统的方式。 Within the scope of the present invention, there are many possible ways to implement such systems. 最简单的解决方案是为个别分组或帧计入信用。 The simplest solution is for individual packets or frames included in the credits. 从缓冲器管理角度来看,这可能不是最佳的解决方案:如果为单个信用预留一个缓冲器并且每个分组上应用一个信用,那么就为单个分组预留了整个缓冲器。 From the buffer management perspective, this may not be the best solution: If you reserve a buffer for a single credit and a credit application on each packet, then set aside for the entire single packet buffer. 即使缓冲器的大小只等于预期的标准大小帧的大小,这种信用计入方案经常也会导致对每个缓冲器的利用率很低,这是因为许多帧将会小于最大大小。 Even if the size of the buffer is only equal to the expected size of a standard frame size, the credit included in the program often can lead to low utilization of each buffer, because many will be less than the maximum frame size. 例如,如果标准大小帧为9KB,并且所有缓冲器都是9KB,但是平均帧大小是1500字节,则通常每个缓冲器中只有1/6被使用。 For example, if the standard size frame 9KB, and all buffers are 9KB, but the average frame size is 1500 bytes, each of the buffers is usually only 1/6 is used.

一种更好的解决方案是根据帧大小来计入信用。 A better solution is to be included in credit depending on the frame size. 虽然可以为例如单个字节计入一个信用,但是在实际中优选使用更大的单位,例如64B、128B、256B、512B、1024B,等等。 While, for example, a credit can be included in a single byte, but in practice preferred to use larger units, e.g. 64B, 128B, 256B, 512B, 1024B, and the like. 例如,如果信用是针对512B这一单位的,则前述的平均1500字节的帧将会需要3个信用。 For example, if the credit is for this unit 512B, the aforementioned average 1500 byte frame would require three credits. 如果根据本发明的一个这种实现方式传送这种帧,帧信用字段465将会指示帧需要3个信用。 If this frame is transmitted in accordance with one embodiment of this implementation of the present invention, a frame credit field 465 indicates that the frame will take up to 3 credits.

根据帧大小的信用计入允许了更高效地使用缓冲器空间。 The frame size is counted credit allows more efficient use of buffer space. 已知分组的大小不仅指示了将会需要多少缓冲器空间,还指示了何时可将分组从缓冲器移走。 Known packet size indicates not only how much buffer space will be needed, also indicates when the packet can be removed from the buffer. 例如,如果交换机的内部传送速度不同于数据到达交换机端口的速率的话,那么这就尤其重要。 For example, if the internal data transmission rate different from the rate of arrival of the switch ports of the switch, then this is particularly important.

此示例提供了DCE头部的较长版本和较短版本。 This example provides a longer version of the DCE head and shorter version. 长头部字段460指示了DCE头部是长版本还是短版本。 Long header field 460 indicates that the DCE is the head of the long version or the short version. 在该实现方式中,所有的数据帧都至少包含短头部,该短头部在字段445、450、455和465中分别包括TTL、VL、ECN和帧信用信息。 In this implementation, all of the data frame contains at least the short preamble, the head portion each include a short TTL, VL, ECN and credit information in the field frames 445,450,455 and 465. 如果数据帧除需要携带存在于短头部中的信息外还需要携带与每个VL相关联的信用信息,则数据帧可包含长头部。 If the data frame is required to carry in addition to the information present in the short header also needs to carry the credit information associated with each VL, then the data frame may include a header length. 在该示例中,存在8个VL以及用于指示每个VL的缓冲器信用的8个相应字段。 In this example, there are eight VL and eight fields corresponding to each buffer for indicating the VL of credit. 短DCE头部和长DCE头部两者的使用减少了在所有帧中携带信用信息的开销。 The use of both short and long DCE DCE head head reduces the overhead carry credit information in all frames.

当没有要发送的数据帧时,本发明的一些实施例使得链路管理帧(“LMF”)被发送,以宣告信用信息。 When there is no data frame to be transmitted, some embodiments of the present invention is such that the link management frame ( "LMF") is sent, to announce credit information. LMF还可用于携带来自接收者的缓冲器信用或者携带来自发送者的所发送的帧信用。 LMF buffer credits may also be used to carry or carries a frame from the recipient from the sender's credit is transmitted. LMF应当在无信用的情况下被发送(帧信用=0),因为它优选被端口消耗,而不被转发。 LMF should be sent (frame credits = 0) in the absence of credit, the port since it is preferably consumed, without being forwarded. LMF可被周期性地发送和/或响应于预定条件发送,例如在每10MB的有效载荷被数据帧传送之后。 After LMF can be transmitted periodically and / or in response to a predetermined transmission condition, for example, each data frame is transmitted in the payload 10MB.

图5示出了根据本发明的一些实现方式的LMF格式的示例。 FIG 5 shows an example of the format of LMF some implementations of the present invention. LMF 500开始于标准的6B以太网字段510和520,它们分别用于帧的目的地地址和源地址。 LMF 500 begins 6B standard Ethernet field 510 and a destination address and a source address 520, respectively, for the frame. 协议类型头部530指示之后是DCE头部540,该DCE头部在本示例中是短DCE头部(例如长头部字段=0)。 After the header protocol type is DCE 530 indicates the head 540, the head portion is short DCE DCE head in this example (e.g., header length field = 0). DCE头部540的VL、TTL、ECN和帧信用字段被发送者设置为零并且被接收者忽略。 VL 540 of DCE header, TTL, ECN field and the frame is sent by the credit set to zero and ignored recipient. 因此,LMF可由以下特性标识:Protocol_Type=DCE_Header并且Long_Header=0并且Frame_Credit=0。 Thus, LMF identified by the following characteristics: Protocol_Type = DCE_Header and Long_Header = 0 and Frame_Credit = 0.

字段550指示活动VL的接收者缓冲器信用。 Field 550 indicates activity VL recipient buffer credit. 在本示例中,存在8个活动VL,因此由字段551至558指示每个活动VL的缓冲器信用。 In the present example, there are eight active VL, so the field is indicated for each active VL 551-558 buffer credit. 类似地,字段560指示发送设备的缓冲器信用,因此由字段561至568指示每个活动VL的帧信用。 Similarly, field 560 indicates a buffer credit transmitting device, thus indicated by the fields 561 to 568 for each active frame VL credit.

LMF 500不包含任何有效载荷。 LMF 500 does not contain any payload. 如果必要的话,就像本示例中一样,LMF 500被填充字段570填充到64字节,以创建合法的最小大小的以太网帧。 If necessary, just as in this example, LMF 500 is filled with padding field 570 to 64 bytes of Ethernet frame to create a legal minimum size. LMF 500终止于标准的以太网CRC字段580。 LMF 500 terminates in a standard Ethernet CRC field 580.

一般来说,本发明的缓冲器到缓冲器信用计入方案是根据以下两条规则来实现的:(1)发送者在来自接收者的信用数大于或等于要发送的帧所需的信用数时发送该帧;以及(2)接收者在其能够接受额外的帧时向发送者发送信用。 In general, the present invention is the buffer to the buffer included in the credit scheme is implemented according to the following two rules: (1) the sender's number from the number of credits in the credit recipient than or equal to the desired frame to be transmitted when transmitting the frame; and (2) the receiver transmits a credit to the sender when it is able to accept additional frames. 如上所述,利用数据帧或LMF中的任何一种都可以补充信用。 As described above, by using any one of a data frame or may be supplemented in the LMF credit. 仅当至少存在数目等于帧长度(排除DCE头部的长度)的信用时,才允许端口为特定VL发送帧。 Only when there is at least equal in number to the frame length (excluding the length of the DCE header) of credit allowed for a particular port to send a frame VL.

如果使用暂停帧而不是信用,则应用类似的规则。 If you use the pause frame instead of credit, similar rules apply. 发送者在帧未被接收者暂停时发送该帧。 The sender transmits the frame when the frame is not the recipient suspended. 接收者在无法接受额外的帧时向发送者发送暂停帧。 Receiver sends a pause frame to the sender if you can not accept additional frames.

以下是数据传输和信用补充的简化示例。 The following is a simplified example of data transfer and credit supplement. 图6A示出了从交换机B发送到交换机A的数据帧605,其具有短DCE头部。 6A shows a transmitted data frame from switch B to switch A 605, having a short DCE header. 在分组605到达交换机A之后,它将被保存在缓冲器610的存储器空间608中。 After the packet arrives at the switch 605 A, it will be stored in the buffer 610 memory space 608. 由于缓冲器610的存储器中有一些被消耗了,因此交换机B的可用信用将会有相应的减少。 Since the memory buffer 610 is consumed some, so the available credit switch B will have a corresponding decrease. 类似地,当数据帧615(也具有短DCE头部)被从交换机A发送到交换机B时,数据帧615将会消耗缓冲器620的存储器空间618,从而交换机A可用的信用将会相应减少。 Similarly, when a data frame 615 (also having a short DCE header) when it is transmitted from switch A to switch B, and the data buffer 620 the frame 615 will consume memory space 618, so that the switch A credit available will be reduced accordingly.

但是,在帧605和615已被转发之后,在发送方交换机的缓冲器中相应的存储器空间将会可用。 However, after the frame 605 and 615 has been forwarded, a sender in a buffer in the switch corresponding memory space will be available. 在某个时刻,例如周期性地或根据需要地,该缓冲器空间再次可用这一事实应当被传送给链路另一端的设备。 At some point, for example, periodically or as needed, the fact that buffer space is available again be transmitted to the other end of the link apparatus. 具有长DCE头部的数据帧和LMF被用于补充信用。 Having a long DCE header and the data frame LMF be used to complement credit. 如果不补充信用,则可使用短DCE头部。 If you do not replenish the credit, you can use the short DCE head. 虽然一些实现方式对所有传送都使用较长的DCE头部,但是这种实现方式的效率不那么高,这是因为为不包含关于信用补充的信息的分组消耗了超额的带宽。 While some implementations of all transmissions use a longer DCE head, but the efficiency of this implementation is not so high, it is because consuming excess bandwidth packet does not contain supplementary information about the credit.

图6B示出了本发明的信用信令方法的一个示例。 FIG 6B illustrates an example of a credit signaling method of the present invention. 传统的信用信令方案650通告接收者希望返回的新信用。 Traditional credit signaling scheme 650 recipient notices hope that the new credit returned. 例如,在时刻t4,接收者希望返回5个信用,因此值5被携带在帧中。 For example, at time t4, 5 credits recipient desires to return, the value of 5 is carried in the frame. 在时刻t5,接收者没有信用要返回,因此值0被携带在帧中。 At time t5, no credit to return to the recipient, the value 0 is carried in the frame. 如果在时刻t4帧丢失,则五个信用丢失。 If you lose a frame at a time t4, the five credit loss.

DCE方案660通告累积的信用值。 DCE program 660 notices accumulated credit value. 换言之,每个通告将要返回的新信用加到先前返回的信用的总数模m(对于8位,m为256)。 In other words, each advertisement to the new credit is added to return the total number of credits previously returned modulus m (for 8-bit, m is 256). 例如,在时刻t3,从链路初始化开始返回的信用总数是3;在时刻t4,由于需要返回5个信用,因此将5加到3,并且在帧中发送8。 For example, at time t3, the total number of credits returned from the start link initialization is 3; at time t4, the need to return credits 5, 5 thus added to 3, and 8 in a frame transmitted. 在时刻t5,不需要返回信用,从而再次发送8。 At time t5, no return credit, thereby sending 8 again. 如果在时刻t4帧丢失,那么没有信用丢失,因为在时刻t5帧包含相同的信息。 If you lose a frame at a time t4, so no credit is lost, because it contains the same information at the time t5 frame.

根据本发明的一种示例性实现方式,接收方DCE交换机维护以下信息(其中VL指示信息是针对每个虚拟通道维护的):·BufCrd[VL]-按能够发送的信用数递增的模数计数器;·BytesFromLastLongDCE-自最后一个长DCE头部以来发送的字节数;·BytesFromLastLMF-自最后一个LMF以来发送的字节数;·MaxIntBetLongDCE-在发送长DCE头部之间的最大间隔; According to an exemplary implementation of the present invention, the receiver side DCE switch maintains the following information (information indicating where VL is maintained for each virtual channel): · BufCrd [VL] - number of credits that can be transmitted by incrementing the modulo counter ; the number of bytes · BytesFromLastLongDCE- since last a long DCE header sent; the number of bytes · BytesFromLastLMF- since last LMF sent; · MaxIntBetLongDCE- maximum interval between sending long DCE head;

·MaxIntBetLMF-在发送LMF之间的最大间隔;以及·FrameRx-按接收帧的FrameCredit字段递增的模数计数器。 · MaxIntBetLMF- maximum interval between transmission LMF; and · FrameRx- FrameCredit field of the received frame by incrementing the modulo counter.

发送DCE交换机端口维护以下信息:·LastBufCrd[VL]-接收者的BufCrd[VL]变量的最后估计值;以及·FrameCrd[VL]-按用于发送帧的信用数递增的模数计数器。 Transmitting DCE switch port maintains the following information: · LastBufCrd [VL] - BufCrd recipient [the VL] Finally, the estimated value of the variable; and · FrameCrd [VL] - a frame for transmitting a credit according to the number of the modulo counter is incremented.

当链路建立时,链路每一端的网络设备将会协商DCE头部的存在性。 When the link is established, each end of the link network equipment will negotiate the presence of the head of the DCE. 如果头部不存在,则网络设备例如将会简单地使链路能够进行标准以太网操作。 If the head does not exist, then the network device, such that the link will simply perform a standard Ethernet operation. 如果头部不存在,则网络设备将会启用根据本发明一些方面的DCE链路的特征。 If the head is not present, the network device will feature of the DCE link to some aspects of the present invention is enabled.

图6C是示出根据本发明的一些实现方式如何初始化DCE链路的流程图。 FIG 6C is a flowchart showing how to initialize the link DCE according to some implementations of the present invention. 本领域的技术人员将会意识到,方法680(与这里描述的其他方法一样)的步骤不需要按所指示的顺序执行,并且在一些情况下没有按所指示的顺序执行。 Those skilled in the art will appreciate that the method 680 (and other methods as described herein) steps need not be performed in the order indicated, and are not executed in the order indicated in some cases. 此外,这些方法的一些实现方式包括比所指示的更多或更少的步骤。 In addition, some implementations of these methods include the indicated more than or fewer steps.

在步骤661中,两个交换机端口之间的物理链路建立,并且在步骤663中,第一分组被接收。 In step 661, the physical link established between the two switch ports, and in step 663, the first packet is received. 在步骤665中,确定(由接收方端口)该分组是否具有DCE头部。 In step 665, it is determined (the destination port) if the packet has a DCE header. 如果没有,则使该链路能够传送标准以太网流量。 If not, that the link is capable of transmitting standard Ethernet traffic. 如果该分组具有DCE头部,则端口执行步骤以将该链路配置为DCE链路。 If the packet has a DCE header, the port to the step link configured as DCE link. 在步骤671中,接收者和发送者将与链路上的流量相关的所有阵列清零。 In step 671, the receiver and the sender associated with the array of all traffic on the link is cleared. 在步骤673中,MaxIntBetLongDCE的值被初始化为配置的值,并且在步骤675中,MaxIntBetLMF被初始化为配置的值。 In step 673, MaxIntBetLongDCE value is initialized to the value of the configuration, and in step 675, MaxIntBetLMF is initialized to a configured value.

在步骤677中,两个DCE端口优选地通过发送LMF来交换每个VL的可用信用信息。 In step 677, two ports are preferably DCE credit available to exchange information by sending each VL LMF. 如果某个VL不被使用,则它的可用信用被宣告为0。 If a VL is not used, then it's available credit is declared zero. 在步骤679中,使链路能够传送DCE,并且包括数据帧在内的常规DCE流量可根据这里描述的方法在该链路上发送。 At step 679, that the link capable of transmitting DCE, and the conventional flow DCE including a data frame may be sent on that link according to the methods described herein.

为了在存在单个帧丢失的情况下正确工作,优选实现方式的DCE自恢复机制要求帧中通告的信用的最大数目小于最大可通告值的1/2。 For proper operation the maximum number in the presence of a single frame loss, DCE preferred implementation of the self-recovery mechanism requires frame advertised credits advertised is less than the maximum value of 1/2. 在短DCE头部的一些实现方式中,每个信用字段为8位,即等于256的值。 In some implementations, the short DCE header, each of the credit field is 8 bits, i.e. equal to a value of 256. 从而,在单个帧中可通告最多达127个额外信用。 Thereby, up to 127 additional credits can be advertised in a single frame. 127个信用的最大值是合理的,因为最坏情况由一个方向上的一长串最小大小帧和相反方向上的单个巨大帧所代表。 Maximum of 127 credits is reasonable, since the worst case is represented by the minimum size of the frame and a long list of great single frame in the opposite direction in one direction. 在传送9KB的巨大帧期间,最小大小帧的最大数目约为9220B/84B=110个信用(假定9200字节的最大传送单位和20字节的IPG和前导)。 During the great 9KB frame transmission, the maximum size of the frame is about the minimum number 9220B / 84B = 110 credits (assuming the maximum transmission unit 9200 bytes and 20 bytes of a preamble and IPG).

如果多个连续的帧丢失,则LMF恢复方法可“修复”链路。 If multiple consecutive frame loss, the LMF recovery method can "fix" link. 一个这种LMF恢复方法基于以下观点,即在一些实现方式中,由DCE交换机的端口维护的内部计数器为16位,但是为了节省带宽,只有较低的8位在长DCE头部中发送。 Such a LMF recovery method is based on the idea that in some implementations, the switch maintained by the port DCE internal counter is 16 bits, but in order to save bandwidth, transmit only the lower 8 bits long DCE header. 如果没有连续帧丢失则这一方式工作得很好,如前所述。 If there are no consecutive frames are lost in this way works well, as previously described. 当链路经历多个连续差错时,长DCE头部可能不再能够同步计数器,但这却通过包含所有计数器的全部16位的LMF实现了。 When the link is subjected to a plurality of consecutive errors, the long DCE header may no longer be able to synchronize the counter, but it implements all through LMF 16 contains all the counters. 8个额外的位允许了恢复多256倍的差错,即总共512个连续差错。 8 extra bits allow the error to recover more than 256 times, for a total of 512 consecutive errors. 优选地,在遇到这种情况之前,链路被声明为不可工作并被重置。 Preferably, before this happens, the link is declared inoperable and reset.

为了实现低延迟以太网系统,必须考虑至少3种一般类型的流量。 In order to achieve low-latency Ethernet system, at least three general types of traffic must be considered. 这些类型是IP网络流量、存储流量和集群流量。 These are the type of IP network traffic, storage traffic and cluster traffic. 如上面详细描述的,LLE为“无丢弃”VL提供了适合于例如存储流量的类似FC的特性。 As described above in detail, LLE providing similar properties suitable for e.g. FC storage traffic "No discard" VL. “无丢弃”VL不会丢失分组/帧,并且可以根据例如图8所示的简单栈来提供。 "No discard" VL is not lost packets / frames, and may be provided according to a simple stack such as that shown in FIG. 8. 只有一小“片”LLE上FC(FC over LLE)810处于LLE层805和FC第2层(815)之间。 Only a small "sheet" LLE on FC (FC over LLE) 810 LLE layer 805 is between the second layer and FC (815). 层815、820和825与FC栈750的那些相同。 Layers 815, 820 and 825 and 750 the same as those of the FC stack. 因此,以前运行在FC上的存储应用现在可以运行在LLE上。 Therefore, before running on FC storage applications can now run on LLE.

现在将参考图9A、9B和10描述根据LLE上FC层810的一个示例性实现方式的FC帧到以太网上FC(FC over Ethernet)帧的映射。 Referring now to FIGS. 9A, 9B, and describes the mapping Ethernet FC (FC over Ethernet) frames 10 on the LLE exemplary implementation of a FC 810 FC frame to the layer. 图9A是FC帧的简化版本。 FIG 9A is a simplified version of the FC frame. FC帧900包括SOF 905和EOF 910,它们是有序的符号集合,不仅用于限定帧900的边界,还用于传达诸如帧的种类、帧是序列(一组FC帧)的开始还是结束,帧是正常还是非正常之类的信息。 FC frame 900 includes SOF 905 and EOF 910, which is an ordered set of symbols, not only to define the frame boundaries 900, further for communicating such frame type, the frame is a start sequence (a group of FC frame) or at the end, frame is a normal and non-normal information and the like. 这些符号中的至少一些是非法的“代码违规(code violation)”符号。 At least some of these symbols is illegal "code violation (code violation)" symbol. FC帧900还包括24位的源FC ID字段915、24位的目的地FC ID字段920和有效载荷925。 FC frame 900 further comprises a 24-bit source FC ID field of 915,24 bit destination FC ID field 920 and a payload 925.

本发明的一个目标是在以太网上传达FC帧(例如FC帧900)中包含的存储信息。 An object of the present invention is to convey information comprises storing FC frame (e.g. frame FC 900) over Ethernet. 图10示出了用于能够传达这种存储流量的LLE的本发明的一种实现方式。 FIG. 10 illustrates one implementation of the present invention can be used to convey the LLE flow of such storage. 网络1000包括LLE云1005,设备1010、1015和1020附接到该LLE云。 LLE cloud 1005 comprises network 1000, devices 1010, 1015 and 1020 attached to the LLE cloud. LLE云1005包括多个LLE交换机1030,其示例性体系结构在本文中其他地方讨论。 LLE LLE cloud 1005 includes a plurality of switches 1030, which is exemplary architecture discussed elsewhere herein. 设备1010、1015和1020可以是主机设备、服务器、交换机等等。 Equipment 1010, 1015 and 1020 can be host devices, servers, switches, and so on. 存储网关1050将LLE云1005与存储设备1075相连。 Storage gateway LLE cloud 1050 connects storage devices 1075 and 1005. 出于移动存储流量的目的,网络100可被配置为充当FC网络。 For the purpose of moving the storage traffic, network 100 may be configured to act as the FC network. 因此,设备1010、1015和1020的端口分别具有其自己的FC ID,并且存储设备1075的端口具有FC ID。 Thus, the device ports 1010, 1015 and 1020 each have its own FC ID, and the storage device 1075 has a port FC ID.

为了在设备1010、1015和1020与存储设备1075之间高效地移动存储流量(包括帧900),本发明的一些优选实现方式将来自FC帧900的字段的信息映射到LLE分组950的相应字段。 For mobile storage traffic (including frame 900) efficiently between devices 1010, 1015 and 1020 and storage devices 1075, some preferred implementations of the present invention from the information field of the FC frame 900 are mapped to corresponding fields of LLE packet 950. LLE分组950包括SOF 955、目的地MAC字段的组织ID字段965和设备ID字段970、源MAC字段的组织ID字段975和设备ID字段980、协议类型字段985、字段990和有效载荷995。 LLE packet 950 includes SOF 955, the destination MAC field organization ID field 965 and the device ID field 970, a source MAC field organization ID field 975 and the device ID field 980, a protocol type field 985, field 990 and a payload 995.

优选地,字段965、970、975和980都是24位字段,符合常规以太网协议。 Preferably, the fields 965,970,975, and 980 are 24-bit field, consistent with conventional Ethernet protocol. 因此,在本发明的一些实现方式中,FC帧900的目的地FC ID字段的内容被映射到字段965或970中的一个,优选映射到字段970。 Thus, in some implementations of the present invention, the contents of destination FC ID field of the FC frame 900 are mapped to a field 965 or 970, preferably 970 to a field map. 类似地,FC帧900的源FC ID字段的内容被映射到字段975或980中的一个,优选映射到字段980。 Similarly, the content source FC ID field of the FC frame 900 are mapped to a field 975 or 980, preferably 980 to a field map. 优选地,将FC帧900的目的地FC ID字段915和源FC ID字段920的内容分别映射到LLE分组950的字段970和980,因为约定俗成地,IEEE为单个组织代码指派许多设备代码。 Preferably, the contents of destination FC ID field 900 of FC frame 915 and the source FC ID field 920 are mapped to fields 970 LLE packet 950 and 980, because the convention, IEEE many devices assigned to a single code for the organization code. 这种映射功能例如可由存储网关1050执行。 This mapping function may be performed by, for example, a storage gateway 1050.

因此,FC帧到LLE分组的映射部分地可通过向IEEE购买与一组设备代码相对应的组织唯一标识符(“OUI”)代码来实现。 Thus, FC frame to the mapping part LLE packet to the IEEE later by a set of devices with the code corresponding to the Organizationally Unique Identifier ( "OUI") code to implement. 在一个这种示例中,当前的受让人Cisco Systems支付了OUI的注册费,并将OUI指派给“以太网上的FC”。 In one such example, the current assignee Cisco Systems to pay the registration fee OUI, and OUI assigned to "FC over Ethernet." 根据本发明的这一方面配置的存储网关(例如存储网关1050)将OUI置于字段965和975中,将目的地FC ID字段915的24位内容拷贝到24位字段970,并将源FC ID字段920的24位内容拷贝到24位字段980。 The storage of the gateway configuration of this aspect of the present invention (e.g., storage gateway 1050) placed the OUI field 965 and 975, destination FC ID field 915 of the content copy 24 to the 24-bit field 970, and source FC ID field 24 contents of 920 copied to 980 24-bit field. 存储网关在协议类型字段985中插入指示以太网上FC的代码,并将有效载荷925的内容拷贝到有效载荷字段995。 Gateway stored in the protocol type field 985 indicates Ethernet FC inserted code and the content payload 925 is copied to the payload field 995.

由于上述映射,不需要在存储网络上明确指派MAC地址。 Due to the above mapping, the MAC address need not be explicitly assigned on the storage network. 然而,由于映射,目的地和源FC ID的以算法导出的版本被编码在了LLE帧的相应部分中,这些相应部分在常规以太网分组中将被指派给目的地和源MAC地址。 However, since the version of the mapping algorithm is derived, source and destination FC ID is encoded in the corresponding portion of LLE frame, which is assigned to the corresponding portions of the source and destination MAC address in the conventional Ethernet packet. 通过就好像这些字段是MAC地址字段那样利用这些字段的内容,可在LLE网络上路由存储流量。 As the contents of these fields by use if the fields are the MAC address field, may be stored by the road network traffic LLE.

SOF字段905和EOF字段910包含有序的符号集合,其中一些(例如用于指示FC帧的开始和结束的那些)是保留的符号,有时这些符号被称为“非法”或“代码违规”符号。 SOF EOF field 905 and field 910 contains an ordered set of symbols, some of which (e.g., those used to indicate the beginning and end of FC frame) is reserved symbols, these symbols are sometimes referred to as "illegal" or the "code violation" symbol . 如果这些符号之一被拷贝到LLE分组950内的某个字段(例如字段990),则该符号将会导致差错,例如通过指示LLE分组950应当在该符号处终止。 If one of these symbols is copied into a field (e.g., field 990) of the LLE packet 950, the symbol error will lead to, for example, should terminate at the symbol indicated by LLE packet 950. 但是,由这些符号传达的信息必须被保留,因为它指示了FC帧的种类,帧是序列的开始还是结束,以及其他重要信息。 However, the information conveyed by these symbols must be preserved, because it indicates the type of FC frame, the frame is the beginning or end of the sequence, and other important information.

因此,本发明的优选实现方式提供了将非法符号转换成合法符号的另一种映射功能。 Thus, a preferred implementation of the present invention are provided to convert illegal symbols into another legal symbol mapping function. 这些合法符号随后可被插入在LLE分组950的内部部分中。 These legal symbols may then be inserted into the inner portion of LLE packet 950. 在一个这种实现方式中,经转换的符号被置于字段990中。 In one such implementation, the converted symbol field 990 is placed. 字段990不需要很大;在一些实现方式,它的长度仅为1或2个字节。 Field 990 need not be great; In some implementations, its length is only 1 or 2 bytes.

为了允许贯通(cut-through)交换的实现,字段990可被分割成两个单独的字段。 In order to allow through (cut-through) to achieve exchange, field 990 may be divided into two separate fields. 例如,一个字段可以位于帧开始处,另一个可位于帧的另一端。 For example, a field may be located at the beginning of the frame, the other may be located at the other end of the frame.

前述方法只是用于将FC帧封装在扩展的以太网帧内的各种技术的一个示例。 The method is only one example of various techniques for the FC frame is encapsulated in the extended Ethernet frame. 其他方法包括任何便利的映射,例如包括从三元组{VSAN,D_ID,S_ID}导出三元组{VLAN,DST MAC Addr,Src MAC Addr}。 Other methods include any convenient mapping, for example ranging from triplet {VSAN, D_ID, S_ID} derived triplet {VLAN, DST MAC Addr, Src MAC Addr}.

上述映射和符号转换过程产生了LLE分组,例如LLE分组950,其允许去往或来自基于FC的存储设备1075的存储流量经由LLE云1005被转发到末端节点设备1010、1015和1020。 Above mapping and symbol conversion process produces LLE packet, e.g. LLE packet 950, which allows flow to or from the FC-based storage memory device 1075 via the LLE cloud 1005 is forwarded to the end node apparatus 1010, 1015 and 1020. 映射和符号转换过程例如可以由存储网关1050在逐帧基础上运行。 Mapping and symbol conversion processes, for example, may be executed by a storage gateway 1050 on a frame.

因此,本发明提供了用于在FC-以太网云的入口边缘处将FC帧封装在扩展的以太网帧内的示例性方法。 Accordingly, the present invention provides a method of an exemplary extended Ethernet frame is encapsulated FC frame for the edge of the inlet FC- Ethernet cloud. 本发明的类似方法提供了在以太网-FC云的出口边缘处执行的相反过程。 Similar methods of the present invention provides a process performed at the outlet of the opposite edges of the Ethernet -FC cloud. FC帧可从扩展以太网帧中解封出来,然后在FC网络上传送。 FC frame may be deblocked from the extended Ethernet frame FC is then transmitted on the network.

一些这样的方法包括这些步骤:接收以太网帧(例如按这里描述的方式封装);将以太网帧的目的地MAC字段的第一部分的目的地内容映射到FC帧的目的地FC ID字段;将以太网帧的源MAC字段的第二部分的源内容映射到FC帧的源FC ID字段;将以太网帧的合法符号转换成非法符号;将非法符号插入到FC帧的所选字段中;将以太网帧的有效载荷字段的有效载荷内容映射到FC帧有效载荷字段;以及在FC网络上传送FC帧。 Some such methods include these steps: receiving an Ethernet frame (e.g., encapsulating the manner described herein); the destination of the first part of the destination MAC field of the Ethernet frame is mapped to the destination FC ID field of the FC frame; and source of the second part of the source MAC field of the Ethernet frame is mapped to the source FC ID field of the FC frame; converting Ethernet frames into valid symbol illegal symbols; illegal symbols inserted into the selected field FC frame; and payload content payload field of an Ethernet frame is mapped to the payload field FC frame; and transmitting the frame on FC FC network.

不需要保留关于帧的状态信息。 You do not need to maintain state information about the frame. 因此,可以迅速地处理帧,例如以40Gb/s的速率处理帧。 Thus, the frame can be processed quickly, for example at a rate of 40Gb / s frame process. 末端节点可基于SCSI运行存储应用,因为存储应用能看到图8所示的LLE栈800的SCSI层825。 End node may run on SCSI storage applications, since the memory application to see the stack shown in FIG. 8 LLE layer 825,800 of SCSI. 不同于经由专用于FC流量的交换机(例如图1所示的FC交换机140和145)转发存储流量,这种FC交换机可由LLE交换机1030所取代。 Unlike forward dedicated to storage traffic via the FC switch traffic (e.g. FC switch 140 of FIG. 1 and 145), the FC switch 1030 may be replaced by LLE switches.

此外,LLE交换机的功能允许了空间强大的管理灵活性。 In addition, LLE function switch allows space for strong management flexibility. 参考图11,在一种管理方案中,LLE云1105的LLE交换机1130中的每一个可被视为单独的FC交换机。 Referring to FIG 11, in one management program, LLE switches 1105 LLE cloud 1130 may each be considered a separate FC switch. 或者,LLE交换机1130中的一些或全部可被聚集起来,并且出于管理目的被视为FC交换机。 Or, LLE 1130 switch some or all may be gathered, and for administrative purposes is considered an FC switch. 例如,出于网络管理目的,通过将LLE云1105中的所有LLE交换机视为单个FC交换机,形成了虚拟FC交换机1140。 For example, for network management purposes, by all LLE LLE switch cloud in 1105 as a single FC switch, forming a virtual FC switch 1140. 个别LLE交换机1130的所有端口例如可被视为虚拟FC交换机140的端口。 LLE all individual switch ports 1130, for example, can be regarded as a virtual port FC switch 140. 或者,可以聚集较少量的LLE交换机1130。 Alternatively, the aggregate may be a lesser amount of LLE switches 1130. 例如,3个LLE交换机被聚集起来以形成虚拟FC交换机1160,4个LLE交换机被聚集起来以形成虚拟FC交换机1165。 For example, three LLE switches are aggregated together to form a virtual FC switch 1160,4 LLE switches are aggregated together to form virtual FC switch 1165. 网络管理者可通过考虑个别LLE交换机具有多少端口等等来决定聚集多少交换机。 Network managers can have the number of ports by considering individual LLE switches and so on to determine how much the switch aggregation. 通过将每个LLE交换机视为一个FC交换机,或者通过将多个LLE交换机聚集成一个虚拟FC交换机,可实现FC的控制平面功能,例如分区(zoning)、DNS、FSPF和其他功能。 Each LLE by the FC switch as a switch, or by a plurality of LLE switches aggregated into a virtual FC switch, enabling the FC control plane functions, such as zoning (zoning), DNS, FSPF and other features.

此外,同一LLE云1105可支持许多虚拟网络。 In addition, the same cloud LLE 1105 can support many virtual network. 虚拟局域网(“VLAN”)是本领域中已知的,用于提供虚拟的基于以太网的网络。 Virtual local area network ( "VLAN") is known in the art for providing a virtual Ethernet-based network. 题为“Interswitch Link Mechanism for Connecting High-Performance NetworkSwitches”的美国专利No.5,742,604描述了相关系统,这里通过引用将其结合进来。 US Patent entitled "Interswitch Link Mechanism for Connecting High-Performance NetworkSwitches" the No.5,742,604 describes a related system, here incorporated by reference. 本受让人的各种专利申请,包括2001年12月26日递交的题为“Methods And Apparatus For Encapsulating A Frame For Transmission In AStorage Area Network”的美国专利申请No.10/034,160,提供了用于为基于FC的网络实现虚拟存储区域网(“VSAN”)的方法和设备。 Various patent applications present assignee, including the December 26, 2001 filed entitled "Methods And Apparatus For Encapsulating A Frame For Transmission In AStorage Area Network" US Patent Application No.10 / 034,160, provides for to achieve the virtual storage area network ( "VSAN") method and apparatus for FC-based network. 这里通过引用将该申请完全结合进来。 The fully incorporated herein by reference in its application to come. 由于LLE网络既能支持以太网流量又能支持FC流量,本发明的一些实现方式实现了为FC和以太网流量两者在同一物理LLE云上形成虚拟网络。 Since the network LLE can support both Ethernet traffic FC traffic, some implementations of the present invention achieves both FC and Ethernet traffic for the virtual network is formed on the same physical LLE cloud.

图12是示出根据本发明一个实施例的DCE交换机1200的简化体系结构的示意图。 FIG 12 is a schematic diagram illustrating a simplified architecture DCE switch 1200 according to an embodiment of the present invention. DCE交换机1200包括N个线路卡,每个线路卡的特征在于入口侧(或输入)1205和出口侧(或输出)1225。 DCE switch 1200 includes N line cards, each line card is characterized in that the inlet side (or input) 1205 and an outlet side (or output) 1225. 线路卡入口侧1205经由交换结构1250连接到线路卡出口侧1225,在本示例中该交换结构包括纵横交换机(crossbar)。 The inlet side of line cards 1205 are connected via the switch fabric 1250 to 1225 line card outlet side, in this example, the switch structure comprising a crossbar switch (crossbar).

在该实现方式中,在输入和输出侧都执行缓冲。 In this implementation, perform buffering input and output sides. 也可能实现其他体系结构,例如具有输入缓冲器、输出缓冲器和共享存储器的那些。 Other architectures may also be achieved, for example, those having an input buffer, an output buffer, and shared memory. 因此,输入线路卡1205中的每一个包括至少一个缓冲器1210,并且输出线路卡1225中的每一个包括至少一个缓冲器1230,所述缓冲器可以是本领域已知的任何便利类型的缓冲器,例如外部的基于DRAM的缓冲器或片上的基于SRAM的缓冲器。 Thus, the input line cards 1205 includes at least one buffer 1210, and each output line card 1225 includes at least one buffer 1230, the buffer may be any convenient buffer type known in the art , for example based on a buffer DRAM or SRAM-based sheet external buffer. 缓冲器1210例如用于输入缓冲,以便在等待输出线路卡处有足够缓冲器可用于存储要经由交换结构1250发送的分组的同时暂时保存分组。 Input buffer for example the buffer 1210, to wait at the output line card has sufficient storage buffer may be used for temporarily storing a packet switched via packet structure 1250 simultaneously transmitted. 缓冲器1230例如用于输出缓冲,以便在等待有足够信用用于要发送到另一DCE交换机的分组的同时暂时保存接收自输入线路卡1205中的一个或多个的分组。 Buffer 1230 for example, an output buffer, so that while there is sufficient credit in waiting for a packet to be transmitted to another DCE switch temporarily stores the packet received from the input line cards 1205 of one or more.

值得注意的是,虽然信用可在交换机内部和外部使用,但是在内部和外部信用之间不一定存在一对一映射。 It is worth noting that, although the credit can be used in internal and external switch, but there is not necessarily one to one mapping between internal and external credit. 此外,可以在内部或外部使用暂停帧。 Further, the pause frame may be used internally or externally. 例如,四个可能组合暂停-暂停(PUASE-PAUSE)、暂停-信用(PAUSE-CREDITS)、信用-暂停(CREDITs-PAUSE)和信用-信用(CREDIT-CREDIT)中的任何一种都可产生可行的解决方案。 For example, four possible combinations Pause - pauses (PUASE-PAUSE), Pause - Credit (PAUSE-CREDITS), Credit - Pause (CREDITs-PAUSE) and credit - any kind of credit (CREDIT-CREDIT) can produce viable in s solution.

DCE交换机1200包括某种形式用于施加流控制的信用机制。 DCE switch 1200 include some form of credit for flow control mechanism is applied. 该流控制机制可在缓冲器1230之一的输出队列达到其最大容量时在缓冲器1210上施加反向压力。 Applying a reverse pressure on the buffer 1210 when the flow control mechanism can reach its maximum capacity of 1230 one of the output queue buffers. 例如,在发送帧之前,输入线路卡1205之一可在从输入队列1215向输出队列1235发送帧之前向仲裁器1240(它例如可以是位于中央位置的单独的芯片或分布在输出线路卡上的一组芯片)请求信用。 For example, prior to transmission frame, one of the input line cards 1205 may be output from the input queue to queue 1215 before the arbiter 1235 transmits frames to 1240 (for example, it may be located in a separate chip, or distributed in the center position of the output line card a set of chips) a credit request. 优选地,该请求例如根据DCE头部的帧信用字段指示帧的大小。 Preferably, for example, the request indicates the size of a frame according to a frame credit field of the DCE header. 仲裁器1240将会确定输出队列1235是否能够接受该帧(即输出缓冲器1230具有足够的空间来容纳该帧)。 The arbiter 1240 determines whether the output queue will be able to accept the frame 1235 (i.e., the output buffer 1230 has enough space to accommodate the frame). 如果能够,则信用请求将会被准予,并且仲裁器1240将会向输入队列1215发送信用授予。 If so, then the request will be granted credit, and the arbiter will send a credit grant 1240 to 1215 input queue. 但是,如果输出队列1235太满,则该请求将会被拒绝,并且不会向输入队列1215发送信用。 However, if too full output queue 1235, then the request will be rejected and not sent to the input queue 1215 credits.

如本文中其他地方所讨论的,DCE交换机1200需要能够支持虚拟通道所要求的“丢弃”、“无丢弃”和中间行为。 As discussed elsewhere herein, the, the DCE switch 1200 need to be able to support the required virtual channel "discard", "No discard" and an intermediate behavior. 部分地通过在内部向DCE交换机施加类似上面描述的某类信用机制来启用“无丢弃”功能。 Part similar to that described above by applying a certain credit mechanism within the DCE switch to turn on the "discard" function. 在外部,“无丢弃”功能可根据先前描述的缓冲器到缓冲器信用机制或暂停帧来实现。 Externally, "No discard" function to a buffer credit mechanism according to the previously described buffer or pause frames to achieve. 例如,如果输入线路卡1205之一正经历着通过内部信用机制来自一个或多个输出线路卡1225的反向压力,那么线路卡可经由类似FC的那种的缓冲器到缓冲器信用系统在上游方向上在外部传播反向压力。 For example, if one of the input line cards 1205 are experiencing 1225 through a back pressure mechanism internal credit output from one or more line cards, the line card may be a credit to the upstream system via a buffer similar to that of FC buffer back pressure propagation direction outside.

优选地,提供“无丢弃”和中间功能的同一芯片(例如同一ASIC)也将提供类似经典以太网交换机的那种的“丢弃”功能。 Preferably the same chip, providing "no-discard" and the intermediate function (e.g., the same ASIC) will also provide a similar kind of classic Ethernet switch to "discard" function. 虽然这些任务可被分配在不同芯片上,但是在同一芯片上提供丢弃、无丢弃和中央功能允许了以低得多的成本来提供DCE交换机。 While these tasks may be assigned on different chips, but dropped provided on the same chip, and without discarding the central feature allows much lower cost to provide DCE switch.

每个DCE分组将会在例如本文中其他地方描述的DCE头部中包含指示DCE分组所属的虚拟通道的信息。 Each packet will contain DCE DCE packet belongs, information indicating the virtual channel in DCE header described elsewhere herein, for example, in. DCE交换机1200将会根据DCE分组被指派到的VL是丢弃还是无丢弃VL来处理每个DCE分组。 The DCE DCE switch 1200 will be assigned to the packet is discarded or not discarded is VL VL DCE to process each packet.

图13示出了为VL划分缓冲器的示例。 FIG 13 shows an example of division of the buffer as VL. 在该示例中,指派了4个VL。 In this example, four assigned VL. VL 1305和VL 131O是丢弃VL。 VL 1305 and VL 131O discard VL. VL 1315和VL 1320是无丢弃VL。 VL 1315 and VL 1320 are not discarded VL. 在该示例中,输入缓冲器1300具有为每个VL指派的特定区域:VL 1305被指派到缓冲器空间1325,VL 1310被指派到缓冲器空间1330,VL 1315被指派到缓冲器空间1335,VL 1320被指派到缓冲器空间1340。 In this example, the input buffer 1300 has a specific area is assigned to each VL: VL 1305 is assigned to buffer space 1325, VL 1310 is assigned to buffer space 1330, VL 1315 is assigned to buffer space 1335, VL 1320 assigned buffer space 1340. VL 1305和VL 1310上的流量的管理方式很像常规的以太网流量,并且部分是根据缓冲器空间1325和1330的操作。 Management traffic on the VL 1305 and VL 1310 like a conventional Ethernet traffic, and the operating portion 1325 and 1330 in accordance with the buffer space. 类似地,VL 1315和1320的无丢弃特征部分是根据仅为缓冲器空间1335和1340启用的缓冲器到缓冲器信用流控制方案来实现的。 Similarly, VL features 1315 and 1320 without discarding the buffer space only partly on the buffer 1335 and 1340 to enable buffer credit flow control scheme to achieve.

在一些实现方式中,指派给VL的缓冲器空间的量可根据诸如以下标准动态地指派:缓冲器占用率、一天中的时间、流量负载/拥塞、保证最小带宽分配、已知的要求更大带宽的任务、最大带宽分配,等等。 In some implementations, the amount of buffer space assigned to the VL can be assigned dynamically according to the following criteria, such as: buffer occupancy, time of day, traffic load / congestion, guaranteed minimum bandwidth allocation, greater demands known bandwidth task, maximum bandwidth allocation, and so on. 优选地,公平原则将会应用来防止一个VL获得过多量的缓冲器空间。 Preferably, the fairness principle will apply a VL to prevent an excessive amount of buffer space is obtained.

在每个缓冲器空间内,在作为与目的地相关联的逻辑队列(虚拟输出队列或VOQ)的数据结构中存在数据的组织。 Within each buffer space, organized data structure present in the data as the logical queue associated with the destination (or virtual output queues VOQs) of. (“A Practical SchedulingAlgorithm to Achieve 100%Throughput in Input-Queued Switches”,AdisakMekkittikul和Nick McKeown,Computer Systems Laboratory,StanfordUniversity(InfoCom 1998)和其中引用的参考文献描述了用于实现VQO的相关方法,这里通过引用将它们结合进来)。 ( "A Practical SchedulingAlgorithm to Achieve 100% Throughput in Input-Queued Switches", AdisakMekkittikul and Nick McKeown, Computer Systems Laboratory, StanfordUniversity (InfoCom 1998) and references cited therein describe methods for achieving correlation of VQO herein by reference they are incorporated). 目的地优选是目的地端口/虚拟通道对。 Preferably destination port is the destination / virtual channel pair. 利用VOQ方案避免了在输出端口阻塞时和/或目的地输出端口的另一虚拟通道阻塞时引起的输入线路卡处的队头阻塞(head of lineblocking)。 Using VOQ approach avoids head of line blocking (head of lineblocking) at the input line cards in the output port blocking and / or causing another virtual channel blocked when the destination output port.

在一些实现方式中,在VL之间不共享VOQ。 In some implementations, the VL is not shared between VOQ. 在其他实现方式中,在丢弃VL之间或无丢弃VL之间可共享VOQ。 In other implementations, VL or between discarded without discard VOQ may be shared between the VL. 但是,在无丢弃VL和丢弃VL之间不应当共享VOQ。 However, no discard between VL and VL should not be discarded shared VOQ. 在一些实施例中,VOQ与单个缓冲器相关联。 In some embodiments, VOQ associated with a single buffer. 但是,在其他实施例中,VOQ可由多于一个缓冲器实现。 However, in other embodiments, more than one buffer may be implemented VOQs.

DCE交换机的缓冲器可实现各种类型的活动队列管理。 DCE switch buffer may implement various types of active queue management. DCE交换机缓冲器的一些优选实施例提供了至少4种基本类型的活动队列管理:流控制;出于拥塞避免目的,针对丢弃VL进行丢弃或针对无丢弃VL进行标记;丢弃以避免无丢弃VL中的死锁;以及为了延迟控制而丢弃。 Some preferred embodiments of DCE switches the buffer to provide at least four basic types of active queue management: flow control; for the purpose of avoiding congestion, discards marked for discard or VL VL for no discard; no discarded to avoid discarding the VL deadlock; and a delay control for discarding.

优选地,DCE网络的流控制具有至少两种基本表现。 Preferably, the flow control DCE network having at least two basic performance. 一种流控制表现是缓冲器到缓冲器的基于信用的流控制,它主要用于实现“无丢弃”VL。 A streaming control performance is a buffer to buffer credit based flow control, which is mainly used to implement the "no-discard" VL. 一些优选实现方式的另一种流控制表现包括显式上游拥塞通知。 Another flow control performance of some preferred implementations include an explicit upstream congestion notification. 该显式上游拥塞通知例如可由DCE头部的显式拥塞通知(“ECN”)字段实现,如本文中其他地方所描述。 The explicit upstream congestion notification may be explicit congestion notification DCE header ( "ECN") field to achieve, as described elsewhere herein.

图14示出了DCE网络1405,包括边缘DCE交换机1410、1415、1425和1430以及核心DCE交换机1420。 FIG 14 shows a DCE network 1405, including an edge DCE switch 1410,1415,1425 and 1430 and the core DCE switch 1420. 在这种情况下,核心DCE交换机1420的缓冲器1450实现3类流控制。 In this case, the buffer core DCE switch 1420 3 1450 implement flow control class. 一类是缓冲器到缓冲器流控制指示1451,它由缓冲器1450和边缘DCE交换机1410的缓冲器1460之间的缓冲器到缓冲器信用的授予(与否)来传达。 One is to the buffer flow control instruction buffer 1451, a buffer between it and the edge buffer 1450 buffers 14601410 DCE switches to the buffer credits awarded (or not) to convey.

缓冲器1450还发送2个ECN 1451和1452,这两个ECN都是经由DCE分组的DCE头部的ECN字段实现的。 2 1450 also transmits the buffer 1451 and ECN 1452, ECN are both via DCE ECN field of the packet header DCE implementation. ECN 1451可被视为核心到边缘通知,因为它是由核心设备1420发送并由边缘DCE交换机1410的缓冲器1460接收的。 ECN 1451 can be considered a notification core to the edge, since it is transmitted by the device 1420 by the edge of the core DCE switch 1410 receive buffer 1460. ECN 1452将被视为核心到末端通知,因为它是由核心设备1420发送并由末端节点1440的NIC卡1465接收的。 ECN 1452 is notified as a core to the tip, since it is transmitted by the device 1420 by the end of the core node 1465 NIC card 1440 is received.

在本发明的一些实现方式中,ECN是通过对存储到经历拥塞的缓冲器中的分组进行采样来生成的。 In some implementations of the present invention, ECN is stored to the buffer by the packet congestion is experienced in generating the sampling. 通过将其目的地地址设置为等于被采样分组的源地址,ECN被发送到该分组的源。 By its destination address is set equal to the source address of the packet is sampled, ECN is sent to the source of the packet. 边缘设备将会得知源是像末端节点1440那样支持DCE ECN,还是像末端节点1435那样不支持。 Edge equipment will be informed source as end nodes that support DCE ECN 1440, or as end nodes that do not support 1435. 在后一情况下,边缘设备1410将会终止ECN并实现适当的动作。 In the latter case, the edge device terminates ECN 1410 and implement appropriate action.

活动队列管理(AQM)将响应于各种标准被执行,这些标准包括但不限于缓冲器占用率(例如针对每个VL)、每个VOQ的队列长度以及VOQ中的分组的年龄。 Active queue management (the AQM) is executed in response to various criteria, which include but are not limited to, buffer occupancy (e.g., for the VL each), the queue length and the age of each VOQ of VOQ packet. 为了简单起见,在讨论AQM时,一般会假定在VL之间不共享VOQ。 For simplicity, in the discussion of AQM, we will generally assume that VOQ not shared between VL.

现在将参考图15描述根据本发明的AQM的一些示例。 15 will now be described with reference to FIG accordance with some examples of the present invention AQM. 图15示出了特定进刻的缓冲器使用情况。 FIG 15 shows a specific use of the buffer into engraved. 在该时刻,物理缓冲器1500的部分1505已被分配给丢弃VL,而部分1510已被分配给无丢弃VL。 At this time, the physical buffer portion 15051500 has been assigned to the VL discarded, while the part 1510 has been allocated to non-discarded VL. 如本文中其他地方所述,缓冲器1500被分配给丢弃VL或无丢弃VL的量可随时间而改变。 As described elsewhere herein, the buffer 1500 is allocated to the non-discarded VL or VL discards may change over time. 在分配给丢弃VL的部分1505中,部分1520当前正被使用,而部分1515当前未被使用。 Discarding the portion allocated to the 1505 VL, a portion 1520 is currently being used, and the portion 1515 is not in use.

在部分1505和1510内,存在许多VOQ,包括VOQ 1525、1530和1535。 1505 and 1510 in the part, there are many VOQs, including VOQ 1525,1530 and 1535. 在该示例中,确立了阈值VOQ长度L。 In this example, the threshold established VOQ length L. VOQ 1525和1535的长度大于L,而VOQ 1530的长度小于L。 VOQ 1525 and 1535 is greater than the length L, but less than 1530 VOQ length L. 长的VOQ指示下游拥塞。 VOQ length downstream congestion indication. 活动队列管理优选地防止了任何VOQ变得太大,这是因为否则的话影响一个VOQ的下游拥塞将会不利地影响去往其他目的地的流量。 Active queue management preferably prevents any VOQ becomes too large, because otherwise affect a VOQ downstream congestion will adversely affect traffic destined for other destinations.

VOQ中的分组的年龄是用于AQM的另一标准。 Age Group VOQ is another standard for the AQM. 在优选实现方式中,分组在进入缓冲器并被排队到适当的VOQ中时被加上时间戳。 When the time stamp is added to achieve a preferred embodiment, the packet into the buffer and is queued to the appropriate VOQ. 因此,分组1540在到达缓冲器1500时接收时间戳1545,并被根据其目的地和VL标示置于VOQ中。 Thus, the packet receiving time stamp 1540 1545 1500 upon reaching the buffer, and is labeled according to its destination and VL placed in VOQ. 如其他地方所述,VL标示将会指示是应用丢弃还是无丢弃行为。 As described elsewhere, VL is applied marking will indicate whether to discard or not discard behavior. 在该示例中,分组1540的头部指示分组1540将在丢弃VL上传送,并且具有与VOQ 1525相对应的目的地,因此分组1540被置于VOQ 1525中。 In this example, the 1540 packet header indicates that the packet is discarded in the transmitting 1540 the VL, and has a destination corresponding VOQ 1525, 1540 and therefore the packet is placed in the VOQ 1525.

通过比较时间戳1545的时刻和当前时刻,在之后的时刻可确定分组1540的年龄。 By comparing the time stamps of 1545 and the current time moment in time can be determined after the age of packets 1540. 在此上下文中,“年龄”仅指分组在交换机中花费的时间,而不是在网络中的某个其他部分中花的时间。 In this context, "age" refers only to time spent in a packet switch, rather than time spent in some other part of the network. 然而,通过分组的年龄可推断网络的其他部分的状况。 However, by age group inferred status of the rest of the network. 例如,如果分组的年龄变得相对较大,则这种状况指示去往分组的目的地的路径正经历着拥塞。 For example, if the age of the packet becomes relatively large, this situation indicates the path to the destination of the packet is experiencing congestion.

在优选实现方式中,年龄超过预定年龄的分组将被丢弃。 In a preferred implementation, the packet is older than a predetermined age will be discarded. 如果在确定年龄时发现VOQ中的若干个分组超过了预定的年龄阈值,则可以进行多个丢弃。 If it is found VOQ several of packets exceeds a predetermined threshold value in determining the age of age may be a plurality of discard.

在一些优选实现方式中,存在用于延迟控制(TL)和用于避免死锁(TD)的单独的年龄极限。 In some preferred implementations, the delay control for the presence of (TL) and a separate age limit to avoid deadlock (TD) of. 当分组到达TL时要采取的动作优选取决于分组是在丢弃VL还是在无丢弃VL上传送。 When the packet arrives TL action to be taken depends on the packet preferably is not discarded or transmitted on the discarded VL VL. 对于无丢弃通道上的流量,数据完整性比延迟更重要。 For the non-discarded traffic channel, the delay is more important than the integrity of data. 因此,在本发明的一些实现方式中,当无丢弃VL中的分组的年龄超过TL时,分组不被丢弃,而是采取另一动作。 Thus, in some implementations of the present invention, when a packet is not discarded VL older than TL, packets are not discarded, but take other action. 例如,在一些这样的实现方式中,分组可被标记和/或上游拥塞通知可被触发。 For example, in some such implementations, the packet may be marked and / or upstream congestion notification may be triggered. 对于丢弃VL中的分组,延迟控制相对更重要,因此当分组的年龄超过TL时采取更激进的动作较为适当。 For the VL packet discard, delay control relatively more important, and therefore more aggressive actions more appropriate when a packet is older than TL. 例如,可向该分组应用概率性丢弃函数。 For example, packets may be discarded probabilistically to the application function.

图16的图1600提供了概率性丢弃函数的一些示例。 1600 of FIG. 16 provides some examples of probabilistic drop function. 根据丢弃函数1605、1610和1615,当分组的年龄超过TCO,即延迟截止阈值时,随着分组的年龄增大达到TL,它将会被故意丢弃的概率从0%增大到100%,这取决于函数。 The discarding function 1605, 1610 and 1615, when the probability exceeds the age of the TCO packet, i.e. the delay cutoff threshold, with increasing age of the packet reaches TL, it will be dropped deliberately increased from 0% to 100%, which It depends on the function. 丢弃函数1620是阶梯函数,其故意丢弃的概率为0%,直到达到TL为止。 1620 discarding function is a step function, deliberately discard probability is 0%, so far until it reaches TL. 当分组的年龄达到TL时,丢弃函数1605、1610、1615和1620都达到100%的故意丢弃机率。 When the age of the packet reaches TL, the drop function 1605,1610,1615 and 1620 are 100% chance of intentionally discarded. 虽然TCO、TL和TD可以是任何便利的时间,但是在本发明的一些实现方式中,TCO在数十微秒量级,TL在数毫秒至数十毫秒量级,而TD在数百毫秒量级,例如500毫秒。 While TCO, TL and TD may be any convenient time, in some implementations of the present invention, the TCO in the order of tens of microseconds, TL several milliseconds to several tens milliseconds, and the amount of TD in the hundreds of milliseconds level, for example, 500 milliseconds.

如果丢弃VL或无丢弃VL中的分组的年龄超过TD,则分组将会被丢弃。 If the drop VL or VL packets without discarding the age exceeds TD, the packet will be discarded. 在优选实现方式中,针对无丢弃VL的TD比针对丢弃VL的TD要大。 In a preferred implementation, the TD ratio for the discarded without discard VL VL for the TD is larger. 在一些实现方式中,TL和/或TD也可部分取决于分组在其上传送的VL的带宽以及同时将分组传送到该VL的VOQ的数目。 In some implementations, TL and / or TD may also depend in part on the number of packets transmitted to the VOQ VL VL bandwidth in which the transport packet and simultaneously.

对于无丢弃VL,与图16所示类似的概率函数可被用于触发上游拥塞通知,或设置属于能够支持TCP ECN的连接的TCP分组的头部中的拥塞经历位(CE)。 For the VL dropped without a head, similar to the probability function shown in FIG. 16 may be used to trigger an upstream congestion notification, or set belonging to the ECN capable of supporting TCP connections TCP packet congestion experienced bit (CE).

在一些实现方式中,分组是否被丢弃、上游拥塞通知是否被发送以及TCP分组的CE位是否被标记不仅取决于分组的年龄,还取决于分组被置于其中的VOQ的长度。 In some implementations, packets are discarded, whether upstream congestion notification is transmitted TCP packet and whether the CE bit is marked only depends on the age of the packet, the packet is placed also depends on the length in which the VOQ. 如果这种长度高于阈值Lmax,则采取AQM动作;否则将对从长度超过阈值Lmax的VOQ出队的第一分组执行AQM动作。 If this is above a threshold length Lmax, AQM action is taken; otherwise, will exceed the threshold of Lmax VOQ first packet dequeue operation is performed from the AQM length.

每VL缓冲器占用率的使用如图15所示,缓冲器被划分成VL。 Using buffer occupancy per VL shown in Figure 15, the buffer is divided into VL. 对于缓冲器中被划分给丢弃VL的部分(例如缓冲器1500的部分1505),如果在任何给定时刻VL的占用率大于预定的最大值,则分组将被丢弃。 For the buffer is allocated to the portion (e.g. the buffer portion 15051500) discard the VL, VL if the occupancy is greater than a predetermined maximum value at any given moment, the packet is discarded. 在一些实现方式中,计算并维护VL的平均占用率。 In some implementations, calculate and maintain an average occupancy rate of VL. 基于这种平均占用率,可采取AQM动作。 Based on this average occupancy rate, AQM can take action. 例如,对于与无丢弃VL相关联的部分1505,DCE ECN将被触发,而不是像与丢弃VL相关联的部分1510的情况下那样进行分组丢弃。 For example, for non-disposable portion 1505 associated VL, DCE ECN will be triggered, instead of discarding the packet is discarded as the case of the VL associated portion 1510 above.

图17示出了一段时间(水平轴)中VL占用率B(VL)(垂直轴)的图1700。 FIG 17 shows a period of time (horizontal axis) in which the VL occupancy B (VL) (vertical axis) 1700 of FIG. 在这里,BT是B(VL)的阈值。 Here, BT is B (VL) threshold. 在本发明的一些实现方式中,在确定B(VL)达到时VL中的一些分组将被丢弃。 In some implementations of the present invention, when determining B (VL) VL reaches some packets will be dropped. 一段时间中B(VL)的实际值由曲线1750示出,但是B(VL)仅在时刻t1至tN是确定的。 Period of time B (VL) of the actual value shown by curve 1750, but B (VL) only at the time t1 to tN are determined. 在该示例中,在点1705、1710和1715分组将被丢弃,这些点对应于时刻t2、t3和t6。 In this example, the points 1705,1710 and 1715 packets will be dropped, which correspond to the time points t2, t3 and t6. 将会根据分组的年龄(例如最老的最先)、其大小、分组的虚拟网络的QoS、随机地、根据丢弃函数或以其他方式丢弃分组。 It will be based on age groups (e.g., oldest first), size, QoS packet of the virtual network, randomly, according to a drop function, or otherwise discard the packet.

此外(或者作为替换),当B(VL)的平均值、加权平均值等等达到或超过BT时,可采取活动队列管理动作。 Additionally (or alternatively), when B (VL) of an average value, weighted average, etc. reached or exceeded BT, active queue management action can be taken. 这种平均可根据各种方法来计算,例如通过将确定出的B(VL)值相加起来并且除以确定数。 Such an average can be calculated according to various methods, for example, add up the values ​​determined by the B (VL) and dividing the number determined. 一些实现方式应用加权函数,例如通过为更新的样本赋予更大的权重。 Some implementations of weighting function is applied, for example, given more weight by the weight update samples. 可以应用本领域已知的任何类型的加权函数。 Known in the art may be applied to any type of weighting function.

所采取的活动队列管理动作例如可以是发送ECN,和/或应用概率性丢弃函数,例如与图18所示的那些之一类似的函数。 Taken active queue management action may be send for example the ECN, and / or application of a probabilistic drop function, one of the functions similar to those shown in FIG. 18, for example. 在该示例中,图1880的水平轴是B(VL)的平均值。 In this example, the horizontal axis 1880 is the average of B (VL) of. 当平均值低于第一值1805时,故意丢弃分组的机率是0%。 When the average value falls below a first value 1805, deliberately discard probability of the packet is 0%. 当平均值达到或超过第二值1810时,故意丢弃分组的机率为100%。 When the average value reaches or exceeds the second value 1810, deliberately discard probability of the packet is 100%. 可以向居间值应用任何便利的函数,不论是与1815、1820或1825类似的函数还是其他函数。 It can be applied to any convenient function intervening value, whether it is similar to the function of 1815,1820 or 1825 or other functions.

参考图15,很明显,VOQ 1525和1535的长度超过了预定长度L。 Referring to FIG 15, it is clear that, VOQs 1525 and 1535 length exceeds a predetermined length L. 在本发明的一些实现方式中,这种状况触发了活动队列管理响应,例如发送一个或多个ECN。 In some implementations of the invention, this condition triggers the event queue management response, such as sending one or more ECN. 优选地,缓冲器1500中包含的分组将会指示源是否能够对ECN作出响应。 Preferably, the buffer 1500 included in the packet will indicate whether the source is able to respond to the ECN. 如果分组的发送者不能对ECN作出响应,则该状况可触发概率性丢弃函数或者就简单地进行丢弃。 If the sender does not respond to the packet ECN, the condition may trigger a probabilistic drop function, or is simply discarded. VOQ 1535不仅长于预定长度L1,还长于预定长度L2。 VOQ 1535 is not only longer than a predetermined length L1, but also longer than the predetermined length L2. 根据本发明的一些实现方式,该状况触发分组的丢弃。 According to some implementations of the present invention, the condition is triggered to discard the packet. 本发明的一些实现方式利用了平均VOQ长度作为触发活动队列管理响应的标准,但是由于它需要大量的计算因此不是优选的。 Some implementations of the invention utilizes as a standard an average length VOQ queue manager in response to a trigger event, but because it requires a large amount of calculation is not preferable.

希望具有多个用于触发AQM动作的标准。 We want to have more standard for triggering AQM action. 例如,虽然提供对VOQ长度的响应是有用的,但是这种措施对于每端口具有约1至2MB缓冲器空间的DCE交换机可能是不够的。 For example, while providing a response to the VOQ length is useful, but this measure for each DCE switch port having from about 1 to 2MB of buffer space may not be sufficient. 对于给定缓冲器,可能有数千个活动VOQ。 For a given buffer, there may be thousands of activities VOQ. 但是,可能只有足够用于103个分组量级的存储器空间,或者可能更少。 However, memory space may be sufficient for the order of packets 103, or may be less. 因此,可能会出现没有哪个个别VOQ有足够触发任何AQM响应的分组,但某个VL却已用尽空间的情况。 Therefore, there may be no individual VOQ enough to trigger any response packet AQM, but a VL already run out of space situation.

无丢弃VL的队列管理在本发明的优选实现方式中,丢弃VL和无丢弃VL的活动队列管理之间的主要差别是对于丢弃VL将会触发分组丢弃的(一个或多个)标准对于无丢弃VL将会导致DCE ECN被发送或TCP CE位被标记。 The main difference between the queue manager without discarding VL In a preferred implementation of the invention, the VL and discarded without discard VL active queue management for VL discarded packets discarded will trigger (s) for the non-standard discarded VL will result in DCE ECN bits to be transmitted or TCP CE marked. 例如,对于丢弃VL将会触发概率性分组丢弃的状况一般将会导致去往上游边缘设备或末端(主机)设备的概率性ECN。 For example, the trigger will be dropped probabilistically VL dropped packets conditions will generally result in a probabilistic ECN destined upstream device or terminal edge (host) device. 基于信用的方案不是基于分组去往何处的,而是基于分组来自何处的。 Credit-based scheme is not based on where the packet is destined, but from where the packet-based. 因此,上游拥塞通知帮助了提供缓冲器使用的公平性,并且避免了在用于无丢弃VL的流控制的唯一方法是基于信用的流控制的情况下可能造成的死锁。 Therefore, the upstream congestion notification help provide fairness buffer usage, and the only way to avoid free flow for discarding VL control is based on the case where the deadlock credit-based flow control may be caused.

例如,对于使用每VL缓冲器占用率作为标准,优选地不会仅因为每VL缓冲器占用率达到或超过了阈值就丢弃分组。 For example, for buffer occupancy per VL as a standard, preferably not only as buffer occupancy per VL reaches or exceeds the threshold value is discarded packets. 相反,例如,分组将被标记,或者ECN将被发送。 Instead, for example, the packet will be marked, or ECN to be sent. 类似地,仍可以计算某种类型的平均每VL缓冲器占用率,并应用概率函数,但是要采取的基本动作是标记和/或发送ECN。 Similarly, we can still calculate some type of buffer occupancy per VL, and applied probability function, but the basic actions to be taken are marked and / or send ECN. 分组将不会被丢弃。 Packets will not be discarded.

但是,即使对于无丢弃VL,响应于例如如本文中其他地方所描述的由超过阈值的分组年龄所指示的阻塞或死锁状况,分组仍将被丢弃。 However, even with no discard the VL, for example in response to a blocking or a deadlock condition by the age of a packet exceeds the threshold value as indicated elsewhere herein, as described, the packet will be discarded. 本发明的一些实现方式还允许了无丢弃VL的分组响应于延迟状况被丢弃。 Some implementations of the present invention also allows the non-discarded packets VL in response to the delay condition is discarded. 这将会取决于对于该具体无丢弃VL,对延迟设置的重要程度。 This will depend on the particular for non-discarded VL, the degree of importance of the delay set. 一些这样的实现方式应用概率性丢弃算法。 Some of these implementations use probabilistic dropping algorithm. 例如,与存储应用相比,一些集群应用可在延迟因素上设置更高的值。 For example, compared with storage applications, some of the cluster application may set a higher value on the delay factor. 数据完整性对于集群应用来说仍是重要的,但是通过放弃某一程度的数据完整性来降低延迟可能会是有利的。 Data integrity is still important for cluster applications, but reduced by giving up some degree of data integrity of the delay may be beneficial. 在一些实现方式中,比起用于丢弃通道的相应值,较大值的TL(即延迟控制阈值)可被用于无丢弃通道。 In some implementations, than the corresponding values ​​for discarding TL channel, a large value (i.e., the delay control threshold) may be discarded without a channel.

图19示出了可被配置来实现本发明的一些方法的网络设备的示例。 FIG 19 illustrates an example may be a network device configured to implement some methods of the present invention. 网络设备1960包括主中央处理单元(CPU)1962、接口1968和总线1967(例如PCI总线)。 Network device 1960 includes a master central processing unit (CPU) 1962, interfaces 1968, and a bus 1967 (e.g., a PCI bus). 一般来说,接口1968包括适用于与适当介质通信的端口1969。 Generally, interfaces 1968 include ports 1969 suitable for communication with the appropriate media. 在一些实施例中,接口1968中的一个或多个包括至少一个独立的处理器1974,并且在一些情况下包括易失性RAM。 In some embodiments, the interface 1968 includes one or more of at least one independent processor 1974 and includes both volatile RAM in some cases. 独立处理器1974例如可以是ASIC或任何其他适当的处理器。 Independent processors 1974 may be, for example, ASIC, or any other suitable processor. 根据一些这样的实施例,这些独立处理器1974执行这里描述的逻辑的至少一些功能。 According to some such embodiments, these independent processors 1974 perform the logic described herein at least some functions. 在一些实施例中,接口1968中的一个或多个控制诸如媒体控制和管理这样的通信密集型任务。 In some embodiments, one or more of interfaces 1968 control such communications in intensive tasks as media control and management. 通过为通信密集型任务提供单独的处理器,接口1968允许了主微处理器1962高效地执行诸如路由计算、网络诊断、安全性功能之类的其他功能。 By providing separate processors for the communications intensive tasks, interfaces 1968 allow the master microprocessor 1962 to efficiently perform such routing computations, network diagnostics, security functions of other functional classes.

接口1968一般是作为接口卡(有时称为“线路卡”)提供的。 Interface 1968 is generally used as interface cards (sometimes referred to as "line cards") offer. 一般来说,接口1968控制网络上数据分组的发送和接收,并且有时支持与网络设备1960一起使用的其他外围设备。 Generally, interfaces 1968 control the sending and receiving of data packets over the network and sometimes support other peripherals used with the network device 1960. 可以提供的接口包括光纤(“FC”)接口、以太网接口、帧中继接口、线缆接口、DSL接口、令牌环接口,等等。 Interface may include an optical fiber ( "FC") interfaces, Ethernet interfaces, frame relay interfaces, cable interfaces, DSL interfaces, token ring interfaces, and the like. 此外,可以提供各种甚高速接口,例如以太网接口、吉比特以太网接口、ATM接口、HSSI接口、POS接口、FDDI接口、ASI接口、DHEI接口,等等。 In addition, various very high-speed interfaces may be provided, such as Ethernet interfaces, Gigabit Ethernet interfaces, ATM interfaces, HSSI interfaces, POS interfaces, FDDI interfaces, the ASI interfaces, DHEI interfaces, and the like.

当在适当软件或固件的控制之下动作时,在本发明的一些实现方式中,CPU 1962可负责实现与所需网络设备的功能相关联的特定功能。 When acting under the control of appropriate software or firmware, in some implementations of the present invention, CPU 1962 may be responsible for implementing specific functions associated with the desired function of the network device. 根据一些实施例,CPU 1962在包括操作系统(例如Linux、VxWorks等等)和任何适当的应用软件在内的软件的控制之下实现所有这些功能。 According to some embodiments, CPU 1962 to achieve all these functions under the control comprising an operating system (e.g. Linux, VxWorks, etc.) and any appropriate applications software, including software.

CPU 1962可以包括一个或多个处理器1963,例如来自Motorola微处理器家族或MIPS微处理器家族的处理器。 CPU 1962 may include one or more processors 1963, for example from the Motorola family of microprocessors or the MIPS family of microprocessors processor. 在另一实施例中,处理器863是特别设计的用于控制网络设备1960的操作的硬件。 In another embodiment, processor 863 is specially designed hardware for controlling the operations of network device 1960. 在特定实施例中,存储器1961(例如非易失性RAM和/或ROM)也形成CPU 1962的一部分。 In a particular embodiment, the memory 1961 (e.g., non-volatile RAM and / or ROM) also forms part of the CPU 1962. 但是,存在许多不同的存储器耦合到系统的方式。 However, there are many different memory coupled to the system mode. 存储器块1961可以用于多种目的,例如,缓存和/或存储数据,编程指令等等。 Memory block 1961 may be used for various purposes, for example, caching and / or storing data, programming instructions and the like.

不论网络设备的配置如何,它都可以采用被配置为存储数据、用于通用通用网络操作的程序指令和/或与这里描述的技术的功能相关的其他信息的一个或多个存储器或存储器模块(例如存储器块1965)。 Regardless of network device's configuration, it may employ be configured to store data, program instructions for the general-purpose network operations in general and / or techniques described herein related to a function of other information or more memories or memory modules ( for example, memory block 1965). 程序指令例如可以控制操作系统和/或一个或多个应用的操作。 The program instructions may control the operation of the operating system and / or one or more applications.

由于这种信息和程序指令可以用来实现这里描述的系统/方法,因此本发明涉及包括用于执行这里描述的各种操作的程序指令、状态信息等的机器可读介质。 Because such information and program instructions may be used to implement the systems / methods described, the present invention relates to a program comprising instructions for performing the various operations described herein, state information, etc. of a machine-readable medium. 机器可读介质的示例包括但不限于:磁介质,例如硬盘、软盘和磁带;光介质,例如CD-ROM盘;磁光介质;以及特别配置为存储和执行程序指令的硬件,例如只读存储器器件(ROM)和随机访问存储器(RAM)。 Machine-readable media include, but are not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media; and hardware specially configured to store and perform program instructions, such as read only memory means (ROM) and random access memory (RAM). 本发明也可以实现在在诸如无线电波、光线路、电线路之类的适当介质上传播的载波中。 The present invention may also be implemented in a carrier wave propagating on a suitable medium such as radio waves, optical lines, electric lines of class. 程序指令的示例既包括机器代码,例如由编译器产生的机器代码,也包括包含可以由计算机利用解释器执行的更高级别代码的文件。 Examples of program instructions include both machine code, such as machine code generated by a compiler, but also can include files executed by the computer using an interpreter higher level code.

虽然图19所示的系统示出了本发明的一个特定网络设备,但是它决非其上可实现本发明的唯一的一种网络设备。 Although the system shown in FIG. 19 illustrates one specific network device of the present invention, but it is by no means may be implemented on a single network device according to the present invention. 例如,经常使用具有处理通信以及路由计算等的单个处理器的体系结构。 For example, the architecture is often used with a single processor, and communication calculation processing route. 此外,其他类型的接口和介质也可与网络设备一起使用。 In addition, other types of interfaces and media may also be used with the network device. 接口/线路卡之间的通信路径可以是基于总线的(如图19所示)或者基于交换结构的(例如纵横交换机)。 A communication path between interfaces / line cards may be based on (FIG. 19) or switch fabric based (e.g. crossbar switches) bus.

虽然已经参考特定实施例详细地示出和描述了本发明,但是本领域的技术人员将会理解,在不脱离本发明的精神和范围的情况下,可以对所公开的实施例作出形式和细节上的改变。 Although specific embodiments with reference to the embodiments shown and described in detail the present invention, those skilled in the art will appreciate that, without departing from the spirit and scope of the present invention may be made in form and detail of the disclosed embodiments change on. 例如,本发明的一些实现方式允许VL从丢弃VL变为无丢弃VL。 For example, some implementations of the invention allow the VL VL becomes discarded without discard VL. 从而,这里描述的示例并不想要限制本发明。 Thus, the examples described herein are not intended to limit the present invention. 因此,希望所附权利要求书被解释为包括落在本发明的真实精神和范围之内的所有变化、等同物、改变和修改。 Therefore, it intended that the appended claims be construed to include all changes which fall within the true spirit of the invention and scope of the equivalents, changes and modifications.

Claims (15)

1.一种变换光纤信道(“FC”)帧以在以太网上传送的方法,该方法包括:接收FC帧;将所述FC帧的目的地FC ID字段的目的地内容映射到以太网帧的目的地MAC字段的第一部分;将所述FC帧的源FC ID字段的源内容映射到所述以太网帧的源MAC字段的第二部分;将所述FC帧的非法符号转换成合法符号;将所述合法符号插入到所述以太网帧的选定字段中;将FC帧有效载荷的有效载荷内容映射到所述以太网帧的有效载荷字段;以及在所述以太网上传输所述以太网帧。 A Fiber Channel ( "FC") method of transmitting frames on the Ethernet, the method comprising converting: receiving an FC frame; mapping destination contents destination FC ID field of the FC frame to the Ethernet frame a first portion of the destination MAC field; source content source FC ID field of the FC frame is mapped to a second portion of the source MAC field of the Ethernet frame; converting illegal symbols of the FC frames into valid symbol; the valid symbol is inserted into a selected field of the Ethernet frame; mapping payload contents FC frame payload to the payload field of the Ethernet frame; and in the transmission of the Ethernet Ethernet frame.
2.如权利要求1所述的方法,其中所述第一部分是所述目的地MAC字段的设备ID字段。 2. The method according to claim 1, wherein said first portion is a device ID field of the destination MAC field.
3.如权利要求1所述的方法,其中所述第二部分是所述源MAC字段的设备ID字段。 The method according to claim 1, wherein said second portion is a device ID field of the source MAC field.
4.如权利要求1所述的方法,其中所述非法符号是所述FC帧的SOF字段和EOF字段中的符号。 4. The method according to claim 1, wherein the symbols are illegal field SOF and EOF field of the FC frame symbols.
5.如权利要求1所述的方法,其中所述插入步骤包括将所述合法符号插入到所述以太网帧的至少一个内部字段中。 5. The method according to claim 1, wherein the inserting step comprises inserting the valid symbol into at least one internal field of the Ethernet frame.
6.如权利要求1所述的方法,还包括:向准备在以太网上传送的FC帧指派一个组织唯一标识符(“OUI”)代码;以及将所述OUI代码插入到所述以太网帧的所述源MAC字段和所述目的地MAC字段的组织ID字段中。 6. The method according to claim 1, further comprising: preparing to transmit in the Ethernet frame is assigned a FC organization unique identifier ( "OUI") codes; and the OUI code into the Ethernet frame the tissue source MAC field and the ID field of the destination MAC field.
7.一种网络设备,包括:多个光纤信道(“FC”)端口,被配置用于与FC网络通信;多个以太网端口,被配置用于与以太网通信;以及至少一个逻辑设备,被配置用于执行下列步骤:从所述多个FC端口中的一个接收FC帧;将所述FC帧的目的地FC ID字段的目的地内容映射到以太网帧的目的地MAC字段的第一部分;将所述FC帧的源FC ID字段的源内容映射到所述以太网帧的源MAC字段的第二部分;将所述FC帧的非法符号转换成合法符号;将所述合法符号插入到所述以太网帧的选定字段中;将FC帧有效载荷的有效载荷内容映射到所述以太网帧的有效载荷字段;以及将所述以太网帧转发到所述多个以太网端口中的一个以在所述以太网上传输。 A network device, comprising: a plurality of Fiber Channel ( "FC") port configured for network communication with the FC; a plurality of Ethernet ports are configured to communicate with Ethernet; and at least one logic device, the first portion of content mapping destination FC ID field of the destination of the FC frame to the destination MAC field of the Ethernet frame; from said plurality of FC ports of a FC frame received: is configured to perform the following steps ; source content source FC ID field of the FC frame is mapped to a second portion of the source MAC field of the Ethernet frame; converting illegal symbols of the FC frames into valid symbol; the symbol inserted into legitimate the selected field of the Ethernet frame; FC frame payload content payload is mapped into a payload field of the Ethernet frame; and forwarded to a plurality of Ethernet ports in the Ethernet frame a transmission to the Ethernet.
8.如权利要求7所述的网络设备,其中所述第一部分是所述目的地MAC字段的设备ID字段。 8. The network apparatus according to claim 7, wherein said first portion is a device ID field of the destination MAC field.
9.如权利要求7所述的网络设备,其中所述第二部分是所述源MAC字段的设备ID字段。 9. The network apparatus according to claim 7, wherein said second portion is a source device ID field of the MAC field.
10.如权利要求7所述的网络设备,其中所述非法符号是所述FC帧的SOF字段和EOF字段中的符号。 10. The network apparatus according to claim 7, wherein said illegal symbol EOF and SOF field is a field of the FC frame symbols.
11.如权利要求7所述的网络设备,其中所述至少一个逻辑设备被配置用于将所述合法符号插入到所述以太网帧的至少一个内部字段中。 11. The network apparatus according to claim 7, wherein said at least one logic device being configured to insert the symbol into at least one legitimate internal fields in the Ethernet frame.
12.如权利要求7所述的网络设备,其中所述至少一个逻辑设备还被配置用于:向准备在以太网上传送的FC帧指派一个组织唯一标识符(“OUI”)代码;以及将所述OUI代码插入到所述以太网帧的所述源MAC字段和所述目的地MAC字段的组织ID字段中。 12. The network apparatus according to claim 7, wherein said at least one logic device is further configured to: transmit to prepare FC over Ethernet frame is assigned a unique organization identifier ( "OUI"); and code The OUI code into said tissue source ID field of the MAC field of the Ethernet frame and the destination MAC field.
13.如权利要求7所述的网络设备,其中所述网络设备包括存储网关。 13. The network apparatus according to claim 7, wherein said storage device comprises a network gateway.
14.一种用于变换光纤信道(“FC”)帧以在以太网上传送的设备,该设备包括:用于接收FC帧的装置;用于将所述FC帧的目的地FC ID字段的目的地内容映射到以太网帧的目的地MAC字段的第一部分的装置;用于将所述FC帧的源FC ID字段的源内容映射到所述以太网帧的源MAC字段的第二部分的装置;用于将所述FC帧的非法符号转换成合法符号的装置;用于将所述合法符号插入到所述以太网帧的选定字段中的装置;用于将FC帧有效载荷的有效载荷内容映射到所述以太网帧的有效载荷字段的装置;以及用于在所述以太网上传输所述以太网帧的装置。 14. A method for converting Fiber Channel ( "FC") frames transmitted to the device on the Ethernet, the apparatus comprising: means for receiving an FC frame; means for the purposes of destination FC ID field of the FC frame content mapping means to the first portion of the destination MAC field of the Ethernet frame; means for mapping the source content source FC ID field of the FC frame to a second portion of the source MAC field of the Ethernet frame ; illegal symbols means converting the FC frame for a legitimate symbols; symbols for the legal apparatus is inserted into a selected field of the Ethernet frame; a payload of the frame payload FC content mapping means to the payload field of the Ethernet frame; and means in said Ethernet frame for transmission of the Ethernet.
15.一种变换以太网帧以在光纤信道(“FC”)网络上传送的方法,该方法包括:接收扩展以太网帧;将所述以太网帧的目的地MAC字段的第一部分的目的地内容映射到FC帧的目的地FC ID字段;将所述以太网帧的源MAC字段的第二部分的源内容映射到所述FC帧的源FC ID字段;将所述以太网帧的合法符号转换成非法符号;将所述非法符号插入到所述FC帧的选定字段中;将所述以太网帧的有效载荷字段的有效载荷内容映射到FC帧有效载荷字段;以及在所述FC网络上传输所述FC帧。 15. A method of converting an Ethernet frame to be transmitted over the network Fiber Channel ( "FC"), the method comprising: receiving an extended Ethernet frame; destination of the first portion of the Ethernet frame destination MAC field content mapping to the destination FC ID field of the FC frame; the second part of the source of the Ethernet frame source MAC field is mapped to the source FC ID field of the FC frame; the Ethernet frame legal symbol converted into illegal symbols; illegal symbols inserted into the selected field of the FC frame; payload contents to the payload field of the Ethernet frame is mapped to the FC frame payload field; and the FC network the FC frame transmission.
CN 200580035946 2004-10-22 2005-10-17 Fibre channel over ethernet CN100555969C (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US62139604P true 2004-10-22 2004-10-22
US60/621,396 2004-10-22
US11/078,992 2005-03-10

Publications (2)

Publication Number Publication Date
CN101044717A true CN101044717A (en) 2007-09-26
CN100555969C CN100555969C (en) 2009-10-28



Family Applications (4)

Application Number Title Priority Date Filing Date
CN 200580034646 CN101040489B (en) 2004-10-22 2005-10-13 Network device architecture for consolidating input/output and reducing latency
CN 200580034647 CN101040471B (en) 2004-10-22 2005-10-14 Ethernet extension for the data center
CN 200580035946 CN100555969C (en) 2004-10-22 2005-10-17 Fibre channel over ethernet
CN 200580034955 CN101129027B (en) 2004-10-22 2005-10-18 Forwarding Forwarding reduction and multi-path network

Family Applications Before (2)

Application Number Title Priority Date Filing Date
CN 200580034646 CN101040489B (en) 2004-10-22 2005-10-13 Network device architecture for consolidating input/output and reducing latency
CN 200580034647 CN101040471B (en) 2004-10-22 2005-10-14 Ethernet extension for the data center

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN 200580034955 CN101129027B (en) 2004-10-22 2005-10-18 Forwarding Forwarding reduction and multi-path network

Country Status (1)

Country Link
CN (4) CN101040489B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009086774A1 (en) * 2007-12-11 2009-07-16 Huawei Technologies Co., Ltd. A method, an apparatus and a system for transmitting data
WO2011047612A1 (en) * 2009-10-19 2011-04-28 杭州华三通信技术有限公司 Method for virtual link discovery control and system for fibre channel over ethernet protocol
CN102474464A (en) * 2009-08-04 2012-05-23 国际商业机器公司 Apparatus, system, and method for establishing point to point connections in FCOE
CN101656721B (en) 2009-08-27 2012-08-08 杭州华三通信技术有限公司 Method for controlling virtual link discovering and Ethernet bearing fiber channel protocol system
CN103597790A (en) * 2011-06-02 2014-02-19 国际商业机器公司 Fibre channel forwarder fabric login sequence
CN104767606A (en) * 2015-03-19 2015-07-08 华为技术有限公司 Data synchronization device and method

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7821939B2 (en) 2007-09-26 2010-10-26 International Business Machines Corporation Method, system, and computer program product for adaptive congestion control on virtual lanes for data center ethernet architecture
EP2489172A1 (en) 2010-05-28 2012-08-22 Huawei Technologies Co., Ltd. Virtual layer 2 and mechanism to make it scalable
US9014054B2 (en) 2010-06-29 2015-04-21 Futurewei Technologies, Inc. Layer two over multiple sites
CN102377661A (en) * 2010-08-24 2012-03-14 鸿富锦精密工业(深圳)有限公司 Blade server and method for building shortest blade transmission path in blade server
CN102347955A (en) * 2011-11-01 2012-02-08 杭州依赛通信有限公司 Reliable data transmission protocol based on virtual channels
US20140153443A1 (en) * 2012-11-30 2014-06-05 International Business Machines Corporation Per-Address Spanning Tree Networks
US9160678B2 (en) 2013-04-15 2015-10-13 International Business Machines Corporation Flow control credits for priority in lossless ethernet
US9703743B2 (en) * 2014-03-31 2017-07-11 Juniper Networks, Inc. PCIe-based host network accelerators (HNAS) for data center overlay network
US9479457B2 (en) 2014-03-31 2016-10-25 Juniper Networks, Inc. High-performance, scalable and drop-free data center switch fabric
CN104301229B (en) * 2014-09-26 2016-05-04 深圳市腾讯计算机系统有限公司 Data packet forwarding method, route table generating method and device
US10243840B2 (en) 2017-03-01 2019-03-26 Juniper Networks, Inc. Network interface card switching for virtual networks

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5920566A (en) * 1997-06-30 1999-07-06 Sun Microsystems, Inc. Routing in a multi-layer distributed network element
US5974467A (en) * 1997-08-29 1999-10-26 Extreme Networks Protocol for communicating data between packet forwarding devices via an intermediate network interconnect device
KR100309748B1 (en) 1997-12-26 2001-09-11 윤종용 Bidirectional trunk amplifier for cable hybrid fiber coaxial network by using upstream signals and cable modem of hybrid fiber coaxial network
US6684031B1 (en) 1998-06-18 2004-01-27 Lucent Technologies Inc. Ethernet fiber access communications system
US6556541B1 (en) * 1999-01-11 2003-04-29 Hewlett-Packard Development Company, L.P. MAC address learning and propagation in load balancing switch protocols
US20030195983A1 (en) 1999-05-24 2003-10-16 Krause Michael R. Network congestion management using aggressive timers
CN1104800C (en) * 1999-10-27 2003-04-02 华为技术有限公司 Dual-table controlled data frame forwarding method
EP1370947A4 (en) 2001-02-13 2009-05-27 Candera Inc Silicon-based storage virtualization server
US20020141427A1 (en) 2001-03-29 2002-10-03 Mcalpine Gary L. Method and apparatus for a traffic optimizing multi-stage switch fabric network
US7093024B2 (en) 2001-09-27 2006-08-15 International Business Machines Corporation End node partitioning using virtualization
US20030169690A1 (en) 2002-03-05 2003-09-11 James A. Mott System and method for separating communication traffic
US7349416B2 (en) 2002-11-26 2008-03-25 Cisco Technology, Inc. Apparatus and method for distributing buffer status information in a switching fabric
US7295565B2 (en) 2002-12-24 2007-11-13 Sun Microsystems, Inc. System and method for sharing a resource among multiple queues
US7782784B2 (en) 2003-01-10 2010-08-24 Cisco Technology, Inc. Port analyzer adapter

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009086774A1 (en) * 2007-12-11 2009-07-16 Huawei Technologies Co., Ltd. A method, an apparatus and a system for transmitting data
CN102474464A (en) * 2009-08-04 2012-05-23 国际商业机器公司 Apparatus, system, and method for establishing point to point connections in FCOE
CN102474464B (en) 2009-08-04 2014-07-16 国际商业机器公司 Apparatus, system, and method for establishing point to point connections in FCOE
CN101656721B (en) 2009-08-27 2012-08-08 杭州华三通信技术有限公司 Method for controlling virtual link discovering and Ethernet bearing fiber channel protocol system
WO2011047612A1 (en) * 2009-10-19 2011-04-28 杭州华三通信技术有限公司 Method for virtual link discovery control and system for fibre channel over ethernet protocol
US8929368B2 (en) 2009-10-19 2015-01-06 Hangzhou H3C Technologies Co., Ltd. Control method of virtual link discovery and system for fibre channel over ethernet protocol
CN103597790A (en) * 2011-06-02 2014-02-19 国际商业机器公司 Fibre channel forwarder fabric login sequence
CN103597790B (en) * 2011-06-02 2016-06-08 国际商业机器公司 Mthods, systems and devices in fibre channel media environment
CN104767606A (en) * 2015-03-19 2015-07-08 华为技术有限公司 Data synchronization device and method
CN104767606B (en) * 2015-03-19 2018-10-19 华为技术有限公司 Data synchronization unit and method

Also Published As

Publication number Publication date
CN101040471B (en) 2012-01-11
CN101040471A (en) 2007-09-19
CN101040489B (en) 2012-12-05
CN100555969C (en) 2009-10-28
CN101129027A (en) 2008-02-20
CN101129027B (en) 2011-09-14
CN101040489A (en) 2007-09-19

Similar Documents

Publication Publication Date Title
US7596627B2 (en) Methods and apparatus for network congestion control
US7640364B2 (en) Port aggregation for network connections that are offloaded to network interface devices
US7630368B2 (en) Virtual network interface card loopback fastpath
US6510135B1 (en) Flow-level demultiplexing within routers
US8451852B2 (en) Systems and methods for selectively performing explicit congestion notification
US6570876B1 (en) Packet switch and switching method for switching variable length packets
US8401027B2 (en) Method for traffic management, traffic prioritization, access control, and packet forwarding in a datagram computer network
CN101091318B (en) Data transfer error checking
JP3829165B2 (en) Enhanced 802.3 media access control and accompanying signaling for full-duplex Ethernet
US8774215B2 (en) Fibre channel over Ethernet
US9036643B2 (en) Multiple logical channels for use in network devices
CA2358525C (en) Dynamic assignment of traffic classes to a priority queue in a packet forwarding device
DE19983404B4 (en) Method and apparatus for use in setting a TCP sliding window
US9590914B2 (en) Randomized per-packet port channel load balancing
CN100559775C (en) Parallel data link layer controllers in a network switching device
US6778495B1 (en) Combining multilink and IP per-destination load balancing over a multilink bundle
US6977892B2 (en) Method and apparatus for preserving flow order across links of a multi link trunk
US7145914B2 (en) System and method for controlling data paths of a network processor subsystem
US20020021701A1 (en) Dynamic assignment of traffic classes to a priority queue in a packet forwarding device
EP2695338B1 (en) Port and priority based flow control mechanism
CN101803316B (en) Method, system, and computer program product for adaptive congestion control on virtual lanes for data center Ethernet architecture
US6988161B2 (en) Multiple port allocation and configurations for different port operation modes on a host
US7215680B2 (en) Method and apparatus for scheduling packet flow on a fibre channel arbitrated loop
US20030202520A1 (en) Scalable switch fabric system and apparatus for computer networks
US7606150B2 (en) Fibre channel switch

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C14 Granted