WO2003088594A1 - Procede pour creer une redondance en cas de defaillance des adaptateurs de canaux - Google Patents
Procede pour creer une redondance en cas de defaillance des adaptateurs de canaux Download PDFInfo
- Publication number
- WO2003088594A1 WO2003088594A1 PCT/EP2003/003530 EP0303530W WO03088594A1 WO 2003088594 A1 WO2003088594 A1 WO 2003088594A1 EP 0303530 W EP0303530 W EP 0303530W WO 03088594 A1 WO03088594 A1 WO 03088594A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- channel adapter
- ports
- control information
- providing
- host channel
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/22—Alternate routing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/28—Routing or path finding of packets in data switching networks using route fault recovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/58—Association of routers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/35—Switches specially adapted for specific applications
- H04L49/356—Switches specially adapted for specific applications for storage area networks
- H04L49/358—Infiniband Switches
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/55—Prevention, detection or correction of errors
- H04L49/557—Error correction, e.g. fault recovery or fault tolerance
Definitions
- the present invention relates generally to digital network communication, and specifically to provide improved reliability of a computer system or any other node attaching to an InfiniBand subnet or fabric .
- I/O interconnect architectures in which computing hosts and peripherals are linked by a switching network, commonly referred to as a switching fabric.
- IB InfiniBand
- the IB architecture is described in detail in the InfiniBand Architecture Specification, Release 1.0. a, which is available from the InfiniBand Trade Association at www.infinibandta.org and is incorporated herein by reference.
- HCAs Host Channel Adapters
- TCAs Target Channel Adapters
- the HCAs tend to be located near the servers' CPUs and memory, while the TCAs tend to be located near the systems' disk storage and other peripherals.
- Switches or Routers may be located between HCAs and TCAs, directing data packets to the correct TCA destination based on information that is contained in the data packets themselves .
- HCAs and TCAs are either an InfiniBand point-to-point link or a switch or router, which allows to create an uniform InfiniBand subnet or fabric environment, respectively.
- a switch or router which allows to create an uniform InfiniBand subnet or fabric environment, respectively.
- One of the key points of this switch is that it allows packets of information (or data) to be managed based on variables, such as service level (SL) and a destination identifier (DLID/DGID) .
- SL service level
- DLID/DGID destination identifier
- the InfiniBand architecture is developed with a serial, switched fabric approach. This switched nature allows for low- latency, high-bandwidth characteristics of the InfiniBand architecture. Clustered systems and networks require a connectivity standard that allows for fault tolerant interconnects .
- InfiniBand architecture which incorporates advanced fault detection and correction mechanisms.
- IBM PCI-X to InfiniBand Host Channel Adapter which allows connectivity between a host's PCI-X bus and an InfiniBand network.
- the dual InfiniBand ports provide the capability to support Automatic Path Migration and single or multiple subnet connections with a single HCA device.
- APM Automatic Path Migration
- HCA Host Channel Adapter
- TCA Target Channel Adapter
- APM provides a redundancy mechanism in case of a port failure of a HCA or TCA or a link, switch, or router failure in a subnet or fabric.
- InfiniBand does only define a redundancy mechanism in case only one or more ports of an HCA fail but not in case the entire HCA fails. Summary of the invention
- the invention provides for a redundancy mechanism for a Channel Adapter (CA) , such as a Host Channel Adapter (HCA) or a Target Channel Adapter (TCA) , in case of a complete Channel Adapter failure. It is a particular advantage of the invention that the redundancy mechanism fits seamlessly into the InfiniBand architecture and relies on the fault detection and correction methods which are specified in the InfiniBand architecture.
- CA Channel Adapter
- HCA Host Channel Adapter
- TCA Target Channel Adapter
- At least two physical Host Channel Adapters are provided.
- the two physical Host Channel Adapters are registered as one logical Host Channel Adapter in terms of the InfiniBand architecture.
- Both Host Channel Adapters have dedicated caching means which cooperate with the system memory for storaging Queue Pair (QP) control information in terms of Queue Pair Control Blocks (QPCBs) .
- QP Queue Pair
- QPCBs Queue Pair Control Blocks
- write-through caches are utilized.
- the QPCBs stored in system memory are an exact copy of the dedicated caches of each physical Host Channel Adapters.
- write-back caches are used for Host Channel Adapters.
- the system memory is synchronized with the caches at certain times and does not always reflect the actual contents of the caches at any given point in time.
- This copy may contain stale data.
- the fault detection and correction mechanisms provided by the InfiniBand architecture are utilized.
- CAs Channel Adapters
- Figure 1 shows a block diagram illustrating the operation of a single Host Channel Adapter with a dedicated cache memory
- Figure 2 shows a block diagram of a computer system having a redundant logical Host Channel Adapter for the case of a write-through cache
- Figure 3 shows the block diagram of figure 2 after the replacement of the failing Host Channel Adapter by the redundancy mechanism
- Figure 4 illustrates the discrepancy which can occur between the state of a cache and system memory for a write-back cache
- Figures 5 to 7 illustrate the utilization of the fault detection and correction methods provided by the InfiniBand architecture for implementing the redundancy mechanism of the invention in case of the use of write-back caches .
- Figure 1 shows a computer system having a Host Channel Adapter 1 comprising a cache 2 and a cache directory 3. Further the computer system has system memory 4.
- queue directory 3 and cache 2 the address space for the Queue Pair Control Blocks (QPCBs) is virtualized.
- QPCBs Queue Pair Control Blocks
- All Queue Pair Control Blocks reside in system memory 4 and are loaded (unloaded) into the Host Channel Adapter cache 2 when used (no longer used) . A failure of the Host Channel Adapter 1 does not prevent to access this data from a physically different Host Channel Adapter.
- Figure 2 shows a block diagram of a preferred embodiment of the invention which illustrates the redundancy mechanism. Like elements of the computer system of figure 2 and the computer system of figure 1 are designated by the same reference numerals .
- the computer system has a physical Host Channel Adapter 1 with one or more ports 6 and a physical Host Channel Adapter 7 with one or more ports 8.
- the ports 6 and 8 are connected to an InfiniBand subnet or fabric 9.
- the two physical Host Channel Adapters 1 and 7 are recognized as one single Host Channel Adapter according to the InfiniBand architecture. Thereby a logical Host Channel Adapter 10 is constituted.
- the logical Host Channel Adapter 10 has the ports 6 and 8 of the physical Host Channel Adapters 1 and 7.
- the physical Host Channel Adapter 1 has the cache 2 and the physical Host Channel Adapter 7 has the cache 11. Both caches 2 and 11 are organized as write-through caches.
- the computer system has system memory 4 for storage of Queue Pair control block data for the physical Host Channel Adapters 1 and 7.
- the Queue Pair numbers of the different physical Host Channel Adapters 1 and 7 are disjoint.
- Queue Pair numbers There is no further restriction on the Queue Pair numbers .
- the physical Host Channel Adapter 1 has a block 12 of Queue Pair Control Blocks QPCB_2 to QPCB_m and that the physical Host Channel Adapter 7 has a block 13 of Queue Pair Control Blocks QPCB_m+l to QPCB_n.
- QPCB_0 and QPCB_1 are used for subnet management purposes and are not further considered here.
- the QPCB data in the system memory 4 is identical to the data in the caches 2 and 11.
- Figure 3 illustrates the redundancy mechanism for dealing with a complete failure of the physical Host Channel Adapter 1 of figure 2.
- a copy of a QPCB in block 12 is made into the cache 11 as needed.
- block 12 contains an exact copy of the contents of cache 2, no further recovery mechanisms are required.
- Figure 4 illustrates the situation in the case of write-back caches. If a write-back cache 14 is used rather than a write- through cache, the QPCBs stored in system memory 4 do not always reflect the up-to-date state of the QPCB data in cache 14. This is the reason why in case of using a write-back cache an additional fault detection and correction method of the InfiniBand architecture needs to be invoked.
- Figure 5 illustrates the situation before failover of one of the physical Host Channel Adapters.
- PSNs packet sequence numbers
- sequence 16 of outstanding PSNs is stored in the local cache memory, which is a write-back cache.
- This sequence 16 represents the up-to-date sequence of transmitted packets.
- sequence number Sn is up-to-date in this sequence
- sequence 17 of PSNs At the receiver's side there is a sequence 17 of PSNs.
- the next packet expected by the receiver is the packet with the sequence number Rn.
- the sequence 15 After failover of one of the physical Host Channel Adapters the sequence 15 remains unaffected as it is stored in system memory 4.
- a copy of the sequence 15 is provided to the remaining still operating physical Host Channel Adapter. This way the sequence 16 of the cache of the failing Host Channel Adapter is replaced by the sequence 15 in the cache of the remaining still operating physical Host Channel Adapter.
- the receiver returns an acknowledgement (ACK) to the sending Host Channel Adapter and discards the packet.
- ACK acknowledgement
- the Host Channel Adapter sends the next packet identified in the sequence 15. This way the sequence 15 is processed until it reaches the original state of the sequence 16 before failover. After this state is reached the normal system operation continues normally.
- the receiver sends an acknowledgement for having received the packet with the sequence number Sn to the logical Host Channel Adapter.
- the logical Host Channel Adapter i.e. the remaining still operating physical Host Channel Adapter, interprets this acknowledgement as a ghost acknowledgement and ignores it.
- the sender sends the packet with the sequence number Sm of the sequence 15 as in the scenario shown in figure 5.
- Figure 7 shows a scenario where the Host Channel Adapter acts as a receiver.
- a sequence 18 of PSNs is stored in the system memory and an up-to-date sequence 19 in cache memory. Further there is a sequence 20 of outstanding PSNs to be sent by the sender. This is the situation before failover.
- sequence 19 is replaced by the sequence 18, i.e. a copy of the sequence 18 is provided from system memory to the cache of the remaining still operating physical Host Channel Adapter part of the logical Host Channel Adapter.
- sequence 20 remains unchanged.
- NAK negative acknowledgement
- HCA 1 ports 6 physical Host Channel Adapter 2 7
- InfiniBand fabric 9 logical Host Channel Adapter 10
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2003585378A JP2005527898A (ja) | 2002-04-18 | 2003-04-04 | チャネル・アダプタ障害に対する冗長性を提供する方法 |
AU2003226784A AU2003226784A1 (en) | 2002-04-18 | 2003-04-04 | A method for providing redundancy for channel adapter failure |
KR10-2004-7014653A KR20050002865A (ko) | 2002-04-18 | 2003-04-04 | 인피니밴드 채널 어댑터 장애용 리던던시 제공 방법 및 컴퓨터 시스템 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP02008692.2 | 2002-04-18 | ||
EP02008692 | 2002-04-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2003088594A1 true WO2003088594A1 (fr) | 2003-10-23 |
Family
ID=29225590
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2003/003530 WO2003088594A1 (fr) | 2002-04-18 | 2003-04-04 | Procede pour creer une redondance en cas de defaillance des adaptateurs de canaux |
Country Status (5)
Country | Link |
---|---|
JP (1) | JP2005527898A (fr) |
KR (1) | KR20050002865A (fr) |
CN (1) | CN1647466A (fr) |
AU (1) | AU2003226784A1 (fr) |
WO (1) | WO2003088594A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100403248C (zh) * | 2005-06-07 | 2008-07-16 | 富士通株式会社 | 库装置 |
CN102566944A (zh) * | 2011-12-31 | 2012-07-11 | 曙光信息产业股份有限公司 | 存储路径冗余方法 |
CN107451092A (zh) * | 2017-08-09 | 2017-12-08 | 郑州云海信息技术有限公司 | 一种基于ib网络的数据传输系统 |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7756012B2 (en) * | 2007-05-18 | 2010-07-13 | Nvidia Corporation | Intelligent failover in a load-balanced network environment |
CN101510142B (zh) * | 2008-02-15 | 2011-12-21 | 环旭电子股份有限公司 | 存储设备的多输出入接口系统与通信方法 |
US10051054B2 (en) | 2013-03-15 | 2018-08-14 | Oracle International Corporation | System and method for efficient virtualization in lossless interconnection networks |
US9990221B2 (en) * | 2013-03-15 | 2018-06-05 | Oracle International Corporation | System and method for providing an infiniband SR-IOV vSwitch architecture for a high performance cloud computing environment |
CN103312564B (zh) * | 2013-06-24 | 2016-07-06 | 曙光信息产业(北京)有限公司 | InfiniBand网络检测方法 |
US10397105B2 (en) | 2014-03-26 | 2019-08-27 | Oracle International Corporation | System and method for scalable multi-homed routing for vSwitch based HCA virtualization |
CN107547260B (zh) * | 2017-07-24 | 2020-12-22 | 杭州沃趣科技股份有限公司 | 一种长距infiniband链路检测切换修复的方法 |
CN107592361B (zh) * | 2017-09-20 | 2020-05-29 | 郑州云海信息技术有限公司 | 一种基于双ib网络的数据传输方法、装置、设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5835696A (en) * | 1995-11-22 | 1998-11-10 | Lucent Technologies Inc. | Data router backup feature |
US5963540A (en) * | 1997-12-19 | 1999-10-05 | Holontech Corporation | Router pooling in a network flowswitch |
US6195705B1 (en) * | 1998-06-30 | 2001-02-27 | Cisco Technology, Inc. | Mobile IP mobility agent standby protocol |
US6295276B1 (en) * | 1999-12-31 | 2001-09-25 | Ragula Systems | Combining routers to increase concurrency and redundancy in external network access |
EP1158725A2 (fr) * | 2000-05-24 | 2001-11-28 | Alcatel Internetworking (PE), Inc. | Procédé et dispositif supportant multi-protocoles du type routeur redondance |
-
2003
- 2003-04-04 WO PCT/EP2003/003530 patent/WO2003088594A1/fr not_active Application Discontinuation
- 2003-04-04 JP JP2003585378A patent/JP2005527898A/ja not_active Withdrawn
- 2003-04-04 KR KR10-2004-7014653A patent/KR20050002865A/ko not_active Application Discontinuation
- 2003-04-04 CN CNA03808578XA patent/CN1647466A/zh active Pending
- 2003-04-04 AU AU2003226784A patent/AU2003226784A1/en not_active Abandoned
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5835696A (en) * | 1995-11-22 | 1998-11-10 | Lucent Technologies Inc. | Data router backup feature |
US5963540A (en) * | 1997-12-19 | 1999-10-05 | Holontech Corporation | Router pooling in a network flowswitch |
US6195705B1 (en) * | 1998-06-30 | 2001-02-27 | Cisco Technology, Inc. | Mobile IP mobility agent standby protocol |
US6295276B1 (en) * | 1999-12-31 | 2001-09-25 | Ragula Systems | Combining routers to increase concurrency and redundancy in external network access |
EP1158725A2 (fr) * | 2000-05-24 | 2001-11-28 | Alcatel Internetworking (PE), Inc. | Procédé et dispositif supportant multi-protocoles du type routeur redondance |
Non-Patent Citations (1)
Title |
---|
KNIGHT S ET AL: "Virtual Router Redundancy Protocol", IETF - RFC, April 1998 (1998-04-01), XP002135272, Retrieved from the Internet <URL:ftp://ftp.isi.edu/in-notes/rfc2338.txt> [retrieved on 20000410] * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100403248C (zh) * | 2005-06-07 | 2008-07-16 | 富士通株式会社 | 库装置 |
CN102566944A (zh) * | 2011-12-31 | 2012-07-11 | 曙光信息产业股份有限公司 | 存储路径冗余方法 |
CN107451092A (zh) * | 2017-08-09 | 2017-12-08 | 郑州云海信息技术有限公司 | 一种基于ib网络的数据传输系统 |
Also Published As
Publication number | Publication date |
---|---|
CN1647466A (zh) | 2005-07-27 |
JP2005527898A (ja) | 2005-09-15 |
KR20050002865A (ko) | 2005-01-10 |
AU2003226784A1 (en) | 2003-10-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6545981B1 (en) | System and method for implementing error detection and recovery in a system area network | |
EP1543422B1 (fr) | Mecanisme de commutation du controleur reseau active par l'acces direct memoire a distance et de commutation de retour a l'etat initial | |
US6493343B1 (en) | System and method for implementing multi-pathing data transfers in a system area network | |
US6925578B2 (en) | Fault-tolerant switch architecture | |
US7145837B2 (en) | Global recovery for time of day synchronization | |
US7668923B2 (en) | Master-slave adapter | |
CA2483197C (fr) | Systeme, procede et produit destines a gerer des transferts de donnees dans un reseau | |
US6970972B2 (en) | High-availability disk control device and failure processing method thereof and high-availability disk subsystem | |
US6760859B1 (en) | Fault tolerant local area network connectivity | |
US7509419B2 (en) | Method for providing remote access redirect capability in a channel adapter of a system area network | |
US7844730B2 (en) | Computer system and method of communication between modules within computer system | |
US20050081080A1 (en) | Error recovery for data processing systems transferring message packets through communications adapters | |
US20050091383A1 (en) | Efficient zero copy transfer of messages between nodes in a data processing system | |
JP2004032224A (ja) | サーバ引継システムおよびその方法 | |
WO2000072421A1 (fr) | Multidiffusion fiable | |
US20050080869A1 (en) | Transferring message packets from a first node to a plurality of nodes in broadcast fashion via direct memory to memory transfer | |
WO2003105417A1 (fr) | Procedes et dispositifs de correction d'erreurs de flux de paquets de donnees | |
US20070230469A1 (en) | Transmission apparatus | |
US20050080920A1 (en) | Interpartition control facility for processing commands that effectuate direct memory to memory information transfer | |
WO2003088594A1 (fr) | Procede pour creer une redondance en cas de defaillance des adaptateurs de canaux | |
US20050080945A1 (en) | Transferring message packets from data continued in disparate areas of source memory via preloading | |
US20050078708A1 (en) | Formatting packet headers in a communications adapter |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LV MA MD MG MK MN MW MX MZ NI NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 1020047014653 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2003808578X Country of ref document: CN Ref document number: 2003585378 Country of ref document: JP |
|
WWP | Wipo information: published in national office |
Ref document number: 1020047014653 Country of ref document: KR |
|
122 | Ep: pct application non-entry in european phase | ||
WWR | Wipo information: refused in national office |
Ref document number: 1020047014653 Country of ref document: KR |