WO2009123629A1 - Converting resets in shared i/o system - Google Patents

Converting resets in shared i/o system Download PDF

Info

Publication number
WO2009123629A1
WO2009123629A1 PCT/US2008/059066 US2008059066W WO2009123629A1 WO 2009123629 A1 WO2009123629 A1 WO 2009123629A1 US 2008059066 W US2008059066 W US 2008059066W WO 2009123629 A1 WO2009123629 A1 WO 2009123629A1
Authority
WO
WIPO (PCT)
Prior art keywords
reset
pci
function
flr
host
Prior art date
Application number
PCT/US2008/059066
Other languages
French (fr)
Inventor
David Matthews
Hubert Brinkmann
Paul Brownell
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to US12/935,541 priority Critical patent/US8423698B2/en
Priority to KR1020107024661A priority patent/KR101445436B1/en
Priority to JP2011502918A priority patent/JP5182771B2/en
Priority to PCT/US2008/059066 priority patent/WO2009123629A1/en
Priority to EP08744891A priority patent/EP2260364B1/en
Priority to CN200880128466.2A priority patent/CN101983365B/en
Publication of WO2009123629A1 publication Critical patent/WO2009123629A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/24Resetting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4027Coupling between buses using bus bridges
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0058Bus-related hardware virtualisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Multi Processors (AREA)
  • Hardware Redundancy (AREA)
  • Stored Programmes (AREA)
  • Small-Scale Networks (AREA)

Abstract

Embodiments include methods, apparatus, and systems for converting resets in a shared I/O system. One embodiment includes a method that propagates a first type of reset from a host computer to a multi-function device that shares I/O operations with other hosts. The first type of reset is converted to a second type of reset to prevent the host from resetting functions bound to the other hosts at the multi-function device.

Description

CONVERTING RESETS IN SHARED I/O SYSTEM
BACKGROUND
[001 ] The Peripheral Component Interconnect or PCI Standard defines a computer bus for attaching peripheral devices to a motherboard. The PCI specification describes the physical attributes of the bus, electrical characteristics, bus timing, communication protocols, and more. A PCI Special Interest Group (PCI-SIG) maintains and governs the specifications for various PCI architectures.
[002] In a PCI environment, a host can reset a peripheral device by transmitting a reset command to the device. The reset command is propagated downstream through the PCI hierarchy to reset the device. This procedure works well in environments in which the host does not share the peripheral device with other hosts.
[003] In a shared I/O environment, multiple different hosts share one or more functions of the I/O devices. When a host transmits a reset to a shared I/O device, the reset propagates down the shared PCI link. As a result, the host resets functions of a shared device that it does not own. In other words, the host inadvertently resets functions that are bound to other hosts. This can cause problems since a host could reset functions owned by other hosts.
BRIEF DESCRIPTION OF THE DRAWINGS
[004] Figure 1 is a block diagram of a computer system using a shared I/O architecture in accordance with an exemplary embodiment.
[005] Figure 2 is a block diagram of a computer system using a shared I/O architecture and showing a view of a single host computer in accordance with an exemplary embodiment.
[006] Figure 3 is another block diagram of a computer system using a shared VO architecture in accordance with an exemplary embodiment.
[007] Figure 4 is a flow diagram for converting hot resets to function level resets in a computer system using a shared I/O architecture in accordance with an exemplary embodiment.
[008] Figure 5 is a block diagram showing portions of an exemplary computer system using a shared I/O architecture in accordance with an exemplary embodiment.
DETAILED DESCRIPTION
[009] Exemplary embodiments are directed to methods, systems, and apparatus for converting resets in shared input/output (I/O) architecture. One embodiment converts a hot reset PCI command to a functional level reset while the hot reset is in transit to a peripheral device or endpoint (such as an VO device). This conversion enables a host to reset only the function bound to the host and not also reset other functions bound to other hosts.
[0010] Exemplary embodiments are applied in shared I/O environments that use, for example, PCI architecture. The conversion of an in-band hot reset to a functional level reset allows hosts to only reset a particular shared function instead of resetting the link on which that function resides. This prevents a host from resetting functions it does not own but still allows a host to seamlessly reset its virtual device that the host believes is directly attached to a virtual peer-to-peer (P2P) downstream port.
[0011] By way of illustration, in a PCI-Express system the in-band hot reset mechanism is used to propagate a reset through a PCI-Express hierarchy from top down. Hot resets in PCI-Express are only propagated downstream from an upstream port.
[0012] In one embodiment, the I/O devices are physically disaggregated from a blade server. These I/O devices, however, are still seen as being directly attached by the host residing on each blade server. The host sees these virtual devices through a P2P bridge device. This device is also seen as an endpoint on the other side and is attached to a PCI- Express switch. The host does not see this switch or other physical devices and links between the virtual P2P bridge and his virtual end device that the host believes is directly behind the P2P bridge.
[0013] In one embodiment, the upstream port attached to the virtual P2P bridge issues an in-band hot reset and expects this reset to propagate through the bridge and onto the link thus resetting end device of the host. Physically, this link does not exist. Instead, a PCI-Express switch exists and functions as an upstream device on this link. Since in PCI- Express a hot reset can only be propagated downstream, it cannot be sent on this physical link. Also since the switch links are transparent to the host and cannot be seen by the host, they should not be reset by the host. In order to get this hot reset to the shared end device, exemplary embodiments convert or transform this hot reset protocol to another PCI-Express protocol called a functional level reset.
[0014] A functional level reset resets only a particular function of a device and not all functions like a hot reset. The virtual P2P device knows which resource is due to receive the hot reset and uses the functional level reset protocol to propagate this reset to the shared function on the other end of the shared VO network.
[0015] Figure 1 is a block diagram of a computer system 100 using a shared FO architecture in accordance with an exemplary embodiment. For illustration, the computer system is shown using PCI Express architecture, but exemplary embodiments are not limited to any particular type of PCI architecture.
[0016] Figure 1 shows a hierarchy that includes multiple root nodes or host computers 110 (shown as Root Node/Host- 1 to Root Node/Host-N) connected to an I/O fabric 120, multiple I/O adapters 125 (shown as I/O Adapter- 1 to I/O Adapter-N), and a management node 130. As shown in Figures 2 and 3, the root nodes connect to various devices (such as endpoints or endnodes, bridges, switches, etc.) through multiple PCI Express buses or links 160.
[0017] The root nodes 110 (also known as compute nodes) include a CPU 140, memory 145, and root complex 150 coupled through a host bus 155. The root complexes 150 connect to I/O adapters 125 and management node 130 through the I/O fabric 120. By way of example, the I/O fabric 120 includes one or more of ports, bridges, switches, etc.
[0018] The root complexes (RC) 150 denote the root of an I/O hierarchy that connects the CPU/memory subsystem to the I/O devices. A root complex can support one or more ports. [0019] Each interface defines a separate hierarchy domain, and each hierarchy domain includes a single endpoint or a sub-hierarchy containing one or more switch components and endpoints. The capability to route peer-to-peer (P2P) transactions between hierarchy domains through a root complex is optional and implementation dependent. For example, an implementation can include a real or virtual switch internally within the root complex to enable full peer-to-peer (P2P) support in a software transparent way.
[0020] A root complex 150 can function or support one or more of the following: support generation of configuration requests as a requester, support the generation of I/O requests as a requester, and support generation of locked requests as a requester.
[0021] In one exemplary embodiment, the hosts 110 share a pool of resources through the I/O fabric 120 (which includes various devices conforming to the PCI Express specification). In this configuration, multiple different hosts can share I/O adapters 125 which can be single or multi-function adapters and ultimately end points (shown in more detail in Figures 2 and 3). Further, the hosts can be connected together (for example, to form a Symmetric Multi-Processing (SMP) system) or can be independent nodes.
[0022] The management node 130 configures shared resources and assigns resources to the hosts 110. The management node 130 can be attached to the I/O fabric 120 or included in one of the hosts.
[0023] Figure 2 is a block diagram of a computer system using a shared I/O architecture and showing a view as seen by a single host computer (i.e., a view of the network from the perspective of the host) in accordance with an exemplary embodiment. The host 1 10 connects to a switch 230 and PCI/Express to PCI/PCI-X Bridge 265 through multiple PCI Express buses or links 260. The switch, in turn, connects to multiple endpoints or endnodes which include PCI Express endpoints 220. In one embodiment, the PCI Express endpoints 220 are disaggregated from the switch 230. In other words, the endpoints are not physically connected to the ports 270B.
[0024] Endpoints (shown in Figures 2 and 3) include both virtual endpoints and actual or physical endpoints. A physical or actual endpoint is a device or collection of devices that can be a requester or completer of a PCI transaction either on its own behalf or on behalf of a distinct non-PCI device (other than a PCI device or host CPU), e.g., a PCI Express attached graphics controller, a PCI Express-USB host controller, etc. or other I/O device (such as a disk drive). By contrast, virtual endpoints represent devices that are not actually and physically present and/or connected to the computer system. Thus, a host 110 detects or believes that physical devices are connected to slots/ports in the computer system, but in reality no physical device actually exists.
[0025] As shown, the switch 230 includes a plurality of ports 270 and plurality of virtual PCI-PCI bridges 275. For illustration, switch 230 is shown with one upstream port 270A and three downstream ports 270B. More upstream and downstream ports can be provided to accommodate connections with the multiple hosts (shown in Figure 1). The switch connects or communicates with one or more physical endpoints 220 through PCI links 260.
[0026] The switch follows one or more of the following rules: switches appear to configuration software as two or more logical PCI-to-PCI Bridges, a switch forwards transactions using PCI bridge mechanisms (such as address based routing), and a switch forwards various types of transaction layer packets between sets of ports.
[0027] In one embodiment, each PCI Express link 260 is mapped through a virtual PCI-to-PCI bridge structure and has a logical PCI bus associated with it. The virtual PCI- to-PCI Bridge structure can be part of a PCI Express root complex port, a switch upstream port, or a switch downstream port. A root port is a virtual PCI-to-PCI bridge structure that originates a PCI Express hierarchy domain from a PCI Express root complex. Devices are mapped into configuration space such that each will respond to a particular device number.
[0028] Figure 3 is a block diagram of a computer system 300 using a shared I/O architecture and showing connection of multiple hosts computers to multiple shared endpoints or VO platforms. The computer system 300 includes a plurality of compute nodes 310 connected a management node 320 and to a plurality of endpoints or I/O platforms 330 through a switch platform 345.
[0029] Each compute node 310 includes a bridge or Cnode 340 having a network configuration (shown as box "Network Config") and one or more upstream P2P ports (shown as box "Upstream P2P") and downstream P2P ports (shown as box "Downstream P2P"). The Cnode 340 connects to a downstream port (shown as box "Downstream P2P") in switch platform 345.
[0030] The switch platform 345 includes one or more upstream P2P ports (shown as box "Upstream P2P") and downstream P2P ports (shown as box "Downstream P2P"). These ports couple the compute nodes 310, management node 320, and FO platforms 300 together.
[0031] Each I/O platform 330 includes an Enode 350 and an end device 360. Further, the Enodes 350 include a virtual root (shown as box "Virtual Root") and a network configuration (shown as box "Network Config"). The end devices 360 are multifunctional and include a first function (shown as box "(funct O)") and a second function (shown as box "(funct I)").
[0032] Figure 4 is a flow diagram for converting hot resets to function level resets in a computer system using a shared I/O architecture in accordance with an exemplary embodiment. The method of Figure 4 can be implemented in the computer systems shown in Figures 1 - 3.
[0033] For illustration, Figure 4 is discussed in connection with Figure 5 which shows portions of an exemplary computer system 500 using a shared I/O architecture. The computer system 500 includes plural hosts 510 that connect to plural multi-functional devices 520 (one device being shown for illustration) through plural Cnodes 530, a PCI- Express switch 540, and an Enode 550. Also shown is a middle manager 560 coupled to the PCI-Express switch 540. [0034] According to block 400, functions are bound to hosts. By way of example, multi-functional device 520 is shown with five different functions (shown as boxes FO to F4). For illustration, host A is bound to one function (Fl), and host B is bound to another function (F3). By way of further example, the multi-function device 520 can be an Ethernet device with each function (FO to F4) being a shared I/O device.
[0035] According to block 410, a host wants to reset a function and propagates a host reset. Resets can occur for various reasons. For example, a host can receive errors from a device and desire to reset it.
[0036] Host A is shown to issue a hot reset for one function (Fl), and host B is shown to issue a hot reset for another function (F3).
[0037] Host A only sees or detects a single function device and hence is unaware of other functions (namely, FO, F2, F3, and F4). From the perspective of host A, multifunction device 520 is actually a single function device with one function (i.e., function Fl). If the hot reset issued by host A were not converted to a function level reset (FLR), then host A would inadvertently reset all functions at the multi-function device 520. In other words, host A would reset functions (namely, namely, FO, F2, F3, and F4) not bound to host A.
[0038] Likewise, host B only sees or detects a single function device and hence is unaware of other functions (namely, FO, Fl , F2, and F4). From the perspective of host B, multi -function device 520 is actually a single function device with one function (i.e., function F3). If the hot reset issued by host B were not converted to a function level reset (FLR), then host B would inadvertently reset all functions at the multi-function device 520. In other words, host B would reset functions (namely, namely, FO, Fl, F2, and F4) not bound to host B.
[0039] According to block 420, a Cnode (or bridge) receives the hot reset. The hot reset from host A propagates to the virtual bridge (shown in box "Virtual Bridge") of Cnode A. Likewise, the hot reset from host B propagates to the virtual bridge (shown in box "Virtual Bridge") of Cnode B. The Cnodes are virtual bridges that are seen as being bridges with endpoints directly behind them. The Cnodes are seen as endnodes or endpoints to the middle manager 560. In other words, the hosts do not see the PCI- Express switch 540 or Enode 550.
[0040] According to block 430, the Cnode determines the destination I/O device for the received hot reset. Thus, for host A, the Cnode A determines that the hot reset is destined for one function (Fl ). For host B, the Cnode B determines that the hot reset is destined for another function (F3).
[0041] According to block 440, the Cnode transforms the hot reset into a function level reset (FLR) and routes the FLR to the destination. In other words, the initial hot reset is converted into a FLR and the propagated as a FLR. As shown in Figure 5, Cnode A receives the hot reset, converts it to a FLR A, and propagates the FLR A to PCI-Express switch 540. Likewise, Cnode B receives the hot reset, converts it to a FLR B, and propagates the FLR B to PCI-Express switch 540.
[0042] In one exemplary embodiment, the Cnode builds a configuration cycle to perform the function level reset. The Cnode encapsulates the configuration cycle into a header of a message or packet for routing to the PCI-Express switch 540. The payload of the message or packet contains the configuration cycle.
[0043] According to block 450, the switch receives the function level reset and routes it to the Enode. As shown in Figure 5, PCI-Express switch 540 receives the FLR A from Cnode A and FLR B from Cnode B. The switch propagates these FLRs to Enode 550.
[0044] According to block 460, the Enode receives the function level reset and determines the function to receive the FLR. In one embodiment, the Enode decodes the header, retrieves the payload, and determines which function in the multi-function device will receive the function level reset.
[0045] As shown in Figure 5, Enode 550 receives the FLR A and FLR B. Each of these FLRs is decoded and routed to the correct function. FLR A is routed to one function (Fl) since this function is bound to host A. FLR B is routed to routed to another function (F3) since this function is bound to host B.
[0046] According to block 470, the function receives the function level reset and resets the appropriate function. For Figure 5, function Fl is reset according to FLR A, and function F3 is reset accordin *gc- to FLR B.
[0047] In general, resets provide a hardware mechanism for returning port states to an initial or specified condition. Resets can be provided as a signal from one device to another device, such as a component or adapter card. A function level reset (FLR) is a specific type of reset that enables software to quiesce (i.e., temporarily disable or make inactive) and reset endpoint hardware with function-level granularity. The following three examples illustrate a FLR.
[0048] As one example, in some systems, it is possible that the software entity that controls a function will cease to operate normally. To prevent data corruption, it may be necessary to stop all PCI Express and external I/O (not PCI Express) operations being performed by the function. Other defined reset operations do not guarantee that external I/O operations will be stopped. As another example, in a partitioned environment where hardware is migrated from one partition to another, it may be necessary to ensure that no residual "knowledge" of the prior partition be retained by hardware, for example, a user's secret information entrusted to the first partition but not to the second. Further, due to the wide range of functions, it may be necessary that this be done in a function independent way. As a third example, when system software is taking down the software stack for a function and then rebuilding that stack, it is sometimes necessary to return the state to an uninitialized state before rebuildin *teg the function's software stack.
[0049] FLR applies on a per function basis, and only the targeted function is affected by the FLR operation. Furthermore, the link state is not affected by the FLR (unlike a hot reset which does affect the link state). FLRs modify the function state of the device since registers and function- specific state machines are set to their initialization values. FLRs are quiescent on the link, and port state machines associated with link functionality are not reset by the FLR. Further, FLRs can be initiated to a multi-function device for resetting a specific function and not the entire multi-function device. Further information on FLRs and hot resets is found in PCI Express Base Specification Revision 2.0 (edition of December 20, 2006) and incorporated herein by reference.
[0050] Definitions: As used herein and in the claims, the following words and terms are defined as follows:
[0051] The word "bridge" means a device that connects two local area networks (LANs) or segments of a LAN using a same protocol (for example, Ethernet or token ring). For example, a bridge is a function that virtually or actually connects a PCI/PCI-X segment or PCI Express port with an internal component interconnect or with another PCI/PCI-X bus segment or PCI Express port.
[0052] The term "configuration space" means address spaces within the PCI architecture. Packets with a configuration space address are used to configure a function (i.e., an address entity) within a device.
[0053] The word "downstream" means a relative position of an interconnect/system element (port/component) that is farther from the root complex. For example, the ports on a switch that are not the upstream port are downstream ports. All ports on a root complex are downstream ports. Thus, downstream also includes a direction of information flow where the information is flowing away from the root complex.
[0054] The word "endpoint" or "endnode" means a device (i.e., an addressable electronic entity) or collection of devices that operate according to distinct sets of rules.
[0055] The word "function" means an addressable entity in configuration space. Function can also refer to one function of a single function device or multi-functional device.
[0056] The terms "function level reset" or "FLR" mean a mechanism for resetting a specific endpoint function (i.e., a specific function of a device). [0057] The word "hot-plug" or "hot swap" or the like means the ability to remove and replace an electronic component of a machine or system while the machine or system continues to operate. For example, hot swapping enables one or more devices (for example, hard drives) to be exchanged or serviced without impacting operation of an overall blade or enclosure in which the device is located. For instance, in the event of a failure, the individual hard drive is removed from the blade and replaced with a new or different hard drive. The new hard drive is connected to the blade without disrupting continuous operation of the blade while it remains in the enclosure.
[0058] The term "hot reset" means a reset propagated in-band across a link using a physical layer mechanism (i.e., a layer that directly interacts with the communication medium between two components).
[0059] The word "link" means collection of two ports and their interconnecting lanes. In PCI-Express architecture, a link is a dual simplex communications path between two components.
[0060] The acronym "PCI" means Peripheral Component Interconnect. The PCI specification describes the physical attributes of the bus, electrical characteristics, bus timing, communication protocols, and more. A PCI Special Interest Group (PCI-SIG) maintains and governs the specifications for various PCI architectures.
[0061] The word "port" logically means an interface between a component and a link (i.e., a communication path between two devices), and physically means a group of transmitters and receivers located on a chip that define a link.
[0062] The term "root complex" means a device or collection of devices that include a host bridge and one or more ports. For example, a host computer has a PCI to host bridging function that is a root complex. The root complex provides a bridge between a CPU bus (such as hyper- transport) and PCI bus.
[0063] The term "root node" means a host computer, computer system, or server. [0064] The word "switch" means a device or collection of devices that connects two or more ports to allow packets to be routed from one port to another. To configuration software, a switch appears as a collection of virtual PCI-to-PCI bridges.
[0065] The word "virtual" means not real and distinguishes something (for example, a device) that is merely conceptual from something that has physical reality. As one example, a host can see or detect a virtual endpoint as being a physical endpoint when in fact a physical endpoint is not actually connected to the bus (the device being imaginary but detected or believed to exist by the host). The opposite of virtual is real or physical.
[0066] The word "upstream" means a relative position of an interconnect/system element (port/component) that is closer to the root complex. For example, the ports on a switch that are closet topologically to the root complex are upstream ports. For example, the port on component that contains only an endpoint is an upstream port. Upstream also includes a direction of information flow where the information is flowing toward the root complex.
[0067] In one exemplary embodiment, one or more blocks or steps discussed herein are automated. In other words, apparatus, systems, and methods occur automatically. As used herein, the terms "automated" or "automatically" (and like variations thereof) mean controlled operation of an apparatus, system, and/or process using computers and/or mechanical/electrical devices without the necessity of human intervention, observation, effort and/or decision.
[0068] The methods in accordance with exemplary embodiments of the present invention are provided as examples and should not be construed to limit other embodiments within the scope of the invention. For instance, blocks in diagrams or numbers (such as (1), (2), etc.) should not be construed as steps that must proceed in a particular order. Additional blocks/steps may be added, some blocks/steps removed, or the order of the blocks/steps altered and still be within the scope of the invention. Further, methods or steps discussed within different figures can be added to or exchanged with methods of steps in other figures. Further yet, specific numerical data values (such as specific quantities, numbers, categories, etc.) or other specific information should be interpreted as illustrative for discussing exemplary embodiments. Such specific information is not provided to limit the invention.
[0069] In the various embodiments in accordance with the present invention, embodiments are implemented as a method, system, and/or apparatus. As one example, exemplary embodiments and steps associated therewith are implemented as one or more computer software programs to implement the methods described herein. The software is implemented as one or more modules (also referred to as code subroutines, or "objects" in object-oriented programming). The location of the software will differ for the various alternative embodiments. The software programming code, for example, is accessed by a processor or processors of the computer or server from long-term storage media of some type, such as a CD-ROM drive or hard drive. The software programming code is embodied or stored on any of a variety of known media for use with a data processing system or in any memory device such as semiconductor, magnetic and optical devices, including a disk, hard drive, CD-ROM, ROM, etc. The code is distributed on such media, or is distributed to users from the memory or storage of one computer system over a network of some type to other computer systems for use by users of such other systems. Alternatively, the programming code is embodied in the memory and accessed by the processor using the bus. The techniques and methods for embodying software programming code in memory, on physical media, and/or distributing software code via networks are well known and will not be further discussed herein.
[0070] The above discussion is meant to be illustrative of the principles and various embodiments of the present invention. Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.

Claims

What is claimed is:
1 ) A method, comprising: propagating a first type of reset from a host computer to a multi-function device that shares Input/Output (I/O) operations with other hosts; and converting the first type of reset to a second type of reset while the first type of reset is in transit to the multi-function device to prevent the host from resetting functions bound to the other hosts at the multi-function device.
2) The method of claim 1 , wherein the first type of reset is a hot reset according to a Peripheral Component Interconnect (PCI) specification, and the second type of reset is a function level reset (FLR) according to the PCI specification.
3) The method of claim 1 , wherein the first type of reset resets a link on which a function resides at the multi-function device, and the second type of reset only resets a single function at the multi-function device.
4) The method of claim 1, wherein the first type of reset is converted to the second type of reset so the host only resets a function bound to the host at the multifunction device.
5) The method of claim 1, wherein the multi -function device is seen by the host as being directly attached to the host and residing on a blade server.
6) The method of claim 1 further comprising, converting the first type of reset to the second type at a virtual Peripheral Component Interconnect (PCI) bridge.
7) The method of claim 1 further comprising: propagating the first type of reset to a Peripheral Component Interconnect (PCI) bridge; converting the first type of reset to the second type of reset at the PCI bridge; propagating the second type of reset from the PCI bridge to a PCI switch and then to the multi-function device.
8) A tangible computer readable storage medium having instructions for causing a computer to execute a method, comprising: propagating a Peripheral Component Interconnect (PCI) hot reset to a peripheral device; and converting the PCI hot reset into a function level reset (FLR) while the PCI hot reset is in transit to the peripheral device.
9) The tangible computer readable storage medium of claim 8 further comprising: generating the PCI hot reset at a root node; converting the PCI hot reset to the FLR at a bridge between the root node and peripheral device.
10) The tangible computer readable storage medium of claim 8 further comprising, using the FLR to reset only a function bound to a root node that generated the PCI hot reset.
11) The tangible computer readable storage medium of claim 8, wherein the peripheral device includes multiple functions that are shared among plural separate host computers in a computer system.
12) The tangible computer readable storage medium of claim 8 further comprising, propagating the FLR through switches that are transparent to a host computer that generated the PCI hot reset.
13) The tangible computer readable storage medium of claim 8, wherein the PCI hot reset resets multiple functions bound to different hosts, and the FLR resets only a single function bound to one host. 14) The tangible computer readable storage medium of claim 8 further comprising: generating the PCI hot reset at a host computer; preventing the host computer from resetting functions at the peripheral device that are not bound to the host computer by converting the PCI hot reset to the FLR.
15) The tangible computer readable storage medium of claim 8 further comprising: building a configuration cycle to perform the FLR; encapsulating the configuration cycle into a message; routing the message to the peripheral device.
16) The tangible computer readable storage medium of claim 8 further comprising: receiving the FLR at a node before the peripheral device; retrieving a payload in the FLR at the node to determine which function in the peripheral device to route the FLR; routing the FLR to the function.
17) A computer system, comprising: a memory that stores an algorithm; and a processor that executes the algorithm to: propagate a Peripheral Component Interconnect (PCI) hot reset to a peripheral device having resources that are shared among plural host computers in the computer system; and transform the PCI hot reset into a function level reset (FLR) while the PCI hot reset is in transit to the peripheral device.
18) The computer system of claim 17, wherein the peripheral device is an Ethernet device with multiple functions bound to multiple host computers.
19) The computer system of claim 17 further comprising, a PCI switch that receives the FLR and forwards the FLR to the peripheral device. 0) The computer system of claim 17 further comprising, a virtual bridge that receives the PCI hot reset and forwards the FLR to the peripheral device.
PCT/US2008/059066 2008-04-02 2008-04-02 Converting resets in shared i/o system WO2009123629A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US12/935,541 US8423698B2 (en) 2008-04-02 2008-04-02 Conversion of resets sent to a shared device
KR1020107024661A KR101445436B1 (en) 2008-04-02 2008-04-02 Converting resets in shared i/o system
JP2011502918A JP5182771B2 (en) 2008-04-02 2008-04-02 Reset conversion in shared I / O
PCT/US2008/059066 WO2009123629A1 (en) 2008-04-02 2008-04-02 Converting resets in shared i/o system
EP08744891A EP2260364B1 (en) 2008-04-02 2008-04-02 Converting resets in shared i/o system
CN200880128466.2A CN101983365B (en) 2008-04-02 2008-04-02 Converting resets in shared i/o system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2008/059066 WO2009123629A1 (en) 2008-04-02 2008-04-02 Converting resets in shared i/o system

Publications (1)

Publication Number Publication Date
WO2009123629A1 true WO2009123629A1 (en) 2009-10-08

Family

ID=41135866

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/059066 WO2009123629A1 (en) 2008-04-02 2008-04-02 Converting resets in shared i/o system

Country Status (6)

Country Link
US (1) US8423698B2 (en)
EP (1) EP2260364B1 (en)
JP (1) JP5182771B2 (en)
KR (1) KR101445436B1 (en)
CN (1) CN101983365B (en)
WO (1) WO2009123629A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819447A (en) * 2012-05-29 2012-12-12 中国科学院计算技术研究所 Direct I/O virtualization method and device used for multi-root sharing system

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5180729B2 (en) * 2008-08-05 2013-04-10 株式会社日立製作所 Computer system and bus allocation method
US7873068B2 (en) * 2009-03-31 2011-01-18 Intel Corporation Flexibly integrating endpoint logic into varied platforms
US8527745B2 (en) * 2009-12-07 2013-09-03 Oracle America, Inc. Input/output device including a host interface for processing function level reset requests and updating a timer value corresponding to a time until application hardware registers associated with the function level reset requests are available
CN102707991B (en) * 2012-05-17 2016-03-30 中国科学院计算技术研究所 The many virtual shared method and systems of I/O
US8843688B2 (en) * 2012-09-11 2014-09-23 International Business Machines Corporation Concurrent repair of PCIE switch units in a tightly-coupled, multi-switch, multi-adapter, multi-host distributed system
US9218310B2 (en) * 2013-03-15 2015-12-22 Google Inc. Shared input/output (I/O) unit
JP6090017B2 (en) * 2013-07-10 2017-03-08 富士ゼロックス株式会社 Computer system
JP6228793B2 (en) * 2013-09-24 2017-11-08 株式会社日立製作所 Computer system, computer system control method, and connection module
US11283734B2 (en) * 2014-04-30 2022-03-22 Intel Corporation Minimizing on-die memory in pull mode switches
CN105183533B (en) * 2014-05-26 2018-09-28 华为技术有限公司 A kind of method, apparatus and system of bus virtualization
CN105335227B (en) * 2014-06-19 2019-01-08 华为技术有限公司 Data processing method, device and system in a kind of node
WO2016111677A1 (en) * 2015-01-06 2016-07-14 Hewlett-Packard Development Company, L.P. Adapter to concatenate connectors
US10102169B2 (en) 2015-08-10 2018-10-16 Microsemi Solutions (U.S.), Inc. System and method for port migration in a PCIE switch
US10817447B2 (en) * 2016-11-14 2020-10-27 Intel Corporation Input/output translation lookaside buffer (IOTLB) quality of service (QoS)
US10908998B2 (en) * 2017-08-08 2021-02-02 Toshiba Memory Corporation Managing function level reset in an IO virtualization-enabled storage device
KR102049251B1 (en) * 2018-05-29 2019-11-28 (주)넥스챌 Microgrid gateway of collecting data and control method thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5829048A (en) * 1996-06-03 1998-10-27 Emc Corporation Reset propagation for a multi-port storage controller
US6370644B1 (en) * 1998-04-07 2002-04-09 Micron Technology, Inc. Device for blocking bus transactions during reset
US20040158668A1 (en) * 2003-02-11 2004-08-12 Richard Golasky System and method for managing target resets
US20070156934A1 (en) * 2006-01-02 2007-07-05 Kuan-Jui Ho High-speed PCI Interface System and A Reset Method Thereof

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100445973C (en) 2002-04-17 2008-12-24 威盛电子股份有限公司 Method for arbitrating bus control right and its arbitrator
US7496045B2 (en) * 2005-07-28 2009-02-24 International Business Machines Corporation Broadcast of shared I/O fabric error messages in a multi-host environment to all affected root nodes
JP2007109045A (en) 2005-10-14 2007-04-26 Matsushita Electric Ind Co Ltd Reset controller and reset control method for signal processing circuit and reset control circuit insertion method for signal processing circuit
US20070240018A1 (en) * 2005-12-29 2007-10-11 Intel Corporation Functional level reset on a per device/function basis
US7813366B2 (en) * 2006-12-19 2010-10-12 International Business Machines Corporation Migration of a virtual endpoint from one virtual plane to another
US7979592B1 (en) * 2007-02-09 2011-07-12 Emulex Design And Manufacturing Corporation Virtualization bridge device
US8464260B2 (en) * 2007-10-31 2013-06-11 Hewlett-Packard Development Company, L.P. Configuration and association of a supervisory virtual device function to a privileged entity
US8176304B2 (en) * 2008-10-22 2012-05-08 Oracle America, Inc. Mechanism for performing function level reset in an I/O device
US8527745B2 (en) * 2009-12-07 2013-09-03 Oracle America, Inc. Input/output device including a host interface for processing function level reset requests and updating a timer value corresponding to a time until application hardware registers associated with the function level reset requests are available

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5829048A (en) * 1996-06-03 1998-10-27 Emc Corporation Reset propagation for a multi-port storage controller
US6370644B1 (en) * 1998-04-07 2002-04-09 Micron Technology, Inc. Device for blocking bus transactions during reset
US20040158668A1 (en) * 2003-02-11 2004-08-12 Richard Golasky System and method for managing target resets
US20070156934A1 (en) * 2006-01-02 2007-07-05 Kuan-Jui Ho High-speed PCI Interface System and A Reset Method Thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2260364A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819447A (en) * 2012-05-29 2012-12-12 中国科学院计算技术研究所 Direct I/O virtualization method and device used for multi-root sharing system
CN102819447B (en) * 2012-05-29 2015-06-03 中国科学院计算技术研究所 Direct I/O virtualization method and device used for multi-root sharing system

Also Published As

Publication number Publication date
CN101983365A (en) 2011-03-02
EP2260364B1 (en) 2013-03-20
EP2260364A1 (en) 2010-12-15
US8423698B2 (en) 2013-04-16
EP2260364A4 (en) 2012-02-01
KR20110000748A (en) 2011-01-05
KR101445436B1 (en) 2014-09-26
JP2011521310A (en) 2011-07-21
JP5182771B2 (en) 2013-04-17
US20110029710A1 (en) 2011-02-03
CN101983365B (en) 2013-03-27

Similar Documents

Publication Publication Date Title
EP2260364B1 (en) Converting resets in shared i/o system
US11194753B2 (en) Platform interface layer and protocol for accelerators
US8346997B2 (en) Use of peripheral component interconnect input/output virtualization devices to create redundant configurations
US8223745B2 (en) Adding packet routing information without ECRC recalculation
US8225005B2 (en) Use of peripheral component interconnect input/output virtualization devices to create high-speed, low-latency interconnect
US7934033B2 (en) PCI-express function proxy
US7506094B2 (en) Method using a master node to control I/O fabric configuration in a multi-host environment
US7610431B1 (en) Configuration space compaction
US7631050B2 (en) Method for confirming identity of a master node selected to control I/O fabric configuration in a multi-host environment
US10210120B2 (en) Method, apparatus and system to implement secondary bus functionality via a reconfigurable virtual switch
US7549003B2 (en) Creation and management of destination ID routing structures in multi-host PCI topologies
US20110029693A1 (en) Reserving pci memory space for pci devices
US20090276551A1 (en) Native and Non-Native I/O Virtualization in a Single Adapter
US20070136458A1 (en) Creation and management of ATPT in switches of multi-host PCI topologies
US11372787B2 (en) Unified address space for multiple links
WO2006115752A2 (en) Virtualization for device sharing
Tu et al. Seamless fail-over for PCIe switched networks
US20080301350A1 (en) Method for Reassigning Root Complex Resources in a Multi-Root PCI-Express System
Hanawa et al. Pearl: Power-aware, dependable, and high-performance communication link using pci express
CN115203110A (en) PCIe function and method of operating the same
Kong Using PCI Express® as the Primary System Interconnect in Multiroot Compute, Storage, Communications and Embedded Systems
KR20230152394A (en) Peripheral component interconnect express device and operating method thereof
Hanawa et al. Power-aware, dependable, and high-performance communication link using PCI Express: PEARL

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200880128466.2

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08744891

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 5943/CHENP/2010

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 12935541

Country of ref document: US

Ref document number: 2008744891

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2011502918

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20107024661

Country of ref document: KR

Kind code of ref document: A