WO2015035574A1 - 一种故障处理的方法、计算机系统和装置 - Google Patents

一种故障处理的方法、计算机系统和装置 Download PDF

Info

Publication number
WO2015035574A1
WO2015035574A1 PCT/CN2013/083325 CN2013083325W WO2015035574A1 WO 2015035574 A1 WO2015035574 A1 WO 2015035574A1 CN 2013083325 W CN2013083325 W CN 2013083325W WO 2015035574 A1 WO2015035574 A1 WO 2015035574A1
Authority
WO
WIPO (PCT)
Prior art keywords
endpoint device
memory address
status
endpoint
access request
Prior art date
Application number
PCT/CN2013/083325
Other languages
English (en)
French (fr)
Inventor
林沐晖
王俊捷
王瑞玲
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2013/083325 priority Critical patent/WO2015035574A1/zh
Priority to CN201380001454.4A priority patent/CN104756081B/zh
Priority to ES13882632.6T priority patent/ES2656464T3/es
Priority to EP13882632.6A priority patent/EP2869201B1/en
Priority to US14/549,395 priority patent/US9678826B2/en
Publication of WO2015035574A1 publication Critical patent/WO2015035574A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0745Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in an input/output transactions management context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0772Means for error signaling, e.g. using interrupts, exception flags, dedicated error registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0796Safety measures, i.e. ensuring safe condition in the event of error, e.g. for controlling element
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3027Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3041Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is an input/output interface
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3485Performance evaluation by tracing or monitoring for I/O devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/349Performance evaluation by tracing or monitoring for interfaces, buses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal

Definitions

  • Embodiments of the present invention relate to computer technology, and in particular, to a method, computer system, and apparatus for fault handling.
  • PCIe Peripheral Component Interconnect Express
  • RAS Reliable, Availability, Serviceability
  • an error packet may be generated, and the error packet is routed from the faulty device to the root complex. After the root complex obtains the error packet, a system interrupt is generated, and the system reports the fault to the operating system. Error message, the operating system performs error handling according to the error message.
  • the CPU or other PCIe endpoint device and the faulty device can continue to access each other, which is ineffective. Isolating the faulty device may cause the fault to spread and affect the reliability of the system.
  • the embodiment of the invention provides a fault processing method, a computer system and a device, which can isolate the faulty device, prevent the spread of the fault, and improve the reliability of the system.
  • an embodiment of the present invention provides a method for fault isolation, a computer system for PCIe interconnection, the computer system including a primary domain and an extended domain, where the primary domain includes a root complex, The first endpoint device and the root complex endpoint device, the extended domain includes the root complex endpoint device and the second endpoint device, and the method includes:
  • the device status record includes a correspondence between the identifier information of the second endpoint device and a state of the second endpoint device; receiving an access request, the access request Include the access request of the second endpoint device to the primary domain or the access request of the primary domain to the second endpoint device;
  • the access request is discarded to block communication between the second endpoint device and the primary domain.
  • the monitoring, by the second endpoint device, the status of the second endpoint device includes: receiving an error message sent by the second endpoint device, or receiving the indication Determining, by the second endpoint device, a device probe response message; determining, according to the error message or the device probe response message, a status of the second endpoint device.
  • the identifier information of the second endpoint device includes a first memory address of the second endpoint device, and the first memory The address is a memory address of the second endpoint device in the primary domain; the device establishes a device state record according to the state of the second endpoint device, where the device state record includes the identifier information and location of the second endpoint device Corresponding relationship of the state of the second endpoint device, including: acquiring the bus/device/function BDF identifier or the second memory address of the second endpoint device carried in the error message or the device probe response message, where The second memory address is a memory address of the second endpoint device in the extended domain; acquiring a first memory address of the second endpoint device according to the BDF identifier or the second memory address; Corresponding relationship between the first memory address of the second endpoint device and the state of the second endpoint device is recorded.
  • Obtaining the first memory address of the second endpoint device according to the BDF identifier or the second memory address including: according to the saved second memory address of the second endpoint device and the first memory address a mapping relationship, the second memory address is converted into a first memory address of the second endpoint device; or, according to the saved mapping relationship between the BDF identifier of the second endpoint device and the second memory address, Translating, by the second memory address of the second endpoint device, the second memory address into the second according to a mapping relationship between the saved second memory address of the second endpoint device and the first memory address The first memory address of the endpoint device.
  • the first memory address of the second endpoint device includes the first memory of the configuration space access An address, a first memory address interrupted by the message signal, a first memory address accessed by the memory mapped input and output, and a first memory address accessed by the direct memory access; then, the second record is recorded in the device status record
  • Corresponding relationship between the first memory address of the endpoint device and the state of the second endpoint device includes: recording, in the device state record, each first memory address of the second endpoint device and the second endpoint device The correspondence of states.
  • an embodiment of the present invention provides a fault isolation apparatus, a computer system for PCIe interconnection, the computer system includes a primary domain and an extended domain, and the primary domain includes a root complex and a first endpoint device. And the root complex endpoint device, the extended domain includes the root complex endpoint device and the second endpoint device, and the device includes:
  • a monitoring unit configured to monitor a status of the second endpoint device of the extended domain
  • a recording unit configured to establish a device status record according to the state of the second endpoint device, where the device status record includes a correspondence between the identifier information of the second endpoint device and a state of the second endpoint device;
  • a receiving unit configured to receive an access request, where the access request includes an access request of the second endpoint device to the primary domain or an access request of the primary domain to the second endpoint device;
  • a determining unit configured to query the device status record according to the identifier information of the second endpoint device in the access request, and determine a status of the second endpoint device;
  • the processing unit is configured to discard the access request to prevent communication between the second endpoint device and the primary domain when the state of the second endpoint device is a fault state.
  • the monitoring unit is specifically configured to: receive an error message sent by the second endpoint device, or receive a device for indicating whether the second endpoint device exists And detecting a response message, determining a status of the second endpoint device according to the error message or the device probe response message.
  • the identifier information of the second endpoint device includes a first memory address of the second endpoint device, and the first memory The address is a memory address of the second endpoint device in the primary domain;
  • the recording unit specifically includes:
  • An address conversion module subunit configured to: when the monitoring unit determines the fault state of the second endpoint device, obtain the error message or the BDF identifier of the second endpoint device that is carried in the device probe response message or a second memory address, the first memory address of the second endpoint device is obtained according to the BDF identifier or the second memory address, where the second memory address is the second endpoint device in the extended domain Memory address
  • a status recording subunit configured to record, in the device status record, a correspondence between a first memory address of the second endpoint device and a state of the second endpoint device.
  • the address translation module subunit is further configured to save the second memory address and the first memory address of the second endpoint device a mapping relationship between the BDF identifier and the second memory address of the second endpoint device; the address translation module sub-unit is specifically configured to be used according to the saved second endpoint device a mapping relationship between the memory address and the first memory address, converting the second memory address to a first memory address of the second endpoint device; or, according to the saved BDF identifier of the second endpoint device a mapping relationship between the two memory addresses, the second memory address of the second endpoint device is obtained, according to the saved mapping relationship between the second memory address of the second endpoint device and the first memory address, Converting the second memory address to the second The first memory address of the endpoint device.
  • the first memory address of the second endpoint device includes the first memory of the configuration space access An address, a first memory address interrupted by the message signal, a first memory address accessed by the memory mapped input and output, and a first memory address accessed by the DMA; the address conversion module subunit is specifically configured to record each of the second endpoint devices Corresponding relationship between the first memory address and the second endpoint device.
  • the device state record is established according to the state of the second endpoint device, the second endpoint device and the host are received.
  • the device status record is queried according to the identifier information of the second endpoint device in the access request, and the status of the second endpoint device is determined, if the second endpoint device is The state is a fault state, and the access request is discarded, thereby preventing communication between the faulty second endpoint device and the primary domain, preventing the fault from spreading to the primary domain, and ensuring system reliability.
  • FIG. 1 is a system block diagram of a computer system including a PCIe primary domain and an extended domain according to an embodiment of the present invention
  • Figure 2 (a) is a memory address allocation diagram of the endpoint device of the primary domain and the extended domain of the computer system shown in Figure 1;
  • Figure 2 (b) is a mapping relationship between the memory address of the primary domain of the computer system shown in Figure 1 and the memory address of the extended domain;
  • FIG. 3 is a flowchart of a fault isolation method according to Embodiment 1 of the present invention
  • 4 is a flowchart of a fault isolation method according to Embodiment 2 of the present invention
  • FIG. 5 is a flowchart of a fault isolation method according to Embodiment 3 of the present invention.
  • FIG. 6 is a flowchart of a fault isolation method according to Embodiment 4 of the present invention.
  • FIG. 7 is a structural diagram of a fault isolation device according to an embodiment of the present invention.
  • FIG. 8 is a structural diagram of a fault isolation device according to an embodiment of the present invention.
  • FIG. 9 is a structural diagram of a fault isolation system according to an embodiment of the present invention.
  • FIG. 10 is a structural diagram of a fault isolation device according to an embodiment of the present invention. detailed description
  • the embodiment of the present invention provides a method, a computer system and a device for fault isolation, which are used for a computer system including a PCIe primary domain and an extended domain, where the root complex endpoint device of the extended domain is an endpoint device of the primary domain, When the endpoint device of the extended domain is faulty, the embodiment of the present invention can prevent mutual access between the primary domain and the endpoint device of the extended domain, avoiding the spread of the fault, and ensuring the availability of the system.
  • the primary domain 100 includes a root complex (RC) 102, a switch (Switch) 104, and at least one PCIe endpoint device 107.
  • the root complex 102 passes through the root port 103.
  • the upstream port 104A of the switch 104 is connected, and the downstream port 104B of the switch 104 is connected to the PCIe endpoint device 107 such that the root complex 102 is connected to the PCIe endpoint device 107 through the switch 104.
  • the root complex 102 can be integrated on the main CPU 101.
  • the primary domain 100 is exemplified by a switch. In other embodiments, the primary domain 100 may further include multiple switches, each of which is switched.
  • the device can be connected to one or more PCIe endpoint devices.
  • the root complex 102 is configured to process and forward requests between the primary CPU 101 and the PCIe endpoint device 107, and the switch 104 is configured to route requests downstream to the downstream port 104B.
  • the PCIe endpoint device, and routing the request upstream from each independent downstream port to a single root complex, can also be used to route requests from one downstream port to another downstream port, the PCIe endpoint device 107 having a
  • the function of requesting and completing PCIe transaction processing, the PCIe endpoint device 107 may be a storage device, a network card, a sound card, or the like.
  • the RCEP 106 is included in the PCIe endpoint device 107 in the primary domain 100, the RCEP
  • 106 can not only initiate requests and complete PCIe transaction processing, but also can implement the connection between the extended domain 118 and the primary domain 100, and manage and forward the extended domain 118 with the hardware modules and device drivers that add the same functions as the root complex.
  • a request between primary domains 100 As shown in FIG.
  • the extended domain 118 includes: an RCEP 106 as an extended domain root complex, a switch 112, and second endpoint devices 114 and 116 (the second endpoint device may be multiple or multiple The second endpoint device 114 and 116 are respectively connected to the RCEP 106 through the switch 112, and the second endpoint devices 114 and 116 may be storage devices, network cards, For the sound card or the like, the extended domain 118 may also have multiple root ports and multiple switches, and multiple switch devices may be connected under each switch.
  • FIG. 2 is a memory address allocation diagram of an endpoint device of a primary domain and an extended domain in the computer system shown in FIG. 1, and a 64-bit physical address 202 of the primary CPU 101 (specifically, a memory mapped input/output memory Mapped Input/Output)
  • the MMIO address can be divided into a memory address 203 of the primary domain and a memory address 204 of the extended domain.
  • a memory address is assigned to the endpoint device of each primary domain, for example, a memory address is allocated for the RCEP 106 and the first endpoint device 108.
  • a portion of the MMIO address 202 is assigned to the RCEP 106, that is, the memory address 205 of the RCEP 106, and another portion of the MMIO address 202 is assigned to the first endpoint device 108. That is, the memory address 210 of the first endpoint device 108, since the RCEP 106 and the first endpoint device 108 are both endpoint devices of the primary domain, the memory address 205 and the memory address 210 are combined.
  • the memory address 203 of the primary domain is assigned to the endpoint device of each primary domain, for example, a memory address is allocated for the RCEP 106 and the first endpoint device 108.
  • the system When the system loads the driver of the RCEP 106, the system detects the location of the extended domain a driver of the second endpoint device 114 and 116, triggering scanning of all second endpoint devices of the entire extended domain, and allocating a memory address for each second endpoint device of the extended domain, specifically, the MMIO address 202 Part of being allocated to the second endpoint device of the extended domain, that is, the second endpoint device is assigned a second memory address (the second memory address is a memory address of the second endpoint device in the extended domain) Representing the second endpoint device in the extended domain, such as the second memory addresses 206 and 207 of the second endpoint devices 114 and 116 shown in FIG.
  • the second memory devices 206 and 207 of the second endpoint device are allocated by taking the two second endpoint devices as an example.
  • the second endpoint device needs to be allocated a first memory address (the first memory address is the second endpoint device in the primary The memory address of the domain used to represent the second endpoint device in the primary domain.
  • the first memory address of the second endpoint device is mainly all or a part of the memory address of the RCEP, and the memory address 205 of the RCEP is segmented according to the number of the second endpoint device in the extended domain.
  • the memory address 205 of the RCEP is divided into n parts, and each part of the memory address corresponds to a second endpoint device, as shown in FIG. 2( a )
  • the memory address 205 of the RCEP is divided into two parts 208 and 209, where 208 corresponds to the memory address 206 of the second endpoint device 114, is the first memory address of the second endpoint device 114, 209 and the second endpoint device 116
  • the memory address 207 corresponds to the first memory address of the second endpoint device 114.
  • the second endpoint devices 114 and 116 of the extended domain perform message interaction with the primary domain, four access modes may be used, specifically, configuration space access, Message Signal Interrupts (MSI) access, and memory. Mapping the input and output MMIO access and the direct memory access (DMA) access, the second memory address 206 and the first memory address 208 allocated by the system to the second endpoint device may actually be four types of memory respectively.
  • MSI Message Signal Interrupts
  • DMA direct memory access
  • the second memory address 206 of the second endpoint device 114 can be divided into: 206a, 206b, 206c and 206d are used for configuration space access, MSI access, MMIO access, and DMA access, respectively, to the second endpoint device 114.
  • the system allocates the four types of memory addresses to the second endpoint device 114, the system further cuts the first memory address 208 corresponding to the second endpoint device 114 into four parts of the memory address of the RCEP.
  • the configuration space address 208a, the MSI address 208b, the MMIO address 208c, and the DMA address 208d, and the first memory addresses 208a, 208b, 208c, and 208d of the second endpoint device are respectively associated with the second endpoint device 114.
  • the two memory addresses 206a, 206b, 206c, and 206d have a mapping relationship.
  • the mapping relationship may be represented by an address offset relationship. For example, a first address offset relationship exists between 208a and 206a, and a relationship exists between 208b and 206b. There is a third address offset relationship between 208c and 206c, and a fourth address offset relationship between 208d and 206d.
  • the mapping relationship between the first memory address of the second endpoint device and the second memory address of the second endpoint device may be stored in the RCEP 106, for example, an address translation module stored in the RCEP 106, the address The conversion module saves the address offset relationship, and the address translation module may perform address translation according to a mapping relationship between the saved second memory address and the first memory address.
  • the system allocates a second memory address for each second endpoint device of the extended domain, and allocates for each second endpoint device of the extended domain.
  • a bus/device/function (BDF) identifier a mapping relationship between the BDF identifier of the second endpoint device and the second memory address of the second endpoint device, where the RCEP can save the location
  • the address translation module of the RCEP 106 stores a mapping relationship between the BDF identifier of the second endpoint device 114 and the second memory address, so that the RCEP 106 is based on the saved second endpoint.
  • the mapping between the BDF identifier of the device 114 and the second memory address performs a mutual conversion between the BDF identifier of the second endpoint device 114 and the second memory address.
  • the second endpoint device 114 fails, due to an interrupt message generated from the second endpoint device 114 to the operating system processing the interrupt message, there is a time window in which the fault of the extended domain is
  • the two endpoint devices 114 may still be in phase with other endpoint devices. Inter-access, for example, the second endpoint device 114 communicates with other devices on the endpoint device in the primary domain or through the CPU of the primary domain, or the other endpoint device of the CPU or primary domain may also access the second endpoint Device 114 makes access, however, access or communication associated with the failed second endpoint device 114 may cause other devices to fail, such as causing the first endpoint device 108 to fail, or causing the CPU to perform unnecessary duplication. Error message processing, which affects system performance, seriously affects system reliability.
  • the embodiment of the present invention provides a fault isolation method, which is used to prevent mutual access between the primary domain and the extended domain endpoint device when the endpoint device of the extended domain fails, and prevent the fault from spreading to the primary domain.
  • a flowchart of a fault isolation method is provided for a PCIe interconnected computer system, where the computer system includes a primary domain and an extended domain, and the primary domain is composed of a root complex and a first
  • the endpoint device is formed with the RCEP
  • the extended domain is formed by the RCEP and the second endpoint device, and the method includes:
  • the status of the second endpoint device may include a fault state indicating that the second endpoint device is faulty and failing to perform normal operation, and the non-failure state indicating that the second endpoint device of the extended domain is working normally.
  • the RCEP monitors the status of the second endpoint device in the extended domain, and may receive an error message sent by the second endpoint device, or receive a device probe response indicating whether the second endpoint device exists. And determining, according to the error message or the device probe response message, a status of the second endpoint device.
  • the device status record includes a correspondence between the identifier information of the second endpoint device and a state of the second endpoint device.
  • the RCEP may establish a device status record according to the status of the second endpoint device.
  • the device status record includes a correspondence between the identifier information of the second endpoint device and the state of the second endpoint device, so that the RCEP may determine the second endpoint device according to the identifier information of the second endpoint device. status.
  • the access request is routed when the second endpoint device of the extended domain accesses the primary domain by using the access request, or when the primary domain accesses the second endpoint device by using the access request.
  • the RCEP receives the access request.
  • 104 Query the device status record according to the identifier information of the second endpoint device in the access request, and determine a status of the second endpoint device.
  • the access request carries the identifier information of the second endpoint device
  • the RCEP may query the correspondence between the identifier information of the second endpoint device and the state of the second endpoint device in the device state record, and determine The status of the second endpoint device.
  • monitoring a state of the second endpoint device of the extended domain and establishing a device state record according to the state of the second endpoint device, where the device state record includes the second endpoint device Corresponding relationship between the identifier information and the state of the second endpoint device, after receiving the access request between the second endpoint device and the primary domain, according to the second endpoint device in the access request Identifying information, querying the device status record, determining a status of the second endpoint device, and discarding the access request if the status of the second endpoint device is a fault state, thereby preventing the faulty second endpoint device from
  • the communication between the primary domains prevents faults from spreading to the primary domain, thereby ensuring system reliability.
  • a flowchart of a fault isolation method is provided for a PCIe interconnected computer system, where the computer system includes a primary domain and an extended domain, and the primary domain is root complex, An endpoint device is formed with an RCEP, the extended domain is formed by the RCEP and a second endpoint device, and the second endpoint device performs the root complex or the first endpoint device in the primary domain by using the RCEP Communication interaction, the method may include:
  • the state of the second endpoint device includes a fault state indicating that the second endpoint device is faulty and cannot perform normal operation, and a non-failure state indicating that the second endpoint device of the extended domain can work normally.
  • the monitoring, by the RCEP, the status of the second endpoint device of the extended domain includes: receiving an error message sent by the second endpoint device, or receiving a device probe response message indicating whether the second endpoint device exists; Determining a status of the second endpoint device according to the error message or the device probe response message.
  • the RCEP may send a device detection message to a configuration space register of the second endpoint device, and obtain a device probe response message returned by the second endpoint device, if the device probe response message indicates the second endpoint The device does not exist, indicating that the second endpoint device is faulty and cannot be detected, determining that the state of the second endpoint device is a fault state, otherwise determining that the state of the second endpoint device is a non-fault state; or
  • the RCEP may further determine whether the error message belongs to an error message repeatedly sent, if An error message that is sent repeatedly, indicating that the second endpoint device has sent an error message to the primary domain for corresponding error processing, and discards the error.
  • the message avoids unnecessary repeated error message processing by the CPU, and ensures the reliability of the system. If it is not an error message repeatedly sent, it indicates that the error message is an error message sent by the second endpoint device for the first time.
  • the RCEP sends the error message to the CPU, so that the CPU performs error processing on the second endpoint device.
  • the determining whether the error message is an error message that is repeatedly sent specifically includes:
  • the state of the second endpoint device is a fault state, determining that the error message belongs to an error message that is repeatedly sent, and if it is determined that the state of the second endpoint device is a non-fault state, determining that the error message belongs to a duplicate The error message sent.
  • the first memory address is a memory address of the second endpoint device in the primary domain, and is used to represent the second endpoint device in a primary domain, where the second memory address is the second endpoint The memory address of the device in the extended domain, for representing the second endpoint device in the extended domain.
  • the device status record includes a correspondence between a first memory address of the second endpoint device and a status of the second endpoint device.
  • the RCEP establishes a device status record according to the status of the second endpoint device, where the device status record includes a first memory address of the second endpoint device and the second endpoint device Corresponding relationship between the states, so that the RCEP can determine the state of the second endpoint device according to the first memory address of the second endpoint device.
  • the establishing the device status record according to the status of the second endpoint device may include: acquiring the BDF identifier or the second memory address of the second endpoint device carried in the error message or the device probe response message;
  • the first memory address of the second endpoint device which may be:
  • the RCEP converts the second memory address into a first memory address of the second endpoint device according to a mapping relationship between the saved second endpoint address of the second endpoint device and the first memory address; or The RCEP first acquires the second memory address of the second endpoint device according to the mapping relationship between the saved BDF identifier of the second endpoint device and the second memory address, and then according to the saved second endpoint device. a mapping relationship between the memory address and the first memory address, converting the second memory address to a first memory address of the second endpoint device;
  • a correspondence between the first memory address of the second endpoint device and the state of the second endpoint device may be recorded in the device status record, or the BDF identifier of the second endpoint device and the first The corresponding relationship between the states of the two endpoint devices, so that the RCEP may further determine the state of the second endpoint device according to the first memory address or the BDF identifier of the second endpoint device.
  • the first memory address of the second endpoint device includes a first memory address accessed by the configuration space, a first memory address accessed by the MSI, a first memory address accessed by the MMIO, and a first memory address accessed by the DMA.
  • the second memory address of the second endpoint device includes a second memory address accessed by the configuration space, a second memory address accessed by the MSI, a second memory address accessed by the MMIO, and a second memory address accessed by the DMA, and the RCEP can be saved according to the The second endpoint device a mapping relationship between each of the second memory addresses and each of the first memory addresses, obtaining a first memory address of the configuration space access of the second endpoint device, a first memory address accessed by the MSI, and a first access of the MMIO The memory address and the first memory address of the DMA access;
  • the RCEP may acquire, according to a mapping relationship between the BDF identifier of the second endpoint device and each of the second memory addresses, a second memory address of the configuration space access of the second endpoint device, and a second memory accessed by the MSI.
  • the address, the second memory address accessed by the MMIO, and the second memory address accessed by the DMA and then according to the mapping relationship between each saved second memory address of the second endpoint device and each of the first memory addresses, a first memory address of the configuration space access of the second endpoint device, a first memory address accessed by the MSI, a first memory address accessed by the MMIO, and a first memory address accessed by the DMA;
  • the recording the correspondence between the first memory address of the second endpoint device and the state of the second endpoint device in the device status record specifically refers to: recording each type of the second endpoint device Corresponding relationship between a memory address and a state of the second endpoint device; and recording, in the device state record, a correspondence between a second memory address of the second endpoint device and a state of the second endpoint device Means: recording a correspondence between each second memory address of the second endpoint device and a state of the second endpoint device.
  • the access request may be an access request of the second endpoint device of the extended domain to access the primary domain, or may be the primary An access request for accessing the second endpoint device by the root complex of the domain or the first endpoint device of the primary domain, and when the access request is from the primary domain, the access request carries the second a first memory address of the endpoint device, when the access request is from the extended domain, the access request carries a second memory address of the second endpoint device or a BDF identifier of the second endpoint device.
  • 204 Query the identifier according to the identifier information of the second endpoint device in the access request.
  • a device status record that determines the status of the second endpoint device.
  • the identifier information of the second endpoint device includes one or a combination of the following information: a first memory address of the second endpoint device, and a second memory address of the second endpoint.
  • the access request Querying, according to the first memory address of the second endpoint device in the access request, the first memory of the second endpoint device recorded in the device state record, when the access request is from the primary domain Determining a state of the second endpoint device by using a correspondence between the address and the state of the second endpoint device, for example, when the access mode of the MMIO is adopted, the access request carries the MMIO access of the second endpoint device a memory address, the device status record records a correspondence between each first memory address of the second endpoint device and a state of the second endpoint device, and the RCEP may utilize the first of the access requests The first memory address accessed by the MMIO of the second endpoint device, the device status record is queried, and the state of the second endpoint device is determined.
  • the RCEP queries the device status record according to the second memory address or the BDF identifier of the second endpoint device in the access request, if the device status record The second memory address of the second endpoint device or the correspondence between the BDF identifier and the state of the second endpoint device is not recorded, and the second endpoint device is obtained according to the second memory address or the BDF identifier.
  • determining a state of the second endpoint device by querying a correspondence between a first memory address of the second endpoint device and a state of the second endpoint device recorded in the device state record, In the device status record, the second memory address of the second endpoint device or the correspondence between the BDF identifier and the state of the second endpoint device is recorded, and the second endpoint recorded in the device state record is directly queried. Determining the second endpoint device by the second memory address of the device or the correspondence between the BDF identifier and the state of the second endpoint device State, thus avoiding a second memory address converting device or the second endpoint to the second endpoint BDF identifying a first memory device address, to accelerate the process of determining a state of the device.
  • the second endpoint device 205 If the status of the second endpoint device is a fault state, discard the access request to block communication between the second endpoint device and the primary domain. If it is determined that the state of the second endpoint device is a fault state, discarding the access request to prevent the RCEP from forwarding the access request, thereby preventing communication between the second endpoint device and the primary domain .
  • the method may further include:
  • the fault isolation message is used to indicate that the CPU in the primary domain stops accessing the second endpoint device of the extended domain, and the fault isolation message carries the first memory address of the second endpoint device.
  • the RCEP may send a fault isolation message to the CPU of the primary domain, such that the CPU in the primary domain stops accessing the second endpoint device of the extended domain, for example
  • the driver of the failed second endpoint device may be unloaded, or the I/O path of the second endpoint device accessing the fault may be isolated.
  • the method may further include:
  • the access request is an access request sent by the primary domain, returning an analog response message of the access request to the primary domain.
  • the access request of the primary domain to access the second endpoint device is a non-post type access request
  • a response message needs to be returned for the access request, otherwise the primary domain may generate a return message.
  • the timeout error causes the computer system to restart, but after the second endpoint device fails, the access request may not reach the second endpoint device, or although the second endpoint device is reached, the second endpoint device
  • the RCEP may generate an analog response message for the access request, and return to the primary domain to avoid generating a return packet timeout error, causing the computer system to restart, the simulated response.
  • the message can be an Unsupported Request (UR) message or a Completion Abort (CA) message.
  • UR Unsupported Request
  • CA Completion Abort
  • the steps 206 and 206 are two optional steps, and the two do not have to be performed simultaneously. Steps.
  • the state of the second endpoint device of the extended domain is monitored, and the device state record is established according to the state of the second endpoint device, where the device state record includes the second endpoint device Corresponding relationship between the first memory address and the state of the second endpoint device, after receiving the access request between the second endpoint device and the primary domain, acquiring the second of the access requests Obtaining a first memory address of the endpoint device, or acquiring a first memory address of the second endpoint device according to the BDF identifier or the second memory address of the second endpoint device in the access request, and querying the device state record Corresponding relationship between the first memory address of the second endpoint device and the state of the second endpoint device, determining a state of the second endpoint device, and discarding if the state of the second endpoint device is a fault state
  • the access request thereby preventing communication between the failed second endpoint device and the primary domain, and also sending a faulty message to The CPU instructs the CPU to stop accessing the second endpoint device of
  • the device status record may further record a correspondence between a BDF identifier or a second memory address of the second endpoint device and a state of the second endpoint device, so that Querying the BDF identifier or the second memory address or the first memory address of the second endpoint device in the request, querying the device state record, and determining the state of the second endpoint device, avoiding the second
  • the conversion of the second memory address of the endpoint device or the BDF identifier to the first memory address of the second endpoint device speeds up the process of determining the state of the second endpoint device.
  • the error message sent by the second endpoint device when monitoring the state of the second endpoint device, receiving an error message sent by the second endpoint device, determining, by using the type of the error message, the second endpoint After the status bit of the device is in a fault state, the error message sent by the second endpoint device may be further determined to be an error message repeatedly sent. If the error message is repeatedly sent, the error message is discarded to block the error. The message reaches the primary domain, preventing the spread of errors, avoiding unnecessary repeated error message processing by the CPU, and ensuring the reliability of the system. Sex.
  • the fault isolation method provided by the embodiment of the present invention is as shown in FIG. 5.
  • the second endpoint device 116 of the extended domain is a faulty device, and the access mode of the DMA is used.
  • An endpoint device 108 sends a Non-post type access request to access the failed second endpoint device 116, the access request being first routed to the RCEP 106, due to the second endpoint device failure,
  • the access request may have crossed the boundary of the RCEP 106, i.e., may have been forwarded by the RCEP 106, and may not have crossed the boundary of the RCEP 106, i.e., has not been forwarded by the RCEP 106,
  • the method may specifically include: 301:
  • the RCEP 106 monitors states of all second endpoint devices of the extended domain.
  • the state of the device includes a fault state and a non-fault state
  • the RCEP 106 monitors the states of the second endpoint device 114 and the second endpoint device 116 of the extended domain, and specifically includes: receiving the second endpoint An error message sent by the device 114 or 116, or receiving a device probe response message indicating whether the second endpoint device 114 or 116 is present; determining the second endpoint according to the error message or the device probe response message The status of device 114 or 116.
  • the error message sent by the second endpoint device 116 includes the BDF identifier of the second endpoint device 116, and the RCEP 106 acquires the BDF identifier of the second endpoint device 116; according to the BDF of the second endpoint device 116.
  • Identifying a mapping relationship with each of the second memory addresses obtaining a second memory address of the configuration space access of the second endpoint device 116, a second memory address accessed by the MSI, a second memory address accessed by the MMIO, and a DMA access
  • the second memory address is obtained according to the mapping relationship between each second memory address of the second endpoint device 116 and each of the first memory addresses, and the configuration space access of the second endpoint device 116 is obtained.
  • a memory address a first memory address accessed by the MSI, a first memory address accessed by the MMIO, and a first DMA access a memory address; a correspondence between each first memory address of the second endpoint device 116 and a state of the second endpoint device 116 is recorded in the device status record, for example, each of the second endpoint devices 116 The first memory address is marked as faulty.
  • each first memory address of the second endpoint device 114 is recorded in the device state record, for example, Each of the first memory addresses of the second endpoint device 116 is marked as a fault.
  • the access request is sent to the RCEP 106 by using an address route, and the RCEP acquires the second endpoint carried by the access request.
  • the first memory address of the DMA access of device 116 is sent to the RCEP 106 by using an address route, and the RCEP acquires the second endpoint carried by the access request.
  • the state of the second endpoint device recorded in the device state record is a fault state, according to Querying a relationship between a first memory address of the DMA access of the second endpoint device 116 and a state of the second endpoint device 116 recorded in the device status record, and determining the determined state of the second endpoint device 116 as a fault state; when the second endpoint device 116 fails, if the access request has crossed the boundary of the RCEP, then the state of the second endpoint device recorded in the device state record is non-fault a status, querying a correspondence between a first memory address of the DMA access of the second endpoint device 116 and a state of the second endpoint device recorded in the device status record, and determining the determined second endpoint device 116 Status is Non-faulty state.
  • step 305 If the state of the second endpoint device 116 is a fault state, discard the access request to block the first endpoint device 108 from accessing the second endpoint device 116, and then perform step 306.
  • the RCEP After the RCEP receives the access request, the RCEP determines that the accessed second endpoint device 116 is in a fault state, and discards the access request to block the first endpoint device 108. Access to the second endpoint device 116 avoids the spread of faults to the primary domain.
  • the access request is a non-post type access request, and the simulated response message is generated for the access request, and the generated analog response message is returned to the first endpoint device 108 to avoid the primary domain.
  • the CPU generates a return message timeout error that causes the computer system to reboot.
  • the RCEP 116 monitors the status of all the second endpoint devices in the extended domain, and establishes a device state record according to the states of all the second endpoint devices in the extended domain, where
  • the RCEP receives the access request and according to the first memory accessed by the DMA in the access request An address, querying the device status record, determining a status of the second endpoint device, and if the second endpoint device 116 fails, the access request has not crossed the boundary of the RCEP 106, then step 304 determines The state of the second endpoint device 116 is a fault state, and the RCEP discards the access request to prevent the first endpoint device 108 from accessing the second endpoint device 116, thereby preventing the fault from spreading to the location.
  • the primary domain, and the RCEP may also return an analog response message of the access request to the first endpoint device 108,
  • step 304 will determine that the state of the second endpoint device 116 is a non-faulty state, and the RCEP 106 sends the access request to the second endpoint device according to a normal workflow. 116.
  • the failed second endpoint device 116 may be triggered by the access request, and send an error message to the RCEP 106, where the failed second endpoint device 116 may also actively send an error message to the RCEP 106.
  • the RCEP sends an error message to report a failure, the RCEP receiving an error message from the second endpoint device, monitoring a status of the second endpoint device, if the type of the error message is an uncorrectable error message (Uncorrectable error) type, determining that the state of the second endpoint device is a fault state, the RCEP may further query the device state record, determine whether the error message belongs to a repeated sending error message, if it belongs to a repeated transmission The error message, the repeated sent error message is discarded, and the CPU is prevented from making unnecessary repeated errors. Message processing ensures the reliability of the system.
  • Uncorrectable error Uncorrectable error
  • the fault isolation method provided by the embodiment of the present invention is as shown in FIG. 6.
  • the second endpoint device 116 of the extended domain is a faulty device, and the access mode of the MMIO is used.
  • the endpoint device 108 sends a Non-post type access request to access the primary CPU 101 of the primary domain, the access request is first routed to the RCEP 106, and the access is caused when the second endpoint device fails.
  • the request may have crossed the boundary of the RCEP 106, ie may have been forwarded by the RCEP 106, and may not have crossed the boundary of the RCEP 106, ie, has not been forwarded by the RCEP 106, the method may specifically The method includes: 401: The RCEP 106 monitors states of all second endpoint devices of the extended domain.
  • the state of the device includes a fault state and a non-fault state
  • the RCEP 106 monitors the states of the second endpoint device 114 and the second endpoint device 116 of the extended domain, and specifically includes: receiving the second endpoint An error message sent by the device 114 or 116, or receiving a device probe response message indicating whether the second endpoint device 114 or 116 is present; determining the second endpoint according to the error message or the device probe response message The status of device 114 or 116. 402: Establish a device status record according to a state of all the second endpoint devices of the extended domain, where the device state record includes a correspondence between a first memory address of the second endpoint device of the extended domain and a state of the device.
  • the error message sent by the second endpoint device 116 includes the BDF identifier of the second endpoint device 116, and the RCEP 106 acquires the BDF identifier of the second endpoint device 116; according to the BDF of the second endpoint device 116.
  • Identifying a mapping relationship with the second memory address acquiring a second memory address of the configuration space access of the second endpoint device 116, a second memory address accessed by the MSI, a second memory address accessed by the MMIO, and a DMA access
  • the second memory address, the first memory address and the MSI access of the configuration space access of the second endpoint device 116 are obtained according to the saved mapping relationship between the second memory address of the second endpoint device 116 and the first memory address.
  • the corresponding relationship between each first memory address of the second endpoint device 114 and the state of the second endpoint device 114 is recorded in the device state record, and Corresponding relationship between the BDF identifier of the second endpoint device 114 and the state of the second endpoint device 114.
  • the access request is sent to the RCEP 106 by using an address route, and the RCEP acquires the second endpoint device 116 carried by the access request.
  • the second memory address accessed by MMIO.
  • the second endpoint device 116 fails, if the access request does not cross the boundary of the RCEP, according to querying the DMA access of the second endpoint device 116 recorded in the device state record. a relationship between the first memory address and the state of the second endpoint device 116, the determined state of the second endpoint device 116 is a fault state; when the second endpoint device 116 fails, if the access request Having passed the boundary of the RCEP, the relationship between the first memory address of the DMA access of the second endpoint device 116 and the state of the second endpoint device 116 recorded in the device status record is queried at this time, and will be determined.
  • the state of the second endpoint device 116 is a non-fault state.
  • the RCEP determines the accessed second endpoint device 116 after receiving the access request.
  • the state is a fault state, and the access request is discarded to prevent the second endpoint device 116 from accessing the primary CPU 101, thereby preventing the fault from spreading to the primary domain.
  • the RCEP 116 monitors the status of all the second endpoint devices in the extended domain, and establishes a device state record according to the state of all the second endpoint devices in the extended domain, where the primary domain
  • the RCEP receives the access request, and obtains the first according to the second memory address accessed by the MMIO in the access request.
  • the step 404 determines that the state of the second endpoint device 116 is a fault state, and the RCEP discards the access request to prevent the second endpoint device 116 from accessing the primary CPU 101, thereby avoiding The fault spreads to the primary domain.
  • step 404 determines that the state of the second endpoint device 116 is a non-faulty state, the RCEP 106 Sending the access request to the main CPU 101 according to a normal workflow, after receiving the access request, the main CPU 101 returns a response message for the access request, and the returned response message first Reaching the RCEP, since the second endpoint device has failed, it is meaningless to send the returned response message to the failed second endpoint device 116, and may trigger the second fault.
  • the endpoint device 116 repeatedly sends an error message, so the RCEP can discard the returned response message.
  • the faulty second endpoint device 116 may actively send an error message to the RCEP to report the fault, and the RCEP receives an error message from the second endpoint device to monitor the state of the second endpoint device. If the type of the error message is an uncorrectable error (Uncorrectable error) type, determining that the state of the second endpoint device is a fault state, the RCEP may further query the device state record to determine the error. Whether the message belongs to an error message that is repeatedly sent. If it is an error message that is repeatedly sent, the repeated sent error message is discarded, thereby preventing the spread of the fault.
  • Uncorrectable error Uncorrectable error
  • the embodiment of the invention provides a fault isolation device, which is used to prevent mutual access between the primary domain and the extended domain endpoint device when the endpoint device of the extended domain fails, and prevent the fault from spreading to the primary domain.
  • a composition diagram of a fault isolation apparatus is used for a PCIe interconnected computer system, where the computer system includes a primary domain and an extended domain, and the primary domain is composed of a root complex and a An endpoint device is formed with the RCEP, the extended domain is formed by the RCEP and the second endpoint device, and the device includes:
  • the monitoring unit 701 is configured to monitor a status of the second endpoint device of the extended domain.
  • the recording unit 702 is configured to establish a device status record according to the status of the second endpoint device, where the device status record includes a correspondence between the identifier information of the second endpoint device and the state of the second endpoint device.
  • the receiving unit 703 is configured to receive an access request, where the access request includes an access request of the second endpoint device to the primary domain or an access request of the primary domain to the second endpoint device.
  • the determining unit 704 is configured to query the device status record according to the identifier information of the second endpoint device in the access request, and determine a status of the second endpoint device.
  • the processing unit 705 is configured to discard the access request to prevent communication between the second endpoint device and the primary domain when the state of the second endpoint device is a fault state.
  • the status of the second endpoint device includes a fault state and a non-fault state
  • the monitoring unit 701 may receive an error message sent by the second endpoint device or receive an indication of whether the second endpoint device exists.
  • a device probe response message determining a status of the second endpoint device according to the error message or the device probe response message, where the recording unit 702 is established according to a state of the second endpoint device in the monitoring unit 701 a device status record, where the device status record includes a correspondence between the identifier information of the second endpoint device and a state of the second endpoint device, and the receiving unit 703 receives the second endpoint device and the primary domain
  • the determining unit 704 queries the device status record according to the identifier information of the second endpoint device in the access request, and determines the status of the second endpoint device, where the processing unit 705
  • the determining unit 704 determines that the state of the second endpoint device is a fault state
  • the request is accessed to prevent communication between the failed second endpoint device
  • a composition diagram of a fault isolation apparatus is used for a PCIe interconnected computer system, where the computer system includes a primary domain and an extended domain, and the primary domain is a root complex, An endpoint device is formed with an RCEP, the extended domain is formed by the RCEP and a second endpoint device, and the second endpoint device is connected to the root complex in the primary domain by the RCEP or
  • the first endpoint device performs communication interaction, and the device may include: a monitoring unit 801, a recording unit 802, a receiving unit 803, a determining unit 804, and a processing unit 805, where the fault isolation device may be the RCEP.
  • the monitoring unit 801 is configured to monitor a state of the second endpoint device of the extended domain, where the state of the second endpoint device includes a fault state and a non-fault state, where the fault state indicates that the second endpoint device is faulty, and cannot be performed.
  • the non-failure state indicates that the second endpoint device of the extended domain can work normally
  • the monitoring unit 801 monitoring the state of the second endpoint device of the extended domain includes: receiving the sending by the second endpoint device An error message, or receiving a device probe response message indicating whether the second endpoint device exists, determining, according to the error message or the device probe response message, a status of the second endpoint device, specifically, the The monitoring unit 801 may send a device probe message to a configuration space register of the second endpoint device, and obtain a device probe response message returned by the second endpoint device, if the device probe response message indicates that the second endpoint device does not exist , indicating that the second endpoint device is faulty and cannot be detected.
  • the monitoring unit 801 receives an error message from the second endpoint device, according to the Determining an error message, determining a type of the error message, if the type of the error message is an uncorrectable error (Uncorrectable error) type, determining that the state of the second endpoint device is a fault state, otherwise determining The state of the second endpoint device is a non-fault state.
  • Uncorrectable error Uncorrectable error
  • a recording unit 802 configured to establish a device status record according to a state of the second endpoint device, where the device status record includes a correspondence between a first memory address of the second endpoint device and a state of the second endpoint device,
  • the first memory address is a memory address of the second endpoint device in the primary domain, and is used to represent the second endpoint device in a primary domain.
  • the recording unit 802 specifically includes: an address translation module sub-unit 802a and a status recording sub-unit 802b, where the address conversion module sub-unit 802a is configured to acquire the second endpoint device carried in the error message or the device probe response message.
  • BDF identifier or second memory address, root Obtaining, according to the BDF identifier or the second memory address, a first memory address of the second endpoint device;
  • the address translation module sub-unit 802a stores a second memory address of the second endpoint device and the first memory address a mapping relationship between the BDF identifier of the second endpoint device and the second memory address, according to the mapping relationship between the saved second endpoint address of the second endpoint device and the first memory address
  • the second memory address is converted into the first memory address of the second endpoint device, or the second endpoint device is obtained according to the mapping relationship between the saved BDF identifier of the second endpoint device and the second memory address.
  • the second memory address is converted into the first memory address of the second endpoint device according to the mapping relationship between the saved second memory address of the second endpoint device and the first memory address;
  • the status recording sub-unit 802b is configured to record, in the device status record, a first memory address of the second endpoint device and a status of the second endpoint device Correspondingly, the RCEP may determine the state of the second endpoint device according to the first memory address of the second endpoint device, where the second memory address of the second endpoint device is the second endpoint The memory address of the device in the extended domain for representing the second endpoint device in the extended domain.
  • the status recording sub-unit 802b may be further configured to record, in the device status record, a correspondence between a first memory address of the second endpoint device and a status of the second endpoint device, or the second Corresponding relationship between the BDF identifier of the endpoint device and the state of the second endpoint device, so that the determining unit 803 may further determine, according to the first memory address or the BDF identifier of the second endpoint device, the second endpoint device status.
  • the first memory address of the second endpoint device includes a first memory address accessed by the configuration space, a first memory address accessed by the MSI, a first memory address accessed by the MMIO, and a first memory address accessed by the DMA.
  • the second memory address of the second endpoint device includes a second memory address of the configuration space access, a second memory address accessed by the MSI, a second memory address accessed by the MMIO, and a second memory address accessed by the DMA, the address conversion module
  • the unit 802a is specifically configured to save a mapping relationship between each second memory address of the second endpoint device and each first memory address, and a BDF identifier of the second endpoint device and each second memory address.
  • the receiving unit 803 is configured to receive an access request, where the access request includes an access request of the second endpoint device to the primary domain or an access request of the primary domain to the second endpoint device, when the access When the request is from the primary domain, the access request carries a first memory address of the second endpoint device, and when the access request is from the extended domain, the access request carries the second endpoint device A memory address or a BDF identifier of the second endpoint device.
  • the determining unit 804 is configured to query the device status record according to the identifier information of the second endpoint device in the access request, and determine a status of the second endpoint device, where the second endpoint device
  • the identification information includes one or a combination of the following information: a first memory address of the second endpoint device, a second memory address of the second endpoint, a BDF identifier of the second endpoint device, specifically, when When the access request is from the primary domain, the determining unit 804 queries the second endpoint device recorded in the device state record according to the first memory address of the second endpoint device in the access request.
  • determining the state of the second endpoint device for example, when the access mode of the MMIO is adopted, the access request carries the first memory address accessed by the MMIO of the second endpoint device, and the device state record Recording a correspondence between each first memory address of the second endpoint device and a state of the second endpoint device, where the determining unit 804 can utilize the MMIO access of the second endpoint device in the access request a first memory address, querying the device status record, determining a status of the second endpoint device; when the access request is from the extended domain, the determining unit 804 is configured according to the second one of the access requests Querying the device status record of the second memory address or the BDF identifier of the endpoint device, if the second memory address of the second endpoint device or the BDF identifier and the status of the second endpoint device are not recorded in the device status record Corresponding relationship, obtaining the first memory address of the second endpoint device according to
  • the processing unit 805 is configured to discard the access request to block communication between the second endpoint device and the primary domain when the determining unit 804 determines that the state of the second endpoint device is a fault state.
  • the processing unit 805 is further configured to send a fault isolation message to the CPU when the determining unit 804 determines that the state of the second endpoint device is a fault state, so that the CPU in the primary domain stops the extended domain.
  • the access of the second endpoint device for example, the driver of the second endpoint device that can be uninstalled, or the I/O path of the second endpoint device that accesses the fault,
  • the fault isolation message carries a first memory address of the second endpoint device.
  • the processing unit 805 is further configured to: when the access request is sent, the access request sent by the primary domain is used to return an analog response of the access request to the primary domain, specifically, when When the access request of the primary domain to access the second endpoint device is a non-post type access request, a response message needs to be returned for the access request, otherwise the primary domain may generate a return packet timeout error and cause the computer to be
  • the system restarts, but after the second endpoint device fails, the access request may not reach the second endpoint device, or although the second endpoint device is reached, the second endpoint device cannot generate a normal fault due to a fault.
  • the processing unit 805 may generate an analog response message for the access request, and return it to the primary domain to avoid generating a return packet timeout error, causing the computer system to restart.
  • the analog response message may be It can be an Unsupported Request (UR) message or a Completion Abort (CA) message.
  • UR Unsupported Request
  • CA Completion Abort
  • the monitoring unit 801 determines that the state of the second endpoint device is a fault state by using the received error message
  • the monitoring unit 801 is further configured to determine whether the error message belongs to an error message that is repeatedly sent, if the error is a duplicate transmission.
  • the message indicates that the second endpoint device has sent an error message to the primary domain for corresponding error processing, and then discards the error message to prevent the CPU from performing unnecessary repeated error message processing, thereby ensuring system reliability.
  • An error message that is not sent repeatedly indicating that the error message is an error message sent by the second endpoint device for the first time, and the RCEP sends the error message to the CPU, so that the CPU performs the second endpoint device.
  • the determining whether the error message is an error message sent repeatedly includes:
  • the monitoring unit 801 monitors the status of the second endpoint device in the extended domain, and the recording unit 802 establishes the device according to the state of the second endpoint device determined by the monitoring unit 801.
  • a status record where the device status record includes a correspondence between a first memory address of the second endpoint device and a state of the second endpoint device
  • the determining unit 804 receives the first Obtaining, by the second endpoint device, the first memory address of the second endpoint device in the access request, or according to the BDF of the second endpoint device in the access request And identifying, by the identifier or the second memory address, the first memory address of the second endpoint device, and querying the correspondence between the first memory address of the second endpoint device and the state of the second endpoint device in the device state record a relationship determining the state of the second endpoint device, the determining unit 804 determining, at the determining unit 804, that the state of the second endpoint device is In the barrier state, discarding the access request to prevent communication
  • the recording unit 802 may further record, in the device status record, a correspondence between a BDF identifier or a second memory address of the second endpoint device and a state of the second endpoint device. Having the determining unit 804 directly querying the device status record according to the BDF identifier or the second memory address or the first memory address of the second endpoint device in the access request, and determining the second endpoint The state of the device avoids converting the second memory address or the BDF identifier of the second endpoint device to the first memory address of the second endpoint device, and speeds up the process of determining the state of the second endpoint device.
  • the monitoring unit 801 is in the shape of the second endpoint device.
  • the second endpoint may be further determined. Whether the error message sent by the device belongs to the error message repeatedly sent. If it belongs to the error message sent repeatedly, the error message is discarded to prevent the error message from reaching the primary domain, and the CPU is prevented from performing unnecessary repeated error message processing. The reliability of the system is guaranteed.
  • FIG. 9 is a fault isolation system 900 according to an embodiment of the present invention.
  • the system 900 includes a PCIe primary domain 910 and an extended domain 920.
  • the primary domain includes a root complex 911, a first endpoint device 912, and a root complex endpoint.
  • the device 913, the extension domain 920 includes the root complex endpoint device 913 and the second endpoint device 921, and the root complex endpoint device 913 is configured to monitor the state of the second endpoint device 921 of the extended domain, according to the The state of the second endpoint device 921 establishes a device state record, where the device state record includes a correspondence between the identifier information of the second endpoint device 921 and the state of the second endpoint device 921, and receives an access request, the access The request includes an access request of the second endpoint device 921 to the primary domain 910 or an access request of the primary domain 910 to the second endpoint device 921, according to the second endpoint device 921 in the access request.
  • Identification information querying the device status record, determining a status of the second endpoint device 921, and if the status of the second endpoint device 921 is a fault status, Discard the access request to block the communication between the main domain 921 and the second endpoint device 910.
  • the state of the second endpoint device of the extended domain is monitored, and the device state record is established according to the state of the second endpoint device, and the second endpoint device and the host are received.
  • the device status record is queried according to the identifier information of the second endpoint device in the access request, and the state of the second endpoint device is determined, if the second endpoint device is The state is a fault state, and the access request is discarded, thereby preventing communication between the faulty second endpoint device and the primary domain, preventing the fault from spreading to the primary domain, and ensuring system reliability.
  • FIG. 10 is a schematic structural diagram of a fault isolation device according to an embodiment of the present invention.
  • a fault isolation device provided by an embodiment of the present invention is used for a PCIe interconnected computer system, and the computer system package a primary domain and an extended domain, where the primary domain is formed by a root complex, a first endpoint device, and a root complex endpoint device, where the extended domain is formed by the root complex endpoint device and the second endpoint device;
  • the fault isolation device can include:
  • the processor 1001, the memory 1002, and the communication interface 1005 are connected by the system bus 1004 and complete communication with each other.
  • Processor 1001 may be a single core or multi-core central processing unit, or a particular integrated circuit, or one or more integrated circuits configured to implement embodiments of the present invention.
  • the memory 1002 may be a high speed RAM memory or a non-volatile memory such as at least one disk memory.
  • Memory 1002 is for computer execution of instruction 1003. Specifically, the program code may be included in the computer execution instruction 1003.
  • the processor 1001 runs the computer execution instruction 1003, and may perform the following method:
  • the device status record includes a correspondence between the identifier information of the second endpoint device and a state of the second endpoint device; receiving an access request, the access request Include the access request of the second endpoint device to the primary domain or the access request of the primary domain to the second endpoint device;
  • the access request is discarded to block communication between the second endpoint device and the primary domain.
  • the method specifically includes the following:
  • the access request includes an access request of the second endpoint device to the primary domain or an access request of the primary domain to the second endpoint device;
  • the state of the second endpoint device is a fault state, discarding the access request to block communication between the second endpoint device and the primary domain;
  • the fault isolation message carries the first memory address of the second endpoint device; if the access request is an access request sent by the primary domain, the analog response packet of the access request is returned to the primary domain.
  • aspects of the present invention, or possible implementations of various aspects can be embodied as a system, method, or computer program product.
  • aspects of the invention, or possible implementations of various aspects may be in the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, etc.), or a combination of software and hardware aspects, They are collectively referred to herein as "circuits," “modules,” or “systems.”
  • aspects of the invention, or possible implementations of various aspects may take the form of a computer program product, which is a computer readable program code stored in a computer readable medium.
  • the computer readable medium can be a computer readable signal medium or a computer readable storage medium.
  • the computer readable storage medium includes, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any suitable combination of the foregoing, such as random access memory (RAM), read only memory (ROM), Erase programmable read-only memory (EPROM or flash memory), optical fiber, portable read-only memory (CD-ROM).
  • a processor in a computer reads a computer readable program stored in a computer readable medium
  • the code enables the processor to perform the functional actions specified in each step, or combination of steps, in the flowchart; generating means for implementing the functional actions specified in each block of the block diagram, or in a combination of blocks.
  • the computer readable program code can execute entirely on the user's computer, partly on the user's computer, as a separate software package, partly on the user's computer and partly on the remote computer, or entirely on the remote computer or server.
  • the functions noted in the various steps of the flowchart, or in the blocks in the block diagrams may not occur in the order noted. For example, two steps, or two blocks shown in succession may be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本发明实施例涉及一种故障隔离的方法、计算机系统和装置,能够对所述扩展域的第二端点设备的状态进行监控,并根据所述第二端点设备的状态建立设备状态记录,在接收到所述第二端点设备与所述主域之间的访问请求后,根据所述访问请求中的所述第二端点设备的标识信息,查询所述设备状态记录,确定所述第二端点设备的状态,若所述第二端点设备的状态为故障状态,丢弃所述访问请求,从而阻止所述故障的第二端点设备与所述主域之间的通信,防止故障扩散到主域,保障了系统的可靠性。

Description

一种故障处理的方法、 计算机系统和装置
技术领域
本发明实施例涉及计算机技术, 特别是一种故障处理的方法、 计算机系 统和装置。
背景技术
高速夕卜围组件互联 ( Peripheral Component Interconnect Express, PCIe ) 总线技术是用来互联 CPU和外围设备的高性能总线技术。 PCIe作为新一代的 总线和接口标准, 采用了串行互联方式, 以点对点的形式进行数据传输, 大 大提高了传输速率, 而且也为更高的频率提升创造了条件, 广泛使用于工业 服务器、 PC、 嵌入式计算 /通信和工作站等, 逐步取代了 PCI, AGP等总线。 目前, PCIe设备故障占据了系统所有故障的很大的一部分。实时地监控系统, 识别出错误的发生, 对系统故障进行检测和处理, 可以有效避免系统运行的 全面中断, 是保证系统持续可用的一项 RAS ( Reliability , Availability , Serviceability )特性。
现有技术中, PCIe设备发生故障时可能产生错误报文, 该错误报文从故 障设备路由到根复合体, 根复合体在获取到错误报文后将产生系统中断, 向 操作系统报告所述错误报文, 操作系统根据所述错误报文进行错误处理。 现 有技术中, 从故障设备产生错误报文到操作系统处理该错误报文, 存在一个 时间窗口, 在这个时间窗口内, CPU或者其它 PCIe端点设备与该故障设备仍 然能够继续相互访问, 无法有效隔离该故障设备, 可能导致故障的扩散, 影 响了系统的可靠性。
发明内容
本发明实施例提出了一种故障处理的方法、 计算机系统和装置, 能够隔 离故障设备, 防止故障的扩散, 提高系统的可靠性。
第一方面, 本发明实施例提出了一种故障隔离的方法, 用于 PCIe互连 的计算机系统, 该计算机系统包括主域和扩展域, 所述主域包括根复合体、 第一端点设备与根复合体端点设备,所述扩展域包括所述根复合体端点设备 与第二端点设备, 所述方法包括:
监控所述扩展域的第二端点设备的状态;
根据所述第二端点设备的状态建立设备状态记录, 所述设备状态记录包 括所述第二端点设备的标识信息与所述第二端点设备的状态的对应关系; 接收访问请求, 所述访问请求包括所述第二端点设备对所述主域的访问 请求或者所述主域对所述第二端点设备的访问请求;
根据所述访问请求中的所述第二端点设备的标识信息, 查询所述设备状 态记录, 确定所述第二端点设备的状态;
若所述第二端点设备的状态为故障状态,丢弃所述访问请求以阻止所述 第二端点设备与所述主域之间的通信。
结合第一方面, 在第一种可能的实现方式中, 所述监控所述扩展域的第 二端点设备的状态, 包括: 接收所述第二端点设备发送的错误消息, 或者接 收用于指示所述第二端点设备是否存在的设备探测响应消息; 根据所述错误 消息或所述设备探测响应消息, 确定所述第二端点设备的状态。
结合第一方面的第一种可能的实现方式, 在第二种可能的实现方式中, 所述第二端点设备的标识信息包括所述第二端点设备的第一内存地址, 所述 第一内存地址为所述第二端点设备在所述主域的内存地址; 所述根据所述第 二端点设备的状态建立设备状态记录, 所述设备状态记录包括所述第二端点 设备的标识信息与所述第二端点设备的状态的对应关系, 包括: 获取所述错 误消息或者设备探测响应消息中携带的所述第二端点设备的总线 /设备 /功能 BDF标识或第二内存地址, 其中, 所述第二内存地址为所述第二端点设备在 所述扩展域的内存地址; 根据所述 BDF标识或第二内存地址, 获取所述第 二端点设备的第一内存地址; 在所述设备状态记录中记录所述第二端点设备 的第一内存地址与所述第二端点设备的状态的对应关系。
结合第一方面的第二种可能的实现方式, 在第三种可能的实现方式中, 所述根据所述 BDF标识或第二内存地址, 获取所述第二端点设备的第一内 存地址, 包括: 根据保存的所述第二端点设备的第二内存地址与第一内存地 址之间的映射关系,将所述第二内存地址转换为所述第二端点设备的第一内 存地址; 或者, 根据保存的所述第二端点设备的 BDF标识与第二内存地址 之间的映射关系, 获取所述第二端点设备的第二内存地址, 根据保存的所述 第二端点设备的第二内存地址与第一内存地址之间的映射关系 ,将所述第二 内存地址转换为所述第二端点设备的第一内存地址。
结合第一方面的第二种可能的实现方式或第三种可能的实现方式,在第 四种可能的实现方式中, 所述第二端点设备的第一内存地址包括配置空间访 问的第一内存地址、 消息信号中断访问的第一内存地址、 内存映射输入输出 访问的第一内存地址和直接内存存取访问的第一内存地址; 则, 所述在所述 设备状态记录中记录所述第二端点设备的第一内存地址与所述第二端点设 备的状态的对应关系包括: 在所述设备状态记录中记录所述第二端点设备的 每种第一内存地址与所述第二端点设备的状态的对应关系。
第二方面, 本发明实施例提出了一种故障隔离的装置, 用于 PCIe互连 的计算机系统, 该计算机系统包括主域和扩展域, 所述主域包括根复合体、 第一端点设备与根复合体端点设备,所述扩展域包括所述根复合体端点设备 与第二端点设备, 所述装置包括:
监控单元, 用于监控所述扩展域的第二端点设备的状态;
记录单元, 用于根据所述第二端点设备的状态建立设备状态记录, 所述 设备状态记录包括所述第二端点设备的标识信息与所述第二端点设备的状 态的对应关系;
接收单元, 用于接收访问请求, 所述访问请求包括所述第二端点设备对 所述主域的访问请求或者所述主域对所述第二端点设备的访问请求;
确定单元, 用于根据所述访问请求中的所述第二端点设备的标识信息, 查询所述设备状态记录, 确定所述第二端点设备的状态; 处理单元, 在所述第二端点设备的状态为故障状态时, 用于丢弃所述访 问请求以阻止所述第二端点设备与所述主域之间的通信。
结合第二方面, 在第一种可能的实现方式中, 所述监控单元具体用于: 接收所述第二端点设备发送的错误消息, 或者接收用于指示所述第二端点设 备是否存在的设备探测响应消息,根据所述错误消息或所述设备探测响应消 息, 确定所述第二端点设备的状态。
结合第二方面的第一种可能的实现方式, 在第二种可能的实现方式中, 所述第二端点设备的标识信息包括所述第二端点设备的第一内存地址, 所述 第一内存地址为所述第二端点设备在所述主域的内存地址;
所述记录单元具体包括:
地址转换模块子单元, 用于在所述监控单元确定所述第二端点设备的故 障状态时, 获取所述错误消息或者所述设备探测响应消息中携带的所述第二 端点设备的 BDF标识或第二内存地址,根据所述 BDF标识或第二内存地址, 获取所述第二端点设备的第一内存地址, 其中, 所述第二内存地址为所述第 二端点设备在所述扩展域的内存地址;
状态记录子单元, 用于在所述设备状态记录中记录所述第二端点设备的 第一内存地址与所述第二端点设备的状态的对应关系。
结合第二方面的第二种可能的实现方式, 在第三种可能的实现方式中, 所述地址转换模块子单元还用于保存所述第二端点设备的第二内存地址与 第一内存地址之间的映射关系, 以及保存所述第二端点设备的 BDF标识与 第二内存地址之间的映射关系; 所述地址转换模块子单元具体用于根据保存 的所述第二端点设备的第二内存地址与第一内存地址之间的映射关系,将所 述第二内存地址转换为所述第二端点设备的第一内存地址; 或者, 根据保存 的所述第二端点设备的 BDF标识与第二内存地址之间的映射关系, 获取所 述第二端点设备的第二内存地址,根据保存的所述第二端点设备的第二内存 地址与第一内存地址之间的映射关系,将所述第二内存地址转换为所述第二 端点设备的第一内存地址。
结合第二方面的第二种可能的实现方式或第三种可能的实现方式,在第 四种可能的实现方式中, 所述第二端点设备的第一内存地址包括配置空间访 问的第一内存地址、 消息信号中断访问的第一内存地址、 内存映射输入输出 访问的第一内存地址和 DMA访问的第一内存地址; 所述地址转换模块子单 元具体用于记录所述第二端点设备的每种第一内存地址与所述第二端点设 备的对应关系。
在本发明实施例中, 由于能够对扩展域的第二端点设备的状态进行监 控, 并根据所述第二端点设备的状态建立设备状态记录, 在接收到所述第二 端点设备与所述主域之间的访问请求后,根据所述访问请求中的所述第二端 点设备的标识信息,查询所述设备状态记录,确定所述第二端点设备的状态, 若所述第二端点设备的状态为故障状态, 丢弃所述访问请求, 从而阻止所述 故障的第二端点设备与所述主域之间的通信, 防止故障扩散到主域, 保障了 系统的可靠性。
附图说明
为了更清楚地说明本发明实施例的技术方案, 下面将对现有技术或实施 例中所需要使用的附图作筒单地介绍, 显而易见地, 下面描述中的附图仅仅 是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳 动的前提下, 还可以根据这些附图获得其它的附图。
图 1为本发明实施例提供的一种包含 PCIe主域和扩展域的计算机系统 的系统框图;
图 2 ( a )为图 1所示的计算机系统的主域和扩展域的端点设备的内存地 址分配图;
图 2 ( b )为图 1所示的计算机系统的主域的内存地址与扩展域的内存地 址的映射关系;
图 3为本发明实施例一提供的故障隔离方法流程图; 图 4为本发明实施例二提供的故障隔离方法流程图;
图 5为本发明实施例三提供的故障隔离方法流程图;
图 6为本发明实施例四提供的故障隔离方法流程图;
图 7为本发明实施提供的一种故障隔离装置的组成图;
图 8为本发明实施提供的一种故障隔离装置的组成图;
图 9为本发明实施提供的一种故障隔离系统的组成图;
图 10为本发明实施提供的一种故障隔离装置的组成图。 具体实施方式
本发明实施例提出了一种故障隔离的方法、 计算机系统和装置, 用于包 含 PCIe主域和扩展域的计算机系统, 所述扩展域的根复合体端点设备为所 述主域的端点设备, 当所述扩展域的端点设备发生故障时, 本发明实施例能 够阻止所述主域与所述扩展域的端点设备之间的相互访问,避免了故障的扩 散, 保障了系统的可用性。
如图 1所示为一种包含 PCIe主域和扩展域的计算机系统的系统框图, 所述计算机系统包括主域 100和扩展域 118, 该计算机系统通过根复合体端 点设备 ( Root Complex Endpoint, RCEP ) 106进行 PCIe域的扩展, 所述主域 100包含根复合体 ( Root Complex, RC ) 102、 交换器(Switch ) 104和至少 一个 PCIe端点设备 107, 所述根复合体 102通过根端口 103与交换器 104 的上游端口 104A相连, 所述交换器的 104的下游端口 104B与 PCIe端点设 备 107相连, 使得所述根复合体 102通过所述交换器 104与所述 PCIe端点 设备 107相连, 所述根复合体 102可以集成在主 CPU 101上, 上图中所述主 域 100以一个交换器为例, 在其它的实施例中, 所述主域 100还可以包括多 个交换器, 每个交换器可以与一个或多个 PCIe端点设备相连。
所述根复合体 102用于处理和转发主 CPU 101与 PCIe端点设备 107之 间的请求, 所述交换器 104用于将请求向下游路由给与下游端口 104B相连 接的 PCIe端点设备, 以及从每个独立的下游端口将请求向上游路由至单一 的根复合体, 还可以用于从一个下游端口向另一个下游端口路由请求, 所述 PCIe端点设备 107具有发起请求和完成 PCIe事物处理的功能, 所述 PCIe 端点设备 107可以是存储设备、 网卡、 声卡等。
所述主域 100中的 PCIe端点设备 107中包括所述 RCEP 106, 该 RCEP
106不仅能够发起请求和完成 PCIe事物处理,而且由于其增加了与根复合体 相同功能的硬件模块和设备驱动, 可以实现扩展域 118与主域 100之间的连 接, 管理和转发扩展域 118与主域 100之间的请求。 如图 1所示, 所述扩展 域 118包括: 作为扩展域根复合体的 RCEP 106、 交换器 112和第二端点设 备 114和 116 (所述第二端点设备可以为多个, 也可以为一个, 本发明实施 例以两个第二端点设备为例), 通过交换器 112, 第二端点设备 114和 116分 别与 RCEP 106相连, 所述第二端点设备 114和 116可以是存储设备、 网卡、 声卡等, 所述扩展域 118还可以有多个根端口和多个交换器, 每个交换器下 可以连接多个端点设备。
图 2为所述图 1所示的计算机系统中主域和扩展域的端点设备的内存地 址分配图, 主 CPU 101的 64位物理地址 202 (具体可以是内存映射输入输 Λ Memory Mapped Input/Output, MMIO地址)可以被划分为所述主域的内 存地址 203和所述扩展域的内存地址 204。
在图 1所述的计算机系统中, 系统加载所述主域的端点设备的驱动时, 为每个主域的端点设备分配内存地址,例如为 RCEP 106和第一端点设备 108 分配内存地址, 如图 2 ( a )所示, MMIO地址 202的一部分被分配给所述 RCEP 106, 即所述 RCEP 106的内存地址 205 , MMIO地址 202的另一部分 被分配给所述第一端点设备 108, 即所述第一端点设备 108的内存地址 210, 由于所述 RCEP 106和所述第一端点设备 108都是所述主域的端点设备, 因 此所述内存地址 205和内存地址 210共同组成了主域的内存地址 203。
在系统加载所述 RCEP 106的驱动时, 所述系统检测到所述扩展域的所 有第二端点设备 114和 116的驱动, 触发对整个扩展域的所有第二端点设备 进行扫描, 为所述扩展域的每一个第二端点设备分配内存地址, 具体地, 所 述 MMIO地址 202中的一部分被分配给所述扩展域的第二端点设备, 即为 所述第二端点设备分配第二内存地址(所述第二内存地址为所述第二端点设 备在所述扩展域的内存地址, 用于在扩展域中代表所述第二端点设备), 例 如图 2 ( a )所示的第二端点设备 114和 116的第二内存地址 206和 207, 其 中, 有多少个第二端点设备就有多少个第二端点设备的第二内存地址, 本实 施例中, 以两个第二端点设备为例, 则分配出第二端点设备的第二内存地址 206和 207。 另外, 除了为所述第二端点设备分配第二内存地址之外, 还需 要为所述第二端点设备分配第一内存地址(所述第一内存地址为所述第二端 点设备在所述主域的内存地址, 用于在主域中代表所述第二端点设备)。 具 体地, 所述第二端点设备的第一内存地址主要为所述 RCEP的内存地址的全 部或者一部分,根据所述扩展域的第二端点设备的个数将所述 RCEP的内存 地址 205进行分割, 若所述第二端点设备为 n个, 则所述 RCEP的内存地址 205被分割成 n个部分, 每一部分内存地址对应于一个第二端点设备, 如图 2 ( a )所示的所述 RCEP的内存地址 205被划分为 208和 209两部分, 其中 208与第二端点设备 114的内存地址 206对应, 为所述第二端点设备 114的 第一内存地址, 209与第二端点设备 116的内存地址 207对应, 为所述第二 端点设备 114的第一内存地址。
由于所述扩展域的第二端点设备 114和 116与所述主域之间进行消息交 互时可以采用 4种访问方式,具体为配置空间访问、消息信号中断(Message Signaled Interrupts , MSI )访问、 内存映射输入输出 MMIO访问和直接内存 存取(Direct Memory Access, DMA )访问, 所述系统为所述第二端点设备 分配的第二内存地址 206和第一内存地址 208实际上分别可以为 4种内存地 址, 如图 2 ( b )主域的第一内存地址与扩展域的第二内存地址的映射关系所 示 , 第二端点设备 114的第二内存地址 206可以被划分为: 206a、 206b, 206c和 206d, 分别用于对所述第二端点设备 114进行配置空间访问、 MSI 访问、 MMIO访问和 DMA访问时使用。 所述系统为所述第二端点设备 114 分配该 4种内存地址时,相应地还将所述 RCEP的内存地址中的对应于所述 第二端点设备 114的第一内存地址 208切割为 4部分, 具体为: 配置空间地 址 208a、 MSI地址 208b、 MMIO地址 208c和 DMA地址 208d, 所述第二端 点设备的第一内存地址 208a、 208b, 208c和 208d分别与所述第二端点设备 114的第二内存地址 206a、 206b, 206c和 206d存在映射关系, 具体地, 该 映射关系可以通过地址偏移关系来体现, 例如 208a与 206a之间存在第一地 址偏移关系, 208b与 206b之间存在第二地址偏移关系, 208c与 206c之间 存在第三地址偏移关系, 208d与 206d之间存在第四地址偏移关系。 所述第 二端点设备的第一内存地址与所述第二端点设备的第二内存地址之间的映 射关系可以保存在所述 RCEP106, 例如保存在所述 RCEP 106的地址转换模 块, 所述地址转换模块保存了所述地址偏移关系, 所述地址转换模块可以根 据保存的第二内存地址与第一内存地址之间的映射关系进行地址转换。
系统对整个扩展域的第二端点设备进行扫描的过程中, 除了为所述扩展 域的每一个第二端点设备分配第二内存地址外,还为所述扩展域的每一个第 二端点设备分配了总线 /设备 /功能(Bus /Device /Function, BDF )标识, 所 述第二端点设备的 BDF标识与所述第二端点设备的第二内存地址之间存在 映射关系, 所述 RCEP可以保存所述映射关系, 例如, 所述 RCEP 106的地 址转换模块保存了所述第二端点设备 114的 BDF标识与第二内存地址之间 的映射关系, 使得所述 RCEP 106根据保存的所述第二端点设备 114的 BDF 标识与第二内存地址的映射关系, 进行该第二端点设备 114 的 BDF标识与 第二内存地址之间的相互转换。
若所述第二端点设备 114发生故障, 由于从所述第二端点设备 114产生 中断消息到操作系统处理该中断消息, 存在一个时间窗口, 在这个时间窗口 内, 所述扩展域的故障的第二端点设备 114仍然可能与其它端点设备进行相 互访问,例如所述第二端点设备 114对主域中的端点设备或通过主域的 CPU 对其它设备进行通信, 或者, 所述 CPU或主域的其它端点设备也可以对所 述第二端点设备 114进行访问, 然而, 与故障的第二端点设备 114相关的访 问或者通信将可能导致其它设备产生故障, 例如导致所述第一端点设备 108 产生故障, 或者导致 CPU进行不必要的重复的错误消息处理, 使系统性能 受到影响, 严重影响了系统的可靠性。
一种故障隔离的方法
本发明实施例一
本发明实施例提供一种故障隔离方法, 用于在扩展域的端点设备发生故 障时, 阻止所述主域与所述扩展域端点设备之间的相互访问, 防止故障扩散 到主域。
如图 3所示, 为本发明实施例提供的故障隔离方法的流程图,用于 PCIe 互联的计算机系统, 所述计算机系统包括主域和扩展域, 所述主域由根复合 体、 第一端点设备与 RCEP形成, 所述扩展域由所述 RCEP与第二端点设备 形成, 所述方法包括:
101: 监控所述扩展域的第二端点设备的状态。
所述第二端点设备的状态可以包括故障状态和非故障状态,故障状态表 明所述第二端点设备发生故障, 无法进行正常工作, 非故障状态表明所述扩 展域的第二端点设备可以正常工作, 所述 RCEP对所述扩展域的第二端点设 备的状态进行监控, 可以是接收所述第二端点设备发送的错误消息, 或者接 收用于指示所述第二端点设备是否存在的设备探测响应消息,根据所述错误 消息或所述设备探测响应消息, 确定所述第二端点设备的状态。
102: 根据所述第二端点设备的状态建立设备状态记录, 所述设备状态 记录包括所述第二端点设备的标识信息与所述第二端点设备的状态的对应 关系。
所述 RCEP可以根据所述第二端点设备的状态, 建立设备状态记录, 所 述设备状态记录包括所述第二端点设备的标识信息与所述第二端点设备的 状态的对应关系, 使得所述 RCEP可以根据所述第二端点设备的标识信息, 确定所述第二端点设备的状态。
103: 接收访问请求, 所述访问请求包括所述第二端点设备对所述主域 的访问请求或者所述主域对所述第二端点设备的访问请求。
所述扩展域的第二端点设备通过所述访问请求对所述主域进行访问时, 或者所述主域通过所述访问请求对所述第二端点设备进行访问时, 所述访问 请求被路由到所述 RCEP, 所述 RCEP接收所述访问请求。
104: 根据所述访问请求中的所述第二端点设备的标识信息, 查询所述 设备状态记录, 确定所述第二端点设备的状态。
所述访问请求携带所述第二端点设备的标识信息, 所述 RCEP可以查询 所述设备状态记录中的所述第二端点设备的标识信息与所述第二端点设备 的状态的对应关系, 确定所述第二端点设备的状态。
105: 若所述第二端点设备的状态为故障状态, 丢弃所述访问请求以阻 止所述第二端点设备与所述主域之间的通信。
若确定所述第二端点设备的状态为故障状态, 则丢弃所述访问请求以阻 止所述 RCEP将所述访问请求进行转发,从而阻止所述第二端点设备与所述 主域之间的通信。
在本发明实施例中, 对所述扩展域的第二端点设备的状态进行监控, 并 根据所述第二端点设备的状态建立设备状态记录, 所述设备状态记录包括所 述第二端点设备的标识信息与所述第二端点设备的状态的对应关系,在接收 到所述第二端点设备与所述主域之间的访问请求后,根据所述访问请求中的 所述第二端点设备的标识信息, 查询所述设备状态记录, 确定所述第二端点 设备的状态, 若所述第二端点设备的状态为故障状态, 丢弃所述访问请求, 从而阻止所述故障的第二端点设备与所述主域之间的通信, 防止故障扩散到 主域, 保障了系统的可靠性。 本发明实施例二
如图 4所示, 为本发明实施例提供的故障隔离方法的流程图,用于 PCIe 互连的计算机系统, 所述计算机系统包括主域和扩展域, 所述主域由根复合 体、 第一端点设备与 RCEP形成, 所述扩展域由所述 RCEP与第二端点设备 形成, 所述第二端点设备通过所述 RCEP与所述主域中的根复合体或第一端 点设备进行通信交互, 所述方法可以包括:
201: 监控所述扩展域的第二端点设备的状态。
所述第二端点设备的状态包括故障状态和非故障状态,故障状态表明所 述第二端点设备发生故障, 无法进行正常工作, 非故障状态表明所述扩展域 的第二端点设备可以正常工作, 所述 RCEP对所述扩展域的第二端点设备的 状态进行监控包括: 接收所述第二端点设备发送的错误消息, 或者接收用于 指示所述第二端点设备是否存在的设备探测响应消息; 根据所述错误消息或 所述设备探测响应消息, 确定所述第二端点设备的状态。
具体地, 所述 RCEP可以发送设备探测消息到所述第二端点设备的配置 空间寄存器, 获取所述第二端点设备返回的设备探测响应消息, 如果所述设 备探测响应消息指示所述第二端点设备不存在,表明该第二端点设备由于故 障以至于无法探测到, 确定所述第二端点设备的状态为故障状态, 否则, 确 定所述第二端点设备的状态为非故障状态; 或者,
所述 RCEP接收来自所述第二端点设备的错误消息,根据所述错误消息, 确定所述错误消息的类型, 若所述错误消息的类型是不可纠正的错误消息 ( Uncorrectable error )类型, 则确定所述第二端点设备的状态为故障状态, 否则, 确定所述第二端点设备的状态为非故障状态。
优选地, 在接收来自所述第二端点设备的错误消息, 确定所述第二端点 设备的状态为故障状态之后, 所述 RCEP还可以进一步确定所述错误消息是 否属于重复发送的错误消息, 如果属于重复发送的错误消息, 表明所述第二 端点设备已经发送了错误消息给主域进行相应的错误处理, 则丢弃所述错误 消息, 避免 CPU进行不必要的重复的错误消息处理, 保障了系统的可靠性, 如果不属于重复发送的错误消息,表明所述错误消息为所述第二端点设备第 一次发送的错误消息, 所述 RCEP将所述错误消息发送给 CPU, 使得 CPU 对所述第二端点设备进行错误处理。
所述确定所述错误消息是否为重复发送的错误消息具体包括:
获取所述错误消息携带的所述第二端点设备的 BDF标识或第二内存地 址;
查询所述设备状态记录中记录的所述第二端点设备的第二内存地址与 所述第二端点设备的状态的对应关系,确定所述第二端点设备的状态,或者, 查询所述设备状态记录中记录的所述第二端点设备的 BDF标识与所述第二 端点设备的状态的对应关系, 确定所述第二端点设备的状态, 或者, 根据所 述访问请求中的所述第二端点设备的第二内存地址或者 BDF标识, 确定所 述第二端点设备的第一内存地址,通过查询所述设备状态记录中记录的所述 第二端点设备的第一内存地址与所述第二端点设备的状态的对应关系,确定 所述第二端点设备的状态;
如果确定所述第二端点设备的状态为故障状态,确定所述错误消息属于 重复发送的错误消息, 如果确定所述第二端点设备的状态为非故障状态, 确 定所述错误消息属于不属于重复发送的错误消息。
其中, 所述第一内存地址为所述第二端点设备在所述主域的内存地址, 用于在主域中代表所述第二端点设备, 所述第二内存地址为所述第二端点设 备在所述扩展域的内存地址, 用于在扩展域中代表所述第二端点设备。
202: 根据所述第二端点设备的状态建立设备状态记录, 所述设备状态 记录包括所述第二端点设备的第一内存地址与所述第二端点设备的状态的 对应关系。
所述 RCEP根据所述第二端点设备的状态, 建立设备状态记录, 所述设 备状态记录包括所述第二端点设备的第一内存地址与所述第二端点设备的 状态的对应关系, 使得所述 RCEP根据所述第二端点设备的第一内存地址, 可以确定所述第二端点设备的状态。
所述根据所述第二端点设备的状态,建立设备状态记录,具体可以包括: 获取所述错误消息或者设备探测响应消息中携带的所述第二端点设备 的 BDF标识或第二内存地址;
根据所述 BDF标识或第二内存地址, 获取所述第二端点设备的第一内 存地址, 可以是:
所述 RCEP根据所述保存的第二端点设备的第二内存地址与第一内存地 址之间的映射关系,将所述第二内存地址转换为所述第二端点设备的第一内 存地址; 或者, 所述 RCEP根据保存的第二端点设备的 BDF标识与第二内 存地址之间的映射关系, 先获取所述第二端点设备的第二内存地址, 再根据 保存的第二端点设备的第二内存地址与第一内存地址之间的映射关系,将所 述第二内存地址转换为所述第二端点设备的第一内存地址;
在所述设备状态记录中记录所述第二端点设备的第一内存地址与所述 第二端点设备的状态的对应关系,使得所述 RCEP可以根据所述第二端点设 备的第一内存地址, 确定所述第二端点设备的状态;
进一步,还可以在所述设备状态记录中记录所述第二端点设备的第一内 存地址与所述第二端点设备的状态的对应关系, 或者所述第二端点设备的 BDF标识与所述第二端点设备的状态的对应关系,使得所述 RCEP还可以根 据所述第二端点设备的第一内存地址或者 BDF标识, 确定所述第二端点设 备的状态。
具体地, 由于所述第二端点设备的第一内存地址包括配置空间访问的第 一内存地址、 MSI访问的第一内存地址、 MMIO访问的第一内存地址和 DMA 访问的第一内存地址, 所述第二端点设备的第二内存地址包括配置空间访问 的第二内存地址、 MSI访问的第二内存地址、 MMIO访问的第二内存地址和 DMA访问的第二内存地址, 所述 RCEP可以根据保存的所述第二端点设备 的每种第二内存地址与每种第一内存地址之间的映射关系, 获取所述第二端 点设备的配置空间访问的第一内存地址、 MSI访问的第一内存地址、 MMIO 访问的第一内存地址和 DMA访问的第一内存地址;
或者, 所述 RCEP可以根据第二端点设备的 BDF标识与每种第二内存 地址之间的映射关系, 获取所述第二端点设备的配置空间访问的第二内存地 址、 MSI访问的第二内存地址、 MMIO访问的第二内存地址和 DMA访问的 第二内存地址,再根据保存的所述第二端点设备的每种第二内存地址与每种 第一内存地址之间的映射关系, 获取所述第二端点设备的配置空间访问的第 一内存地址、 MSI访问的第一内存地址、 MMIO访问的第一内存地址和 DMA 访问的第一内存地址;
则, 所述在所述设备状态记录中记录所述第二端点设备的第一内存地址 与所述第二端点设备的状态的对应关系具体是指: 记录所述第二端点设备的 每种第一内存地址与所述第二端点设备的状态的对应关系; 所述在所述设备 状态记录中记录所述第二端点设备的第二内存地址与所述第二端点设备的 状态的对应关系具体是指: 记录所述第二端点设备的每种第二内存地址与所 述第二端点设备的状态的对应关系。
203: 接收访问请求, 所述访问请求包括所述第二端点设备对所述主域 的访问请求或者所述主域对所述第二端点设备的访问请求。
所述第一端点设备与所述主域之间通过所述访问请求进行消息交互时, 所述访问请求可以是扩展域的第二端点设备对主域进行访问的访问请求,还 可以是主域的根复合体或者所述主域的第一端点设备对所述第二端点设备 进行访问的访问请求, 当所述访问请求来自所述主域时, 所述访问请求携带 所述第二端点设备的第一内存地址, 当所述访问请求来自所述扩展域时, 所 述访问请求携带所述第二端点设备的第二内存地址或者所述第二端点设备 的 BDF标识。
204: 根据所述访问请求中的所述第二端点设备的标识信息, 查询所述 设备状态记录, 确定所述第二端点设备的状态。
所述第二端点设备的标识信息包括以下信息之一或其组合: 所述第二端 点设备的第一内存地址, 所述第二端点的第二内存地址。
当所述访问请求来自所述主域时,根据所述访问请求中的所述第二端点 设备的第一内存地址, 查询所述设备状态记录中记录的所述第二端点设备的 第一内存地址与所述第二端点设备的状态的对应关系,确定所述第二端点设 备的状态, 例如, 当采用 MMIO的访问方式时, 所述访问请求携带所述第 二端点设备的 MMIO访问的第一内存地址, 所述设备状态记录中记录了所 述第二端点设备的每种第一内存地址与所述第二端点设备的状态的对应关 系, 所述 RCEP可以利用所述访问请求中的第二端点设备的 MMIO访问的 第一内存地址, 查询所述设备状态记录, 确定所述第二端点设备的状态。
当所述访问请求来自所述扩展域时,所述 RCEP根据所述访问请求中的 所述第二端点设备的第二内存地址或者 BDF标识, 查询所述设备状态记录, 若所述设备状态记录中没有记录所述第二端点设备的第二内存地址或者 BDF标识与所述第二端点设备的状态的对应关系,则根据所述第二内存地址 或者 BDF标识获取所述第二端点设备的第一内存地址, 通过查询所述设备 状态记录中记录的所述第二端点设备的第一内存地址与所述第二端点设备 的状态的对应关系, 确定所述第二端点设备的状态, 若所述设备状态记录中 记录了所述第二端点设备的第二内存地址或者 BDF标识与所述第二端点设 备的状态的对应关系, 则直接查询所述设备状态记录中记录的所述第二端点 设备的第二内存地址或者 BDF标识与所述第二端点设备的状态的对应关系, 确定所述第二端点设备的状态,避免了将所述第二端点设备的第二内存地址 或者 BDF标识转换为所述第二端点设备的第一内存地址, 加速了确定设备 的状态的过程。
205: 若所述第二端点设备的状态为故障状态, 丢弃所述访问请求以阻 止所述第二端点设备与所述主域之间的通信。 若确定所述第二端点设备的状态为故障状态, 则丢弃所述访问请求以阻 止所述 RCEP将所述访问请求进行转发,从而阻止所述第二端点设备与所述 主域之间的通信。
进一步, 所述方法还可以包括:
206: 若所述第二端点设备的状态为故障状态, 发送故障隔离消息到
CPU, 所述故障隔离消息用于指示所述主域中的 CPU停止对所述扩展域的 第二端点设备的访问, 所述故障隔离消息携带所述第二端点设备的第一内存 地址。
若所述第二端点设备的状态为故障状态, 所述 RCEP可以发送故障隔离 消息给主域的 CPU, 使得所述主域中的 CPU停止对所述扩展域的第二端点 设备的访问, 例如可以卸载故障的所述第二端点设备的驱动, 或者隔离访问 故障的所述第二端点设备的 I/O路径。
进一步, 若所述访问请求为来自所述主域的对所述第二端点设备进行访 问的访问请求, 所述方法还可以包括:
206': 若所述访问请求为主域发送的访问请求, 向所述主域返回所述访 问请求的模拟响应报文。
具体地, 当所述主域对所述第二端点设备进行访问的访问请求为 Non-post类型的访问请求时, 需要为所述访问请求返回响应消息, 否则所述 主域可能产生返回报文超时错误而导致该计算机系统重启, 然而所述第二端 点设备发生故障后, 所述访问请求可能无法到达所述第二端点设备, 或者虽 然到达所述第二端点设备,所述第二端点设备由于故障无法产生正常的响应 消息, 所述 RCEP可以为所述访问请求生成模拟响应报文, 并返回给所述主 域, 避免产生返回报文超时错误而导致该计算机系统重启, 所述模拟响应报 文可以为可以是 Unsupported Request ( UR )报文或者 Completion Abort ( CA ) 报文。
其中, 所述步骤 206与 206,为两个可选步骤, 两者并非必须同时执行的 步骤。
在本发明实施例中, 对所述扩展域的第二端点设备的状态进行监控, 并 根据所述第二端点设备的状态建立设备状态记录, 所述设备状态记录中包括 所述第二端点设备的第一内存地址与所述第二端点设备的状态的对应关系, 在接收到所述第二端点设备与所述主域之间的访问请求后, 获取所述访问请 求中的所述第二端点设备的第一内存地址, 或者根据所述访问请求中的所述 第二端点设备的 BDF标识或者第二内存地址, 获取所述第二端点设备的第 一内存地址, 查询所述设备状态记录中的所述第二端点设备的第一内存地址 与所述第二端点设备的状态的对应关系, 确定所述第二端点设备的状态, 若 所述第二端点设备的状态为故障状态, 丢弃所述访问请求, 从而阻止所述故 障的第二端点设备与所述主域之间的通信, 并且还可以发送故障隔萬消息到 CPU, 指示 CPU停止对所述扩展域的第二端点设备的访问, 防止故障扩散 到主域。
进一步, 在本发明实施例中, 所述设备状态记录中还可以记录所述第二 端点设备的 BDF标识或者第二内存地址与所述第二端点设备的状态的对应 关系, 使得直接根据所述访问请求中的所述第二端点设备的 BDF标识或者 第二内存地址或者第一内存地址, 查询所述设备状态记录, 就能够确定所述 第二端点设备的状态,避免了将所述第二端点设备的第二内存地址或者 BDF 标识转换为所述第二端点设备的第一内存地址,加速了确定所述第二端点设 备的状态的过程。
此外, 在本发明实施例中, 对所述第二端点设备的状态进行监控时, 在 接收到所述第二端点设备发送的错误消息, 通过所述错误消息的类型, 确定 所述第二端点设备的状态位故障状态之后,还可以进一步确定所述第二端点 设备发送的错误消息是否属于重复发送的错误消息,如果属于重复发送的错 误消息, 则丢弃所述错误消息, 以阻止所述错误消息到达主域, 防止了错误 的扩散, 避免 CPU进行不必要的重复的错误消息处理, 保障了系统的可靠 性。
本发明实施例三
结合图 1所示的计算机系统, 本发明实施例提供的故障隔离方法如图 5 所示,所述扩展域的第二端点设备 116为故障设备,利用 DMA的访问方式, 所述主域的第一端点设备 108发送 Non-post类型的访问请求对所述故障的第 二端点设备 116进行访问, 所述访问请求先被路由到所述 RCEP 106, 由于 所述第二端点设备发生故障时, 所述访问请求可能已经越过所述 RCEP 106 的边界,即可能已经经过所述 RCEP 106的转发,还可能还未越过所述 RCEP 106的边界, 即还未经过所述 RCEP 106的转发, 所述方法具体可以包括: 301: 所述 RCEP 106监控所述扩展域的所有第二端点设备的状态。
所述设备的状态包括故障状态和非故障状态, 所述 RCEP 106对所述扩 展域的第二端点设备 114和所述第二端点设备 116的状态进行监控, 具体包 括: 接收所述第二端点设备 114或 116发送的错误消息, 或者接收用于指示 所述第二端点设备 114或 116是否存在的设备探测响应消息; 根据所述错误 消息或所述设备探测响应消息, 确定所述第二端点设备 114或 116的状态。
302: 根据所述扩展域的所有第二端点设备的状态建立设备状态记录, 所述设备状态记录包括所述扩展域的第二端点设备的第一内存地址与该设 备的状态的对应关系。
例如, 所述第二端点设备 116发送的错误消息中包括所述第二端点设备 116的 BDF标识, 所述 RCEP 106获取所述第二端点设备 116的 BDF标识; 根据第二端点设备 116的 BDF标识与每种第二内存地址之间的映射关系, 获取所述第二端点设备 116的配置空间访问的第二内存地址、 MSI访问的第 二内存地址、 MMIO访问的第二内存地址和 DMA访问的第二内存地址, 根 据保存的所述第二端点设备 116的每种第二内存地址与每种第一内存地址之 间的映射关系,获取所述第二端点设备 116的配置空间访问的第一内存地址、 MSI访问的第一内存地址、 MMIO访问的第一内存地址和 DMA访问的第一 内存地址; 在所述设备状态记录中记录所述第二端点设备 116的每种第一内 存地址与所述第二端点设备 116的状态的对应关系,例如将所述第二端点设 备 116的每种第一内存地址都标记为故障。
同理, 若述第二端点设备 114故障, 在所述设备状态记录中记录所述第 二端点设备 114的每种第一内存地址与所述第二端点设备 114的状态的对应 关系, 例如将所述第二端点设备 116的每种第一内存地址都标记为故障。
303: 接收所述第一端点设备 108对所述第二端点设备 116进行访问的 访问请求, 所述访问请求携带所述第二端点设备 116的 DMA访问的第一内 存地址。
所述第一端点设备 108对所述第二端点设备 116进行 DMA访问时, 所 述访问请求通过地址路由发送到所述 RCEP 106, 所述 RCEP获取所述访问 请求携带的所述第二端点设备 116的 DMA访问的第一内存地址。
304: 根据所述访问请求中的所述第二端点设备 116的 DMA访问的第 一内存地址, 确定所述第二端点设备 116的状态。
根据所述访问请求中的所述第二端点设备 116的 DMA访问的第一内存 地址, 查询所述设备状态记录中记录的所述第二端点设备的第一内存地址与 所述第二端点设备的状态的对应关系, 确定所述第二端点设备 116的状态。
具体地, 当所述第二端点设备 116发生故障时, 如果所述访问请求未越 过所述 RCEP的边界, 则所述设备状态记录中记录的所述第二端点设备的状 态为故障状态, 根据查询所述设备状态记录中记录的所述第二端点设备 116 的 DMA访问的第一内存地址与所述第二端点设备 116的状态的关系, 将确 定的所述第二端点设备 116的状态为故障状态; 当所述第二端点设备 116发 生故障时, 如果所述访问请求已经越过所述 RCEP的边界, 则此时所述设备 状态记录中记录的所述第二端点设备的状态为非故障状态, 查询所述设备状 态记录中记录的所述第二端点设备 116的 DMA访问的第一内存地址与所述 第二端点设备的状态的对应关系,将确定的所述第二端点设备 116的状态为 非故障状态。
305: 若所述第二端点设备 116的状态为故障状态, 丢弃所述访问请求 以阻止所述第一端点设备 108与对所述第二端点设备 116的访问, 然后执行 步骤 306。
当所述第二端点设备 116发生故障时, 如果所述访问请求未越过所述
RCEP 106的边界, 则所述 RCEP接收到所述访问请求后, 确定所述被访问 的第二端点设备 116的状态为故障状态, 丢弃所述访问请求, 以阻止所述第 一端点设备 108对所述第二端点设备 116的访问,避免了故障扩散到所述主 域。
306: 向所述第一端点设备 108返回所述访问请求的模拟响应报文。 由于所述访问请求为 Non-post类型的访问请求,为所述访问请求生成模 拟响应报文, 将所述生成的模拟响应报文返回给所述第一端点设备 108, 避 免所述主域的 CPU产生返回报文超时错误而导致该计算机系统重启。
在本发明实施例中, 所述 RCEP 116对所述所述扩展域的所有第二端点 设备的状态进行监控, 并根据所述扩展域的所有第二端点设备的状态建立设 备状态记录, 所述主域的第一端点设备 108发送访问请求对所述扩展域的第 二端点设备 116进行访问时, 所述 RCEP接收所述访问请求, 并根据所述访 问请求中的 DMA访问的第一内存地址, 查询所述设备状态记录, 确定所述 第二端点设备的状态, 如果所述第二端点设备 116发生故障时, 所述访问请 求还未越过所述 RCEP 106的边界, 则步骤 304确定所述第二端点设备 116 的状态为故障状态, 此时所述 RCEP丢弃所述访问请求, 以阻止所述第一端 点设备 108对所述第二端点设备 116的访问, 避免了故障扩散到所述主域, 并且所述 RCEP还可以向所述第一端点设备 108返回所述访问请求的模拟响 应报文, 避免所述主域的 CPU产生返回报文超时错误而导致该计算机系统 重启。
进一步, 如果所述第二端点设备 116发生故障时, 所述访问请求已经越 过所述 RCEP 106的边界, 则步骤 304将确定所述第二端点设备 116的状态 为非故障状态, 所述 RCEP 106按照正常的工作流程, 将所述访问请求发送 给所述第二端点设备 116, 所述故障的第二端点设备 116接收到所述访问请 求之后, 可能受到所述访问请求的触发, 向所述 RCEP 106发送错误消息, 所述故障的第二端点设备 116还可能主动向所述 RCEP发送错误消息以上报 故障, 所述 RCEP接收来自所述第二端点设备的错误消息, 对所述第二端点 设备的状态进行监控, 如果所述错误消息的类型是不可纠正的错误消息 ( Uncorrectable error )类型, 则确定所述第二端点设备的状态为故障状态, 所述 RCEP可以进一步查询所述设备状态记录,确定所述错误消息是否属于 重复发送的错误消息, 如果属于重复发送的错误消息, 则丢弃所述重复发送 的错误消息, 避免 CPU进行不必要的重复的错误消息处理, 保障了系统的 可靠性。
本发明的实施例四
结合图 1所示的计算机系统, 本发明实施例提供的故障隔离方法如图 6 所示, 所述扩展域的第二端点设备 116为故障设备, 利用 MMIO的访问方 式,所述故障的第二端点设备 108发送 Non-post类型的访问请求对所述主域 的主 CPU 101进行访问, 所述访问请求先被路由到所述 RCEP 106, 由于所 述第二端点设备发生故障时, 所述访问请求可能已经越过所述 RCEP 106的 边界, 即可能已经经过所述 RCEP 106的转发, 还可能还未越过所述 RCEP 106的边界, 即还未经过所述 RCEP 106的转发, 所述方法具体可以包括: 401: 所述 RCEP106监控所述扩展域的所有第二端点设备的状态。
所述设备的状态包括故障状态和非故障状态, 所述 RCEP 106对所述扩 展域的第二端点设备 114和所述第二端点设备 116的状态进行监控, 具体包 括: 接收所述第二端点设备 114或 116发送的错误消息, 或者接收用于指示 所述第二端点设备 114或 116是否存在的设备探测响应消息; 根据所述错误 消息或所述设备探测响应消息, 确定所述第二端点设备 114或 116的状态。 402: 根据所述扩展域的所有第二端点设备的状态建立设备状态记录, 所述设备状态记录包括所述扩展域的第二端点设备的第一内存地址与该设 备的状态的对应关系。
例如, 所述第二端点设备 116发送的错误消息中包括所述第二端点设备 116的 BDF标识, 所述 RCEP 106获取所述第二端点设备 116的 BDF标识; 根据第二端点设备 116的 BDF标识与第二内存地址之间的映射关系, 获取 所述第二端点设备 116的配置空间访问的第二内存地址、 MSI访问的第二内 存地址、 MMIO访问的第二内存地址和 DMA访问的第二内存地址, 根据保 存的所述第二端点设备 116 的第二内存地址与第一内存地址之间的映射关 系, 获取所述第二端点设备 116的配置空间访问的第一内存地址、 MSI访问 的第一内存地址、 MMIO访问的第一内存地址和 DMA访问的第一内存地址; 在所述设备状态记录中记录所述第二端点设备 116的每种第一内存地址与所 述第二端点设备 116的状态的对应关系, 例如将所述第二端点设备 116的每 种第二内存地址都标记为故障。
同理, 若述第二端点设备 114故障, 在所述设备状态记录中记录所述第 二端点设备 114的每种第一内存地址与所述第二端点设备 114的状态的对应 关系, 以及所述第二端点设备 114的 BDF标识与所述第二端点设备 114的 状态的对应关系。
403: 接收所述第二端点设备 116对所述主 CPU 101进行访问的访问请 求, 所述访问请求携带所述第二端点设备 116的 MMIO访问的第二内存地 址。
所述第二端点设备 116对所述主 CPU 101进行 MMIO访问时, 所述访 问请求通过地址路由发送到所述 RCEP 106, 所述 RCEP获取所述访问请求 携带的所述第二端点设备 116的 MMIO访问的第二内存地址。
404: 根据所述访问请求中的所述第二端点设备 116的 MMIO访问的第 二内存地址, 确定所述第二端点设备 116的状态。 根据所述访问请求中的所述第二端点设备 116的 MMIO访问的第二内 存地址, 利用所述保存的所述第二端点设备的每种第一内存地址与每种第二 内存地址的映射关系, 获取所述第二端点设备 116的 MMIO访问的第一内 存地址, 查询所述设备状态记录中记录的所述第二端点设备的第一内存地址 与所述第二端点设备的状态的对应关系,确定所述第二端点设备 116的状态。
具体地, 当所述第二端点设备 116发生故障时, 如果所述访问请求未越 过所述 RCEP的边界, 则根据查询所述设备状态记录中记录的所述第二端点 设备 116的 DMA访问的第一内存地址与所述第二端点设备 116的状态的关 系, 将确定的所述第二端点设备 116的状态为故障状态; 当所述第二端点设 备 116发生故障时, 如果所述访问请求已经越过所述 RCEP的边界, 则此时 查询所述设备状态记录中记录的所述第二端点设备 116的 DMA访问的第一 内存地址与所述第二端点设备 116的状态的关系,将确定的所述第二端点设 备 116的状态为非故障状态。
405: 若所述第二端点设备 116的状态为故障状态, 丢弃所述访问请求 以阻止所述第二端点设备 116与对所述主 CPU 101的访问。
当所述第二端点设备 116发生故障时, 如果所述访问请求未越过所述 RCEP 106的边界, 则所述 RCEP接收到所述访问请求后, 确定所述被访问 的第二端点设备 116的状态为故障状态, 丢弃所述访问请求, 以阻止所述第 二端点设备 116对所述主 CPU 101的访问, 避免了故障扩散到所述主域。
在本发明实施例中, 所述 RCEP 116对所述扩展域的所有第二端点设备 的状态进行监控, 并根据所述扩展域的所有第二端点设备的状态建立设备状 态记录, 所述主域的第二端点设备 116 发送访问请求对所述主域的主 CPU101进行访问时, 所述 RCEP接收所述访问请求, 并根据所述访问请求 中的 MMIO访问的第二内存地址, 获取所述第二端点设备的 MMIO访问的 第一内存地址, 查询所述设备状态记录, 确定所述第二端点设备的状态, 如 果所述第二端点设备 116发生故障时,所述访问请求还未越过所述 RCEP 106 的边界, 则步骤 404确定所述第二端点设备 116的状态为故障状态, 此时所 述 RCEP丢弃所述访问请求,以阻止所述第二端点设备 116对所述主 CPU101 的访问, 避免了故障扩散到所述主域。
进一步, 如果所述第二端点设备 116发生故障时, 所述访问请求已经越 过所述 RCEP 106的边界, 则步骤 404将确定所述第二端点设备 116的状态 为非故障状态, 所述 RCEP 106按照正常的工作流程, 将所述访问请求发送 给所述主 CPU 101 , 所述主 CPU 101接收到所述访问请求之后, 为所述访问 请求返回响应报文, 所述返回的响应报文先到达所述 RCEP, 由于所述第二 端点设备已经发生故障, 此时将所述返回的响应报文发送给所述故障的第二 端点设备 116已经没有意义, 并可能触发所述故障的第二端点设备 116重复 发送错误消息, 因此, 所述 RCEP可以将所述返回的响应报文丢弃。
此外, 所述故障的第二端点设备 116可能主动向所述 RCEP发送错误消 息以上报故障, 所述 RCEP接收来自所述第二端点设备的错误消息, 对所述 第二端点设备的状态进行监控,如果所述错误消息的类型是不可纠正的错误 消息(Uncorrectable error )类型, 则确定所述第二端点设备的状态为故障状 态, 所述 RCEP可以进一步查询所述设备状态记录, 确定所述错误消息是否 属于重复发送的错误消息, 如果属于重复发送的错误消息, 则丢弃所述重复 发送的错误消息, 防止了故障的扩散,
本发明实施例的装置
本发明实施例提供了一种故障隔离装置, 用于在扩展域的端点设备发生 故障时, 阻止所述主域与所述扩展域端点设备之间的相互访问, 防止故障扩 散到主域。
如图 7所示,为本发明实施例提供的故障隔离装置的组成图,用于 PCIe 互连的计算机系统, 所述计算机系统包括主域和扩展域, 所述主域由根复 合体、 第一端点设备与 RCEP形成, 所述扩展域由所述 RCEP与第二端点 设备形成, 所述装置包括: 监控单元 701 , 用于监控所述扩展域的第二端点设备的状态。 记录单元 702, 用于根据所述第二端点设备的状态建立设备状态记录, 所述设备状态记录包括所述第二端点设备的标识信息与所述第二端点设备 的状态的对应关系。
接收单元 703, 用于接收访问请求, 所述访问请求包括所述第二端点设 备对所述主域的访问请求或者所述主域对所述第二端点设备的访问请求。
确定单元 704, 用于根据所述访问请求中的所述第二端点设备的标识信 息, 查询所述设备状态记录, 确定所述第二端点设备的状态。
处理单元 705, 在所述第二端点设备的状态为故障状态时, 用于丢弃所 述访问请求以阻止所述第二端点设备与所述主域之间的通信。
具体地, 所述第二端点设备的状态包括故障状态和非故障状态, 所述 监控单元 701 可以接收所述第二端点设备发送的错误消息或者接收用于指 示所述第二端点设备是否存在的设备探测响应消息, 根据所述错误消息或 所述设备探测响应消息,确定所述第二端点设备的状态,所述记录单元 702 根据所述监控单元 701 中的所述第二端点设备的状态建立设备状态记录, 所述设备状态记录包括所述第二端点设备的标识信息与所述第二端点设备 的状态的对应关系, 所述接收单元 703接收所述第二端点设备与所述主域 之间的访问请求之后, 所述确定单元 704根据所述访问请求中的所述第二 端点设备的标识信息, 查询所述设备状态记录, 确定所述第二端点设备的 状态, 所述处理单元 705在所述确定单元 704确定第二端点设备的状态为 故障状态时, 用于丢弃所述访问请求, 从而阻止所述故障的第二端点设备 与所述主域之间的通信, 防止故障扩散到主域, 保障了系统的可靠性。
如图 8所示,为本发明实施例提供的故障隔离装置的组成图,用于 PCIe 互连的计算机系统, 所述计算机系统包括主域和扩展域, 所述主域由根复 合体、 第一端点设备与 RCEP形成, 所述扩展域由所述 RCEP与第二端点 设备形成, 所述第二端点设备通过所述 RCEP与所述主域中的根复合体或 第一端点设备进行通信交互, 所述装置可以包括: 监控单元 801、 记录单元 802、 接收单元 803、 确定单元 804、 处理单元 805 , 所述故障隔离装置可以 为所述 RCEP。
所述监控单元 801用于监控所述扩展域的第二端点设备的状态, 所述第 二端点设备的状态包括故障状态和非故障状态,故障状态表明所述第二端点 设备发生故障, 无法进行正常工作, 非故障状态表明所述扩展域的第二端点 设备可以正常工作, 所述监控单元 801对所述扩展域的第二端点设备的状态 进行监控包括: 接收所述第二端点设备发送的错误消息, 或者接收用于指示 所述第二端点设备是否存在的设备探测响应消息,根据所述错误消息或所述 设备探测响应消息, 确定所述第二端点设备的状态, 具体地, 所述监控单元 801可以发送设备探测消息到所述第二端点设备的配置空间寄存器, 获取所 述第二端点设备返回的设备探测响应消息,如果所述设备探测响应消息指示 所述第二端点设备不存在, 表明该第二端点设备由于故障以至于无法探测 到, 确定所述第二端点设备的状态为故障状态, 否则, 确定所述第二端点设 备的状态为非故障状态; 或者, 所述监控单元 801接收来自所述第二端点设 备的错误消息, 根据所述错误消息, 确定所述错误消息的类型, 若所述错误 消息的类型是不可纠正的错误消息( Uncorrectable error )类型, 则确定所述 第二端点设备的状态为故障状态, 否则, 确定所述第二端点设备的状态为非 故障状态。
记录单元 802, 用于根据所述第二端点设备的状态建立设备状态记录, 所述设备状态记录包括所述第二端点设备的第一内存地址与所述第二端点 设备的状态的对应关系, 其中, 所述第一内存地址为所述第二端点设备在所 述主域的内存地址, 用于在主域中代表所述第二端点设备。
所述记录单元 802具体包括: 地址转换模块子单元 802a和状态记录子 单元 802b, 所述地址转换模块子单元 802a用于获取所述错误消息或者设备 探测响应消息中携带的所述第二端点设备的 BDF标识或第二内存地址, 根 据所述 BDF标识或第二内存地址, 获取所述第二端点设备的第一内存地址; 所述地址转换模块子单元 802a保存了第二端点设备的第二内存地址与第一 内存地址之间的映射关系, 以及第二端点设备的 BDF标识与第二内存地址 之间的映射关系,根据所述保存的第二端点设备的第二内存地址与第一内存 地址之间的映射关系,将所述第二内存地址转换为所述第二端点设备的第一 内存地址, 或者, 根据保存的第二端点设备的 BDF标识与第二内存地址之 间的映射关系, 获取所述第二端点设备的第二内存地址, 根据保存的第二端 点设备的第二内存地址与第一内存地址之间的映射关系,将所述第二内存地 址转换为所述第二端点设备的第一内存地址; 所述状态记录子单元 802b用 于在所述设备状态记录中记录所述第二端点设备的第一内存地址与所述第 二端点设备的状态的对应关系,使得所述 RCEP可以根据所述第二端点设备 的第一内存地址, 确定所述第二端点设备的状态, 其中, 所述第二端点设备 的第二内存地址为所述第二端点设备在所述扩展域的内存地址,用于在扩展 域中代表所述第二端点设备。
进一步, 所述状态记录子单元 802b还可以用于在所述设备状态记录中 记录所述第二端点设备的第一内存地址与所述第二端点设备的状态的对应 关系, 或者所述第二端点设备的 BDF标识与所述第二端点设备的状态的对 应关系,使得所述确定单元 803还可以根据所述第二端点设备的第一内存地 址或者 BDF标识, 确定所述第二端点设备的状态。
具体地, 由于所述第二端点设备的第一内存地址包括配置空间访问的第 一内存地址、 MSI访问的第一内存地址、 MMIO访问的第一内存地址和 DMA 访问的第一内存地址, 所述第二端点设备的第二内存地址包括配置空间访问 的第二内存地址、 MSI访问的第二内存地址、 MMIO访问的第二内存地址和 DMA访问的第二内存地址,所述地址转换模块子单元 802a具体用于保存所 述第二端点设备的每种第二内存地址与每种第一内存地址之间的映射关系 , 以及所述第二端点设备的 BDF标识与每种第二内存地址之间的映射关系; 根据所述保存的所述第二端点设备的每种第二内存地址与每种第一内存地 址之间的映射关系, 获取所述第二端点设备的配置空间访问的第一内存地 址、 MSI访问的第一内存地址、 MMIO访问的第一内存地址和 DMA访问的 第一内存地址, 或者先根据第二端点设备的 BDF标识与每种第二内存地址 之间的映射关系, 获取所述第二端点设备的配置空间访问的第二内存地址、 MSI访问的第二内存地址、 MMIO访问的第二内存地址和 DMA访问的第二 内存地址,再根据保存的所述第二端点设备的每种第二内存地址与每种第一 内存地址之间的映射关系, 获取所述第二端点设备的配置空间访问的第一内 存地址、 MSI访问的第一内存地址、 MMIO访问的第一内存地址和 DMA访 问的第一内存地址; 所述记录子单元 802b具体用于记录所述第二端点设备 的每种第一内存地址与所述第二端点设备的状态的对应关系, 或者还用于记 录所述第二端点设备的每种第二内存地址与所述第二端点设备的状态的对 应关系, 或者还用于记录所述第二端点设备的 BDF标识与所述第二端点设 备的状态的对应关系。
所述接收单元 803用于接收访问请求, 所述访问请求包括所述第二端 点设备对所述主域的访问请求或者所述主域对所述第二端点设备的访问请 求, 当所述访问请求来自所述主域时, 所述访问请求携带所述第二端点设 备的第一内存地址, 当所述访问请求来自所述扩展域时, 所述访问请求携 带所述第二端点设备的第二内存地址或者所述第二端点设备的 BDF标识。
所述确定单元 804用于根据所述访问请求中的所述第二端点设备的标 识信息, 查询所述设备状态记录, 确定所述第二端点设备的状态, 其中, 所述第二端点设备的标识信息包括以下信息之一或其组合: 所述第二端点 设备的第一内存地址, 所述第二端点的第二内存地址, 所述第二端点设备 的 BDF标识,具体地, 当所述访问请求来自所述主域时,所述确定单元 804 根据所述访问请求中的所述第二端点设备的第一内存地址, 查询所述设备 状态记录中记录的所述第二端点设备的第一内存地址与所述第二端点设备 的状态的对应关系, 确定所述第二端点设备的状态, 例如, 当采用 MMIO 的访问方式, 所述访问请求携带所述第二端点设备的 MMIO访问的第一内 存地址, 所述设备状态记录中记录了所述第二端点设备的每种第一内存地 址与所述第二端点设备的状态的对应关系, 所述确定单元 804可以利用所 述访问请求中的第二端点设备的 MMIO访问的第一内存地址, 查询所述设 备状态记录, 确定所述第二端点设备的状态; 当所述访问请求来自所述扩 展域时, 所述确定单元 804根据所述访问请求中的所述第二端点设备的第 二内存地址或者 BDF标识, 查询所述设备状态记录, 若所述设备状态记录 中没有记录所述第二端点设备的第二内存地址或者 BDF标识与所述第二端 点设备的状态的对应关系, 则根据所述第二内存地址或者 BDF标识获取所 述第二端点设备的第一内存地址, 通过查询所述设备状态记录中记录的所 述第二端点设备的第一内存地址与所述第二端点设备的状态的对应关系, 确定所述第二端点设备的状态, 若所述设备状态记录中记录了所述第二端 点设备的第二内存地址或者 BDF标识与所述第二端点设备的状态的对应关 系, 则直接查询所述设备状态记录中记录的所述第二端点设备的第二内存 地址或者 BDF标识与所述第二端点设备的状态的对应关系, 确定所述第二 端点设备的状态, 避免了将所述第二端点设备的第二内存地址或者 BDF标 识转换为所述第二端点设备的第一内存地址, 加速了确定设备的状态的过 程。
所述处理单元 805在所述确定单元 804确定所述第二端点设备的状态 为故障状态时, 用于丢弃所述访问请求以阻止所述第二端点设备与所述主 域之间的通信。
所述处理单元 805在所述确定单元 804确定所述第二端点设备的状态 为故障状态时, 还用于发送故障隔离消息到 CPU, 使得所述主域中的 CPU 停止对所述扩展域的第二端点设备的访问, 例如可以卸载故障的所述第二 端点设备的驱动, 或者隔离访问故障的所述第二端点设备的 I/O路径, 所 述故障隔离消息携带所述第二端点设备的第一内存地址。
所述处理单元 805, 在所述访问请求所述访问请求为主域发送的访问请 求时, 还用于向所述主域返回所述访问请求的模拟响应 4艮文, 具体地, 当所 述主域对所述第二端点设备进行访问的访问请求为 Non-post类型的访问请 求时, 需要为所述访问请求返回响应消息, 否则所述主域可能产生返回报文 超时错误而导致该计算机系统重启, 然而所述第二端点设备发生故障后, 所 述访问请求可能无法到达所述第二端点设备,或者虽然到达所述第二端点设 备,所述第二端点设备由于故障无法产生正常的响应消息,所述处理单元 805 可以为所述访问请求生成模拟响应报文, 并返回给所述主域, 避免产生返回 报文超时错误而导致该计算机系统重启, 所述模拟响应报文可以为可以是 Unsupported Request ( UR )报文或者 Completion Abort ( CA )报文。
优选地, 所述监控单元 801在通过接收的错误消息确定所述第二端点设 备的状态为故障状态之后,还用于确定所述错误消息是否属于重复发送的错 误消息, 如果属于重复发送的错误消息, 表明所述第二端点设备已经发送过 错误消息给主域进行相应的错误处理, 则丢弃所述错误消息, 避免 CPU进 行不必要的重复的错误消息处理, 保障了系统的可靠性, 如果不属于重复发 送的错误消息,表明所述错误消息为所述第二端点设备第一次发送的错误消 息, 所述 RCEP将所述错误消息发送给 CPU, 使得 CPU对所述第二端点设 备进行错误处理,所述确定所述错误消息是否为重复发送的错误消息具体包 括:
获取所述错误消息携带的所述第二端点设备的 BDF标识或第二内存地 址; 查询所述设备状态记录中记录的所述第二端点设备的第二内存地址与所 述第二端点设备的状态的对应关系, 确定所述第二端点设备的状态, 或者, 查询所述设备状态记录中记录的所述第二端点设备的 BDF标识与所述第二 端点设备的状态的对应关系, 确定所述第二端点设备的状态, 或者, 根据所 述访问请求中的所述第二端点设备的第二内存地址或者 BDF标识, 确定所 述第二端点设备的第一内存地址,通过查询所述设备状态记录中记录的所述 第二端点设备的第一内存地址与所述第二端点设备的状态的对应关系,确定 所述第二端点设备的状态; 如果确定所述第二端点设备的状态为故障状态, 确定所述错误消息属于重复发送的错误消息,如果确定所述第二端点设备的 状态为非故障状态, 确定所述错误消息属于不属于重复发送的错误消息。
在本发明实施例中, 所述监控单元 801对所述扩展域的第二端点设备的 状态进行监控,所述记录单元 802根据所述监控单元 801确定的所述第二端 点设备的状态建立设备状态记录,所述设备状态记录中包括所述第二端点设 备的第一内存地址与所述第二端点设备的状态的对应关系, 所述确定单元 804在所述接收单元 803接收到所述第二端点设备与所述主域之间的访问请 求后, 获取所述访问请求中的所述第二端点设备的第一内存地址, 或者根据 所述访问请求中的所述第二端点设备的 BDF标识或者第二内存地址, 获取 所述第二端点设备的第一内存地址, 查询所述设备状态记录中的所述第二端 点设备的第一内存地址与所述第二端点设备的状态的对应关系,确定所述第 二端点设备的状态, 所述处于单元 805在所述确定单元 804确定所述第二端 点设备的状态为故障状态时,丢弃所述访问请求从而阻止所述故障的第二端 点设备与所述主域之间的通信, 并且还可以发送故障隔离消息到 CPU,指示 CPU停止对所述扩展域的第二端点设备的访问, 防止故障扩散到主域。
进一步, 在本发明实施例中, 所述记录单元 802还可以在所述设备状态 记录中记录所述第二端点设备的 BDF标识或者第二内存地址与所述第二端 点设备的状态的对应关系,使得所述确定单元 804直接根据所述访问请求中 的所述第二端点设备的 BDF标识或者第二内存地址或者第一内存地址, 查 询所述设备状态记录, 就能够确定所述第二端点设备的状态, 避免了将所述 第二端点设备的第二内存地址或者 BDF标识转换为所述第二端点设备的第 一内存地址, 加速了确定所述第二端点设备的状态的过程。
此外, 在本发明实施例中, 所述监控单元 801对所述第二端点设备的状 态进行监控时, 在接收到所述第二端点设备发送的错误消息, 通过所述错误 消息的类型, 确定所述第二端点设备的状态为故障状态之后, 还可以进一步 确定所述第二端点设备发送的错误消息是否属于重复发送的错误消息,如果 属于重复发送的错误消息, 则丢弃所述错误消息, 以阻止所述错误消息到达 主域, 避免 CPU进行不必要的重复的错误消息处理, 保障了系统的可靠性。
如图 9为本发明实施例提供的故障隔离系统 900,所述系统 900包括 PCIe 主域 910和扩展域 920, 所述主域包括根复合体 911、 第一端点设备 912与 根复合体端点设备 913, 所述扩展域 920包括所述根复合体端点设备 913和 第二端点设备 921 , 所述根复合体端点设备 913用于监控所述扩展域的第二 端点设备 921的状态,根据所述第二端点设备 921的状态建立设备状态记录, 所述设备状态记录包括所述第二端点设备 921的标识信息与所述第二端点设 备 921的状态的对应关系, 接收访问请求, 所述访问请求包括所述第二端点 设备 921对所述主域 910的访问请求或者所述主域 910对所述第二端点设备 921的访问请求,根据所述访问请求中的所述第二端点设备 921的标识信息, 查询所述设备状态记录, 确定所述第二端点设备 921的状态, 若所述第二端 点设备 921的状态为故障状态,丢弃所述访问请求以阻止所述第二端点设备 921与所述主域 910之间的通信。
在本发明实施例中, 对所述扩展域的第二端点设备的状态进行监控, 并 根据所述第二端点设备的状态建立设备状态记录,在接收到所述第二端点设 备与所述主域之间的访问请求后,根据所述访问请求中的所述第二端点设备 的标识信息, 查询所述设备状态记录, 确定所述第二端点设备的状态, 若所 述第二端点设备的状态为故障状态, 丢弃所述访问请求, 从而阻止所述故障 的第二端点设备与所述主域之间的通信, 防止故障扩散到主域, 保障了系统 的可靠性。
如图 10, 为本发明实施例提供的故障隔离装置的结构组成示意图。 本发 明实施例提供的故障隔离装置用于 PCIe互连的计算机系统, 该计算机系统包 括主域和扩展域, 所述主域由根复合体、 第一端点设备与根复合体端点设备 形成, 所述扩展域由所述根复合体端点设备与第二端点设备形成;
该故障隔离装置可包括:
处理器 1001、存储器 1002、系统总线 1004和通信接口 1005。处理器 1001、 存储器 1002和通信接口 1005之间通过系统总线 1004连接并完成相互间的通 信。
处理器 1001可能为单核或多核中央处理单元, 或者为特定集成电路, 或者为被配置成实施本发明实施例的一个或多个集成电路。
存储器 1002 可以为高速 RAM 存储器, 也可以为非易失性存储器 ( non-volat i le memory ), 例如至少一个磁盘存储器。
存储器 1002用于计算机执行指令 1003。 具体的, 计算机执行指令 1003 中可以包括程序代码。
当所述故障隔离装置运行时, 处理器 1001运行计算机执行指令 1003 , 可以执行如下方法:
监控所述扩展域的第二端点设备的状态;
根据所述第二端点设备的状态建立设备状态记录, 所述设备状态记录包 括所述第二端点设备的标识信息与所述第二端点设备的状态的对应关系; 接收访问请求, 所述访问请求包括所述第二端点设备对所述主域的访问 请求或者所述主域对所述第二端点设备的访问请求;
根据所述访问请求中的所述第二端点设备的标识信息, 查询所述设备状 态记录, 确定所述第二端点设备的状态;
若所述第二端点设备的状态为故障状态,丢弃所述访问请求以阻止所述 第二端点设备与所述主域之间的通信。
所述方法具体还可以包括:
监控所述扩展域的第二端点设备的状态;
根据所述第二端点设备的状态建立设备状态记录, 所述设备状态记录包 括所述第二端点设备的第一内存地址与所述第二端点设备的状态的对应关 系;
接收访问请求, 所述访问请求包括所述第二端点设备对所述主域的访问 请求或者所述主域对所述第二端点设备的访问请求;
根据所述访问请求中的所述第二端点设备的标识信息, 查询所述设备状 态记录, 确定所述第二端点设备的状态;
若所述第二端点设备的状态为故障状态,丢弃所述访问请求以阻止所述 第二端点设备与所述主域之间的通信;
若所述第二端点设备的状态为故障状态,发送故障隔离消息到 CPU, 所 述故障隔离消息用于指示所述主域中的 CPU停止对所述扩展域的第二端点 设备的访问, 所述故障隔离消息携带所述第二端点设备的第一内存地址; 若所述访问请求为主域发送的访问请求, 向所述主域返回所述访问请求 的模拟响应报文。
本领域普通技术人员将会理解, 本发明的各个方面、 或各个方面的可能 实现方式可以被具体实施为系统、 方法或者计算机程序产品。 因此, 本发明 的各方面、 或各个方面的可能实现方式可以采用完全硬件实施例、 完全软件 实施例 (包括固件、驻留软件等等), 或者组合软件和硬件方面的实施例的形 式, 在这里都统称为"电路"、 "模块 "或者 "系统"。 此外, 本发明的各方面、 或各个方面的可能实现方式可以采用计算机程序产品的形式,计算机程序产 品是指存储在计算机可读介质中的计算机可读程序代码。
计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。 计算机可读存储介质包含但不限于电子、 磁性、 光学、 电磁、 红外或半导体 系统、 设备或者装置, 或者前述的任意适当组合, 如随机存取存储器 (RAM), 只读存储器 (ROM)、 可擦除可编程只读存储器 (EPROM 或者快闪 存储器)、 光纤、 便携式只读存储器 (CD-ROM)。
计算机中的处理器读取存储在计算机可读介质中的计算机可读程序代 码, 使得处理器能够执行在流程图中每个步骤、 或各步骤的组合中规定的功 能动作;生成实施在框图的每一块、或各块的组合中规定的功能动作的装置。
计算机可读程序代码可以完全在用户的计算机上执行、部分在用户的计 算机上执行、 作为单独的软件包、 部分在用户的计算机上并且部分在远程计 算机上, 或者完全在远程计算机或者服务器上执行。 也应该注意, 在某些替 代实施方案中, 在流程图中各步骤、 或框图中各块所注明的功能可能不按图 中注明的顺序发生。 例如, 依赖于所涉及的功能, 接连示出的两个步骤、 或 两个块实际上可能被大致同时执行,或者这些块有时候可能被以相反顺序执 行。
本领域普通技术人员可以意识到, 结合本文中所公开的实施例描述的各 示例的单元及算法步骤, 能够以电子硬件、 或者计算机软件和电子硬件的结 合来实现。 这些功能究竟以硬件还是软件方式来执行, 取决于技术方案的特 定应用和设计约束条件。 专业技术人员可以对每个特定的应用来使用不同方 法来实现所描述的功能, 但是这种实现不应认为超出本发明的范围。
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限 于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护 范围应所述以权利要求的保护范围为准。

Claims

权利要求
1、 一种故障隔离的方法, 其特征在于, 用于高速外围组件互联 PCIe互 连的计算机系统,该计算机系统包括主域和扩展域,所述主域包括根复合体、 第一端点设备与根复合体端点设备, 所述扩展域包括所述根复合体端点设备 与第二端点设备, 所述方法包括:
监控所述扩展域的第二端点设备的状态;
根据所述第二端点设备的状态建立设备状态记录, 所述设备状态记录包 括所述第二端点设备的标识信息与所述第二端点设备的状态的对应关系; 接收访问请求, 所述访问请求包括所述第二端点设备对所述主域的访问 请求或者所述主域对所述第二端点设备的访问请求;
根据所述访问请求中的所述第二端点设备的标识信息, 查询所述设备状 态记录, 确定所述第二端点设备的状态;
若所述第二端点设备的状态为故障状态,丢弃所述访问请求以阻止所述 第二端点设备与所述主域之间的通信。
2、 根据权利要求 1所述的方法, 其特征在于, 所述监控所述扩展域的 第二端点设备的状态, 包括:
接收所述第二端点设备发送的错误消息, 或者接收用于指示所述第二端 点设备是否存在的设备探测响应消息;
根据所述错误消息或所述设备探测响应消息,确定所述第二端点设备的 状态。
3、 根据权利要求 2所述的方法, 其特征在于, 所述第二端点设备的标 识信息包括所述第二端点设备的第一内存地址, 所述第一内存地址为所述第 二端点设备在所述主域的内存地址;
所述根据所述第二端点设备的状态建立设备状态记录,所述设备状态记 录包括所述第二端点设备的标识信息与所述第二端点设备的状态的对应关 系, 包括: 获取所述错误消息或者设备探测响应消息中携带的所述第二端点设备 的总线 /设备 /功能 BDF标识或第二内存地址, 其中, 所述第二内存地址为所 述第二端点设备在所述扩展域的内存地址;
根据所述 BDF标识或第二内存地址, 获取所述第二端点设备的第一内 存地址 ^
在所述设备状态记录中记录所述第二端点设备的第一内存地址与所述 第二端点设备的状态的对应关系。
4、 根据权利要求 3所述的方法, 其特征在于, 所述第二端点设备的标 识信息还包括所述第二端点设备的第二内存地址;
所述在所述设备状态记录中记录所述第二端点设备的第一内存地址与 所述第二端点设备的状态的对应关系之后, 还包括: 在所述设备状态记录中 记录所述第二端点设备的第二内存地址与所述第二端点设备的状态的对应 关系。
5、 根据权利要求 3或 4所述的方法, 其特征在于, 所述第二端点设备 的标识信息还包括所述第二端点设备的 BDF标识;
所述在所述设备状态记录中记录所述第二端点设备的第一内存地址与 所述第二端点设备的状态的对应关系之后, 还包括: 在所述设备状态记录中 记录所述第二端点设备的 BDF标识与所述第二端点设备的状态的对应关系。
6、 根据权利要求 3所述的方法, 其特征在于, 所述根据所述 BDF标识 或第二内存地址, 获取所述第二端点设备的第一内存地址, 包括:
根据保存的所述第二端点设备的第二内存地址与第一内存地址之间的 映射关系, 将所述第二内存地址转换为所述第二端点设备的第一内存地址; 或者,
根据保存的所述第二端点设备的 BDF标识与第二内存地址之间的映射 关系, 获取所述第二端点设备的第二内存地址, 根据保存的所述第二端点设 备的第二内存地址与第一内存地址之间的映射关系,将所述第二内存地址转 换为所述第二端点设备的第一内存地址。
7、 根据权利要求 3或 6所述的方法, 其特征在于, 所述第二端点设备 的第一内存地址包括配置空间访问的第一内存地址、 消息信号中断访问的第 一内存地址、 内存映射输入输出访问的第一内存地址和 DMA访问的第一内 存地址 ^
则, 所述在所述设备状态记录中记录所述第二端点设备的第一内存地址 与所述第二端点设备的状态的对应关系包括:
在所述设备状态记录中记录所述第二端点设备的每种第一内存地址与 所述第二端点设备的状态的对应关系。
8、 根据权利要求 7所述的方法, 其特征在于, 所述第二端点设备的第 二内存地址包括配置空间访问的第二内存地址、 消息信号中断访问的第二内 存地址、 内存映射输入输出访问的第二内存地址和 DMA访问的第二内存地 址;
所述第二端点设备的第二内存地址与第一内存地址之间的映射关系包 括所述第二端点设备的每种第二内存地址与每种第一内存地址之间的映射 关系;
所述根据保存的所述第二端点设备的第二内存地址与第一内存地址之 间的映射关系,将所述第二内存地址转换为所述第二端点设备的第一内存地 址, 包括: 根据保存的所述第二端点设备的每种第二内存地址与每种第一内 存地址之间的映射关系, 获取所述第二端点设备的配置空间访问的第一内存 地址、 消息信号中断访问的第一内存地址、 内存映射输入输出访问的第一内 存地址和 DMA访问的第一内存地址。
9、 根据权利要求 3-8任意一项所述的方法, 所述方法还包括: 发送故障隔离消息到 CPU, 所述故障隔离消息用于指示所述主域中的 CPU停止对所述扩展域的第二端点设备的访问,所述故障隔离消息携带所述 第二端点设备的第一内存地址。
10、 根据权利要求 3-8任意一项所述的方法, 若所述访问请求为所述主 域对所述第二端点设备的访问请求, 所述方法还包括:
向所述主域返回所述访问请求的模拟响应报文。
11、 根据权利要求 2所述的方法, 在接收所述第二端点设备发送的错误 消息之后, 还包括:
确定所述错误消息是否属于重复发送的错误消息,如果属于重复发送的 错误消息, 则丢弃所述错误消息。
12、 一种故障隔离的装置, 其特征在于, 用于高速外围组件互联 PCIe 互连的计算机系统, 该计算机系统包括主域和扩展域, 所述主域包括根复合 体、 第一端点设备与根复合体端点设备, 所述扩展域包括所述根复合体端点 设备与第二端点设备, 所述装置包括:
监控单元, 用于监控所述扩展域的第二端点设备的状态;
记录单元, 用于根据所述第二端点设备的状态建立设备状态记录, 所述 设备状态记录包括所述第二端点设备的标识信息与所述第二端点设备的状 态的对应关系;
接收单元, 用于接收访问请求, 所述访问请求包括所述第二端点设备对 所述主域的访问请求或者所述主域对所述第二端点设备的访问请求;
确定单元, 用于根据所述访问请求中的所述第二端点设备的标识信息, 查询所述设备状态记录, 确定所述第二端点设备的状态;
处理单元, 在所述第二端点设备的状态为故障状态时, 用于丢弃所述访 问请求以阻止所述第二端点设备与所述主域之间的通信。
13、 根据权利要求 12所述的装置, 其特征在于, 所述监控单元具体用 于:
接收所述第二端点设备发送的错误消息, 或者接收用于指示所述第二端 点设备是否存在的设备探测响应消息,根据所述错误消息或所述设备探测响 应消息, 确定所述第二端点设备的状态。
14、 根据权利要求 13所述的装置, 其特征在于, 所述第二端点设备的 标识信息包括所述第二端点设备的第一内存地址, 所述第一内存地址为所述 第二端点设备在所述主域的内存地址;
所述记录单元具体包括:
地址转换模块子单元, 用于在所述监控单元确定所述第二端点设备的故 障状态时, 获取所述错误消息或者所述设备探测响应消息中携带的所述第二 端点设备的 BDF标识或第二内存地址,根据所述 BDF标识或第二内存地址, 获取所述第二端点设备的第一内存地址, 其中, 所述第二内存地址为所述第 二端点设备在所述扩展域的内存地址;
状态记录子单元, 用于在所述设备状态记录中记录所述第二端点设备的 第一内存地址与所述第二端点设备的状态的对应关系。
15、 根据权利要求 14所述的装置, 其特征在于, 所述第二端点设备的 标识信息还包括所述第二端点设备的第二内存地址;
所述状态记录子单元还用于在所述设备状态记录中记录所述第二端点 设备的第二内存地址与所述第二端点设备的状态的对应关系。
16、 根据权利要求 14或 15所述的装置, 其特征在于, 所述第二端点设 备的标识信息还包括所述第二端点设备的 BDF标识;
所述状态记录子单元还用于在所述设备状态记录中记录所述第二端点 设备的 BDF标识与所述第二端点设备的状态的对应关系。
17、 根据权利要求 14所述的装置, 其特征在于, 所述地址转换模块子 单元还用于保存所述第二端点设备的第二内存地址与第一内存地址之间的 映射关系, 以及保存所述第二端点设备的 BDF标识与第二内存地址之间的 映射关系;
所述地址转换模块子单元具体用于根据保存的所述第二端点设备的第 二内存地址与第一内存地址之间的映射关系,将所述第二内存地址转换为所 述第二端点设备的第一内存地址; 或者, 根据保存的所述第二端点设备的 BDF标识与第二内存地址之间的映射 关系, 获取所述第二端点设备的第二内存地址, 根据保存的所述第二端点设 备的第二内存地址与第一内存地址之间的映射关系,将所述第二内存地址转 换为所述第二端点设备的第一内存地址。
18、 根据权利要求 14或 17所述的装置, 其特征在于, 所述第二端点设 备的第一内存地址包括配置空间访问的第一内存地址、 消息信号中断访问的 第一内存地址、 内存映射输入输出访问的第一内存地址和 DMA访问的第一 内存地址;
所述地址转换模块子单元具体用于记录所述第二端点设备的每种第一 内存地址与所述第二端点设备的对应关系。
19、 根据权利要求 18所述的装置, 其特征在于, 所述第二端点设备的 第二内存地址包括配置空间访问的第二内存地址、 消息信号中断访问的第二 内存地址、 内存映射输入输出访问的第二内存地址和 DMA访问的第二内存 地址;
所述地址转换模块子单元具体用于保存所述第二端点设备的每种第二 内存地址与每种第一内存地址之间的映射关系; 根据保存的所述第二端点设 备的每种第二内存地址与每种第一内存地址之间的映射关系 , 获取所述第二 端点设备的配置空间访问的第一内存地址、 消息信号中断访问的第一内存地 址、 内存映射输入输出访问的第一内存地址和 DMA访问的第一内存地址。
20、 根据权利要求 14-19任意一项所述的装置, 所述处理单元还用于发 送故障隔离消息到 CPU, 所述故障隔离消息用于指示所述主域中的 CPU停 止对所述扩展域的第二端点设备的访问, 所述故障隔离消息携带所述第二端 点设备的第一内存地址。
21、 根据权利要求 14-19任意一项所述的装置, 其特征在于, 所述处理 单元还用于: 在所述访问请求为所述主域对所述第二端点设备的访问请求 时, 向所述主域返回所述访问请求的模拟响应 ^艮文。
22、 根据权利要求 13所述的装置, 其特征在于, 所述监控单元还用于, 在接收所述第二端点设备发送的错误消息之后,确定所述错误消息是否属于 重复发送的错误消息,如果属于重复发送的错误消息,则丢弃所述错误消息。
23、 根据权利要求 12-22任意一项所述的装置, 其特征在于, 所述故障 隔离装置为所述根复合体端点设备。
24、 一种故障隔萬系统, 所述系统包括高速外围组件互联 PCIe主域和 扩展域, 所述主域包括根复合体、 第一端点设备与根复合体端点设备, 所述 扩展域包括所述根复合体端点设备和第二端点设备, 所述根复合体端点设备 用于:
监控所述扩展域的第二端点设备的状态,根据所述第二端点设备的状态 建立设备状态记录, 所述设备状态记录包括所述第二端点设备的标识信息与 所述第二端点设备的状态的对应关系,接收所述第二端点设备发送的访问请 求, 或者所述主域对所述第二端点设备的访问请求, 根据所述访问请求中的 所述第二端点设备的标识信息, 查询所述设备状态记录, 确定所述第二端点 设备的状态, 若所述第二端点设备的状态为故障状态, 丢弃所述访问请求以 阻止所述第二端点设备与所述主域之间的通信。
25、 一种故障隔离装置, 其特征在于, 用于高速外围组件互联 PCIe互 连的计算机系统,该计算机系统包括主域和扩展域,所述主域包括根复合体、 第一端点设备与根复合体端点设备,所述扩展域包括所述根复合体端点设备 与第二端点设备;
所述装置包括包括处理器、 存储器、 总线和通信接口;
所述存储器用于存储计算机执行指令, 所述处理器与所述存储器通过所 述总线连接, 当所述故障隔离装置运行时, 所述处理器执行所述存储器存储 的所述计算机执行指令,以使所述故障隔离装置执行如权利要求 1-11中任一 所述的故障隔离方法。
26、 一种计算机可读介质, 其特征在于, 包括计算机执行指令, 以供计 算机的处理器执行所述计算机执行指令时,所述计算机执行如权利要求 1-11 中任一所述的故障隔离方法。
PCT/CN2013/083325 2013-09-11 2013-09-11 一种故障处理的方法、计算机系统和装置 WO2015035574A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
PCT/CN2013/083325 WO2015035574A1 (zh) 2013-09-11 2013-09-11 一种故障处理的方法、计算机系统和装置
CN201380001454.4A CN104756081B (zh) 2013-09-11 2013-09-11 一种故障处理的方法、计算机系统和装置
ES13882632.6T ES2656464T3 (es) 2013-09-11 2013-09-11 Procedimiento, sistema informático y aparato de procesamiento de fallo
EP13882632.6A EP2869201B1 (en) 2013-09-11 2013-09-11 Failure processing method, computer system, and apparatus
US14/549,395 US9678826B2 (en) 2013-09-11 2014-11-20 Fault isolation method, computer system, and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2013/083325 WO2015035574A1 (zh) 2013-09-11 2013-09-11 一种故障处理的方法、计算机系统和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/549,395 Continuation US9678826B2 (en) 2013-09-11 2014-11-20 Fault isolation method, computer system, and apparatus

Publications (1)

Publication Number Publication Date
WO2015035574A1 true WO2015035574A1 (zh) 2015-03-19

Family

ID=52664929

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/083325 WO2015035574A1 (zh) 2013-09-11 2013-09-11 一种故障处理的方法、计算机系统和装置

Country Status (5)

Country Link
US (1) US9678826B2 (zh)
EP (1) EP2869201B1 (zh)
CN (1) CN104756081B (zh)
ES (1) ES2656464T3 (zh)
WO (1) WO2015035574A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108762962A (zh) * 2018-05-18 2018-11-06 网易宝有限公司 防止应用异常的方法和装置、存储介质及电子设备

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3264280B1 (en) * 2013-12-31 2019-09-18 Huawei Technologies Co., Ltd. Method and apparatus for extending pcie domain
US9804942B2 (en) * 2014-06-10 2017-10-31 Analog Devices, Inc. Safety node in interconnect data buses
US10114688B2 (en) * 2015-02-16 2018-10-30 Dell Products L.P. System and method for peripheral bus device failure management
ES2726302T3 (es) * 2015-09-21 2019-10-03 Huawei Tech Co Ltd Sistema informático y procedimiento para acceder a un dispositivo de punto extremo del mismo
US20170091013A1 (en) * 2015-09-28 2017-03-30 Netapp, Inc. Pcie error reporting and throttling
US9354967B1 (en) 2015-11-30 2016-05-31 International Business Machines Corporation I/O operation-level error-handling
US9384086B1 (en) 2015-11-30 2016-07-05 International Business Machines Corporation I/O operation-level error checking
CN105824622B (zh) * 2016-03-11 2020-04-24 联想(北京)有限公司 数据处理方法及电子设备
CN108259212B (zh) * 2017-05-25 2019-09-17 新华三技术有限公司 报文处理方法及装置
CN108228374B (zh) * 2017-12-28 2021-08-20 华为技术有限公司 一种设备的故障处理方法、装置及系统
US11614986B2 (en) * 2018-08-07 2023-03-28 Marvell Asia Pte Ltd Non-volatile memory switch with host isolation
US11544000B2 (en) 2018-08-08 2023-01-03 Marvell Asia Pte Ltd. Managed switching between one or more hosts and solid state drives (SSDs) based on the NVMe protocol to provide host storage services
CN109815043B (zh) * 2019-01-25 2022-04-05 华为云计算技术有限公司 故障处理方法、相关设备及计算机存储介质
WO2020236972A2 (en) 2019-05-20 2020-11-26 The Broad Institute, Inc. Non-class i multi-component nucleic acid targeting systems
CN112306913B (zh) * 2019-07-30 2023-09-22 华为技术有限公司 一种端点设备的管理方法、装置及系统
CN111767242B (zh) * 2020-05-28 2022-04-15 西安广和通无线软件有限公司 Pcie设备控制方法、装置、计算机设备和存储介质
WO2023196818A1 (en) 2022-04-04 2023-10-12 The Regents Of The University Of California Genetic complementation compositions and methods

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662788A (zh) * 2012-04-28 2012-09-12 浪潮电子信息产业股份有限公司 一种计算机系统故障诊断决策及处理方法
CN102906707A (zh) * 2010-06-23 2013-01-30 国际商业机器公司 管理与硬件事件关联的处理

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6557121B1 (en) 1997-03-31 2003-04-29 International Business Machines Corporation Method and system for fault isolation for PCI bus errors
US7177971B2 (en) * 2001-08-24 2007-02-13 Intel Corporation General input/output architecture, protocol and related methods to provide isochronous channels
US7134052B2 (en) * 2003-05-15 2006-11-07 International Business Machines Corporation Autonomic recovery from hardware errors in an input/output fabric
JP2006195821A (ja) * 2005-01-14 2006-07-27 Fujitsu Ltd 情報処理システムの制御方法、情報処理システム、ダイレクトメモリアクセス制御装置、プログラム
US7660917B2 (en) * 2006-03-02 2010-02-09 International Business Machines Corporation System and method of implementing multiple internal virtual channels based on a single external virtual channel
US20090063894A1 (en) * 2007-08-29 2009-03-05 Billau Ronald L Autonomic PCI Express Hardware Detection and Failover Mechanism
US7752346B2 (en) * 2007-12-21 2010-07-06 Aprius, Inc. Universal routing in PCI-Express fabrics
US7929919B2 (en) 2008-05-15 2011-04-19 Hewlett-Packard Development Company, L.P. Systems and methods for a PLL-adjusted reference clock
US7992058B2 (en) 2008-12-16 2011-08-02 Hewlett-Packard Development Company, L.P. Method and apparatus for loopback self testing
US7873068B2 (en) * 2009-03-31 2011-01-18 Intel Corporation Flexibly integrating endpoint logic into varied platforms
US8645767B2 (en) * 2010-06-23 2014-02-04 International Business Machines Corporation Scalable I/O adapter function level error detection, isolation, and reporting
US8930609B2 (en) * 2010-08-18 2015-01-06 Intel Corporation Method, apparatus, and system for manageability and secure routing and endpoint access
US8751713B2 (en) * 2011-05-06 2014-06-10 International Business Machines Corporation Executing virtual functions using memory-based data in a PCI express SR-IOV and MR-IOV environment
CN103248737B (zh) * 2012-02-10 2016-08-17 联想(北京)有限公司 一种界面显示控制方法、装置和通信终端
US8806098B1 (en) * 2013-03-15 2014-08-12 Avalanche Technology, Inc. Multi root shared peripheral component interconnect express (PCIe) end point
US9286258B2 (en) * 2013-06-14 2016-03-15 National Instruments Corporation Opaque bridge for peripheral component interconnect express bus systems
US9135200B2 (en) * 2013-06-28 2015-09-15 Futurewei Technologies, Inc. System and method for extended peripheral component interconnect express fabrics

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102906707A (zh) * 2010-06-23 2013-01-30 国际商业机器公司 管理与硬件事件关联的处理
CN102662788A (zh) * 2012-04-28 2012-09-12 浪潮电子信息产业股份有限公司 一种计算机系统故障诊断决策及处理方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2869201A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108762962A (zh) * 2018-05-18 2018-11-06 网易宝有限公司 防止应用异常的方法和装置、存储介质及电子设备

Also Published As

Publication number Publication date
US9678826B2 (en) 2017-06-13
CN104756081A (zh) 2015-07-01
EP2869201A1 (en) 2015-05-06
EP2869201A4 (en) 2015-11-18
US20150082080A1 (en) 2015-03-19
CN104756081B (zh) 2016-08-17
ES2656464T3 (es) 2018-02-27
EP2869201B1 (en) 2017-12-06

Similar Documents

Publication Publication Date Title
WO2015035574A1 (zh) 一种故障处理的方法、计算机系统和装置
JP7118922B2 (ja) スイッチングデバイス、ペリフェラル・コンポーネント・インターコネクト・エクスプレスシステムおよびその初期化方法
US9760455B2 (en) PCIe network system with fail-over capability and operation method thereof
US8443237B2 (en) Storage apparatus and method for controlling the same using loopback diagnosis to detect failure
JP6003350B2 (ja) 監視装置、情報処理装置、及び監視方法
US7793139B2 (en) Partial link-down status for virtual Ethernet adapters
US20070260910A1 (en) Method and apparatus for propagating physical device link status to virtual devices
US8924779B2 (en) Proxy responder for handling anomalies in a hardware system
US20080273456A1 (en) Port Trunking Between Switches
WO2012119369A1 (zh) 基于cc-numa的报文处理方法、装置和系统
US20070174723A1 (en) Sub-second, zero-packet loss adapter failover
WO2017049433A1 (zh) 计算机系统和计算机系统中端点设备访问的方法
CN113300917B (zh) Open Stack租户网络的流量监控方法、装置
TWI773152B (zh) 伺服器與應用於伺服器的控制方法
JP6773974B2 (ja) ストレージ制御装置およびストレージ装置
WO2015139327A1 (zh) 失效切换方法、装置和系统
JP6777848B2 (ja) 制御装置、及びストレージ装置
US8880957B2 (en) Facilitating processing in a communications environment using stop signaling
WO2016101177A1 (zh) 计算机设备内存的检测方法和计算机设备
US9454452B2 (en) Information processing apparatus and method for monitoring device by use of first and second communication protocols
WO2020244067A1 (zh) 故障检测方法及相关设备
TWI766594B (zh) 伺服器與應用於伺服器的控制方法
TWI701594B (zh) 遠端硬體診斷系統與診斷方法
TW202134901A (zh) 伺服器及相關的控制方法

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2013882632

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13882632

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE