CN111694685A - PCIE equipment fault positioning method and device - Google Patents

PCIE equipment fault positioning method and device Download PDF

Info

Publication number
CN111694685A
CN111694685A CN202010386363.9A CN202010386363A CN111694685A CN 111694685 A CN111694685 A CN 111694685A CN 202010386363 A CN202010386363 A CN 202010386363A CN 111694685 A CN111694685 A CN 111694685A
Authority
CN
China
Prior art keywords
pcie
pcie slot
supply voltage
power supply
slot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010386363.9A
Other languages
Chinese (zh)
Inventor
孙建鑫
王琳慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010386363.9A priority Critical patent/CN111694685A/en
Publication of CN111694685A publication Critical patent/CN111694685A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A PCIE equipment fault positioning method and device are disclosed, wherein a PCIE slot state and a PCIE slot power supply voltage are obtained through a BMC; and positioning the failure of the PCIE equipment according to the acquired PCIE slot state and the power supply voltage of the PCIE slot. According to the method, the PCIE slot state and the PCIE slot power consumption are obtained to serve as fault location judgment bases, the PCIE slot information is automatically obtained through the BMC, the comparison and verification are not needed, convenience and convenience are achieved, a large amount of manpower and material resources are saved, the testing efficiency is improved, and the testing accuracy is high.

Description

PCIE equipment fault positioning method and device
Technical Field
The invention relates to the field of PCIE equipment fault positioning, in particular to a PCIE equipment fault positioning method and a PCIE equipment fault positioning device.
Background
When a PCIE device supporting a PCIE protocol in a development stage is installed on a server in the development stage, compatibility, stability, and performance of the PCIE device need to be verified. For PCIE devices with indicator lights, a green indicator light may be turned on or flash at high frequency under normal operating conditions, and if the PCIE devices cannot operate normally, a red indicator light is turned on or turned off. In the research and development stage, it is not accurate enough to unilaterally judge that the PCIE device is faulty or damaged only by turning on the red light or not turning on the light.
Taking a PCIE network card as an example, possible reasons when the indicator light is not on include: network card failure, poor compatibility of the network card and a CPU, network cable failure or the three conditions exist simultaneously. The prior art generally determines the ultimate failure by comparison verification:
(1) using a CPU with good compatibility, a network cable without faults, observing the indicator light by using the network card without lighting, and if the network card is still not lighting, indicating that the network card has faults; if the indicator light is green and normally on, the network card fault is eliminated; further verification of network cable faults or poor compatibility of the network card and the CPU is required;
(2) on the premise of 1, a CPU with good compatibility is used, a network cable used by the network card is not lighted, the network card observes an indicator light, and if the network cable is still not lighted, the network cable fault is indicated; if the indicator light is green and normally on, the network cable fault is eliminated, and only the last possible network card is left with poor compatibility with the CPU.
On the one hand, the verification method in the prior art is large in workload, needs enough material resources and manpower, and is a small cost for the research and development of the server, and on the other hand, the method is a white box verification method, causes of non-lighting are not searched from PCIE equipment and a CPU, and causes possibly caused by empirical analysis are only considered, so that the possibility is possibly considered, or direction errors are considered.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method and an apparatus for locating a failure of a PCIE device, where PCIE slot information is acquired through a BMC, a failure type of the PCIE device is automatically determined, manpower and material resources are saved, and a location result is accurate.
The technical scheme of the invention is as follows: a PCIE equipment fault positioning method comprises the following steps:
acquiring a PCIE slot state and a PCIE slot power supply voltage through a BMC;
and positioning the failure of the PCIE equipment according to the acquired PCIE slot state and the power supply voltage of the PCIE slot.
Further, the BMC is configured with a PCIE slot state sensor;
the specific steps for acquiring the PCIE slot state by the BMC are as follows:
the BIOS collects PCIE slot states and reports the PCIE slot states to the BMC;
and the BMC assigns the PCIE slot state sensor according to the PCIE slot state reported by the BIOS.
Further, the BMC is configured with a PCIE slot power supply voltage sensor;
the specific step of acquiring the power supply voltage of the PCIE slot by the BMC is as follows:
the BMC reads power supply voltage of the PCIE slot from the server mainboard;
and assigning a value to the PCIE slot power supply voltage sensor according to the read PCIE slot power supply voltage.
Furthermore, the mainboard is connected with the BMC through an LPC bus.
Further, the PCIE device fault is located according to the obtained PCIE slot state and the PCIE slot power supply voltage, which specifically includes:
judging whether the PCIE slot state is normal or not;
if not, judging that the PCIE slot has a fault; if the voltage is normal, detecting the power supply voltage of the PCIE slot;
if the power supply voltage of the PCIE slot is an invalid value, detecting identification information of the PCIE equipment under the system;
if the PCIE equipment in the system cannot be identified, the PCIE equipment is judged to be in fault;
if the power supply voltage of the PCIE slot is 0, detecting the identification information of the PCIE equipment under the system;
if the PCIE equipment in the system is normally identified, the PCIE equipment is judged to have no fault, but the PCIE equipment is not compatible with the CPU.
Further, when the PCIE device is a PCIE network card,
if the PCIE slot state is judged to be normal and the PCIE slot power supply voltage is detected to be neither 0 nor an invalid value, the network cable fault is judged.
Further, the PCIE slot state is judged by reading the assignment of the PCIE slot state sensor.
Further, the power supply voltage of the PCIE slot is detected by reading the assignment of the power supply voltage sensor of the PCIE slot.
The technical scheme of the invention also comprises a PCIE equipment fault positioning device, which comprises,
an information acquisition module: acquiring a PCIE slot state and a PCIE slot power supply voltage through a BMC;
a fault positioning module: and positioning the failure of the PCIE equipment according to the acquired PCIE slot state and the power supply voltage of the PCIE slot.
Further, the fault location module includes,
a PCIE slot state judging unit: judging whether the PCIE slot state is normal or not;
PCIE slot supply voltage detection unit: detecting a PCIE slot power supply voltage;
PCIE equipment identification information detection unit: detecting PCIE equipment identification information under a system;
a fault location unit: and positioning the failure of the PCIE equipment according to the judgment result of the PCIE slot state judgment unit, the detection result of the PCIE slot power supply voltage detection unit and the detection result of the PCIE equipment identification information detection unit.
According to the method and the device for locating the fault of the PCIE equipment, the PCIE slot state and the PCIE slot power supply voltage are obtained to serve as the fault locating judgment basis, the PCIE slot information is automatically obtained through the BMC, the comparison and verification are not needed, convenience and rapidness are achieved, a large amount of manpower and material resources are saved, the testing efficiency is improved, and the testing accuracy is high.
Drawings
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a specific implementation method for a failure of a PCIE device according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a second structure according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings by way of specific examples, which are illustrative of the present invention and are not limited to the following embodiments.
Example one
As shown in fig. 1, this embodiment provides a method for locating a failure of a PCIE device, including the following steps:
s1, acquiring the PCIE slot state and the PCIE slot power supply voltage through the BMC;
and S2, locating the failure of the PCIE equipment according to the acquired PCIE slot state and the power supply voltage of the PCIE slot.
In the field of servers, devices of a PCIE (peripheral component interconnect express, a high-speed serial computer extended bus standard) protocol can be plugged and unplugged, and are not integrated on a motherboard, so that a Baseboard Management Controller (BMC) cannot be directly connected to the PCIE device on hardware, and it is even impossible to obtain specific information of the PCIE device through the BMC. In order to realize automatic positioning of a PCIE device failure, in this embodiment, the PCIE slot information is acquired by the BMC, which includes a PCIE slot state and a PCIE slot power supply voltage, to indirectly determine the PCIE device failure, so that the failure can be automatically positioned, and the accuracy of positioning is also ensured.
In this embodiment, the BMC is configured with a PCIE slot state sensor, and after acquiring the PCIE slot state, the BMC assigns a value to the PCIE slot state sensor to provide a basis for subsequently locating a fault. Specifically, the step of acquiring the PCIE slot state by the BMC is specifically:
the BIOS collects PCIE slot states and reports the PCIE slot states to the BMC;
and the BMC assigns the PCIE slot state sensor according to the PCIE slot state reported by the BIOS.
It should be noted that the assignment of the PCIE slot state sensor may include two alarm values: 00h-FaultStatus asserted, 01h-Identify Status asserted. Wherein 00h-Fault Status asserted indicates that the state of the PCIE slot is abnormal, and 01h-Identify Status asserted indicates that the state of the PCIE slot is normal.
In this embodiment, the BMC is further configured with a PCIE slot power supply voltage sensor, and the BMC assigns a value to the PCIE slot power supply voltage sensor after acquiring the PCIE slot power supply voltage, so as to provide a basis for subsequent positioning of a fault. Specifically, the step of acquiring the power supply voltage of the PCIE slot by the BMC is specifically:
the BMC reads power supply voltage of the PCIE slot from the server mainboard;
and assigning a value to the PCIE slot power supply voltage sensor according to the read PCIE slot power supply voltage.
It should be noted that the power supply voltage for the PCIE slot by the server motherboard is 12V and 3.3V, and the BMC in this embodiment acquires the power supply voltage for the PCIE slot by the server motherboard. It should be noted that the server motherboard is connected to the BMC through the LPC bus, and sends information to the BMC. If the total value X of the power supply voltage is in the range of 0-15.3, null is sent to the BMC, and the BMC assigns the value of the PCIE slot power supply voltage sensor to be N/A, which indicates that the PCIE equipment is not powered on. And if the total value X of the power supply voltage is within the range of 15.3-16.3, sending 0 to the BMC, and assigning the value of the power supply voltage sensor of the PCIE slot to be 0 by the BMC, which indicates that the PCIE device is on. And if the total power supply voltage value X is larger than 16.3, sending Y (Y = X-16.3) to the BMC, and assigning the value of the power supply voltage sensor of the PCIE slot to be Y by the BMC. It should be noted that, when there is an external PCIE device in a PCIE slot, the power supply voltage of the PCIE slot may increase, but if the external PCIE device cannot normally operate. It means that the additional power is insufficient but not exceeding 1V, so to define the total value X in the range of 15.3-16.3, send 0 to BMC, and above 16.3, send Y (Y = X-16.3) to BMC.
As shown in fig. 2, in this embodiment, the method for locating a failure of a PCIE device according to the obtained PCIE slot state and the PCIE slot power supply voltage specifically includes the following steps:
s201, judging whether the PCIE slot state is normal or not;
s202, if the PCIE slot is abnormal, judging that the PCIE slot is in fault; if the voltage is normal, detecting the power supply voltage of the PCIE slot;
s203, if the power supply voltage of the PCIE slot is an invalid value, detecting the identification information of the PCIE equipment in the system;
s204, if the PCIE equipment in the system can not be identified, the PCIE equipment is judged to be in fault;
s205, if the power supply voltage of the PCIE slot is 0, detecting the identification information of the PCIE equipment in the system;
and S206, if the PCIE equipment in the system is normally identified, the PCIE equipment is judged to have no fault, but the PCIE equipment is incompatible with the CPU.
In addition, when the PCIE device is a PCIE network card, if the PCIE slot state is determined to be normal and the PCIE slot power supply voltage is detected to be neither 0 nor an invalid value, it is determined that the network cable is faulty.
It should be noted that, in this embodiment, the PCIE slot state is determined by reading the assignment of the PCIE slot state sensor. And detecting the power consumption of the PCIE slot by reading the assignment of the PCIE slot power supply voltage sensor.
Specifically, if the PCIE slot state sensor is 00h-Fault Status asserted, the PCIE slot is failed; if the idle state is 01 h-idle state asserted, the PCIE slot has no failure.
Reading the assignment of a PCIE slot power supply voltage sensor under the condition that a PCIE slot state sensor is 01 h-identity Status asserted, and if the value is N/A and PCIE equipment cannot be identified under the system, determining that the PCIE equipment per se has a fault; if the value is 0 and the identification of the PCIE equipment in the system is normal, the PCIE equipment is determined to have no fault, but the PCIE equipment is not compatible with the CPU and can only be identified. When the PCIE equipment is a PCIE network card, if the value of a power supply voltage sensor of the PCIE slot is not 0 or N/A, the network card is determined to have no fault, the network card is compatible with a CPU, and a network cable has a fault.
Example two
As shown in fig. 3, the present embodiment provides a PCIE device failure location apparatus, which includes the following functional modules.
The information acquisition module 11: acquiring a PCIE slot state and a PCIE slot power supply voltage through a BMC;
the fault location module 22: and positioning the failure of the PCIE equipment according to the acquired PCIE slot state and the power supply voltage of the PCIE slot.
The device acquires the PCIE slot information through the BMC, wherein the PCIE slot information comprises the PCIE slot state and the PCIE slot power supply voltage, so that the failure of the PCIE device is indirectly judged, the failure can be automatically positioned, and the positioning accuracy is ensured.
The fault location module 22 includes the following functional units to implement fault location.
A PCIE slot state judging unit: judging whether the PCIE slot state is normal or not;
PCIE slot supply voltage detection unit: detecting a PCIE slot power supply voltage;
PCIE equipment identification information detection unit: detecting PCIE equipment identification information under a system;
a fault location unit: and positioning the failure of the PCIE equipment according to the judgment result of the PCIE slot state judgment unit, the detection result of the PCIE slot power supply voltage detection unit and the detection result of the PCIE equipment identification information detection unit.
Specifically, when the PCIE slot state determination unit determines that the PCIE slot is abnormal according to the PCIE slot state obtained by the BMC, the failure location unit determines that the PCIE slot has a failure.
The PCIE slot state judging unit judges that the PCIE slot is normal according to the PCIE slot state obtained by the BMC, (1) the PCIE slot power supply voltage detecting unit detects that the PCIE slot power supply voltage is an invalid value, and the PCIE equipment cannot be identified under the detection system of the PCIE equipment identification information detecting unit, then the fault locating unit judges that the PCIE equipment is in fault; (2) the PCIE slot power supply voltage detection unit detects that the PCIE slot power supply voltage is 0, and the PCIE equipment is normally identified under the detection system of the PCIE equipment identification information detection unit, the fault location unit judges that the PCIE equipment has no fault, but the PCIE equipment is not compatible with the CPU; (3) under the condition that the PCIE equipment is a network card, the PCIE slot power supply voltage detection unit detects that the PCIE slot power supply voltage is not 0 or is not an invalid value, and then the network cable fault is determined.
The above disclosure is only for the preferred embodiments of the present invention, but the present invention is not limited thereto, and any non-inventive changes that can be made by those skilled in the art and several modifications and amendments made without departing from the principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A PCIE equipment fault positioning method is characterized by comprising the following steps:
acquiring a PCIE slot state and a PCIE slot power supply voltage through a BMC;
and positioning the failure of the PCIE equipment according to the acquired PCIE slot state and the power supply voltage of the PCIE slot.
2. The PCIE device fault locating method of claim 1, wherein the BMC is configured with a PCIE slot status sensor;
the specific steps for acquiring the PCIE slot state by the BMC are as follows:
the BIOS collects PCIE slot states and reports the PCIE slot states to the BMC;
and the BMC assigns the PCIE slot state sensor according to the PCIE slot state reported by the BIOS.
3. The PCIE device fault location method of claim 2, wherein the BMC is configured with a PCIE slot supply voltage sensor;
the specific step of acquiring the power supply voltage of the PCIE slot by the BMC is as follows:
the BMC reads power supply voltage of the PCIE slot from the server mainboard;
and assigning a value to the PCIE slot power supply voltage sensor according to the read PCIE slot power supply voltage.
4. The PCIE device fault locating method of claim 3, wherein a motherboard is connected with the BMC through an LPC bus.
5. The method according to claim 3 or 4, wherein the step of locating the PCIE device fault according to the obtained PCIE slot state and the PCIE slot supply voltage specifically comprises:
judging whether the PCIE slot state is normal or not;
if not, judging that the PCIE slot has a fault; if the voltage is normal, detecting the power supply voltage of the PCIE slot;
if the power supply voltage of the PCIE slot is an invalid value, detecting identification information of the PCIE equipment under the system;
if the PCIE equipment in the system cannot be identified, the PCIE equipment is judged to be in fault;
if the power supply voltage of the PCIE slot is 0, detecting the identification information of the PCIE equipment under the system;
if the PCIE equipment in the system is normally identified, the PCIE equipment is judged to have no fault, but the PCIE equipment is not compatible with the CPU.
6. The method of claim 5, wherein when the PCIE device is a PCIE network card,
if the PCIE slot state is judged to be normal and the PCIE slot power supply voltage is detected to be neither 0 nor an invalid value, the network cable fault is judged.
7. The method of claim 5, wherein the PCIE device status is determined by reading assignments of PCIE slot status sensors.
8. The method of claim 5, wherein the PCIE device power supply voltage is detected by reading an assignment of a PCIE slot power supply voltage sensor.
9. A PCIE equipment fault locating device is characterized in that it includes,
an information acquisition module: acquiring a PCIE slot state and a PCIE slot power supply voltage through a BMC;
a fault positioning module: and positioning the failure of the PCIE equipment according to the acquired PCIE slot state and the power supply voltage of the PCIE slot.
10. The PCIE device fault location apparatus of claim 9, wherein the fault location module includes,
a PCIE slot state judging unit: judging whether the PCIE slot state is normal or not;
PCIE slot supply voltage detection unit: detecting a PCIE slot power supply voltage;
PCIE equipment identification information detection unit: detecting PCIE equipment identification information under a system;
a fault location unit: and positioning the failure of the PCIE equipment according to the judgment result of the PCIE slot state judgment unit, the detection result of the PCIE slot power supply voltage detection unit and the detection result of the PCIE equipment identification information detection unit.
CN202010386363.9A 2020-05-09 2020-05-09 PCIE equipment fault positioning method and device Withdrawn CN111694685A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010386363.9A CN111694685A (en) 2020-05-09 2020-05-09 PCIE equipment fault positioning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010386363.9A CN111694685A (en) 2020-05-09 2020-05-09 PCIE equipment fault positioning method and device

Publications (1)

Publication Number Publication Date
CN111694685A true CN111694685A (en) 2020-09-22

Family

ID=72477488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010386363.9A Withdrawn CN111694685A (en) 2020-05-09 2020-05-09 PCIE equipment fault positioning method and device

Country Status (1)

Country Link
CN (1) CN111694685A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114356644A (en) * 2022-03-18 2022-04-15 阿里巴巴(中国)有限公司 PCIE equipment fault processing method and device
CN117369906A (en) * 2023-12-07 2024-01-09 成都市楠菲微电子有限公司 Pcie verification platform, method and device, storage medium and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114356644A (en) * 2022-03-18 2022-04-15 阿里巴巴(中国)有限公司 PCIE equipment fault processing method and device
CN117369906A (en) * 2023-12-07 2024-01-09 成都市楠菲微电子有限公司 Pcie verification platform, method and device, storage medium and electronic equipment
CN117369906B (en) * 2023-12-07 2024-02-09 成都市楠菲微电子有限公司 Pcie verification platform, method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
US7356431B2 (en) Method for testing an input/output functional board
CN112286709B (en) Diagnosis method, diagnosis device and diagnosis equipment for server hardware faults
CN104850485A (en) BMC based method and system for remote diagnosis of server startup failure
CN106055438A (en) Method and system for rapidly locating anomaly of memory banks on mainboard
CN111694685A (en) PCIE equipment fault positioning method and device
US9274174B2 (en) Processor TAP support for remote services
CN106407059A (en) Server node testing system and method
CN106681878A (en) Method for testing PCIE channel bandwidth
CN110618909B (en) Fault positioning method, device, equipment and storage medium based on I2C communication
CN211505789U (en) PCIE board card testing arrangement
CN112000535A (en) SAS Expander card-based hard disk abnormity identification method and processing method
US9158646B2 (en) Abnormal information output system for a computer system
US8391162B2 (en) Apparatus and method for testing SMNP cards
CN113868058A (en) Peripheral component high-speed interconnection equipment fault detection method and device and server
CN210666750U (en) Memory fault alarm device
CN115934446A (en) Self-checking method, server, equipment and storage medium
CN104571098B (en) Long-range self-diagnosing method based on Atom platforms
CN114924998B (en) Memory information reading device and method, computing device motherboard, device and medium
CN111487487A (en) Device ID identification method and system based on ADC sampling and electronic device
CN101464828A (en) Evaluation method for main unit and its status
CN112579366A (en) Hard disk in-place detection system
CN203573310U (en) System for detecting installation fault of memory bank
CN101452417A (en) Monitor method and monitor device thereof
CN102053888A (en) Self-checking method and system for arithmetic device
CN213241134U (en) Production detection equipment for solid state disk

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200922

WW01 Invention patent application withdrawn after publication