CN111694685A - PCIE equipment fault positioning method and device - Google Patents
PCIE equipment fault positioning method and device Download PDFInfo
- Publication number
- CN111694685A CN111694685A CN202010386363.9A CN202010386363A CN111694685A CN 111694685 A CN111694685 A CN 111694685A CN 202010386363 A CN202010386363 A CN 202010386363A CN 111694685 A CN111694685 A CN 111694685A
- Authority
- CN
- China
- Prior art keywords
- pcie
- pcie slot
- supply voltage
- power supply
- slot
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000001514 detection method Methods 0.000 claims description 23
- 238000012795 verification Methods 0.000 abstract description 6
- 238000012360 testing method Methods 0.000 abstract description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000012827 research and development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
A PCIE equipment fault positioning method and device are disclosed, wherein a PCIE slot state and a PCIE slot power supply voltage are obtained through a BMC; and positioning the failure of the PCIE equipment according to the acquired PCIE slot state and the power supply voltage of the PCIE slot. According to the method, the PCIE slot state and the PCIE slot power consumption are obtained to serve as fault location judgment bases, the PCIE slot information is automatically obtained through the BMC, the comparison and verification are not needed, convenience and convenience are achieved, a large amount of manpower and material resources are saved, the testing efficiency is improved, and the testing accuracy is high.
Description
Technical Field
The invention relates to the field of PCIE equipment fault positioning, in particular to a PCIE equipment fault positioning method and a PCIE equipment fault positioning device.
Background
When a PCIE device supporting a PCIE protocol in a development stage is installed on a server in the development stage, compatibility, stability, and performance of the PCIE device need to be verified. For PCIE devices with indicator lights, a green indicator light may be turned on or flash at high frequency under normal operating conditions, and if the PCIE devices cannot operate normally, a red indicator light is turned on or turned off. In the research and development stage, it is not accurate enough to unilaterally judge that the PCIE device is faulty or damaged only by turning on the red light or not turning on the light.
Taking a PCIE network card as an example, possible reasons when the indicator light is not on include: network card failure, poor compatibility of the network card and a CPU, network cable failure or the three conditions exist simultaneously. The prior art generally determines the ultimate failure by comparison verification:
(1) using a CPU with good compatibility, a network cable without faults, observing the indicator light by using the network card without lighting, and if the network card is still not lighting, indicating that the network card has faults; if the indicator light is green and normally on, the network card fault is eliminated; further verification of network cable faults or poor compatibility of the network card and the CPU is required;
(2) on the premise of 1, a CPU with good compatibility is used, a network cable used by the network card is not lighted, the network card observes an indicator light, and if the network cable is still not lighted, the network cable fault is indicated; if the indicator light is green and normally on, the network cable fault is eliminated, and only the last possible network card is left with poor compatibility with the CPU.
On the one hand, the verification method in the prior art is large in workload, needs enough material resources and manpower, and is a small cost for the research and development of the server, and on the other hand, the method is a white box verification method, causes of non-lighting are not searched from PCIE equipment and a CPU, and causes possibly caused by empirical analysis are only considered, so that the possibility is possibly considered, or direction errors are considered.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method and an apparatus for locating a failure of a PCIE device, where PCIE slot information is acquired through a BMC, a failure type of the PCIE device is automatically determined, manpower and material resources are saved, and a location result is accurate.
The technical scheme of the invention is as follows: a PCIE equipment fault positioning method comprises the following steps:
acquiring a PCIE slot state and a PCIE slot power supply voltage through a BMC;
and positioning the failure of the PCIE equipment according to the acquired PCIE slot state and the power supply voltage of the PCIE slot.
Further, the BMC is configured with a PCIE slot state sensor;
the specific steps for acquiring the PCIE slot state by the BMC are as follows:
the BIOS collects PCIE slot states and reports the PCIE slot states to the BMC;
and the BMC assigns the PCIE slot state sensor according to the PCIE slot state reported by the BIOS.
Further, the BMC is configured with a PCIE slot power supply voltage sensor;
the specific step of acquiring the power supply voltage of the PCIE slot by the BMC is as follows:
the BMC reads power supply voltage of the PCIE slot from the server mainboard;
and assigning a value to the PCIE slot power supply voltage sensor according to the read PCIE slot power supply voltage.
Furthermore, the mainboard is connected with the BMC through an LPC bus.
Further, the PCIE device fault is located according to the obtained PCIE slot state and the PCIE slot power supply voltage, which specifically includes:
judging whether the PCIE slot state is normal or not;
if not, judging that the PCIE slot has a fault; if the voltage is normal, detecting the power supply voltage of the PCIE slot;
if the power supply voltage of the PCIE slot is an invalid value, detecting identification information of the PCIE equipment under the system;
if the PCIE equipment in the system cannot be identified, the PCIE equipment is judged to be in fault;
if the power supply voltage of the PCIE slot is 0, detecting the identification information of the PCIE equipment under the system;
if the PCIE equipment in the system is normally identified, the PCIE equipment is judged to have no fault, but the PCIE equipment is not compatible with the CPU.
Further, when the PCIE device is a PCIE network card,
if the PCIE slot state is judged to be normal and the PCIE slot power supply voltage is detected to be neither 0 nor an invalid value, the network cable fault is judged.
Further, the PCIE slot state is judged by reading the assignment of the PCIE slot state sensor.
Further, the power supply voltage of the PCIE slot is detected by reading the assignment of the power supply voltage sensor of the PCIE slot.
The technical scheme of the invention also comprises a PCIE equipment fault positioning device, which comprises,
an information acquisition module: acquiring a PCIE slot state and a PCIE slot power supply voltage through a BMC;
a fault positioning module: and positioning the failure of the PCIE equipment according to the acquired PCIE slot state and the power supply voltage of the PCIE slot.
Further, the fault location module includes,
a PCIE slot state judging unit: judging whether the PCIE slot state is normal or not;
PCIE slot supply voltage detection unit: detecting a PCIE slot power supply voltage;
PCIE equipment identification information detection unit: detecting PCIE equipment identification information under a system;
a fault location unit: and positioning the failure of the PCIE equipment according to the judgment result of the PCIE slot state judgment unit, the detection result of the PCIE slot power supply voltage detection unit and the detection result of the PCIE equipment identification information detection unit.
According to the method and the device for locating the fault of the PCIE equipment, the PCIE slot state and the PCIE slot power supply voltage are obtained to serve as the fault locating judgment basis, the PCIE slot information is automatically obtained through the BMC, the comparison and verification are not needed, convenience and rapidness are achieved, a large amount of manpower and material resources are saved, the testing efficiency is improved, and the testing accuracy is high.
Drawings
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a specific implementation method for a failure of a PCIE device according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a second structure according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings by way of specific examples, which are illustrative of the present invention and are not limited to the following embodiments.
Example one
As shown in fig. 1, this embodiment provides a method for locating a failure of a PCIE device, including the following steps:
s1, acquiring the PCIE slot state and the PCIE slot power supply voltage through the BMC;
and S2, locating the failure of the PCIE equipment according to the acquired PCIE slot state and the power supply voltage of the PCIE slot.
In the field of servers, devices of a PCIE (peripheral component interconnect express, a high-speed serial computer extended bus standard) protocol can be plugged and unplugged, and are not integrated on a motherboard, so that a Baseboard Management Controller (BMC) cannot be directly connected to the PCIE device on hardware, and it is even impossible to obtain specific information of the PCIE device through the BMC. In order to realize automatic positioning of a PCIE device failure, in this embodiment, the PCIE slot information is acquired by the BMC, which includes a PCIE slot state and a PCIE slot power supply voltage, to indirectly determine the PCIE device failure, so that the failure can be automatically positioned, and the accuracy of positioning is also ensured.
In this embodiment, the BMC is configured with a PCIE slot state sensor, and after acquiring the PCIE slot state, the BMC assigns a value to the PCIE slot state sensor to provide a basis for subsequently locating a fault. Specifically, the step of acquiring the PCIE slot state by the BMC is specifically:
the BIOS collects PCIE slot states and reports the PCIE slot states to the BMC;
and the BMC assigns the PCIE slot state sensor according to the PCIE slot state reported by the BIOS.
It should be noted that the assignment of the PCIE slot state sensor may include two alarm values: 00h-FaultStatus asserted, 01h-Identify Status asserted. Wherein 00h-Fault Status asserted indicates that the state of the PCIE slot is abnormal, and 01h-Identify Status asserted indicates that the state of the PCIE slot is normal.
In this embodiment, the BMC is further configured with a PCIE slot power supply voltage sensor, and the BMC assigns a value to the PCIE slot power supply voltage sensor after acquiring the PCIE slot power supply voltage, so as to provide a basis for subsequent positioning of a fault. Specifically, the step of acquiring the power supply voltage of the PCIE slot by the BMC is specifically:
the BMC reads power supply voltage of the PCIE slot from the server mainboard;
and assigning a value to the PCIE slot power supply voltage sensor according to the read PCIE slot power supply voltage.
It should be noted that the power supply voltage for the PCIE slot by the server motherboard is 12V and 3.3V, and the BMC in this embodiment acquires the power supply voltage for the PCIE slot by the server motherboard. It should be noted that the server motherboard is connected to the BMC through the LPC bus, and sends information to the BMC. If the total value X of the power supply voltage is in the range of 0-15.3, null is sent to the BMC, and the BMC assigns the value of the PCIE slot power supply voltage sensor to be N/A, which indicates that the PCIE equipment is not powered on. And if the total value X of the power supply voltage is within the range of 15.3-16.3, sending 0 to the BMC, and assigning the value of the power supply voltage sensor of the PCIE slot to be 0 by the BMC, which indicates that the PCIE device is on. And if the total power supply voltage value X is larger than 16.3, sending Y (Y = X-16.3) to the BMC, and assigning the value of the power supply voltage sensor of the PCIE slot to be Y by the BMC. It should be noted that, when there is an external PCIE device in a PCIE slot, the power supply voltage of the PCIE slot may increase, but if the external PCIE device cannot normally operate. It means that the additional power is insufficient but not exceeding 1V, so to define the total value X in the range of 15.3-16.3, send 0 to BMC, and above 16.3, send Y (Y = X-16.3) to BMC.
As shown in fig. 2, in this embodiment, the method for locating a failure of a PCIE device according to the obtained PCIE slot state and the PCIE slot power supply voltage specifically includes the following steps:
s201, judging whether the PCIE slot state is normal or not;
s202, if the PCIE slot is abnormal, judging that the PCIE slot is in fault; if the voltage is normal, detecting the power supply voltage of the PCIE slot;
s203, if the power supply voltage of the PCIE slot is an invalid value, detecting the identification information of the PCIE equipment in the system;
s204, if the PCIE equipment in the system can not be identified, the PCIE equipment is judged to be in fault;
s205, if the power supply voltage of the PCIE slot is 0, detecting the identification information of the PCIE equipment in the system;
and S206, if the PCIE equipment in the system is normally identified, the PCIE equipment is judged to have no fault, but the PCIE equipment is incompatible with the CPU.
In addition, when the PCIE device is a PCIE network card, if the PCIE slot state is determined to be normal and the PCIE slot power supply voltage is detected to be neither 0 nor an invalid value, it is determined that the network cable is faulty.
It should be noted that, in this embodiment, the PCIE slot state is determined by reading the assignment of the PCIE slot state sensor. And detecting the power consumption of the PCIE slot by reading the assignment of the PCIE slot power supply voltage sensor.
Specifically, if the PCIE slot state sensor is 00h-Fault Status asserted, the PCIE slot is failed; if the idle state is 01 h-idle state asserted, the PCIE slot has no failure.
Reading the assignment of a PCIE slot power supply voltage sensor under the condition that a PCIE slot state sensor is 01 h-identity Status asserted, and if the value is N/A and PCIE equipment cannot be identified under the system, determining that the PCIE equipment per se has a fault; if the value is 0 and the identification of the PCIE equipment in the system is normal, the PCIE equipment is determined to have no fault, but the PCIE equipment is not compatible with the CPU and can only be identified. When the PCIE equipment is a PCIE network card, if the value of a power supply voltage sensor of the PCIE slot is not 0 or N/A, the network card is determined to have no fault, the network card is compatible with a CPU, and a network cable has a fault.
Example two
As shown in fig. 3, the present embodiment provides a PCIE device failure location apparatus, which includes the following functional modules.
The information acquisition module 11: acquiring a PCIE slot state and a PCIE slot power supply voltage through a BMC;
the fault location module 22: and positioning the failure of the PCIE equipment according to the acquired PCIE slot state and the power supply voltage of the PCIE slot.
The device acquires the PCIE slot information through the BMC, wherein the PCIE slot information comprises the PCIE slot state and the PCIE slot power supply voltage, so that the failure of the PCIE device is indirectly judged, the failure can be automatically positioned, and the positioning accuracy is ensured.
The fault location module 22 includes the following functional units to implement fault location.
A PCIE slot state judging unit: judging whether the PCIE slot state is normal or not;
PCIE slot supply voltage detection unit: detecting a PCIE slot power supply voltage;
PCIE equipment identification information detection unit: detecting PCIE equipment identification information under a system;
a fault location unit: and positioning the failure of the PCIE equipment according to the judgment result of the PCIE slot state judgment unit, the detection result of the PCIE slot power supply voltage detection unit and the detection result of the PCIE equipment identification information detection unit.
Specifically, when the PCIE slot state determination unit determines that the PCIE slot is abnormal according to the PCIE slot state obtained by the BMC, the failure location unit determines that the PCIE slot has a failure.
The PCIE slot state judging unit judges that the PCIE slot is normal according to the PCIE slot state obtained by the BMC, (1) the PCIE slot power supply voltage detecting unit detects that the PCIE slot power supply voltage is an invalid value, and the PCIE equipment cannot be identified under the detection system of the PCIE equipment identification information detecting unit, then the fault locating unit judges that the PCIE equipment is in fault; (2) the PCIE slot power supply voltage detection unit detects that the PCIE slot power supply voltage is 0, and the PCIE equipment is normally identified under the detection system of the PCIE equipment identification information detection unit, the fault location unit judges that the PCIE equipment has no fault, but the PCIE equipment is not compatible with the CPU; (3) under the condition that the PCIE equipment is a network card, the PCIE slot power supply voltage detection unit detects that the PCIE slot power supply voltage is not 0 or is not an invalid value, and then the network cable fault is determined.
The above disclosure is only for the preferred embodiments of the present invention, but the present invention is not limited thereto, and any non-inventive changes that can be made by those skilled in the art and several modifications and amendments made without departing from the principle of the present invention shall fall within the protection scope of the present invention.
Claims (10)
1. A PCIE equipment fault positioning method is characterized by comprising the following steps:
acquiring a PCIE slot state and a PCIE slot power supply voltage through a BMC;
and positioning the failure of the PCIE equipment according to the acquired PCIE slot state and the power supply voltage of the PCIE slot.
2. The PCIE device fault locating method of claim 1, wherein the BMC is configured with a PCIE slot status sensor;
the specific steps for acquiring the PCIE slot state by the BMC are as follows:
the BIOS collects PCIE slot states and reports the PCIE slot states to the BMC;
and the BMC assigns the PCIE slot state sensor according to the PCIE slot state reported by the BIOS.
3. The PCIE device fault location method of claim 2, wherein the BMC is configured with a PCIE slot supply voltage sensor;
the specific step of acquiring the power supply voltage of the PCIE slot by the BMC is as follows:
the BMC reads power supply voltage of the PCIE slot from the server mainboard;
and assigning a value to the PCIE slot power supply voltage sensor according to the read PCIE slot power supply voltage.
4. The PCIE device fault locating method of claim 3, wherein a motherboard is connected with the BMC through an LPC bus.
5. The method according to claim 3 or 4, wherein the step of locating the PCIE device fault according to the obtained PCIE slot state and the PCIE slot supply voltage specifically comprises:
judging whether the PCIE slot state is normal or not;
if not, judging that the PCIE slot has a fault; if the voltage is normal, detecting the power supply voltage of the PCIE slot;
if the power supply voltage of the PCIE slot is an invalid value, detecting identification information of the PCIE equipment under the system;
if the PCIE equipment in the system cannot be identified, the PCIE equipment is judged to be in fault;
if the power supply voltage of the PCIE slot is 0, detecting the identification information of the PCIE equipment under the system;
if the PCIE equipment in the system is normally identified, the PCIE equipment is judged to have no fault, but the PCIE equipment is not compatible with the CPU.
6. The method of claim 5, wherein when the PCIE device is a PCIE network card,
if the PCIE slot state is judged to be normal and the PCIE slot power supply voltage is detected to be neither 0 nor an invalid value, the network cable fault is judged.
7. The method of claim 5, wherein the PCIE device status is determined by reading assignments of PCIE slot status sensors.
8. The method of claim 5, wherein the PCIE device power supply voltage is detected by reading an assignment of a PCIE slot power supply voltage sensor.
9. A PCIE equipment fault locating device is characterized in that it includes,
an information acquisition module: acquiring a PCIE slot state and a PCIE slot power supply voltage through a BMC;
a fault positioning module: and positioning the failure of the PCIE equipment according to the acquired PCIE slot state and the power supply voltage of the PCIE slot.
10. The PCIE device fault location apparatus of claim 9, wherein the fault location module includes,
a PCIE slot state judging unit: judging whether the PCIE slot state is normal or not;
PCIE slot supply voltage detection unit: detecting a PCIE slot power supply voltage;
PCIE equipment identification information detection unit: detecting PCIE equipment identification information under a system;
a fault location unit: and positioning the failure of the PCIE equipment according to the judgment result of the PCIE slot state judgment unit, the detection result of the PCIE slot power supply voltage detection unit and the detection result of the PCIE equipment identification information detection unit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010386363.9A CN111694685A (en) | 2020-05-09 | 2020-05-09 | PCIE equipment fault positioning method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010386363.9A CN111694685A (en) | 2020-05-09 | 2020-05-09 | PCIE equipment fault positioning method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111694685A true CN111694685A (en) | 2020-09-22 |
Family
ID=72477488
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010386363.9A Withdrawn CN111694685A (en) | 2020-05-09 | 2020-05-09 | PCIE equipment fault positioning method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111694685A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114356644A (en) * | 2022-03-18 | 2022-04-15 | 阿里巴巴(中国)有限公司 | PCIE equipment fault processing method and device |
CN117369906A (en) * | 2023-12-07 | 2024-01-09 | 成都市楠菲微电子有限公司 | Pcie verification platform, method and device, storage medium and electronic equipment |
-
2020
- 2020-05-09 CN CN202010386363.9A patent/CN111694685A/en not_active Withdrawn
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114356644A (en) * | 2022-03-18 | 2022-04-15 | 阿里巴巴(中国)有限公司 | PCIE equipment fault processing method and device |
CN117369906A (en) * | 2023-12-07 | 2024-01-09 | 成都市楠菲微电子有限公司 | Pcie verification platform, method and device, storage medium and electronic equipment |
CN117369906B (en) * | 2023-12-07 | 2024-02-09 | 成都市楠菲微电子有限公司 | Pcie verification platform, method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7356431B2 (en) | Method for testing an input/output functional board | |
CN112286709B (en) | Diagnosis method, diagnosis device and diagnosis equipment for server hardware faults | |
CN104850485A (en) | BMC based method and system for remote diagnosis of server startup failure | |
CN106055438A (en) | Method and system for rapidly locating anomaly of memory banks on mainboard | |
CN111694685A (en) | PCIE equipment fault positioning method and device | |
US9274174B2 (en) | Processor TAP support for remote services | |
CN106407059A (en) | Server node testing system and method | |
CN106681878A (en) | Method for testing PCIE channel bandwidth | |
CN110618909B (en) | Fault positioning method, device, equipment and storage medium based on I2C communication | |
CN211505789U (en) | PCIE board card testing arrangement | |
CN112000535A (en) | SAS Expander card-based hard disk abnormity identification method and processing method | |
US9158646B2 (en) | Abnormal information output system for a computer system | |
US8391162B2 (en) | Apparatus and method for testing SMNP cards | |
CN113868058A (en) | Peripheral component high-speed interconnection equipment fault detection method and device and server | |
CN210666750U (en) | Memory fault alarm device | |
CN115934446A (en) | Self-checking method, server, equipment and storage medium | |
CN104571098B (en) | Long-range self-diagnosing method based on Atom platforms | |
CN114924998B (en) | Memory information reading device and method, computing device motherboard, device and medium | |
CN111487487A (en) | Device ID identification method and system based on ADC sampling and electronic device | |
CN101464828A (en) | Evaluation method for main unit and its status | |
CN112579366A (en) | Hard disk in-place detection system | |
CN203573310U (en) | System for detecting installation fault of memory bank | |
CN101452417A (en) | Monitor method and monitor device thereof | |
CN102053888A (en) | Self-checking method and system for arithmetic device | |
CN213241134U (en) | Production detection equipment for solid state disk |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20200922 |
|
WW01 | Invention patent application withdrawn after publication |