CN115878430A - PCIE equipment fault monitoring method and device, communication equipment and storage medium - Google Patents

PCIE equipment fault monitoring method and device, communication equipment and storage medium Download PDF

Info

Publication number
CN115878430A
CN115878430A CN202211685851.5A CN202211685851A CN115878430A CN 115878430 A CN115878430 A CN 115878430A CN 202211685851 A CN202211685851 A CN 202211685851A CN 115878430 A CN115878430 A CN 115878430A
Authority
CN
China
Prior art keywords
pcie
number information
target
fault
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211685851.5A
Other languages
Chinese (zh)
Inventor
邱豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202211685851.5A priority Critical patent/CN115878430A/en
Publication of CN115878430A publication Critical patent/CN115878430A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention provides a PCIE equipment fault monitoring method, a PCIE equipment fault monitoring device, communication equipment and a storage medium, which are applied to a Baseboard Management Controller (BMC), wherein the method comprises the following steps: under the condition that a fault report sent by a Basic Input Output System (BIOS) is detected to be received, acquiring target serial number information corresponding to the fault report; determining a target identifier according to the target sequence number information in a PCIE equipment mapping relation table; and acquiring PCIE equipment information corresponding to the fault report according to the target serial number information and the target identifier. According to the method and the device, the PCIE equipment information corresponding to the fault report is obtained through the serial number information and the identifier information, the state of the specific PCIE equipment in the server based on the PCIE Switch architecture is effectively monitored in real time, and good fault monitoring and diagnosis means are provided for the operation and maintenance work after the server is on line and mass production.

Description

PCIE equipment fault monitoring method and device, communication equipment and storage medium
Technical Field
The present invention relates to the technical field of device fault detection, and in particular, to a method and an apparatus for monitoring a fault of a PCIE device, a communication device, and a storage medium.
Background
The PCIE device is one of the most common peripheral interfaces of the server, and a large number of components including a network card, a Raid card, an FPGA card, a GPU card, an NVME hard disk, and the like are all applied to the server system as peripheral devices through the PCIE interface, so when the PCIE device fails, the normal operation of the server is easily affected.
At present, PCIE equipment failure monitoring is carried out by combining failure information recorded in a black box log with personnel, however, the PCIE equipment cannot be monitored in real time by carrying out troubleshooting through the black box log, information leakage is easily caused, and the risk that the server is influenced because the PCIE equipment failure cannot be timely debugged still exists.
Disclosure of Invention
An object of the embodiments of the present invention is to provide a method and an apparatus for monitoring a failure of a PCIE device, a communication device, and a storage medium, so as to solve a problem that a server is affected because a failure of a PCIE device cannot be timely checked because the PCIE device cannot be monitored in real time, and a specific technical scheme is as follows:
in a first aspect of the present invention, a PCIE device failure monitoring method is first provided, where the method is applied to a BMC, and the method includes:
under the condition that a fault report sent by a Basic Input Output System (BIOS) is detected to be received, acquiring target serial number information corresponding to the fault report;
determining a target identifier according to the target sequence number information in a PCIE equipment mapping relation table;
and acquiring PCIE equipment information corresponding to the fault report according to the target serial number information and the target identifier.
Optionally, before the step of acquiring, when it is detected that a fault report sent by the BIOS is received, target sequence number information corresponding to the fault report, the method further includes:
acquiring serial number information of PCIE equipment in a server;
and matching the serial number information with a preset identifier to generate a PCIE equipment mapping relation table.
Optionally, after the step of obtaining the PCIE device information corresponding to the failure report according to the target sequence number information and the target identifier, the method further includes:
and when the register information is detected to be abnormal, performing adaptive correction on the PCIE equipment corresponding to the register information according to a preset PCIE standard, wherein the PCIE standard is generated based on historical fault information of the PCIE equipment.
Optionally, the obtaining, according to the target sequence number information and the target identifier, PCIE device information corresponding to the failure report includes:
generating an IPMI command according to the target serial number information and the target identifier;
and acquiring register information of the PCIE equipment corresponding to the fault report and a fault position according to the IPMI command.
Optionally, after the step of obtaining the PCIE device information corresponding to the failure report according to the target sequence number information and the target identifier, the method further includes:
and visually displaying the PCIE equipment information on a preset management interface of the BMC.
Optionally, the method further comprises:
under the condition that a fault report sent by the BIOS is detected to be received, analyzing and processing a fault type corresponding to the fault report to generate an analysis result;
and sending the analysis result to a Complex Programmable Logic Device (CPLD) for the CPLD to carry out early warning information indication according to the fault type.
Optionally, after the step of obtaining the PCIE device information corresponding to the failure report according to the target sequence number information and the target identifier, the method includes:
and generating a black box log according to the register information in the PCIE equipment information, and storing the black box log.
In a second aspect of the present invention, there is also provided a PCIE device failure monitoring apparatus, which is applied to a BMC, and includes:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring target serial number information corresponding to a fault report when the fault report sent by a Basic Input Output System (BIOS) is detected to be received;
a determining module, configured to determine a target identifier according to the target sequence number information in a PCIE device mapping relationship table;
and a second obtaining module, configured to obtain, according to the target sequence number information and the target identifier, PCIE device information corresponding to the failure report.
Optionally, the apparatus further comprises:
the first module is used for acquiring serial number information of PCIE equipment in the server;
and the second module is used for matching the serial number information with a preset identifier to generate a PCIE equipment mapping relation table.
Optionally, the apparatus further comprises:
and a third module, configured to, when it is detected that the register information is abnormal, perform adaptive correction on a PCIE device corresponding to the register information according to a preset PCIE specification standard, where the PCIE specification standard is generated based on historical failure information of the PCIE device.
Optionally, the second obtaining module includes:
the first submodule is used for generating an IPMI command according to the target serial number information and the target identifier;
and the second sub-module is used for acquiring the register information of the PCIE equipment corresponding to the fault report and the fault position according to the IPMI command.
Optionally, the apparatus further comprises:
and the fourth module is used for visually displaying the PCIE equipment information on a preset management interface of the BMC.
Optionally, the apparatus further comprises:
a fifth module, configured to, when it is detected that a fault report sent by the BIOS is received, analyze a fault type corresponding to the fault report, and generate an analysis result;
and the sixth module is used for sending the analysis result to a Complex Programmable Logic Device (CPLD) so that the CPLD can indicate early warning information according to the fault type.
Optionally, the apparatus further comprises:
and the seventh module is configured to generate a black box log according to the register information in the PCIE device information, and store the black box log.
In a third aspect of the present invention, there is also provided a communication device, including: a transceiver, a memory, a processor, and a program stored on the memory and executable on the processor;
the processor is configured to read a program in the memory to implement any one of the PCIE device failure monitoring methods described above.
In a fourth aspect of the embodiment of the present invention, a computer-readable storage medium is further provided, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer is caused to execute any one of the above PCIE device failure monitoring methods.
According to the PCIE equipment fault monitoring method provided by the embodiment of the invention, under the condition that a fault report sent by a Basic Input Output System (BIOS) is detected to be received, target serial number information corresponding to the fault report is obtained; determining a target identifier according to the target sequence number information in a PCIE equipment mapping relation table; and acquiring PCIE equipment information corresponding to the fault report according to the target serial number information and the target identifier. In the embodiment of the invention, the target serial number information is obtained by receiving the fault report, the identifier is searched in the pre-generated mapping relation table, and the PCIE equipment information corresponding to the fault report is obtained according to the serial number information and the identifier information, so that the state of the specific PCIE equipment in the server based on the PCIE Switch architecture is effectively monitored in real time, and a good fault monitoring and diagnosing means is provided for the operation and maintenance work after the server is on-line and even mass production.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a first flowchart illustrating a step of a method for monitoring a failure of a PCIE device according to an embodiment of the present invention;
fig. 2 is a flowchart of a second step of a PCIE device failure monitoring method provided in the embodiment of the present invention;
fig. 3 is a flowchart illustrating a third step of the method for monitoring a failure of a PCIE device according to the embodiment of the present invention;
fig. 4 is a first block diagram of a device for monitoring a failure of a PCIE device according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a communication device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present invention, and the embodiments may be mutually incorporated and referred to without contradiction.
Referring to fig. 1, a first flowchart illustrating a first step of a PCIE device failure monitoring method provided in an embodiment of the present invention is applied to a BMC, where the method may include:
step 101, when detecting that a fault report sent by a basic input output system BIOS is received, acquiring target serial number information corresponding to the fault report.
It should be noted that, in the embodiment of the present invention, a Basic Input Output System (BIOS), which is a set of programs solidified on a ROM chip on a main board in a computer, stores the most important Basic Input and Output programs of the computer, system setting information, a self-test after power on program, and a System self-boot program. The BIOS can receive Advanced Error Reporting (AER), and the AER is the Advanced characteristic of PCIE and is used for Reporting PCIE Error information.
It should be noted that, in the embodiment of the present invention, it is to be noted that the PCIE device needs to satisfy the capability of reporting the failure, that is, the PCIE Switch firmware needs to support the AER, and both the Switch error and the PCI endpoint error below the Switch belong to the standard PCI error.
Further, a PCIE Switch provides expansion or aggregation capability, allows more devices to be connected to a PCle port, can serve as a packet router, identifies which path a given packet needs to go along based on address or other routing information, and is a PCIE-to-PCIE bridge.
Therefore, when PCIE ERROR occurs, that is, when PCIE fault occurs, PCIE reports to a BIOS, and the BIOS sends the fault report to a substrate Management Controller (BMC), where hardware in the BMC uses an SOC chip with arm9 as a core, and software is developed based on base code provided by AMI or Avocent. The BMC mainly provides monitoring of temperature, voltage, fans and buses for the server and provides a management interface so that a user can remotely control the server through the BMC.
It should be noted that, in the embodiment of the present invention, the failure report is an AER, the target serial number information is a serial number corresponding to the AER, the serial number refers to a unique identifier of the PCIE Device and may be represented as a BDF (Bus Device Function), and each PCIE Device may have only one Function, that is, fun0. It is also possible to have up to 8 functions, i.e. a multi-function device. Regardless of how many functions the PCIE device has, each function has a unique and independent configuration space corresponding to it, and like the PCIE bus, each function in the PCIE bus has a unique identifier corresponding to it. This identifier is the BDF, and the PCIE configuration software should be able to identify the topology logic of the entire PCIE Bus system, as well as each Bus (Bus), each Device (Device) and each Function (Function) therein.
In addition, besides the serial number information, an identifier is also provided for the PCIE devices to indicate, where the identifier is an identifier preset for each PCIE device before the BMC allocates the BDF to the PCIE device, and is used to assist the operation and maintenance personnel to find the PCIE hardware location more quickly. The identifier may be represented as a BOARD ID, and a corresponding relationship needs to be established between the BDF and the BOARD ID, and specific details of how the relationship is established may be described in detail with reference to the following embodiments.
Therefore, when detecting that a fault report sent by the BIOS is received, the BMC needs to obtain target serial number information corresponding to the fault report at this time, and it should be noted that the target serial number information is serial number information included in the fault report in the serial number information.
And step 102, determining a target identifier in a PCIE equipment mapping relation table according to the target serial number information.
It should be noted that, in step 101, in the embodiment of the present invention, the target serial number information is obtained, and the target serial number information may be searched in a PCIE device mapping relationship table according to the target serial number information, where a one-to-one correspondence relationship between BDF and BOARD ID is stored in the PCIE device mapping relationship table, and therefore, the target identifier corresponding to the target serial number information may be found in the PCIE device mapping relationship table according to the target serial number information.
And 103, acquiring PCIE equipment information corresponding to the fault report according to the target serial number information and the target identifier.
It should be noted that, in the embodiment of the present invention, after the target serial number information and the target identifier are determined, the PCIE device corresponding to the failure report may be determined, and then the PCIE device information is obtained.
Further, the obtaining, according to the target sequence number information and the target identifier, PCIE device information corresponding to the failure report includes: generating an IPMI command according to the target serial number information and the target identifier; and acquiring register information of the PCIE equipment corresponding to the fault report and a fault position according to the IPMI command.
It should be noted that, in the embodiment of the present invention, the PCIE device information includes register information and a fault location, and therefore, the target serial number information and the target identifier are written into the IPMI command, where IPMI is a command language of PCIE Switch, and an operation and maintenance worker can write the IPMI command by writing a script to view the PCIE device information, and therefore, the IPMI command can be generated according to the target serial number information and the target identifier; and checking register information of the PCIE equipment corresponding to the fault report and the fault position according to the IPMI command.
In the method for monitoring the failure of the PCIE device provided in the embodiment of the present invention, when detecting that a failure report sent by a BIOS is received, target sequence number information corresponding to the failure report is obtained; determining a target identifier according to the target sequence number information in a PCIE equipment mapping relation table; and acquiring PCIE equipment information corresponding to the fault report according to the target serial number information and the target identifier. In the embodiment of the invention, the target serial number information is obtained by receiving the fault report, the identifier is searched in the pre-generated mapping relation table, and the PCIE equipment information corresponding to the fault report is obtained according to the serial number information and the identifier information, so that the state of the specific PCIE equipment in the server based on the PCIE Switch architecture is effectively monitored in real time, and a good fault monitoring and diagnosing means is provided for the operation and maintenance work after the server is on-line and even mass production.
Referring to fig. 2, a flowchart showing a second step of the PCIE device failure monitoring method provided in the embodiment of the present invention is applied to a baseboard management controller BMC, where the method may include:
and 001, acquiring serial number information of PCIE equipment in the server.
And 002, matching the serial number information with a preset identifier to generate a PCIE device mapping relationship table.
It should be noted that, in the above steps 001-002, the BMC may first match the BOARD ID and the BDF of the PCIE Switch through the uart serial port, where the serial number refers to a unique identifier of the PCIE Device and may be represented as a BDF (Bus Device Function), and each PCIE Device may have only one Function, that is, fun0. It is also possible to have a maximum of 8 functions, i.e. a multi-function device. Regardless of how many functions the PCIE device has, each function has a unique and independent configuration space corresponding to it, and like the PCIE bus, each function in the PCIE bus has a unique identifier corresponding to it. The identifier is the BDF, the PCIE configuration software should have the capability of identifying the topology logic of the entire PCIE Bus system, and each Bus (Bus), each Device (Device) and each Function (Function) therein, and the PCIE Device is also represented by an identifier, where the identifier is an identifier preset for each PCIE Device before the BMC allocates the BDF to the PCIE, and is used to assist the operation and maintenance staff to find the PCIE hardware position faster, and the identifier may be represented as a bow ID.
Step 101, under the condition that a fault report sent by a basic input output system BIOS is detected to be received, target serial number information corresponding to the fault report is acquired.
And step 102, determining a target identifier in a PCIE equipment mapping relation table according to the target serial number information.
And 103, acquiring PCIE equipment information corresponding to the fault report according to the target serial number information and the target identifier.
It should be noted that, the above steps 101-103 are discussed with reference to the preamble, and are not described herein again.
Further, after the step of obtaining PCIE device information corresponding to the failure report according to the target serial number information and the target identifier, the method may further include: and when the information abnormality of the PCIE equipment is detected, performing self-adaptive correction on the PCIE equipment according to a preset PCIE standard, wherein the PCIE standard is generated based on the historical fault information of the PCIE equipment.
It should be noted that, in the embodiment of the present invention, after the PCIE device information is acquired, the PCIE device may be adaptively corrected, specifically, when a register value in the PCIE device information is abnormal, the PCIE device may be adaptively corrected according to a preset PCIE specification.
The adaptive correction is to perform neural network learning based on historical failure information of the PCIE equipment, and when the same or similar failure is encountered next time, the PCIE equipment can be automatically adjusted based on a historical processing method, so that the normal work of the server is prevented from being influenced.
Further, after the step of obtaining the PCIE device information corresponding to the failure report according to the target sequence number information and the target identifier, the method may further include: and visually displaying the PCIE equipment information on a preset management interface of the BMC.
It should be noted that, in the embodiment of the present invention, PCIE device information may also be visually displayed on a preset management interface of the BMC, where the preset management interface may be a BMC WEB or other preset WEB pages, and the PCIE device information is visually displayed through a code, which is convenient for an operation and maintenance worker to capture the PCIE device, locate the fault information, and display the fault through the preset management interface.
Further, after the step of obtaining the PCIE device information corresponding to the failure report according to the target sequence number information and the target identifier, the method may further include: and generating a black box log according to the register information in the PCIE equipment information, and storing the black box log.
It should be noted that, in the embodiment of the present invention, the information of the PCIE device register is collected to form the black box log, which is convenient for data analysis to the PCIE engineer, and in addition, the one-key log collection may be further collected, specifically, the BIOS may collect the log information and send the log information to the BMC.
It should be noted that the black box log may record a value of a register, and the register may be subsequently analyzed according to the value of the register. If the type of the failure information of the PCIE is an uncorrectable fault error, these errors may have a great influence on the system, and generally accompany with downtime, restart, kernel panel, and the like, so the BMC may control the display of the error indicator lamp besides recording the errors, thereby notifying the operation and maintenance of a serious failure and urgently requiring repair. If the analyzed errors are correct non-normal error and correct error, the BMC records the value of the register and the fault position information in the black box log.
In the method for monitoring the failure of the PCIE device provided in the embodiment of the present invention, when detecting that a failure report sent by a BIOS is received, target sequence number information corresponding to the failure report is obtained; determining a target identifier in a PCIE equipment mapping relation table according to the target sequence number information; and acquiring PCIE equipment information corresponding to the fault report according to the target serial number information and the target identifier. In the embodiment of the invention, the target serial number information is obtained by receiving the fault report, the identifier is searched in the pre-generated mapping relation table, and the PCIE equipment information corresponding to the fault report is obtained according to the serial number information and the identifier information, so that the state of the specific PCIE equipment in the server based on the PCIE Switch architecture is effectively monitored in real time, and a good fault monitoring and diagnosing means is provided for the operation and maintenance work after the server is on line and even mass production.
In addition, the embodiment of the invention can also realize the self-adaptive correction of PCIE fault equipment, visually display the PCIE equipment information on a preset management interface of the BMC and record black box logs, the BMC can monitor the equipment state of the PCIE equipment in real time through IPMI commands or web pages, and the abnormal processing such as self-adaptive correction, alarm, triggering one-key log collection and the like is carried out on related equipment when the abnormality is found. Therefore, the method is realized based on the interaction between the BMC and the PCIE Switch, and finally, real-time health monitoring is formed by displaying a BMC web front-end page or an IPMI command, when the register value is abnormal, self-adaptive correction is carried out according to PCIE standards, an operation log is recorded, current PCIE equipment configuration space data is extracted, and the interactive information is the BOARD ID and the BDF of the PCIE equipment, and the PCIE equipment configuration space corresponds to the register data. The method and the system can be applied to a server matched with a PCIE Switch architecture to realize all monitoring and management of PCIE equipment by a BMC, such as link state, DPC, hot plug, PCIE link training and the like.
Referring to fig. 3, a third step flowchart of the PCIE device failure monitoring method provided in the embodiment of the present invention is shown, and is applied to a baseboard management controller BMC, where the method may include:
step 201, under the condition that the fault report sent by the BIOS is detected to be received, analyzing the fault type corresponding to the fault report to generate an analysis result.
It should be noted that, in the embodiment of the present application, it can be known that when a PCIE device fails, the failure causes and locations may be multiple, and therefore, the failure information is analyzed and processed according to failure types, that is, the failure information is classified into different types, and the failure information can be quickly fed back to operation and maintenance personnel in combination with some early warning manners.
Or the operation and maintenance personnel can determine what fault type the fault occurs through the number of times of flashing of the indicator light, and the invention is not particularly limited and can be set according to actual requirements.
Step 202, sending the analysis result to a complex programmable logic device CPLD, so that the CPLD can perform early warning information indication according to the fault type.
It should be noted that, in the embodiment of the present invention, the analysis result is sent to a Complex Programmable Logic Device (CPLD), where the CPLD adopts programming technologies such as CMOS EPROM, EEPROM, flash memory, and SRAM, so as to form a Programmable Logic Device with high density, high speed, and low power consumption, and therefore, the CPLD may be scripted, so as to implement the indication of the early warning information according to the type of the fault.
According to the PCIE equipment fault monitoring method provided by the embodiment of the invention, under the condition that a fault report sent by a Basic Input Output System (BIOS) is detected to be received, target serial number information corresponding to the fault report is obtained; determining a target identifier according to the target sequence number information in a PCIE equipment mapping relation table; and acquiring PCIE equipment information corresponding to the fault report according to the target serial number information and the target identifier. In the embodiment of the invention, the target serial number information is obtained by receiving the fault report, the identifier is searched in the pre-generated mapping relation table, and the PCIE equipment information corresponding to the fault report is obtained according to the serial number information and the identifier information, so that the state of the specific PCIE equipment in the server based on the PCIE Switch architecture is effectively monitored in real time, and a good fault monitoring and diagnosing means is provided for the operation and maintenance work after the server is on-line and even mass production.
In addition, the embodiment of the invention can realize the early warning information indication according to the fault type by analyzing the type of the fault information and matching with the CPLD, and assists the operation and maintenance personnel to position the fault more quickly.
Referring to fig. 4, a block diagram of a first apparatus of a PCIE device failure monitoring method provided in the embodiment of the present invention is shown, where the first apparatus is applied to a BMC, and the first apparatus may include:
a first obtaining module 301, configured to obtain target sequence number information corresponding to a fault report sent by a BIOS when detecting that the fault report is received;
a determining module 302, configured to determine a target identifier according to the target sequence number information in a PCIE device mapping relationship table;
a second obtaining module 303, configured to obtain, according to the target serial number information and the target identifier, PCIE device information corresponding to the failure report.
In the method for monitoring the failure of the PCIE device provided in the embodiment of the present invention, when detecting that a failure report sent by a BIOS is received, target sequence number information corresponding to the failure report is obtained; determining a target identifier in a PCIE equipment mapping relation table according to the target sequence number information; and acquiring PCIE equipment information corresponding to the fault report according to the target serial number information and the target identifier. In the embodiment of the invention, the target serial number information is obtained by receiving the fault report, the identifier is searched in the pre-generated mapping relation table, and the PCIE equipment information corresponding to the fault report is obtained according to the serial number information and the identifier information, so that the state of the specific PCIE equipment in the server based on the PCIE Switch architecture is effectively monitored in real time, and a good fault monitoring and diagnosing means is provided for the operation and maintenance work after the server is on-line and even mass production.
Optionally, the apparatus further comprises:
the first module is used for acquiring serial number information of PCIE equipment in the server;
and the second module is used for matching the serial number information with a preset identifier to generate a PCIE equipment mapping relation table.
Optionally, the apparatus further comprises:
a third module, configured to, when it is detected that the register information is abnormal, perform adaptive correction on a PCIE device corresponding to the register information according to a preset PCIE specification standard, where the PCIE specification standard is generated based on historical failure information of the PCIE device.
Optionally, the second obtaining module includes:
the first submodule is used for generating an IPMI command according to the target serial number information and the target identifier;
and the second sub-module is used for acquiring the register information of the PCIE equipment corresponding to the fault report and the fault position according to the IPMI command.
Optionally, the apparatus further comprises:
and the fourth module is used for visually displaying the PCIE equipment information on a preset management interface of the BMC.
Optionally, the apparatus further comprises:
a fifth module, configured to, when it is detected that a fault report sent by the BIOS is received, analyze a fault type corresponding to the fault report, and generate an analysis result;
and the sixth module is used for sending the analysis result to a Complex Programmable Logic Device (CPLD) so as to enable the CPLD to carry out early warning information indication according to the fault type.
Optionally, the apparatus further comprises:
and the seventh module is configured to generate a black box log according to the register information in the PCIE device information, and store the black box log.
An embodiment of the present invention further provides an electronic device, as shown in fig. 5, including a processor 401, a communication interface 402, a memory 403, and a communication bus 404, where the processor 401, the communication interface 402, and the memory 403 complete mutual communication through the communication bus 404,
a memory 403 for storing a computer program;
the processor 401, when executing the program stored in the memory 403, implements the following steps:
under the condition that a fault report sent by a Basic Input Output System (BIOS) is detected to be received, acquiring target serial number information corresponding to the fault report; determining a target identifier in a PCIE equipment mapping relation table according to the target sequence number information; and acquiring PCIE equipment information corresponding to the fault report according to the target serial number information and the target identifier.
The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the terminal and other equipment.
The Memory may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In another embodiment provided by the present invention, a computer-readable storage medium is further provided, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a computer, the computer is caused to execute the PCIE device failure monitoring method described in any one of the foregoing embodiments.
In another embodiment of the present invention, a computer program product containing instructions is further provided, which when run on a computer, causes the computer to execute the PCIE device failure monitoring method described in any of the foregoing embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to be performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A PCIE equipment fault monitoring method is characterized in that the method is applied to a baseboard management controller BMC, and the method comprises the following steps:
under the condition that a fault report sent by a Basic Input Output System (BIOS) is detected to be received, acquiring target serial number information corresponding to the fault report;
determining a target identifier in a PCIE equipment mapping relation table according to the target sequence number information;
and acquiring PCIE equipment information corresponding to the fault report according to the target serial number information and the target identifier.
2. The method according to claim 1, wherein before the step of obtaining target serial number information corresponding to the failure report in the case of detecting that the failure report sent by the BIOS is received, the method further comprises:
acquiring serial number information of PCIE equipment in a server;
and matching the serial number information with a preset identifier to generate a PCIE equipment mapping relation table.
3. The method according to claim 1, wherein after the step of obtaining PCIE device information corresponding to the failure report according to the target sequence number information and the target identifier, the method further comprises:
when the information of the PCIE equipment is detected to be abnormal, the PCIE equipment is subjected to self-adaptive correction according to a preset PCIE standard, wherein the PCIE standard is generated based on historical failure information of the PCIE equipment.
4. The method of claim 1, wherein the obtaining PCIE device information corresponding to the failure report according to the target sequence number information and the target identifier comprises:
generating an IPMI command according to the target serial number information and the target identifier;
and acquiring register information of PCIE equipment corresponding to the fault report and a fault position according to the IPMI command.
5. The method according to claim 1, wherein after the step of obtaining PCIE device information corresponding to the failure report according to the target sequence number information and the target identifier, the method further comprises:
and visually displaying the PCIE equipment information on a preset management interface of the BMC.
6. The method of claim 1, further comprising:
under the condition that a fault report sent by the BIOS is detected to be received, analyzing and processing a fault type corresponding to the fault report to generate an analysis result;
and sending the analysis result to a Complex Programmable Logic Device (CPLD) for the CPLD to indicate early warning information according to the fault type.
7. The method according to claim 1, wherein after the step of obtaining PCIE device information corresponding to the failure report according to the target sequence number information and the target identifier, the method further comprises:
and generating a black box log according to the register information in the PCIE equipment information, and storing the black box log.
8. The PCIE equipment fault monitoring device is characterized by being applied to a Baseboard Management Controller (BMC), and the device comprises:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring target serial number information corresponding to a fault report when the fault report sent by a Basic Input Output System (BIOS) is detected to be received;
a determining module, configured to determine a target identifier according to the target sequence number information in a PCIE device mapping relationship table;
and a second obtaining module, configured to obtain, according to the target sequence number information and the target identifier, PCIE device information corresponding to the failure report.
9. A communication device, comprising: a transceiver, a memory, a processor, and a program stored on the memory and executable on the processor;
the processor is configured to read a program in the memory to implement the steps in the PCIE device failure monitoring method according to any one of claims 1 to 7.
10. A readable storage medium storing a program, wherein the program, when executed by a processor, implements the steps in the PCIE device failure monitoring method according to any one of claims 1 to 7.
CN202211685851.5A 2022-12-27 2022-12-27 PCIE equipment fault monitoring method and device, communication equipment and storage medium Pending CN115878430A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211685851.5A CN115878430A (en) 2022-12-27 2022-12-27 PCIE equipment fault monitoring method and device, communication equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211685851.5A CN115878430A (en) 2022-12-27 2022-12-27 PCIE equipment fault monitoring method and device, communication equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115878430A true CN115878430A (en) 2023-03-31

Family

ID=85754766

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211685851.5A Pending CN115878430A (en) 2022-12-27 2022-12-27 PCIE equipment fault monitoring method and device, communication equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115878430A (en)

Similar Documents

Publication Publication Date Title
CN108683562B (en) Anomaly detection positioning method and device, computer equipment and storage medium
US8291379B2 (en) Runtime analysis of a computer program to identify improper memory accesses that cause further problems
CN109976959A (en) A kind of portable device and method for server failure detection
CN111414268B (en) Fault processing method and device and server
CN105468484A (en) Method and apparatus for determining fault location in storage system
CN109117327A (en) A kind of hard disk detection method and device
US20210173010A1 (en) Diagnostic tool for traffic capture with known signature database
US20210111967A1 (en) Graphical user interface for traffic capture and debugging tool
CN112783703A (en) SAS link fault positioning method, device, equipment and storage medium
CN107943654A (en) A kind of method of quick determining server environmental temperature monitoring abnormal cause
CN114860518A (en) Detection method and system of function safety system, electronic equipment and storage medium
CN114793132A (en) Optical module detection method and device, electronic equipment and storage medium
JP2014021577A (en) Apparatus, system, method, and program for failure prediction
CN113010341A (en) Method and equipment for positioning fault memory
CN115878430A (en) PCIE equipment fault monitoring method and device, communication equipment and storage medium
CN108445280A (en) A kind of voltmeter with fault cues
CN105446857A (en) Fault diagnosis method and system
CN117234955B (en) Software test management method and system based on Internet of things
US11829229B2 (en) Apparatus and method for diagnosing no fault failure found in electronic systems
CN115150254B (en) PCIe link fault detection method, detection device, equipment and medium
CN115695159B (en) Equipment diagnosis method, device, equipment and storage medium
CN114253846B (en) Automatic test abnormality positioning method, device, equipment and readable storage medium
CN113867994B (en) Cabinet VPD information processing method and device, storage equipment and readable storage medium
US11544165B2 (en) Method for locating and repairing intermittent faults in communication structures of an aircraft
CN116489001A (en) Switch fault diagnosis and recovery method and device, switch and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination