CN114281618A - PCIE link training state monitoring device and server - Google Patents

PCIE link training state monitoring device and server Download PDF

Info

Publication number
CN114281618A
CN114281618A CN202111444803.2A CN202111444803A CN114281618A CN 114281618 A CN114281618 A CN 114281618A CN 202111444803 A CN202111444803 A CN 202111444803A CN 114281618 A CN114281618 A CN 114281618A
Authority
CN
China
Prior art keywords
pcie
management controller
link training
training
baseboard management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111444803.2A
Other languages
Chinese (zh)
Inventor
施世磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202111444803.2A priority Critical patent/CN114281618A/en
Publication of CN114281618A publication Critical patent/CN114281618A/en
Withdrawn legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention relates to the technical field of computers, in particular to a PCIE link training state monitoring device and a server. The device comprises: the integrated south bridge stores training state information of a PCIE equipment port; the baseboard management controller is connected with the integrated south bridge and is configured to read and analyze the training state information from the integrated south bridge based on a preset rule so as to generate a link training diagnosis result of the PCIE port; and the display terminal is connected with the substrate management controller and is configured to display the link training diagnosis result. The scheme of the invention realizes the automatic diagnosis of whether the link training is normal or not, more flexibly and intuitively monitors the PCIE training state, provides a basis for analyzing and positioning problems when the PCIE equipment has problems, and is beneficial to improving the test efficiency of the PCIE equipment.

Description

PCIE link training state monitoring device and server
Technical Field
The invention relates to the technical field of computers, in particular to a PCIE link training state monitoring device and a server.
Background
With the performance improvement of the servers, the speed of high-speed signals is continuously improved, the requirement of a client on the stability of equipment in a machine room is also continuously improved, and almost every server uses a large number of PCIE devices (such as an intelligent network card, an NVME hard disk, a display card, and the like). In normal operation of the PCIE device, the PCIE device is powered on first and performs PCIE link training (i.e., Train link) with a PCIE port of a Central Processing Unit (CPU). PCIE link training generates training state information, which is typically stored in registers in a binary fashion. In the actual application process, due to differences between the specific approaches of host sides of different manufacturers, various problems may exist in the PCIE link training process, which may cause link training failure, however, the training state information may not directly indicate that the link training is successful or failed.
For example, when a server supports a plurality of video cards and network cards, assuming that 16 video cards are supported, the rate specified when each video card is designed is Gen3, the bandwidth is X16, and the 16 video cards are all at the same bandwidth rate, when a tester tests, if 2 video cards have the problem of reducing the speed and the bandwidth, the tester cannot find the problem most intuitively, and when the tester checks logs, the tester can only see the state after reducing the speed and the bandwidth, and cannot be clearly informed of the existence of the root of the problem of reducing the speed and the bandwidth, so that the efficiency of finding the problem and solving the problem becomes low. In addition, if the development tester does not check the log, the problem is missed, the development rhythm of the product is seriously influenced, the development period of the product is prolonged, and the quality of the product is influenced.
Disclosure of Invention
In view of the above, it is desirable to provide a PCIE link training status monitoring apparatus and a server.
According to a first aspect of the present invention, a PCIE link training state monitoring apparatus is provided, where the apparatus includes:
the integrated south bridge stores training state information of a PCIE equipment port;
the baseboard management controller is connected with the integrated south bridge and is configured to read and analyze the training state information from the integrated south bridge based on a preset rule so as to generate a link training diagnosis result of the PCIE port;
and the display terminal is connected with the substrate management controller and is configured to display the link training diagnosis result.
In some embodiments, the bmc is connected to the integrated south bridge via a PCIE bus, and the bmc is further configured to:
reading training state information stored in an integrated south bridge through the PCIE bus;
analyzing the training state information to determine the actual bandwidth and the actual rate of each channel corresponding to the PCIE equipment port;
comparing the actual bandwidth and the actual rate with a preset bandwidth and a preset rate respectively to determine the channel states corresponding to the ports of the PCIE equipment, wherein the channel states comprise normal and abnormal;
and taking the state of each channel corresponding to the PCIE equipment port as the link training diagnosis result.
In some embodiments, the baseboard management controller includes a built-in first EEPROM;
the baseboard management controller is further configured to store the link training diagnostic result in the first EEPROM.
In some embodiments, the apparatus further comprises: the second EEPROM is arranged outside the substrate management controller and is connected with the substrate management controller; and
the baseboard management controller is further configured to store the link training diagnostic result in the second EEPROM.
In some embodiments, the apparatus further comprises a switch coupled to the baseboard management controller; and
the baseboard management controller is further configured to read the stored link training diagnosis result according to the trigger state of the switch, and send the read link training diagnosis result to the display terminal for displaying.
In some embodiments, the switch is a point-contact switch.
In some embodiments, the display terminal is connected to the baseboard management controller through a VGE interface, and the display terminal is configured to display the link training diagnosis result through a WEB interface.
In some embodiments, the apparatus further comprises: one end of the complex programmable logic device is connected with the PCIE equipment, and the other end of the complex programmable logic device is connected with the substrate management controller and is configured to detect the power-on state of the PCIE equipment;
the baseboard management controller is further configured to obtain a power-on state of the PCIE device from the complex programmable logic device before reading the training state information, and in response to the power-on state of the PCIE device being normal, read and analyze the training state information from the integrated south bridge based on a preset rule to generate a link training diagnosis result of the PCIE port.
In some embodiments, the predetermined rule is to read data at predetermined time intervals, and/or to read data periodically.
According to a second aspect of the present invention, there is provided a server, including the PCIE link training state monitoring apparatus described in the above.
According to the PCIE link training state monitoring device, the training state information is read from the integrated south bridge through the substrate management controller, the obtained training state information is analyzed, a link training diagnosis result is obtained, and finally the link training diagnosis result is displayed through the display terminal, so that whether link training is normal or not is automatically diagnosed, the PCIE training state is monitored flexibly and visually, when a problem occurs in PCIE equipment, a basis is provided for analyzing and positioning the problem, and the efficiency of PCIE equipment testing is improved.
In addition, the invention also provides a server, which can also realize the technical effects and is not described again here.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a device for monitoring a training state of a PCIE link according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of another PCIE link training state monitoring apparatus according to another embodiment of the present invention.
[ description of reference ]
1: an integrated south bridge; 2: a baseboard management controller; 3: a display terminal; 4: a first EEPROM; 5: a second EEPROM; 6: a switch; 7: a complex programmable logic device.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
In an embodiment, referring to fig. 1, the present invention provides a device for monitoring a PCIE link training state, where the device includes:
an integrated south bridge 1, wherein training state information of a PCIE device port (not shown in the figure) is stored in the integrated south bridge 1;
the baseboard management controller 2 is connected with the integrated south bridge 2, and is configured to read and analyze the training state information from the integrated south bridge 1 based on a preset rule to generate a link training diagnosis result of the PCIE port;
and the display terminal 3 is connected with the baseboard management controller 2 and is configured to display the link training diagnosis result.
According to the PCIE link training state monitoring device, the training state information is read from the integrated south bridge through the substrate management controller, the obtained training state information is analyzed, a link training diagnosis result is obtained, and finally the link training diagnosis result is displayed through the display terminal, so that whether link training is normal or not is automatically diagnosed, the PCIE training state is monitored flexibly and visually, when a problem occurs in PCIE equipment, a basis is provided for analyzing and positioning the problem, and the efficiency of PCIE equipment testing is improved.
In some embodiments, the bmc 2 is connected to the integrated south bridge via a PCIE bus, and the bmc is further configured to:
reading training state information stored in an integrated south bridge through the PCIE bus;
analyzing the training state information to determine the actual bandwidth and the actual rate of each channel corresponding to the PCIE equipment port;
comparing the actual bandwidth and the actual rate with a preset bandwidth and a preset rate respectively to determine the channel states corresponding to the ports of the PCIE equipment, wherein the channel states comprise normal and abnormal;
and taking the state of each channel corresponding to the PCIE equipment port as the link training diagnosis result.
In some embodiments, please refer to fig. 2, fig. 2 is a schematic structural diagram illustrating a PCIE link training status monitoring apparatus, specifically, the apparatus includes, in addition to the components in fig. 1, the baseboard management controller includes a built-in first EEPROM 4;
the baseboard management controller 2 is further configured to store the link training diagnostic result in the first EEPROM 4.
In this embodiment, after the link training diagnosis result is generated, the result is stored by using the storage space of the baseboard management controller, so that operation and maintenance personnel or testing personnel can check the result at any time.
In another embodiment, please continue to refer to fig. 2, the apparatus further includes: a second EEPROM5, the second EEPROM5 being provided outside the baseboard management controller 2 and connected to the baseboard management controller 2; and
the baseboard management controller 2 is further configured to store the link training diagnostic result in the second EEPROM 5.
In the embodiment, the storage medium is independently mounted outside the baseboard management controller, the generated result is automatically stored in the storage medium outside the baseboard management controller, and the plug-in storage is different from the direct storage in the baseboard management controller in that the plug-in storage cannot erase data along with power-on and power-off, so that repeated reading and analysis of training state information are avoided, convenience is provided for result acquisition, the result can be conveniently checked at any time, and the time is saved.
In another embodiment, please continue to refer to fig. 2, the apparatus further includes a switch 6, wherein the switch 6 is connected to the bmc 2; and
the baseboard management controller 2 is further configured to read the stored link training diagnosis result according to the trigger state of the switch, and send the read link training diagnosis result to the display terminal 3 for display.
Preferably, the switch 6 is a point-contact switch.
In some embodiments, the display terminal is connected to the baseboard management controller through a VGE interface, and the display terminal is configured to display the link training diagnosis result through a WEB interface.
In some embodiments, please continue to refer to fig. 2, the apparatus further includes: one end of the complex programmable logic device 7 is connected with the PCIE device, and the other end of the complex programmable logic device 7 is connected with the substrate management controller 2, and is configured to detect a power-on state of the PCIE device;
the baseboard management controller 2 is further configured to obtain the power-on state of the PCIE device from the complex programmable logic device 7 before reading the training state information, and in response to that the power-on state of the PCIE device is normal, read and analyze the training state information from the integrated south bridge based on a preset rule to generate a link training diagnosis result of the PCIE port.
In this embodiment, a complex programmable logic device is added to detect the power-on state of the PCIE device, and when the power-on state is normal, the training state information is read and analyzed, so that the influence of the power-on abnormality of the PCIE device on the link training is eliminated, and the accuracy of the training state analysis is improved.
In yet another embodiment, the preset rule is to read data at preset time intervals, and/or to read data periodically.
In the embodiment, by adopting a mode of reading data at regular time or periodically, the PCIE link training state analysis is automatically initiated, the timeliness of the obtained link training state information is effectively ensured, reliable data support is provided for the subsequent training state analysis, manual intervention is not needed, and the labor cost is saved.
In another embodiment, the present invention further provides a server, where the server includes the PCIE link training state monitoring apparatus described above.
In another embodiment, to facilitate understanding of the technical solution of the present invention, a 2-way CPU universal server is taken as an example below, a plurality of PCIE ports are below each CPU, the bandwidth and the rate of each port are X16 and GEN4, and it is assumed that a video card of GEN3 and X16 is mounted below each port, and the specific implementation process is as follows:
(1) the substrate management controller acquires the PCIE Train state of each display card in the current server from the integrated south bridge through the PCIE bus, lists the Train state of each display card in a list form, switches the output of the VGA through the point contact switch, and displays the Train state of each display card on a WEB interface;
(2) the EEPROM is mounted below the baseboard management controller, the baseboard management controller stores the Train state of each display card taken from the integrated south bridge into the EEPROM, and a tester can read the Train state of each current display card in real time through a serial port and can quickly and accurately take logs.
(3) As shown in table 1, table 1 shows the PCIE link training diagnosis result, where "OK" indicates that the channel is normal in the Train process, and "FAIL" indicates that the channel is abnormal in the Train process, and the output of the VGA is switched by using a point-contact key switch, and when the switch is pressed, the VGA switches and outputs the following table content information, and the content form is not limited, and is only illustrated by the table.
TABLE 1 PCIE Link training diagnostic results
CONTENT CPU0 PORT0 CPU0 PORT1 CPU1 PORT0 CPU1 PORT1
LANE0 OK OK OK OK
LANE1 OK FAIL OK OK
LANE2 OK OK OK OK
LANE3 OK OK OK OK
LANE4 OK OK OK OK
LANE5 OK OK OK OK
LANE6 OK OK OK OK
…… …… …… …… ……
Results OK FAIL OK OK
The device of the invention at least has the following beneficial technical effects: the detection is not the analysis of the Train state, the current Train result can be visually displayed for the tester through VGA output display, the Train result is stored in an EEPROM, the tester can conveniently export logs, the current state is recorded and stored, and the switching from VGA output to output of the Train result for real-time monitoring is triggered by detecting the state of a point contact switch through a substrate management controller.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A PCIE link training status monitoring device, characterized in that the device includes:
the integrated south bridge stores training state information of a PCIE equipment port;
the baseboard management controller is connected with the integrated south bridge and is configured to read and analyze the training state information from the integrated south bridge based on a preset rule so as to generate a link training diagnosis result of the PCIE port;
and the display terminal is connected with the substrate management controller and is configured to display the link training diagnosis result.
2. The device of claim 1, wherein the baseboard management controller is connected to the integrated south bridge via a PCIE bus, the baseboard management controller further configured to:
reading training state information stored in an integrated south bridge through the PCIE bus;
analyzing the training state information to determine the actual bandwidth and the actual rate of each channel corresponding to the PCIE equipment port;
comparing the actual bandwidth and the actual rate with a preset bandwidth and a preset rate respectively to determine the channel states corresponding to the ports of the PCIE equipment, wherein the channel states comprise normal and abnormal;
and taking the state of each channel corresponding to the PCIE equipment port as the link training diagnosis result.
3. The device of claim 1, wherein the baseboard management controller comprises a first built-in EEPROM;
the baseboard management controller is further configured to store the link training diagnostic result in the first EEPROM.
4. The device of claim 1, wherein the device further comprises: the second EEPROM is arranged outside the substrate management controller and is connected with the substrate management controller; and
the baseboard management controller is further configured to store the link training diagnostic result in the second EEPROM.
5. The PCIE link training status monitoring device of claim 3 or 4, wherein the device further comprises a switch, the switch is connected with the baseboard management controller; and
the baseboard management controller is further configured to read the stored link training diagnosis result according to the trigger state of the switch, and send the read link training diagnosis result to the display terminal for displaying.
6. The device of claim 5, wherein the switch is a point-contact switch.
7. The device of claim 1, wherein the display terminal is connected to the baseboard management controller via a VGE interface, and the display terminal is configured to display the link training diagnosis result via a WEB interface.
8. The device of claim 1, wherein the device further comprises: one end of the complex programmable logic device is connected with the PCIE equipment, and the other end of the complex programmable logic device is connected with the substrate management controller and is configured to detect the power-on state of the PCIE equipment;
the baseboard management controller is further configured to obtain a power-on state of the PCIE device from the complex programmable logic device before reading the training state information, and in response to the power-on state of the PCIE device being normal, read and analyze the training state information from the integrated south bridge based on a preset rule to generate a link training diagnosis result of the PCIE port.
9. The device of claim 1, wherein the predetermined rule is to read data at predetermined time intervals and/or to read data at regular times.
10. A server comprising the PCIE link training state monitoring apparatus of any one of claims 1 to 9.
CN202111444803.2A 2021-11-30 2021-11-30 PCIE link training state monitoring device and server Withdrawn CN114281618A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111444803.2A CN114281618A (en) 2021-11-30 2021-11-30 PCIE link training state monitoring device and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111444803.2A CN114281618A (en) 2021-11-30 2021-11-30 PCIE link training state monitoring device and server

Publications (1)

Publication Number Publication Date
CN114281618A true CN114281618A (en) 2022-04-05

Family

ID=80870373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111444803.2A Withdrawn CN114281618A (en) 2021-11-30 2021-11-30 PCIE link training state monitoring device and server

Country Status (1)

Country Link
CN (1) CN114281618A (en)

Similar Documents

Publication Publication Date Title
US9720758B2 (en) Diagnostic analysis tool for disk storage engineering and technical support
CN110050441B (en) Capturing traffic in real time for protocol debugging in case of failure
CN102244591A (en) Client server and method for full process monitoring on function text of client server
CN106407059A (en) Server node testing system and method
CN111400121B (en) Server hard disk slot positioning and maintaining method
CN109976959A (en) A kind of portable device and method for server failure detection
CN104298583B (en) Mainboard management system and method based on baseboard management controller
CN112286709A (en) Diagnosis method, diagnosis device and diagnosis equipment for server hardware faults
US11640377B2 (en) Event-based generation of context-aware telemetry reports
CN104239174A (en) BMC (baseboard management controller) remote debugging system and method
US9916273B2 (en) Sideband serial channel for PCI express peripheral devices
CN111176913A (en) Circuit and method for detecting Cable Port in server
JP2014021577A (en) Apparatus, system, method, and program for failure prediction
US20140359377A1 (en) Abnormal information output system for a computer system
CN113708986B (en) Server monitoring apparatus, method and computer-readable storage medium
WO2024124862A1 (en) Server-based memory processing method and apparatus, processor and an electronic device
CN108647124A (en) A kind of method and its device of storage skip signal
US8516311B2 (en) System and method for testing peripheral component interconnect express switch
CN114281618A (en) PCIE link training state monitoring device and server
CN115599617B (en) Bus detection method and device, server and electronic equipment
CN115470056A (en) Method, system, device and medium for troubleshooting power-on starting of server hardware
US9483331B1 (en) Notifying a multipathing driver of fabric events and performing multipathing management operations in response to such fabric events
CN115543707A (en) Hard disk fault detection method, system and device, storage medium and electronic device
CN112463504B (en) Double-control storage product testing method, system, terminal and storage medium
CN114218001A (en) Fault repairing method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20220405

WW01 Invention patent application withdrawn after publication