CN109726055B - Method for detecting PCIe chip abnormity and computer equipment - Google Patents

Method for detecting PCIe chip abnormity and computer equipment Download PDF

Info

Publication number
CN109726055B
CN109726055B CN201711044736.9A CN201711044736A CN109726055B CN 109726055 B CN109726055 B CN 109726055B CN 201711044736 A CN201711044736 A CN 201711044736A CN 109726055 B CN109726055 B CN 109726055B
Authority
CN
China
Prior art keywords
chip
pcie
pcie chip
signal
cpld
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711044736.9A
Other languages
Chinese (zh)
Other versions
CN109726055A (en
Inventor
柴峰
陈加怀
李道宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XFusion Digital Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201711044736.9A priority Critical patent/CN109726055B/en
Publication of CN109726055A publication Critical patent/CN109726055A/en
Application granted granted Critical
Publication of CN109726055B publication Critical patent/CN109726055B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

A method for detecting PCIe chip abnormity and a computer device, wherein the computer device comprises: the system comprises a CPU, a PCIe chip coupled with the CPU and a CPLD coupled with the PCIe chip; the heartbeat signal pin of the PCIe chip is electrically connected with the signal detection pin of the CPLD, the PCIe chip outputs a heartbeat signal to the CPLD through the heartbeat signal pin, and the heartbeat signal is used for indicating whether the PCIe chip works normally or not. The CPLD is used for determining whether the PCIe chip works normally according to the heartbeat signal acquired by the signal detection pin; and when the PCIe chip is determined to work abnormally, controlling the power down of the PCIe chip. According to the embodiment of the application, whether the PCIe chip works normally is detected through a hardware means, and when the PCIe chip is detected to be abnormal, a link between the PCIe chip and the CPU is disconnected through the hardware means, so that the abnormal detection speed is increased, and the link is broken in time.

Description

Method for detecting PCIe chip abnormity and computer equipment
Technical Field
The embodiment of the present application relates to the field of communications technologies, and in particular, to a method and a computer device for detecting a Peripheral Component interconnect express (PCIe) chip exception.
Background
PCIe is a high-performance, high-bandwidth serial communications interconnect standard. PCIe technology is widely used in computer devices such as personal computers, servers, and data centers.
The PCIe chip refers to a chip supporting the PCIe standard, such as a PCIe Switch (Switch) chip, a Non-transparent Bridge (NTB) chip, and the like. The PCIe chip is generally connected to a Central Processing Unit (CPU) of the computer device, and is configured to forward a message sent by the CPU. When the PCIe chip operates, the PCIe chip may operate abnormally due to factors such as abnormal code execution, large message pressure, and aging of physical characteristics.
When the PCIe chip works abnormally, the PCIe chip cannot continuously process the message sent by the CPU, if the PCIe link between the CPU and the PCIe chip is not disconnected at the moment, the CPU waits for the response of the PCIe chip to feed back the message to the CPU under the condition of not sensing the abnormal work of the PCIe chip, and the message which needs to be sent to the PCIe chip subsequently is continuously put into a buffer area of the CPU until the buffer area of the CPU overflows and reports errors.
In the prior art, a CPU detects whether a PCIe chip is operating normally by a software means, and disconnects a PCIe link between the CPU and the PCIe chip by the software means when detecting that the PCIe chip is operating abnormally. Specifically, the software program in the CPU accesses the PCIe chip in a polling manner, and when the number of consecutive access failures reaches a preset number (e.g., 3), the software program disconnects the PCIe link between the CPU and the PCIe chip.
In the prior art, the PCIe chip abnormity is detected and the chain breaking operation is executed through a software means, so that the problems of low detection speed and untimely chain breaking exist.
Disclosure of Invention
The embodiment of the application provides a method for detecting PCIe chip abnormity and computer equipment, which can be used for solving the problems of low detection speed and untimely chain breakage in the prior art that the PCIe chip abnormity is detected and the chain breakage operation is executed by a software means.
In one aspect, an embodiment of the present application provides a computer device, where the computer device includes: the system comprises a CPU, a PCIe chip coupled with the CPU, and a Complex Programmable Logic Device (CPLD) coupled with the PCIe chip. The heartbeat signal pin of PCIe chip and CPLD's signal detection pin electric connection, PCIe chip passes through heartbeat signal pin and exports heartbeat signal to CPLD, and this heartbeat signal is used for instructing PCIe chip whether normally work. The CPLD is used for determining whether the PCIe chip works normally according to the heartbeat signal acquired by the signal detection pin; and when the PCIe chip is determined to work abnormally, controlling the power down of the PCIe chip.
According to the scheme provided by the embodiment of the application, whether the PCIe chip normally works is detected through the CPLD according to the heartbeat signal output by the PCIe chip, and when the PCIe chip is detected to work abnormally, the power of the PCIe chip is controlled to be turned off, so that a PCIe link between the PCIe chip and a CPU is disconnected, and the abnormal detection and chain breakage operation of the PCIe chip are realized. Due to the characteristics of the CPLD, the processing speed of the CPLD is higher than that of software, and the CPLD is not influenced by slow card or even hang-up of the CPU due to message issuing blockage, so that the abnormality of the PCIe chip can be quickly detected; and the PCIe link between the PCIe chip and the CPU is disconnected by powering down the PCIe chip, so that the disconnection is more timely and fast.
In one possible design, the heartbeat signal is a square wave signal of a predetermined frequency. The CPLD is also used for determining that the PCIe chip works abnormally when the duration time of the high level or the duration time of the low level of the heartbeat signal is greater than a preset threshold value.
The square wave signal is used as the heartbeat signal, so that the CPLD can distinguish the form change of the heartbeat signal more simply and definitely, and whether the PCIe chip works normally or not can be determined quickly and accurately.
In one possible design, the CPLD is further configured to output a power-down signal to the power control chip when it is determined that the PCIe chip operates abnormally, and the power control chip is configured to stop supplying power to the PCIe chip according to the power-down signal.
In one possible design, the CPLD is further configured to, after controlling the PCIe chip to power down, control the PCIe chip to power up again according to the power-up signal if the power-up signal output by the CPU is received.
In one possible design, the PCIe chip is a PCIe switch chip; or the PCIe chip is an NTB chip.
On the other hand, the embodiment of the application provides a method for detecting PCIe chip exception, which is applied to a CPLD of a computer device. The computer device includes: the system comprises a CPU, a PCIe chip coupled with the CPU and a CPLD coupled with the PCIe chip. The heartbeat signal pin of the PCIe chip is electrically connected with the signal detection pin of the CPLD, the PCIe chip outputs a heartbeat signal to the CPLD through the heartbeat signal pin, and the heartbeat signal is used for indicating whether the PCIe chip works normally or not.
The method comprises the following steps: the CPLD determines whether the PCIe chip works normally or not according to the heartbeat signal acquired by the signal detection pin; and when the PCIe chip is determined to work abnormally, the CPLD controls the power down of the PCIe chip.
In one possible design, the heartbeat signal is a square wave signal of a predetermined frequency. The CPLD determines whether the PCIe chip works normally according to the heartbeat signal acquired by the signal detection pin, and the method comprises the following steps: and when the high level duration or the low level duration of the heartbeat signal is greater than a preset threshold value, the CPLD determines that the PCIe chip works abnormally.
In one possible design, the CPLD controls the PCIe chip to power down, including: the CPLD outputs a lower electric signal to the power supply control chip, and the power supply control chip is used for stopping supplying power to the PCIe chip according to the lower electric signal.
In one possible design, after the CPLD controls the PCIe chip to power down, the method further includes: when a power-on signal output by the CPU is received, the CPLD controls the PCIe chip to be powered on again according to the power-on signal.
Compared with the prior art, in the scheme provided by the embodiment of the application, whether the PCIe chip normally works is detected through the CPLD according to the heartbeat signal output by the PCIe chip, and when the PCIe chip is detected to work abnormally, the PCIe link between the PCIe chip and the CPU is disconnected by controlling the power down of the PCIe chip, so that the abnormal detection and chain disconnection operation of the PCIe chip are realized. Due to the characteristics of the CPLD, the processing speed of the CPLD is higher than that of software, and the CPLD is not influenced by slow card or even hang-up of the CPU due to message issuing blockage, so that the abnormality of the PCIe chip can be quickly detected; and the PCIe link between the PCIe chip and the CPU is disconnected by powering down the PCIe chip, so that the disconnection is more timely and fast.
Drawings
FIG. 1 is a block diagram of a computer device provided by one embodiment of the present application;
fig. 2A and 2B are schematic diagrams of a heartbeat signal provided by an embodiment of the present application;
fig. 3 is a schematic diagram of an application scenario provided in an embodiment of the present application;
fig. 4 is a schematic diagram of another application scenario provided in the embodiment of the present application;
FIG. 5 is a flowchart of a method for detecting PCIe chip exceptions, according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
In the technical scheme provided by the embodiment of the application, whether the PCIe chip works normally is detected through a hardware means, and when the PCIe chip works abnormally, a PCIe link between the PCIe chip and a CPU is disconnected through the hardware means, so that the abnormal detection speed is increased, and the disconnection is performed more timely.
Hereinafter, the embodiments of the present application will be described in further detail with reference to the common aspects of the embodiments of the present application described above.
Referring to FIG. 1, a block diagram of a computer device 10 provided by one embodiment of the present application is shown. The computer device 10 includes: CPU11, PCIe chip 12 and CPLD 13.
The computer device 10 may be any electronic device supporting PCIe technology, such as a personal computer, a server, and a data center.
The CPU11 is an arithmetic core and a control core of the computer device 10, and functions mainly to interpret computer instructions and process data in computer software.
The PCIe chip 12 refers to a chip supporting the PCIe standard, such as a PCIe switch chip, an NTB chip, a PCIe network card, and the like. A PCIe link is established between the CPU11 and the PCIe chip 12, the CPU11 issues a message to the PCIe chip 12 through the PCIe link, and the PCIe chip 12 performs subsequent processing (such as forwarding and storing) on the received message.
The CPLD13 is configured to detect whether the PCIe chip 12 operates normally, and disconnect a link between the PCIe chip 12 and the CPU11 when detecting that the PCIe chip operates abnormally.
In the embodiment of the present application, as shown in fig. 1, the CPU11 is coupled to the PCIe chip 12, and the PCIe chip 12 is coupled to the CPLD 13.
PCIe chip 12 has heartbeat signal pin, CPLD13 has signal detection pin, PCIe chip 12's heartbeat signal pin and CPLD 13's signal detection pin electric connection. The PCIe chip 12 outputs a heartbeat signal to the CPLD13 through the heartbeat signal pin, and accordingly, the CPLD13 receives the heartbeat signal output by the PCIe chip 12 through the signal detection pin.
The heartbeat signal is used to indicate whether the PCIe chip 12 is operating properly. The heartbeat signal may be a high-low level signal in the form of a square waveform. When the PCIe chip 12 works normally, the PCIe chip 12 outputs a heartbeat signal of the first form to the CPLD 13; when the PCIe chip 12 abnormally operates, the PCIe chip 12 outputs a heartbeat signal of the second form to the CPLD 13; wherein the first form is different from the second form. The CPLD13 can determine whether the PCIe chip 12 operates normally according to the form of the received heartbeat signal. For example, when the PCIe chip 12 operates normally, the first heartbeat signal output by the PCIe chip 12 to the CPLD13 is a high-low level signal in the form of a square wave. If the PCIe chip 12 is abnormal when the PCIe chip 12 outputs the high level signal, the PCIe chip 12 continues to output the high level signal and does not switch to the low level signal; or, if the PCIe chip 12 is abnormal when the PCIe chip 12 outputs the low level signal, the PCIe chip 12 may continuously output the low level signal and no longer switch to the high level signal. Therefore, when the PCIe chip 12 abnormally operates, the heartbeat signal of the second form output by the PCIe chip 12 to the CPLD13 may be a continuous high-level signal or a continuous low-level signal.
In one example, the heartbeat signal is a square wave signal with a preset frequency, and the square wave signal is used as the heartbeat signal, so that the CPLD13 can distinguish the form change of the heartbeat signal more simply and definitely, and it is helpful to determine whether the PCIe chip 12 works normally or not quickly and accurately. The preset frequency may be an empirical value preset according to actual conditions, for example, the preset frequency is 10Hz (hertz). In practical applications, the value of the preset frequency may be determined according to the message sending frequency between the CPU11 and the PCIe chip 12, or according to the requirement for the detection speed for detecting whether the PCIe chip 12 normally operates. For example, when the message sending frequency between the CPU11 and the PCIe chip 12 is high, in order to avoid message blocking caused by an abnormal PCIe chip, it is necessary to more quickly detect whether the PCIe chip 12 normally operates, so that the value of the preset frequency may be appropriately selected to be larger.
In addition, whether the PCIe chip 12 normally operates may be determined as a standard, whether the PCIe chip 12 can normally process the message issued by the CPU11, when the PCIe chip 12 can normally process the message issued by the CPU11, the PCIe chip 12 is considered to normally operate, and when the PCIe chip 12 cannot normally process the message issued by the CPU11, the PCIe chip 12 is considered to abnormally operate.
In the embodiment of the present application, the CPLD13 is configured to determine whether the PCIe chip 12 works normally according to the heartbeat signal acquired by the signal detection pin. For example, when the heartbeat signal acquired by the signal detection pin is in the first form, the CPLD13 determines that the PCIe chip 12 normally operates; when the heartbeat signal acquired by the signal detection pin is in the second form, the CPLD13 determines that the PCIe chip 12 abnormally operates.
Optionally, when the heartbeat signal is a square wave signal with a preset frequency, the CPLD13 is further configured to determine that the PCIe chip 12 abnormally operates when the high level duration or the low level duration of the heartbeat signal is greater than a preset threshold. The preset threshold may be an empirical value preset according to an actual situation, for example, the preset threshold may be set according to a value of a preset frequency. For example, when the preset frequency is 10Hz, the preset threshold is 1 second or 0.5 second. With reference to fig. 2A and fig. 2B, fig. 2A is a schematic diagram illustrating a first form of heartbeat signal output when the PCIe chip 12 normally operates, where the first form of heartbeat signal is a continuously switched high-low level signal; fig. 2B is a schematic diagram illustrating a second form of the heartbeat signal output when the PCIe chip 12 abnormally operates, where the second form of the heartbeat signal may be a continuous high-level signal (as shown in part (a) in fig. 2B) or a continuous low-level signal (as shown in part (B) in fig. 2B).
The CPLD13 is also configured to control the PCIe chip 12 to power down when it is determined that the PCIe chip 12 operates abnormally. In the embodiment of the present application, when the CPLD13 determines that the PCIe chip 12 works abnormally, the PCIe chip 12 is immediately controlled to be powered down, and since the PCIe chip 12 is powered down, the PCIe link between the PCIe chip 12 and the CPU11 is also disconnected. The chain breaking is triggered by controlling the power-off of the PCIe chip 12, so that the chain breaking is quicker.
In one example, the CPLD13 is further configured to output a power-down signal to the power control chip when it is determined that the PCIe chip 12 is abnormally operating, where the power-down signal is used to instruct the power control chip to stop supplying power to the PCIe chip 12. Alternatively, the lower electrical signal may be a high level signal, a low level signal, or a combination of high and low level signals. The power control chip is configured to stop supplying power to the PCIe chip 12 according to the power-down signal, for example, after the power control chip receives the power-down signal, the power control chip does not output a high level to the PCIe chip 12 any more, so as to stop supplying power to the PCIe chip 12.
Alternatively, the CPU11 may execute an Advanced Error Reporting (AER) repair procedure after detecting a link disconnection with the PCIe chip 12. The CPU11 sends a power-on signal to the CPLD13, where the power-on signal is used to trigger the CPLD13 to control the PCIe chip 12 to be powered on again. Optionally, the CPU11 sends the power-on signal to the CPLD13 by writing a register, the CPU11 may write a value in the register of the CPLD13, where the value is used to instruct the PCIe chip 12 to be powered on again, and after the CPLD13 reads the value written in the register, the PCIe chip 12 is controlled to be powered on again. The CPLD13 is further configured to, after controlling the PCIe chip 12 to power down, if a power-up signal output by the CPU11 is received, control the PCIe chip 12 to power up again according to the power-up signal. For example, the CPLD13 outputs the power-on signal to the power control chip after receiving the power-on signal, and the power control chip re-supplies the high level to the PCIe chip 12 after receiving the power-on signal, so as to restart power supply to the PCIe chip 12. After the power-up of the PCIe chip 12, an attempt is made to reestablish the PCIe link by executing a link establishment procedure with the CPU 11. If the fault source causing the abnormal work of the PCIe chip 12 is eliminated, the PCIe link is successfully rebuilt; if the source of the failure that caused the PCIe chip 12 to function abnormally is not eliminated (e.g., the PCIe chip 12 is completely broken), the PCIe link will fail to re-establish, but the source of the failure will no longer continue to affect the CPU 11.
The technical solution provided by the embodiment of the present application is described below with reference to two application scenarios.
In an exemplary scenario, as shown in fig. 3, PCIe chip 12 is PCIe switch chip 121. The PCIe switch chip 121 is used to implement interconnection between the CPU11 and a plurality of PCIe devices. The PCIe switch chip 121 includes one input port and N output ports, where N is a positive integer. The CPU11 is electrically connected to the input port, for example, through a PCIe bus. Each of the N output ports is configured to electrically connect to 1 PCIe device, for example, through a PCIe bus. Due to the presence of the PCIe switch chip 121, the CPU11 is enabled to support simultaneous communication with multiple PCIe devices.
When the PCIe switch chip 121 starts normal operation, it sends a notification to the CPLD13, where the notification is used to instruct the CPLD13 to turn on heartbeat detection enable. After the CPLD13 starts the heartbeat detection enable, the signal detection pin receives the heartbeat signal output by the PCIe switch chip 121, and determines whether the PCIe switch chip 121 normally operates according to the heartbeat signal. When it is determined that the PCIe switch chip 121 operates abnormally, the CPLD13 controls the PCIe switch chip 121 to power down, so that a PCIe link between the PCIe switch chip 121 and the CPU11 is disconnected.
In another exemplary scenario, as shown in fig. 4, PCIe chip 12 is NTB chip 122. The NTB chip 122 is generally applied to data synchronization of a dual-control system, as shown in fig. 4, the computer device 10(a) and the computer device 10(b) form a dual-control system, the computer device 10(a) includes the CPU11(a), the NTB chip 122(a) and the CPLD13(a), and the computer device 10(b) includes the CPU11(b), the NTB chip 122(b) and the CPLD13 (b).
The CPLD13(a) receives the heartbeat signal output by the NTB chip 122(a) through the signal detection pin, and determines whether the NTB chip 122(a) is operating normally according to the heartbeat signal. The CPLD13(b) receives the heartbeat signal output by the NTB chip 122(b) through the signal detection pin, and determines whether the NTB chip 122(b) is operating normally according to the heartbeat signal. Assume that when the CPLD13(a) determines that the NTB chip 122(a) is operating abnormally, the CPLD13(a) controls the NTB chip 122(a) to be powered off, so that the PCIe link between the NTB chip 122(a) and the CPU11 is disconnected, and the PCIe link between the NTB chip 122(a) and the NTB chip 122(b) is also disconnected, thereby avoiding the CPU11(b) from also operating abnormally.
In the scheme provided by the embodiment of the application, whether the PCIe chip 12 normally works is detected by the CPLD13 according to the heartbeat signal output by the PCIe chip 12, and when it is detected that the PCIe chip 12 abnormally works, the PCIe link between the PCIe chip 12 and the CPU11 is disconnected by controlling the power down of the PCIe chip 12, so that the abnormal detection and the chain disconnection operation of the PCIe chip 12 are realized. Due to the characteristics of the CPLD13, the processing speed of the CPLD is higher than that of software, and the CPLD is not influenced by slow card or even hang-up of the CPU11 caused by message transmission blocking, so that the abnormality of the PCIe chip 12 can be quickly detected; and the PCIe link between the PCIe chip 12 and the CPU11 is disconnected by powering down the PCIe chip 12, so that the disconnection is more timely and fast. The technical scheme provided by the embodiment of the application improves the reliability and maintainability of the management of the PCIe chip 12 and reduces the fault time.
Referring to fig. 5, a flowchart of a method for detecting PCIe chip exception according to an embodiment of the present application is shown, where the method is applicable to the CPLD13 provided in the embodiment of fig. 1. The method may include the steps of:
in step 501, the CPLD13 determines whether the PCIe chip 12 works normally according to the heartbeat signal acquired by the signal detection pin.
PCIe chip 12 has heartbeat signal pin, CPLD13 has signal detection pin, PCIe chip 12's heartbeat signal pin and CPLD 13's signal detection pin electric connection. The PCIe chip 12 outputs a heartbeat signal to the CPLD13 through the heartbeat signal pin, and accordingly, the CPLD13 receives the heartbeat signal output by the PCIe chip 12 through the signal detection pin.
The heartbeat signal is used to indicate whether the PCIe chip 12 is operating properly. The heartbeat signal may be a high-low level signal in the form of a square waveform. When the PCIe chip 12 works normally, the PCIe chip 12 outputs a heartbeat signal of the first form to the CPLD 13; when the PCIe chip 12 abnormally operates, the PCIe chip 12 outputs a heartbeat signal of the second form to the CPLD 13; wherein the first form is different from the second form. The CPLD13 can determine whether the PCIe chip 12 operates normally according to the form of the received heartbeat signal. For example, when the PCIe chip 12 operates normally, the first heartbeat signal output by the PCIe chip 12 to the CPLD13 is a high-low level signal in the form of a square wave. If the PCIe chip 12 is abnormal when the PCIe chip 12 outputs the high level signal, the PCIe chip 12 continues to output the high level signal and does not switch to the low level signal; or, if the PCIe chip 12 is abnormal when the PCIe chip 12 outputs the low level signal, the PCIe chip 12 may continuously output the low level signal and no longer switch to the high level signal. Therefore, when the PCIe chip 12 abnormally operates, the heartbeat signal of the second form output by the PCIe chip 12 to the CPLD13 may be a continuous high-level signal or a continuous low-level signal.
In one example, the heartbeat signal is a square wave signal with a preset frequency, and the square wave signal is used as the heartbeat signal, so that the CPLD13 can distinguish the form change of the heartbeat signal more simply and definitely, and it is helpful to determine whether the PCIe chip 12 works normally or not quickly and accurately. The preset frequency may be an empirical value preset according to actual conditions, for example, the preset frequency is 10Hz (hertz). In practical applications, the value of the preset frequency may be determined according to the message sending frequency between the CPU11 and the PCIe chip 12, or according to the requirement for the detection speed for detecting whether the PCIe chip 12 normally operates. For example, when the message sending frequency between the CPU11 and the PCIe chip 12 is high, in order to avoid message blocking caused by an abnormal PCIe chip, it is necessary to more quickly detect whether the PCIe chip 12 normally operates, so that the value of the preset frequency may be appropriately selected to be larger.
In addition, whether the PCIe chip 12 normally operates may be determined as a standard, whether the PCIe chip 12 can normally process the message issued by the CPU11, when the PCIe chip 12 can normally process the message issued by the CPU11, the PCIe chip 12 is considered to normally operate, and when the PCIe chip 12 cannot normally process the message issued by the CPU11, the PCIe chip 12 is considered to abnormally operate.
In the embodiment of the present application, the CPLD13 determines whether the PCIe chip 12 normally operates according to the heartbeat signal acquired by the signal detection pin. For example, when the heartbeat signal acquired by the signal detection pin is in the first form, the CPLD13 determines that the PCIe chip 12 normally operates; when the heartbeat signal acquired by the signal detection pin is in the second form, the CPLD13 determines that the PCIe chip 12 abnormally operates.
Optionally, when the heartbeat signal is a square wave signal with a preset frequency and when the high level duration or the low level duration of the heartbeat signal is greater than a preset threshold, the CPLD13 determines that the PCIe chip 12 abnormally operates. The preset threshold may be an empirical value preset according to an actual situation, for example, the preset threshold may be set according to a value of a preset frequency. For example, when the preset frequency is 10Hz, the preset threshold is 1 second or 0.5 second. With reference to fig. 2A and fig. 2B, fig. 2A is a schematic diagram illustrating a first form of heartbeat signal output when the PCIe chip 12 normally operates, where the first form of heartbeat signal is a continuously switched high-low level signal; fig. 2B is a schematic diagram illustrating a second form of the heartbeat signal output when the PCIe chip 12 abnormally operates, where the second form of the heartbeat signal may be a continuous high-level signal (as shown in part (a) in fig. 2B) or a continuous low-level signal (as shown in part (B) in fig. 2B).
In step 502, when it is determined that the PCIe chip 12 works abnormally, the CPLD13 controls the PCIe chip 12 to power down.
In the embodiment of the present application, when the CPLD13 determines that the PCIe chip 12 works abnormally, the PCIe chip 12 is immediately controlled to be powered down, and since the PCIe chip 12 is powered down, the PCIe link between the PCIe chip 12 and the CPU11 is also disconnected. The chain breaking is triggered by controlling the power-off of the PCIe chip 12, so that the chain breaking is quicker.
In one example, when it is determined that the PCIe chip 12 operates abnormally, the CPLD13 outputs a down electric signal to the power control chip, which is used to instruct the power control chip to stop supplying power to the PCIe chip 12. Alternatively, the lower electrical signal may be a high level signal, a low level signal, or a combination of high and low level signals. The power control chip is configured to stop supplying power to the PCIe chip 12 according to the power-down signal, for example, after the power control chip receives the power-down signal, the power control chip does not output a high level to the PCIe chip 12 any more, so as to stop supplying power to the PCIe chip 12.
Alternatively, the CPU11 may execute AER after detecting a link disconnection with the PCIe chip 12. The CPU11 sends a power-on signal to the CPLD13, where the power-on signal is used to trigger the CPLD13 to control the PCIe chip 12 to be powered on again. After controlling the power-off of the PCIe chip 12, if the CPLD13 receives the power-on signal output by the CPU11, the CPLD13 controls the PCIe chip 12 to be powered on again according to the power-on signal. Optionally, the CPU11 sends the power-on signal to the CPLD13 by writing a register, the CPU11 may write a value in the register of the CPLD13, where the value is used to instruct the PCIe chip 12 to be powered on again, and after the CPLD13 reads the value written in the register, the PCIe chip 12 is controlled to be powered on again. For example, the CPLD13 outputs the power-on signal to the power control chip after receiving the power-on signal, and the power control chip re-supplies the high level to the PCIe chip 12 after receiving the power-on signal, so as to restart power supply to the PCIe chip 12. After the power-up of the PCIe chip 12, an attempt is made to reestablish the PCIe link by executing a link establishment procedure with the CPU 11. If the fault source causing the abnormal work of the PCIe chip 12 is eliminated, the PCIe link is successfully rebuilt; if the source of the failure that caused the PCIe chip 12 to function abnormally is not eliminated (e.g., the PCIe chip 12 is completely broken), the PCIe link will fail to re-establish, but the source of the failure will no longer continue to affect the CPU 11.
For details which are not disclosed in the above-described method embodiment, reference is made to the product embodiment shown in fig. 1 above.
In the scheme provided by the embodiment of the application, whether the PCIe chip 12 normally works is detected by the CPLD13 according to the heartbeat signal output by the PCIe chip 12, and when it is detected that the PCIe chip 12 abnormally works, the PCIe link between the PCIe chip 12 and the CPU11 is disconnected by controlling the power down of the PCIe chip 12, so that the abnormal detection and the chain disconnection operation of the PCIe chip 12 are realized. Due to the characteristics of the CPLD13, the processing speed of the CPLD is higher than that of software, and the CPLD is not influenced by slow card or even hang-up of the CPU11 caused by message transmission blocking, so that the abnormality of the PCIe chip 12 can be quickly detected; and the PCIe link between the PCIe chip 12 and the CPU11 is disconnected by powering down the PCIe chip 12, so that the disconnection is more timely and fast. The technical scheme provided by the embodiment of the application improves the reliability and maintainability of the management of the PCIe chip 12 and reduces the fault time.
An exemplary embodiment of the present application further provides a CPLD13, where firmware is written in the CPLD13, and is used to implement the method provided in the embodiment of fig. 5.
It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. The use of "first," "second," and similar terms herein do not denote any order, quantity, or importance, but rather the terms are used to distinguish one object from another.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the embodiments of the present application in further detail, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present application, and are not intended to limit the scope of the embodiments of the present application, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the embodiments of the present application should be included in the scope of the embodiments of the present application.

Claims (10)

1. A computer device, characterized in that the computer device comprises: the system comprises a Central Processing Unit (CPU), a peripheral component interconnect express (PCIe) chip coupled with the CPU and a Complex Programmable Logic Device (CPLD) coupled with the PCIe chip; the heartbeat signal pin of the PCIe chip is electrically connected with the signal detection pin of the CPLD, the PCIe chip outputs a heartbeat signal to the CPLD through the heartbeat signal pin, and the heartbeat signal is used for indicating whether the PCIe chip works normally or not;
the CPLD is used for determining whether the PCIe chip works normally or not according to the heartbeat signal acquired by the signal detection pin; when the PCIe chip is determined to work abnormally, controlling the PCIe chip to be powered down;
and whether the PCIe chip works normally or not comprises whether the PCIe chip can normally process a message sent by the CPU or not.
2. The computer device according to claim 1, wherein the heartbeat signal is a square wave signal of a preset frequency.
3. The computer device of claim 2,
the CPLD is further used for determining that the PCIe chip works abnormally when the duration time of the high level or the duration time of the low level of the heartbeat signal is larger than a preset threshold value.
4. The computer device of any of claims 1 to 3,
the CPLD is further used for outputting a power-down signal to a power supply control chip when the PCIe chip is determined to work abnormally, and the power supply control chip is used for stopping supplying power to the PCIe chip according to the power-down signal.
5. The computer device of any of claims 1 to 3,
and the CPLD is also used for controlling the PCIe chip to be powered on again according to the power-on signal if the power-on signal output by the CPU is received after the PCIe chip is controlled to be powered off.
6. The computer device of any of claims 1 to 3, wherein the PCIe chip is a PCIe switch chip; or, the PCIe chip is a non-transparent bridge NTB chip.
7. A method for detecting PCIe chip abnormity of peripheral component interconnect express (PCI express) is characterized in that the method is applied to a Complex Programmable Logic Device (CPLD) of computer equipment; the computer device includes: a Central Processing Unit (CPU), the PCIe chip coupled with the CPU, and the CPLD coupled with the PCIe chip; the heartbeat signal pin of the PCIe chip is electrically connected with the signal detection pin of the CPLD, the PCIe chip outputs a heartbeat signal to the CPLD through the heartbeat signal pin, the heartbeat signal is used for indicating whether the PCIe chip works normally or not, and whether the PCIe chip works normally or not comprises whether the PCIe chip can normally process a message issued by the CPU or not;
the method comprises the following steps:
the CPLD determines whether the PCIe chip works normally or not according to the heartbeat signal acquired by the signal detection pin;
and when the PCIe chip is determined to work abnormally, the CPLD controls the power down of the PCIe chip.
8. The method according to claim 7, wherein the heartbeat signal is a square wave signal of a preset frequency;
the CPLD determines whether the PCIe chip works normally according to the heartbeat signal acquired by the signal detection pin, and the method comprises the following steps:
and when the high level duration or the low level duration of the heartbeat signal is greater than a preset threshold value, the CPLD determines that the PCIe chip works abnormally.
9. The method of claim 7 or 8, wherein the CPLD controls the PCIe chip to power down, comprising:
the CPLD outputs a down electric signal to a power supply control chip, and the power supply control chip is used for stopping supplying power to the PCIe chip according to the down electric signal.
10. The method of claim 7 or 8, wherein after the CPLD controls the PCIe chip to power down, further comprising:
and when a power-on signal output by the CPU is received, the CPLD controls the PCIe chip to be powered on again according to the power-on signal.
CN201711044736.9A 2017-10-31 2017-10-31 Method for detecting PCIe chip abnormity and computer equipment Active CN109726055B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711044736.9A CN109726055B (en) 2017-10-31 2017-10-31 Method for detecting PCIe chip abnormity and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711044736.9A CN109726055B (en) 2017-10-31 2017-10-31 Method for detecting PCIe chip abnormity and computer equipment

Publications (2)

Publication Number Publication Date
CN109726055A CN109726055A (en) 2019-05-07
CN109726055B true CN109726055B (en) 2021-01-12

Family

ID=66293094

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711044736.9A Active CN109726055B (en) 2017-10-31 2017-10-31 Method for detecting PCIe chip abnormity and computer equipment

Country Status (1)

Country Link
CN (1) CN109726055B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112867107B (en) * 2019-11-28 2022-09-23 华为技术有限公司 Wireless fidelity WIFI chip control method and related equipment thereof
CN113791368A (en) * 2021-09-10 2021-12-14 苏州浪潮智能科技有限公司 Method and device for automatically checking misplugging of interconnection cables of server and GPU (graphics processing Unit) box

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104125049A (en) * 2014-08-08 2014-10-29 浪潮电子信息产业股份有限公司 Redundancy implementation method of PCIE (Peripheral Component Interface Express) device based on BRICKLAND platform
CN104461805A (en) * 2014-12-29 2015-03-25 浪潮电子信息产业股份有限公司 CPLD-based system state detecting method, CPLD and server mainboard
CN104639304A (en) * 2015-02-05 2015-05-20 南京阖云骥联信息科技有限公司 Dual-controller communication system based on internet of vehicles and dual-controller communication method based on internet of vehicles
JP2015225522A (en) * 2014-05-28 2015-12-14 富士ゼロックス株式会社 System and failure processing method
CN105912089A (en) * 2016-04-07 2016-08-31 浪潮电子信息产业股份有限公司 Battery redundancy method, device and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015225522A (en) * 2014-05-28 2015-12-14 富士ゼロックス株式会社 System and failure processing method
CN104125049A (en) * 2014-08-08 2014-10-29 浪潮电子信息产业股份有限公司 Redundancy implementation method of PCIE (Peripheral Component Interface Express) device based on BRICKLAND platform
CN104461805A (en) * 2014-12-29 2015-03-25 浪潮电子信息产业股份有限公司 CPLD-based system state detecting method, CPLD and server mainboard
CN104639304A (en) * 2015-02-05 2015-05-20 南京阖云骥联信息科技有限公司 Dual-controller communication system based on internet of vehicles and dual-controller communication method based on internet of vehicles
CN105912089A (en) * 2016-04-07 2016-08-31 浪潮电子信息产业股份有限公司 Battery redundancy method, device and system

Also Published As

Publication number Publication date
CN109726055A (en) 2019-05-07

Similar Documents

Publication Publication Date Title
US9697167B2 (en) Implementing health check for optical cable attached PCIE enclosure
CN100388219C (en) Arbitration method and system for redundant controllers
CN104102559B (en) A kind of double controller storage system restarting link based on redundancy heart beating link and opposite end
CN106610712B (en) Substrate management controller resetting system and method
CN111767244B (en) Dual-redundancy computer equipment based on domestic Loongson platform
RU2614569C2 (en) Rack with automatic recovery function and method of automatic recovery for this rack
CN104050061A (en) Multi-main-control-panel redundant backup system based on PCIe bus
US9026685B2 (en) Memory module communication control
JP2024512316A (en) Independent slot control for expansion cards
CN109726055B (en) Method for detecting PCIe chip abnormity and computer equipment
CN212541329U (en) Dual-redundancy computer equipment based on domestic Loongson platform
US20240220385A1 (en) Power source consumption management apparatus for four-way server
CN110985426B (en) Fan control system and method for PCIE Switch product
CN115809164A (en) Embedded equipment, embedded system and hierarchical reset control method
US6943463B2 (en) System and method of testing connectivity between a main power supply and a standby power supply
US20200210201A1 (en) Information processing system and relay device
CN116644011B (en) Quick identification method, device and equipment of I2C equipment and storage medium
JPH09146875A (en) Separation of adaptor card slot for hot plugging
CN114296995B (en) Method, system, equipment and storage medium for server to autonomously repair BMC
CN107276832B (en) Method and device for improving communication reliability of PSU and system
CN103378902B (en) The main/standby switching method of OLT system and OLT system
JP2019159439A (en) Computer system
JP2014026317A (en) Peripheral apparatus extension control board, peripheral apparatus extension device, and peripheral apparatus extension control program
JP2014164488A (en) Control device, control method, and control program
JP2013254333A (en) Multiple system control system and control method therefor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200421

Address after: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Applicant after: HUAWEI TECHNOLOGIES Co.,Ltd.

Address before: 301, A building, room 3, building 301, foreshore Road, No. 310052, Binjiang District, Zhejiang, Hangzhou

Applicant before: Hangzhou Huawei Digital Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211223

Address after: 450046 Floor 9, building 1, Zhengshang Boya Plaza, Longzihu wisdom Island, Zhengdong New Area, Zhengzhou City, Henan Province

Patentee after: xFusion Digital Technologies Co., Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.

TR01 Transfer of patent right