CN117149484A - Degradation error processing method, degradation error processing device, degradation error processing server, degradation error processing equipment and storage medium - Google Patents

Degradation error processing method, degradation error processing device, degradation error processing server, degradation error processing equipment and storage medium Download PDF

Info

Publication number
CN117149484A
CN117149484A CN202311074465.7A CN202311074465A CN117149484A CN 117149484 A CN117149484 A CN 117149484A CN 202311074465 A CN202311074465 A CN 202311074465A CN 117149484 A CN117149484 A CN 117149484A
Authority
CN
China
Prior art keywords
peripheral component
power
target peripheral
degradation
degradation error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311074465.7A
Other languages
Chinese (zh)
Inventor
何业缘
魏东
刘庆元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202311074465.7A priority Critical patent/CN117149484A/en
Publication of CN117149484A publication Critical patent/CN117149484A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4063Device-to-bus coupling
    • G06F13/4068Electrical coupling
    • G06F13/4081Live connection to bus, e.g. hot-plugging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express

Abstract

The embodiment of the disclosure relates to a degradation error processing method, a degradation error processing device, a degradation error processing server, degradation error processing equipment and a storage medium. The main steps of the method comprise: receiving a degradation error notification from the platform control hub, and determining a target peripheral component quick interconnection device with degradation error according to the degradation error notification; controlling the target peripheral component quick interconnection equipment to be powered down; the target peripheral component interconnect express device is controlled to power up and is configured to perform a retraining operation after power up. By adopting the method, the system function utilization rate of the server can be improved while degradation errors are eliminated. When the PCI E device has a degradation error, the target PCI E device with the degradation error can be accurately identified, and the target PCI E device can be independently powered on, powered off and repeatedly trained, so that the target PCI E device can be automatically repaired without user intervention.

Description

Degradation error processing method, degradation error processing device, degradation error processing server, degradation error processing equipment and storage medium
Technical Field
The present disclosure relates to the field of communications technologies, and in particular, to a degradation error processing method, apparatus, server, device, and storage medium.
Background
PCIE (Peripheral Component Interconnect express ) device is a communication device that uses the high-speed serial computer expansion bus standard, and can be used to expand the input/output interface resources of the components of the CPU (Central Processing Unit ) of the server.
In general, during the operation of the server, there may be some cases that cause degradation errors (degradation Error) of PCIE devices to occur. For example, in the case where SI (Signal Integrity) parameter settings are inaccurate, degradation errors of PCIE devices may be caused. After the degradation error of the PCIE devices occurs, the server may be restarted, so that all PCIE devices may be retrained, thereby eliminating the degradation error.
However, restarting the server in a manner that eliminates degradation errors may result in the system functions of the server not being used during the restart.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, apparatus, server, device, and storage medium that can improve the system function utilization of a server while eliminating degradation errors.
In a first aspect, an embodiment of the present disclosure provides a degradation error processing method, including:
Receiving a degradation error notification from the platform control hub, and determining a target peripheral component quick interconnection device with degradation error according to the degradation error notification;
controlling the target peripheral component quick interconnection equipment to be powered down;
the target peripheral component interconnect express device is controlled to power up and is configured to perform a retraining operation after power up.
In some embodiments, receiving a degradation error notification from a platform control hub includes:
receiving a first type level signal which is transmitted by a platform control hub through a first type general input/output interface and indicates degradation errors;
receiving a second class level signal which is transmitted by the platform control hub through a second class general input/output interface and represents the identification of the target peripheral component quick interconnection device;
the first type of level signal and the second type of level signal are determined to be degraded error notifications.
In some embodiments, determining a target peripheral component quick interconnect device in which a degradation error occurred based on a degradation error notification comprises:
and determining the digital number serving as the identification of the rapid interconnection device of the target peripheral component according to the second class level signal.
In some embodiments, the second type of universal input output interface comprises a plurality of universal input output interfaces. According to the second class of level signals, determining the digital number serving as the target peripheral component quick interconnection device identifier comprises the following steps:
Determining a number value according to the level signal of each general input/output interface;
the numerical number is determined from a plurality of number values.
In some embodiments, determining the number as the target peripheral component interconnect express device identification based on the second class level signal comprises:
determining level change state information of the second class level signal in the first class level signal maintaining period;
the number is determined from the level change state information.
In some embodiments, the degradation error handling method further comprises: sending a hot pull request to a central processing unit; a down signal is received. The central processing unit is used for sending a hot extraction instruction to the target peripheral component quick interconnection equipment according to the hot extraction request and feeding back a power-down signal. Accordingly, controlling the target peripheral component interconnect express device to power down includes:
and controlling the target peripheral component quick interconnection device to be powered down according to the power-down signal.
In some embodiments, the degradation error handling method further comprises: sending a hot plug request to a central processing unit; and receiving a power-on signal. The central processing unit is used for sending a hot plug instruction to the target peripheral component quick interconnection device according to the hot plug request and feeding back a power-on signal. Accordingly, controlling the target peripheral component interconnect express device to power up includes:
And controlling the target peripheral component to quickly interconnect the device to power up according to the power-up signal.
In some embodiments, controlling the target peripheral component interconnect express device to power down includes: and controlling the power supply controller corresponding to the target peripheral component quick interconnection device to be powered off.
In some embodiments, controlling the target peripheral component interconnect express device to power up includes: the power supply controller corresponding to the control target peripheral component quick interconnection device is powered on.
In some embodiments, the degradation error handling method further comprises: a timer is started for recording the power-down time of the target peripheral component interconnect express device. Accordingly, issuing a hot plug request to the central processor includes: and when the timing duration of the timer reaches the preset duration, sending a hot plug request to the central processing unit.
In a second aspect, embodiments of the present disclosure provide a degradation error processing apparatus, the apparatus comprising:
the system comprises a degradation error notification receiving module, a degradation error notification receiving module and a control module, wherein the degradation error notification receiving module is used for receiving a degradation error notification from a platform control hub and determining target peripheral component quick interconnection equipment with degradation errors according to the degradation error notification;
the power-down control module is used for controlling the target peripheral component quick interconnection equipment to be powered down;
And the power-on control module is used for controlling the power-on of the target peripheral component quick interconnection device, and the target peripheral component quick interconnection device is used for executing retraining operation after power-on.
In a third aspect, embodiments of the present disclosure provide a server including a processor and a platform control hub. The processor is used for: receiving a degradation error notification from the platform control hub, and determining a target peripheral component quick interconnection device with degradation error according to the degradation error notification; controlling the target peripheral component quick interconnection equipment to be powered down; the target peripheral component interconnect express device is controlled to power up and is configured to perform a retraining operation after power up. The platform control hub is used for generating a degradation error notification according to the degradation error detected and the serial number of the peripheral component quick interconnection device identifying the degradation error.
In some embodiments, the server further comprises a central processor.
The central processing unit is used for sending a hot extraction instruction to the target peripheral component quick interconnection equipment according to the hot extraction request from the processor and feeding back a power-down signal to the processor; the processor is also used for receiving the power-down signal and controlling the target peripheral component rapid interconnection equipment to be powered down according to the power-down signal; and/or the number of the groups of groups,
The central processing unit is used for sending a hot plug instruction to the target peripheral component quick interconnection device according to the hot plug request from the processor and feeding back a power-on signal to the processor; the processor is also used for receiving a power-on signal and controlling the target peripheral component quick interconnection device to be powered on according to the power-on signal.
In a fourth aspect, embodiments of the present disclosure provide a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the degradation error handling method in any of the embodiments of the present disclosure in the first aspect when the computer program is executed by the processor.
In a fifth aspect, embodiments of the present disclosure provide a computer readable storage medium having a computer program stored thereon, characterized in that the computer program, when executed by a processor, implements the steps of the degradation error handling method in any embodiment of the first aspect of the present disclosure.
According to the degradation error processing method, the device, the server, the equipment and the storage medium, when the degradation error occurs to the PCIE equipment, the GPID can determine the target PCIE equipment with the degradation error from the PCIE equipment according to the received degradation error notification comprising the identification information of the target PCIE equipment with the degradation error, and further control the power-down and power-up of the target PCIE equipment, so that the target PCIE equipment is trained to eliminate the degradation error after the power-up, and other PCIE equipment keeps the original power-up state and continues to operate normally, and the effect of improving the system function utilization rate of the server while eliminating the degradation error can be achieved. In addition, when the PCIE equipment has a degradation error, the target PCIE equipment with the degradation error can be accurately identified, and the target PCIE equipment can be independently powered on, powered off and repeatedly trained, so that the target PCIE equipment can be automatically repaired without user intervention.
Drawings
FIG. 1 is an application environment diagram of some embodiments of a degraded error handling method;
FIG. 2 is a flow diagram of a method of degrading error handling in some embodiments;
FIG. 3 is a flow diagram of steps involved in receiving a degradation error notification in some embodiments;
FIG. 4 is a flow diagram that illustrates steps involved in determining PCIE devices based on degradation notifications in some embodiments;
FIG. 5 is a flow diagram of steps involved in determining a number in some embodiments;
FIG. 6 is a flow chart of yet another step involving determining a number in some embodiments;
fig. 7 is a flowchart illustrating steps involved in controlling power-down of a target PCIE device in some embodiments;
fig. 8 is a flowchart illustrating steps involved in controlling power-up of a target PCIE device in some embodiments;
FIG. 9 is an application environment diagram of yet another method of degrading error handling in some embodiments;
FIG. 10 is a flow chart of steps involved in starting a timer in some embodiments;
FIG. 11 is a block diagram of a degraded error handling apparatus in some embodiments;
FIG. 12 is an internal block diagram of a server in some embodiments;
fig. 13 is an internal block diagram of a computer device in some embodiments.
Detailed Description
In order to make the objects, technical solutions and advantages of the present disclosure more apparent, the present disclosure will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present disclosure.
The degradation error processing method provided by the embodiment of the disclosure can be applied to an application environment as shown in fig. 1. Wherein the PCH 101 (PCH, platform Controller Hub, platform control hub) is communicatively connected to the processor 102, the PCH 101 may send a degradation error notification to the processor 102. The processor 102 is connected to the PCIE device 103, and the processor 102 may control power-up and power-down of the PCIE device.
The processor 102 may be implemented in at least one hardware form of a Complex Programmable Logic Device (CPLD), programmable Logic Array (PLA), field Programmable Gate Array (FPGA), digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), general purpose processor, or other programmable logic device.
PCIE device 103 is a communication device that uses the high-speed serial computer expansion bus standard, and can be used to expand the input/output interface resources of the components of the CPU of the server. PCIE device 103 may include one or more PCIE devices. In some cases, PCIE device 103 may be a motherboard that includes PCIE slots.
In a first aspect, an embodiment of the disclosure provides a degradation error processing method, taking an example of application to the processor 102 in fig. 1 as an illustration, and in some embodiments, in conjunction with fig. 1 and fig. 2, the degradation error processing method includes steps S201, S202, and S203 that the processor 102 may execute, and each step is described below.
Step 201, receiving a degradation error notification from a platform control hub, and determining a target peripheral component fast interconnect device in which a degradation error occurs according to the degradation error notification.
When the PCH 101 detects that a degradation error occurs in the PCIE device 103, the number of the PCIE device 103 in which the degradation error occurs may be identified, and a degradation notification transmission processor 102 may be generated. In some alternative embodiments, the method of detecting the degradation error by the PCH 101 may be to compare the current bandwidth rate with the rated bandwidth rate by the BIOS (Basic Input Output System ) code, and determine whether the degradation error occurs according to the comparison result. In some alternatives, the PCH 101 may also extract PCIE device 103 information that a degradation error occurred through a hardware error checking tool (e.g., MCE log). Of course, those skilled in the art may also configure the PCH 101 to detect degradation errors and send degradation error notifications in other ways, depending on the particular needs.
The degradation Error may be Link Width Error or Link Speed Error. Link Width Error means that the actual bandwidth of PCIE device 103 is less than its nominal bandwidth, link Speed Error means that the actual rate of PCIE device 103 is less than its nominal rate. The bandwidth of the PCIE device 103 refers to the number of lanes (a unidirectional, single-signal physical transmission channel) for implementing data transmission, the bandwidth types may include X1, X2, X4, X8, and X16, the rate of the PCIE device 103 refers to the rate of each Lane, and may include Gen 1 (2.5 Gb/s), gen 2 (5 Gb/s), gen 3 (8 Gb/s), gen 4 (16 Gb/s), and Gen 5 (32 Gb/s). Where Gen 1, gen 2, gen 3, gen 4, gen 5 refer to, respectively, in some cases, alternative different modes of operation of PCIE slots of PCIE device 103.
In some alternative embodiments, the downgrade error notification may include information of the downgrade error occurred and information of the target PCIE device identification where the downgrade error occurred.
In general, the PCIE device 103 may include multiple PCIE devices, where the target PCIE device refers to a PCIE device having a degradation error in the multiple PCIE devices, each PCIE device has unique determined identification information, and the target PCIE device may be located through the target PCIE device identification information.
In some alternative embodiments, PCH 101 and processor 102 may communicate a degradation error notification through a GPIO (General Purpose Input Output, general purpose input output interface), where the degradation error may be represented by a level signal of the GPIO.
In step 202, the control target peripheral component quick interconnect device is powered down.
The processor 102 controls the target PCIE device to power down, and may control the power circuit of the target PCIE device to be disconnected through the power controller, or may adopt other modes capable of implementing disconnection between the target PCIE device and the power supply.
In step 203, the control target peripheral component quick interconnect device powers up. Wherein the target peripheral component interconnect express device is configured to perform a retraining operation upon power up.
The processor 102 controls the power-on of the target PCIE device, and may control the power circuit of the target PCIE device to communicate through the power controller, or may adopt other modes capable of implementing the communication between the target PCIE device and the power source.
And after the target PCIE device is powered on, performing retraining operation. Typically, degradation errors that occur for PCIE devices may be recovered by retraining, where retraining degradation errors may include recovering normal bandwidth and recovering normal rate.
By executing step S201, step S202, and step S203, when a degradation error occurs in the PCIE device 103, the processor 102 may determine, from the multiple PCIE devices, a target PCIE device having the degradation error according to a received degradation error notification including the degradation error occurrence target PCIE device identification information, and further control powering down and powering up the target PCIE device, so that the target PCIE device retrains after powering up to eliminate the degradation error, while other PCIE devices keep an original powered up state, continue to operate normally, and can achieve an effect of improving a system function utilization rate of the server while eliminating the degradation error.
In some embodiments, the degradation error notification is represented by a level signal of the GIPO, as shown in fig. 3, step S201 may further include the steps of:
step S301, a first type level signal which is transmitted by a platform control hub through a first type general input/output interface and indicates degradation errors is received;
step S302, receiving a second class level signal which is transmitted by the platform control hub through a second class general input/output interface and represents the identification of the target peripheral component quick interconnection device;
step S303, determining the first class level signal and the second class level signal as degradation error notification.
The GPIOs between PCH 101 and server 102 may include a first type of GPIOs and a second type of GPIOs, where the level signal of the first type of GPIOs is a first type of level signal and the level signal of the second type of GPIOs is a second type of level signal.
In some alternative embodiments, PCH 101 indicates that a degradation error occurs in PCIE device 103 via a first type of level signal and indicates that the target PCIE device identification of the degradation error occurs via a second type of level signal.
In some alternative embodiments, the first type of GPIO may include 1 or more GPIOs, and the first type of level signal used to indicate that the degradation error occurs may be a high level signal, or a low level signal, or special change state information of the first type of GPIO level signal in a specific period.
In some embodiments, the second class of GPIOs may include 1 or more GPIOs, as shown in fig. 4, and the step of determining the target PCIE according to the degradation notification in step S201 may further include step S401: and determining the digital number serving as the identification of the rapid interconnection device of the target peripheral component according to the second class level signal.
The processor 102 may parse the second type level signal information in the received degradation error notification to obtain a number, and use the number as a PCIE device identifier to locate the target PCIE device.
In some embodiments, the second class of GPIOs may include a plurality of GPIOs, as shown in fig. 5, and step S401 may include:
step S501: determining a number value according to the level signal of each general input/output interface;
step S502: the numerical number is determined from a plurality of number values.
Each GPIO level signal in the second class of GPIOs may correspond to a number value, which may include digits 0 and 1, where a combination of the number values may result in a digital number.
In some alternatives, fig. 9 illustrates an application environment of the steps illustrated in fig. 5, and as shown in fig. 9, the PCIE device 103 may include a first PCIE device 931, a second PCIE device 932, and a third PCIE device 933; the first type of GPIO includes a first GPIO 911, and at this time, the second type of GPIO may include a second GPIO 912, a third GPIO 913, a fourth GPIO 914, and a fifth GPIO 915, where a number of four digits may be obtained according to the high-low level signal conditions of the four second type of GPIOs. For example, when the first PCIE device 931 has a degradation error, the PCH 101 sequentially includes a low-level signal, and a high-level signal according to the level signal sent by the second GPIO, and the corresponding processor 102 parses the low-level signal to obtain a digital number 0001; when the second PCIE device 932 performs a degradation error, the PCH 101 sequentially performs a low-level signal, a high-level signal, and a low-level signal according to the level signal sent by the second GPIO, and the corresponding processor 102 parses the level signal to obtain a number 0010. The corresponding relationship between the high level and the low level and the number values 1 and 0, the number of GPIOs in the second class of GPIOs, and the corresponding relationship between the number and the PCIE device 103 can be set by those skilled in the art according to specific actual needs.
In some embodiments, as shown in fig. 6, step S401 may further include:
step S601: determining level change state information of the second class level signal in the first class level signal maintaining period;
step S602: the number is determined from the level change state information.
The level state information of the second level signal is used for representing the state of the second level signal in the level signal high-low change of the first level signal in the maintenance period of the first level signal.
In order to ensure that the digital number obtained after the processor 102 analyzes the second type of level signal and the degradation error information obtained after the processor analyzes the first type of level signal are sent out for the degradation error of the same PCIE device, the level change state information obtaining period for determining the digital number is consistent with the first type of level signal maintenance period.
At this time, the number of GPIOs of the second type of GPIOs may be less than 4, for example, 1, 2, or others. Of course, the number of GPIOs of the second class may also be greater than 4. In any case, by executing step S601 and step S602, a small number of GPIOs can be combined with different level change status information, and can correspond to the identification of a large number of PCIE devices.
Taking the example that the second type of GPIO includes 1 GPIO, in a preset period, the states of level signal level change of the 1 GPIO may be various, for example: setting the preset duration to be 4 unit time periods, wherein the level signals appearing in the preset duration can be sequentially: high level signal, low level signal, these level signals can represent the state that a level signal is high low change. For example: the level signals occurring within the preset time period may be sequentially: low level signal, high level signal, these level signals then can represent the state of another level signal high low change. Of course, different high-low signal combinations can also appear in sequence in the four unit time periods, and correspondingly, the different high-low signal combinations can represent other different high-low signal change states of the level signals.
Each state of the level change corresponds to a number, and the processor 102 may obtain the number of the target PCIE with the degradation error by analyzing the level state information.
The preset duration and the number of unit time periods can be set according to specific actual needs, and in the case that the second type GPIO includes a plurality of GPIOs, a person skilled in the art can make corresponding settings according to an example of a high-low change state of one GPIO level signal, so that the processor 102 can determine the number according to level change state information of the plurality of GPIOs.
By executing step S601 and step S602, the number of the numbers can be determined by the level change state information of 1 GPIO or other small number of GPIOs, so that the number of GPIOs used for transmitting degradation error notification can be saved, and meanwhile, the state number of level change can be set according to the needs, so that a small number of GPIOs level signals can represent more PCIE device identifications, and the GPIOs utilization rate in the server is improved.
In some embodiments, as shown in fig. 7, the degraded error handling method further comprises the following steps that the processor 102 may perform:
step S701: sending a hot pull request to a central processing unit;
step S702: a down signal is received.
Step S202 accordingly includes step S703: and controlling the target peripheral component quick interconnection device to be powered down according to the power-down signal.
The central processing unit is used for sending a hot extraction instruction to the target peripheral component quick interconnection equipment according to the hot extraction request and feeding back a power-down signal. And the central processing unit sends a hot-pull instruction to the target PCIE device according to the received hot-pull request sent by the processor 102 through the management bus, so that the target PCIE device is switched into a hot-pull state according to the hot-pull instruction. In some optional embodiments, the central processor may remove, according to the received hot-unplugging request, the resource of the unplugged target PCIE device, so that the state of the target PCIE device is in the hot-unplugged state.
After the central processing unit sends the hot-unplugged instruction to the target PCIE device, the central processing unit feeds back a down-going signal to the processor 102, where the down-going signal is used to indicate that the target PCIE device is already in a hot-unplugged state. The processor 102 controls the target PCIE device to power down according to the received power down signal.
By executing steps S701-S703, the processor 102 executes the operation of powering down the PCIE device of the control target after receiving the power-down signal fed back by the central processor, so that the situation that the central processor reports errors due to violent plug-pull can be avoided.
In some embodiments, as shown in fig. 8, the degradation error handling method further includes:
step S801: sending a hot plug request to a central processing unit;
step S802: and receiving a power-on signal.
Step S203 accordingly includes step S803: and controlling the target peripheral component to quickly interconnect the device to power up according to the power-up signal.
The central processing unit is used for sending a hot plug instruction to the target peripheral component quick interconnection device according to the hot plug request and feeding back a power-on signal.
And the central processing unit sends a hot plug-in instruction to the target PCIE device according to the received hot plug-in request sent by the processor 102, so that the target PCIE device is switched to a hot plug-in state according to the hot plug-in instruction. In some optional embodiments, the central processor may increase the resource of the target PCIE device according to the received hot plug request, so that the state of the target PCIE device is in the hot plug state.
After the central processing unit sends the hot-unplugged instruction to the target PCIE device, the central processing unit feeds back a power-on signal to the processor 102, where the power-on signal is used to indicate that the target PCIE device is already in a hot-unplugged state.
The processor 102 controls the power-up of the target PCIE device according to the received power-up signal.
By executing steps S801 to S803, the processor 102 executes the operation of powering on the PCIE device of the control target after receiving the power-on signal fed back by the central processing unit, so that the situation that the central processing unit reports errors due to violent plug-in can be avoided.
In some embodiments, the central processor in step S701 and step S801 may be the CPU 901 in fig. 9.
In some embodiments, the step of controlling the target peripheral component interconnect express device to power down in step S703 includes: and controlling the power supply controller corresponding to the target peripheral component quick interconnection device to be powered off.
When the PCIE device 103 includes multiple PCIE devices, a corresponding power controller may be configured for each PCIE device, and as shown in fig. 1 and fig. 9, the PCIE device 103 may include a first PCIE device 931, a second PCIE device 932, and a third PCIE device 933, where the first PCIE device 931 is correspondingly provided with a first power controller 951, the second PCIE device is correspondingly provided with a first power controller 952, and the third PCIE device 933 is correspondingly provided with a first power controller 953.
The power-down mode of the target PCIE device may be implemented by executing a step of controlling the power-down of the power supply controller corresponding to the target PCIE device. In some alternative embodiments, the processor 102 controls the on/off of the corresponding power controller by controlling the high level and the low level of the enable signal, and when the enable signal is at the low level, the power controller cuts off the power output, and the corresponding target PCIE device is powered off.
The power supply controller can be a power supply chip or other devices capable of realizing power supply on-off control.
In some embodiments, the step of controlling the power up of the target peripheral component interconnect express device in step S803 includes: the power supply controller corresponding to the control target peripheral component quick interconnection device is powered on.
In the case where each PCIE device of the multiple PCIE devices corresponds to a respective power controller, the processor 102 may control the power controllers corresponding to the target PCIE devices to power on the target PCIE devices. In some alternative embodiments, the processor 102 controls the on/off of the corresponding power supply controller by controlling the high level and the low level of the enable signal, and when the enable signal is at the high level, the power supply controller turns on the power supply output, and the corresponding target PCIE device is powered on.
In some embodiments, as shown in fig. 10, the degradation error processing method further includes step S1001: a timer is started for recording the power-down time of the target peripheral component interconnect express device.
Accordingly, step S803 may include step S1002: and when the timing duration of the timer reaches the preset duration, sending a hot plug request to the central processing unit.
The timer is used for starting to count after the processor 102 controls the target PCIE device to power down, and the preset duration may be 10 ms, 50 ms, 100 ms, or other durations, which may be set by those skilled in the art according to actual needs. When the time duration of the timer reaches the aforementioned preset time duration, the processor 102 may issue a hot plug request to the central processor.
Before the processor 102 sends the hot plug request to the central processor, it is determined that the power-down time length of the target PCIE device reaches the preset time length, so that the power-down state of the target PCIE device is ensured to be completed, and the situation that the target PCIE device is restarted and fails due to the fact that the hot plug request is sent when the target PCIE device is still in the power-up state due to instruction transmission delay and further the power-up operation is controlled to be executed by the target PCIE device can be avoided.
It should be understood that, although the steps in the flowcharts of fig. 2-8, 10 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps illustrated in fig. 2-8, 10, and steps involved in other embodiments, are not strictly limited to the order of execution unless explicitly stated herein, and may be performed in other orders. Moreover, at least some of the steps of the foregoing embodiments may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
In a second aspect, the disclosed embodiments provide a degraded error handling apparatus that may be applied to the processor 102 in fig. 1. As shown in fig. 11, the degradation error processing apparatus 1100 includes:
a degradation error notification receiving module 1101, configured to receive a degradation error notification from the platform control hub, and determine, according to the degradation error notification, a target peripheral component interconnect express device in which a degradation error occurs;
A power-down control module 1102, configured to control the target peripheral component rapid interconnection device to power down;
the power-on control module 1103 is configured to control the target peripheral component rapid interconnection device to power on, where the target peripheral component rapid interconnection device is configured to perform a retraining operation after power-on.
In some embodiments, the degradation error notification reception module 1101 may include:
a first signal receiving sub-module (not shown) for receiving a first type level signal representing degradation errors transmitted by the platform control hub through a first type general purpose input/output interface;
a second signal receiving sub-module (not shown) for receiving a second class level signal representing the target peripheral component interconnect express device identifier transmitted by the platform control hub through a second class i/o interface;
a signal determination sub-module (not shown) for determining the first type of level signal and the second type of level signal as degraded error notifications.
In some embodiments, the degradation error notification reception module 1101 may further include:
a digital number determining sub-module (not shown) for determining a digital number as a target peripheral component interconnect express device identifier according to the second class level signal.
In some embodiments, the second type of universal input output interface comprises a plurality of universal input output interfaces, and the numerical number determination submodule may comprise:
the number value determining unit is used for determining a number value according to the level signal of each general input/output interface;
and the target number determining unit is used for determining the number according to the number values.
In some embodiments, the number determination submodule may include:
a state information determining unit for determining level change state information of the second class level signal in the first class level signal maintaining period;
and the number determining unit is used for determining the target number according to the level change state information.
In some embodiments, the degradation error processing apparatus 1100 may further include:
a hot-pull request module (not shown) for issuing a hot-pull request to the central processing unit;
a downlink signal receiving module (not shown) for receiving the downlink signal.
Correspondingly, the central processing unit is used for sending a hot-pull instruction to the target peripheral component quick interconnection device according to the hot-pull request and feeding back a power-down signal. The power down control module 1102 may include: a power-down signal response sub-module (not shown) for controlling the target peripheral component quick interconnect device to power down based on the power-down signal.
In some embodiments, the degradation error processing apparatus 1100 may further include:
a hot plug request module (not shown) for issuing a hot plug request to the central processor;
a power-on signal receiving module (not shown) receives a power-on signal.
Accordingly, the central processing unit is used for sending a hot plug instruction to the target peripheral component quick interconnection device according to the hot plug request and feeding back a power-on signal. The power-on control module 1103 may include: a power-up signal response sub-module (not shown) for controlling the target peripheral component quick interconnect device to power up according to the power-up signal.
In some embodiments, the power down control module 1102 may include: and the power-off control sub-module (not shown) is used for controlling the power-off of the power supply controller corresponding to the target peripheral component quick interconnection device.
In some embodiments, the power-on control module 1103 may include: and the power-on control sub-module (not shown) is used for controlling the power-on of the power supply controller corresponding to the target peripheral component quick interconnection device.
In some embodiments, the degradation error processing apparatus 1100 may further include: a timer starting module (not shown) for starting a timer for recording a power-down time period of the target peripheral component interconnect express device. Accordingly, the hot plug request module may include: and the timing execution sub-module is used for sending a hot plug request to the central processing unit when the timing duration of the timer reaches the preset duration.
Specific limitations regarding the degraded error handling apparatus 1100 may be found in the above definitions of the degraded error handling method, i.e. the degraded error handling apparatus 1100 may also be used to perform further steps of the degraded error handling method in any of the embodiments of the present disclosure, which are not described here in detail. The various modules in the above-described degraded error handling means aa may be implemented in whole or in part by software, hardware and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In a third aspect, embodiments of the present disclosure provide a server. As shown in fig. 12, in some embodiments, a server 1200 may include a processor 1201 and a platform control hub 1202. Wherein the processor 1201 may be configured to perform the steps of: receiving a degradation error notification from the platform control hub 1202, determining a target peripheral component interconnect express device in which the degradation error occurred based on the degradation error notification; controlling the target peripheral component quick interconnection equipment to be powered down; the control target peripheral component is powered up by the fast interconnect device. Wherein the target peripheral component interconnect express device is configured to perform a retraining operation upon power up.
The platform control hub 1202 may be configured to perform the steps of: and generating a degradation error notification according to the degradation error detected and the serial number of the peripheral component quick interconnection device identifying the degradation error.
In some embodiments, as shown in fig. 12, the server 1200 may also include a central processor 1203.
The central processor 1203 may be configured to send a hot pull instruction to the target peripheral component interconnect express device in response to a hot pull request from the processor 1201 and to feed back a down signal to the processor 1201. Accordingly, the processor 1201 of the server 12OO may be configured to receive a power-down signal, and control the target peripheral component interconnect express device to power down according to the power-down signal.
The central processor 1203 may be configured to send a hot plug instruction to the target peripheral component interconnect express device in response to a hot plug request from the processor 1201 and to feed back a power-on signal to the processor 1201. Accordingly, the processor 1201 of the server 12OO may be configured to receive a power-up signal, and control the power-up of the target peripheral component interconnect express device according to the power-up signal.
In some embodiments, the processor 1201 may be a CPLD.
In other embodiments, the processor 1201, the platform control hub 1202, and the central processor 1203 may also perform, alone or in combination, further steps in the description of the degraded error handling method provided by the embodiments of the present disclosure in the first aspect.
In a fourth aspect, the disclosed embodiments provide a computer device, which may be a server, and may employ the internal structure diagram shown in fig. 13. The computer device shown in fig. 13 includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The nonvolatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device may be used to communicate with external devices via a network connection. The computer program, when executed by a processor, implements the degraded error handling method in any of the embodiments herein. In some specific embodiments, the processor in fig. 13 may be a CPLD.
It will be appreciated by those skilled in the art that the structure shown in fig. 13 is merely a block diagram of a portion of the structure associated with an embodiment of the present disclosure and is not limiting of the computer device to which an embodiment of the present disclosure is applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In some embodiments, a computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of:
receiving a degradation error notification from the platform control hub, and determining a target peripheral component quick interconnection device with degradation error according to the degradation error notification;
controlling the target peripheral component quick interconnection equipment to be powered down;
the target peripheral component interconnect express device is controlled to power up and is configured to perform a retraining operation after power up.
In other embodiments, the processor, when executing the computer program, may also implement other steps of the degraded error handling method in any of the embodiments herein.
In a fifth aspect, embodiments of the present disclosure provide a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
receiving a degradation error notification from the platform control hub, and determining a target peripheral component quick interconnection device with degradation error according to the degradation error notification;
controlling the target peripheral component quick interconnection equipment to be powered down;
The target peripheral component interconnect express device is controlled to power up and is configured to perform a retraining operation after power up.
In other embodiments, the computer program may also implement other steps of the degraded error handling method in any of the embodiments herein when executed by a processor.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided by the present disclosure may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples merely represent embodiments of the present disclosure, which are described in more detail and detail, but are not to be construed as limiting the scope of the present disclosure. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the disclosure, which are within the scope of the disclosure. Accordingly, the scope of the present disclosure should be determined from the following claims.

Claims (13)

1. A method of degrading error handling, the method comprising:
receiving a degradation error notification from a platform control hub, and determining a target peripheral component quick interconnection device with a degradation error according to the degradation error notification;
controlling the target peripheral component quick interconnection device to be powered down;
and controlling the target peripheral component quick interconnection device to be powered on, wherein the target peripheral component quick interconnection device is used for executing retraining operation after power-on.
2. The method of claim 1, wherein receiving a degradation error notification from a platform control hub comprises:
receiving a first type level signal which is transmitted by the platform control hub through a first type general input/output interface and indicates degradation errors;
receiving a second class level signal which is transmitted by the platform control hub through a second class general input/output interface and represents the identification of the target peripheral component quick interconnection device;
determining the first type of level signal and the second type of level signal as the degradation error notification.
3. The method of claim 2, wherein the determining, based on the degradation error notification, a target peripheral component fast interconnect device in which a degradation error occurred comprises:
and determining a digital number serving as the target peripheral component quick interconnection equipment identifier according to the second class level signal.
4. A method according to claim 3, wherein said second type of universal input output interface comprises a plurality of universal input output interfaces, said determining a digital number as said target peripheral component interconnect express device identification based on said second type of level signal comprising:
Determining a number value according to the level signal of each general input/output interface;
the number is determined from a plurality of number values.
5. The method according to claim 1, wherein the method further comprises:
sending a hot-pull request to a central processing unit, wherein the central processing unit is used for sending a hot-pull instruction to the target peripheral component quick interconnection equipment according to the hot-pull request and feeding back a power-down signal;
receiving the lower electric signal;
the controlling the target peripheral component interconnect express device to power down includes:
and controlling the target peripheral component quick interconnection equipment to be powered down according to the power-down signal.
6. The method according to claim 1, wherein the method further comprises:
a hot plug request is sent to a central processing unit, and the central processing unit is used for sending a hot plug instruction to the target peripheral component quick interconnection equipment according to the hot plug request and feeding back a power-on signal;
receiving the power-on signal;
the controlling the target peripheral component interconnect express device to power up includes:
and controlling the target peripheral component quick interconnection device to be powered on according to the power-on signal.
7. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the controlling the target peripheral component interconnect express device to power down includes: controlling the power supply controller corresponding to the target peripheral component quick interconnection device to be powered off; and/or the number of the groups of groups,
the controlling the target peripheral component interconnect express device to power up includes: and controlling the power supply controller corresponding to the target peripheral component quick interconnection device to be electrified.
8. The method of claim 6, wherein the method further comprises: a timer for recording the power-down time of the target peripheral component interconnect express device is started,
the issuing of the hot plug request to the central processing unit includes: and when the timing duration of the timer reaches the preset duration, sending a hot plug request to the central processing unit.
9. A degradation error handling apparatus, the apparatus comprising:
the system comprises a degradation error notification receiving module, a degradation error notification receiving module and a control module, wherein the degradation error notification receiving module is used for receiving a degradation error notification from a platform control hub and determining target peripheral component quick interconnection equipment with degradation errors according to the degradation error notification;
the power-down control module is used for controlling the target peripheral component quick interconnection equipment to be powered down;
And the power-on control module is used for controlling the target peripheral component quick interconnection device to be powered on, and the target peripheral component quick interconnection device is used for executing retraining operation after being powered on.
10. A server, the server comprising:
a processor for: receiving a degradation error notification from a platform control hub, and determining a target peripheral component quick interconnection device with a degradation error according to the degradation error notification; controlling the target peripheral component quick interconnection device to be powered down; controlling the target peripheral component rapid interconnection device to be powered on, wherein the target peripheral component rapid interconnection device is used for executing retraining operation after the target peripheral component rapid interconnection device is powered on;
and the platform control hub is used for generating the degradation error notification according to the degradation error detected and the serial number of the peripheral component quick interconnection equipment identifying the degradation error.
11. The server of claim 10, wherein the server further comprises a central processor;
the central processing unit is used for sending a hot pull-out instruction to the target peripheral component quick interconnection device according to a hot pull-out request from the processor and feeding back a power-down signal to the processor; the processor is further used for receiving the power-down signal and controlling the target peripheral component quick interconnection device to be powered down according to the power-down signal; and/or the number of the groups of groups,
The central processing unit is used for sending a hot plug instruction to the target peripheral component quick interconnection device according to a hot plug request from the processor and feeding back a power-on signal to the processor; the processor is also used for receiving the power-on signal and controlling the target peripheral component quick interconnection device to be powered on according to the power-on signal.
12. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 8 when the computer program is executed by the processor.
13. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 8.
CN202311074465.7A 2023-08-24 2023-08-24 Degradation error processing method, degradation error processing device, degradation error processing server, degradation error processing equipment and storage medium Pending CN117149484A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311074465.7A CN117149484A (en) 2023-08-24 2023-08-24 Degradation error processing method, degradation error processing device, degradation error processing server, degradation error processing equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311074465.7A CN117149484A (en) 2023-08-24 2023-08-24 Degradation error processing method, degradation error processing device, degradation error processing server, degradation error processing equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117149484A true CN117149484A (en) 2023-12-01

Family

ID=88911217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311074465.7A Pending CN117149484A (en) 2023-08-24 2023-08-24 Degradation error processing method, degradation error processing device, degradation error processing server, degradation error processing equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117149484A (en)

Similar Documents

Publication Publication Date Title
CN108228374B (en) Equipment fault processing method, device and system
US9015458B2 (en) Computer system and method for updating basic input/output system by switching between local mode and bypass mode through baseboard management controller
CN111324494B (en) Processor control method, device and storage medium
WO2021098485A1 (en) Method and system for power-on and power-off control of pcie device
EP3407187B1 (en) Optical line terminal, and method for upgrading master device and slave device
EP3198361B1 (en) Hardware controlled power domains with automatic power on request
CN108334372B (en) Firmware upgrading processing method, device and system
US8954619B1 (en) Memory module communication control
US11175715B2 (en) Method of supplying electric power to a computer system
US8555118B2 (en) System and method for processing network data of a server
CN106610712A (en) Substrate management controller reset system and method
US9377966B2 (en) Method and apparatus for efficiently processing storage commands
US9772795B2 (en) Processing apparatus to recognize peripheral component interconnect express devices during bootup
CN117149484A (en) Degradation error processing method, degradation error processing device, degradation error processing server, degradation error processing equipment and storage medium
CN117289963A (en) Method and equipment for online updating target area of server platform service firmware
EP2750030A1 (en) Method, apparatus and processor for reading BIOS
JP6583942B1 (en) BMC, determination method and BMC firmware
US20130061030A1 (en) System capable of booting through a universal serial bus device and method thereof
JP2009187474A (en) Semiconductor device, portable electronic equipment, self-diagnosis method, and self-diagnosis program
CN108287670B (en) Method for protecting data during system shutdown and BMC
CN111381535A (en) Information processing apparatus, control method for information processing apparatus, and storage medium
CN113448905B (en) Equipment hot adding method, system, equipment and medium
CN114613418B (en) System and method for NVMe-MI function test of solid state disk
CN111414272B (en) Electronic device and reset method thereof
CN103870253A (en) Chip application circuit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination