WO2021212943A1 - 一种服务器电源的维修方法、装置、设备及介质 - Google Patents

一种服务器电源的维修方法、装置、设备及介质 Download PDF

Info

Publication number
WO2021212943A1
WO2021212943A1 PCT/CN2021/073602 CN2021073602W WO2021212943A1 WO 2021212943 A1 WO2021212943 A1 WO 2021212943A1 CN 2021073602 W CN2021073602 W CN 2021073602W WO 2021212943 A1 WO2021212943 A1 WO 2021212943A1
Authority
WO
WIPO (PCT)
Prior art keywords
power supply
psu
maintenance
server
bmc
Prior art date
Application number
PCT/CN2021/073602
Other languages
English (en)
French (fr)
Inventor
滕学军
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2021212943A1 publication Critical patent/WO2021212943A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates

Definitions

  • the present invention relates to the technical field of servers, and in particular to a maintenance method, device, equipment and medium of a server power supply.
  • PSU Power Supply Unit
  • PSU Power Supply Unit
  • the purpose of the present invention is to provide a server power supply maintenance method, device, equipment, and medium, so as to improve the server power maintenance efficiency and reduce the maintenance cost of the operation and maintenance management personnel.
  • the specific plan is as follows:
  • a maintenance method for server power supply, applied to a controller includes:
  • the fault information of the PSU failure is extracted from the operating information, and the fault information is sent to the BMC, so that the BMC can use the fault information to determine the target firmware of the PSU that has failed, and report to the The target firmware is upgraded.
  • the controller is specifically a single-chip microcomputer or a CPLD.
  • the PSU is specifically a PSU with a dual-core structure.
  • it also includes:
  • a relay is used to connect the PSU and the BMC in advance.
  • it also includes:
  • the first I2C port is restarted.
  • it also includes:
  • the second I2C port in the BMC fails and/or the PEC transmission between the BMC and the PSU fails, the second I2C port is restarted.
  • the method further includes:
  • the present invention also discloses a server power supply maintenance device, which is applied to the controller, and includes:
  • the fault detection module is used to detect the operating information of the PSU in the target server in real time, and determine whether the PSU is faulty according to the operating information;
  • a fault determination module configured to lock the PSU and restart the PSU when the determination result of the fault detection module is yes;
  • the restart judgment module is used to judge whether the PSU can be restarted successfully
  • the PSU unlocking module is configured to unlock the PSU when the judgment result of the restart judgment module is yes;
  • the firmware upgrade module is used to extract the fault information of the PSU failure from the operating information, and send the fault information to the BMC, so that the BMC uses the fault information to determine the target of the PSU failure Firmware, and upgrade the target firmware.
  • the present invention also discloses a server power supply maintenance equipment, including:
  • Memory used to store computer programs
  • the processor is used to implement the steps of the server power supply maintenance method disclosed above when the computer program is executed.
  • the present invention also discloses a computer-readable storage medium with a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the maintenance of a server power supply as disclosed above is achieved. Method steps.
  • the controller is used to detect the operating information of the PSU in the target server in real time, and according to the operating information of the PSU to determine whether the PSU is malfunctioning, if it is determined that the PSU is malfunctioning, the PSU is locked to prevent the PSU Wrong power supply information is output to the target server. During this process, restart the PSU and determine whether the PSU can restart successfully. If the PSU can restart successfully, unlock the PSU, thereby avoiding the possibility of glitches in the PSU protection circuit during operation. PSU locked up.
  • the controller extracts the fault information of the PSU when it fails from the operating information of the PSU, and sends the fault information of the PSU to the BMC, so that the BMC can determine the target firmware of the PSU based on the fault information of the PSU After that, BMC upgrades the target firmware again, thus achieving the purpose of repairing the target firmware.
  • this fault repair method can avoid the cumbersome process of operation and maintenance management personnel needing to go to the site to repair the PSU, it can not only improve the maintenance efficiency when repairing the server power supply, but also reduce the operation and maintenance management personnel. Maintenance costs.
  • the server power supply maintenance device, equipment and medium provided by the present invention also have the above-mentioned beneficial effects.
  • FIG. 1 is a flowchart of a method for repairing a server power supply provided by an embodiment of the present invention
  • Figure 2 is a topological structure diagram of the BMC-ME-PSU communication link in the server system in the prior art
  • Figure 3 is a schematic diagram of a connection between a BMC and a PSU provided by an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a BMC and PSU connected through a Buffer provided by an embodiment of the present invention
  • Figure 5 is a schematic structural diagram of a server power supply maintenance device provided by an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a server power supply maintenance equipment provided by an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a computer-readable storage medium of a server power supply provided by an embodiment of the present invention.
  • FIG. 1 is a flowchart of a method for repairing a server power supply provided by an embodiment of the present invention.
  • the repairing method for the server power supply includes:
  • Step S11 Detect the operating information of the PSU (Power Supply Unit) in the target server in real time, and determine whether the PSU is malfunctioning according to the operating information; if so, perform step S12;
  • Step S12 Lock the PSU and restart the PSU
  • Step S13 Determine whether the PSU can be successfully restarted; if so, perform step S14;
  • Step S14 unlock the PSU
  • Step S15 Extract the fault information of the PSU from the operating information, and send the fault information to the BMC (Baseboard Manager Controller), so that the BMC uses the fault information to determine the target firmware of the PSU that has failed, and to The firmware is upgraded.
  • BMC Baseboard Manager Controller
  • a method for repairing a server power supply is provided.
  • the repairing method can not only improve the repair efficiency of the server power supply, but also reduce the maintenance cost of the operation and maintenance management personnel.
  • the controller is used as the execution subject for description.
  • the controller is used to detect the operating information of the PSU in the target server in real time, and based on the operating information to determine whether the PSU in the target server is malfunctioning, if it is determined that the PSU is malfunctioning based on the PSU operating information, Then, a fault lock signal can be sent to the PSU to lock the faulty PSU, so as to prevent the PSU from outputting wrong power supply information to the target server.
  • the PSU After the PSU is locked, in order to ensure that the PSU is not locked by the glitch signal in the target server by mistake, at this time, the PSU can also be restarted and the probability of the above event can be avoided by judging whether the PSU can be restarted successfully. That is to say, if the PSU can be successfully restarted, the PSU is unlocked, and the phenomenon that the PSU is mistakenly locked due to the glitch signal in the target server can be avoided at this time.
  • the controller can also be used to extract the fault information of the PSU from the PSU operating information, and then send the PSU fault information to the BMC, so that the BMC can obtain the fault information of the PSU,
  • the fault information of the PSU can be used to determine the target firmware of the PSU that has failed.
  • the target firmware that has failed in the PSU can be eliminated by upgrading the failed target firmware, so that the target server can be The purpose of the repair.
  • the PSU when upgrading the PSU in the target server, the PSU is usually upgraded offline, that is, the PSU is taken out of the target server, and the fixture board, computer, Burner, USB cable, USB converter, PMBus cable upgrade PSU online and offline one by one.
  • this method because the online upgrade method is used to upgrade the target firmware that has failed in the PSU, this method can not only ensure the effective upgrade of the PSU, but also Prevent target firmware upgrade failures, system crashes, etc., caused by abnormal conditions such as interruption, signal interference, error code, or sudden power failure during the upgrade process of the target firmware.
  • the BMC when the BMC is upgrading the target firmware, the BMC will first confirm the current firmware version and the upgrade version of the PSU, and then, the BMC will send the upgrade command and the firmware upgrade program to the PSU.
  • the firmware upgrade program records the firmware Upgrade program size, version information, verification information, and model information of the target server and PSU.
  • the PSU to be upgraded obtains the upgrade command, it will first confirm whether the PSU type and PSU model corresponding to the firmware upgrade program correspond to the attributes and characteristics of the PSU itself. If they correspond, the PSU can respond to the upgrade. Then, The BMC then obtains the version information of the firmware upgrade program and determines whether the version of the firmware upgrade program is the same as the firmware version of the PSU itself. If the version of the firmware upgrade program is the same as the firmware version of the PSU itself, the PSU does not need to be upgraded. If the version of the program is different from the firmware version of the PSU itself, the PSU needs to be upgraded.
  • the controller can automatically repair and repair the PSU failure in the target server, the entire maintenance process does not require the participation of operation and maintenance management personnel, which can not only improve The maintenance efficiency of the target server power supply can also reduce the maintenance cost required by the operation and maintenance management personnel.
  • the controller is used to detect the operating information of the PSU in the target server in real time, and determine whether the PSU has failed according to the operating information of the PSU. If it is determined that the PSU has failed, the PSU is locked to prevent The PSU outputs incorrect power supply information to the target server. During this process, restart the PSU and determine whether the PSU can restart successfully. If the PSU can restart successfully, unlock the PSU, thereby avoiding the possibility of glitches in the PSU protection circuit during operation. PSU locked up.
  • the controller extracts the fault information of the PSU when it fails from the operating information of the PSU, and sends the fault information of the PSU to the BMC, so that the BMC can determine the target firmware of the PSU based on the fault information of the PSU After that, BMC upgrades the target firmware again, thus achieving the purpose of repairing the target firmware.
  • this fault repair method can avoid the cumbersome process of operation and maintenance management personnel needing to go to the site to repair the PSU, it can not only improve the maintenance efficiency when repairing the server power supply, but also reduce the operation and maintenance management personnel. Maintenance costs.
  • the controller is specifically a single-chip microcomputer or a CPLD.
  • the controller is set as a single-chip microcomputer, because the single-chip microcomputer has small size and high integration, and the single-chip microcomputer also has the characteristics of low power consumption and easy expansion. Therefore, when the controller is set as a single-chip microcomputer , Not only can reduce the space volume that the controller needs to occupy, but also can improve the peripheral expansion capability of the controller.
  • the controller can also be set to CPLD (Complex Programable Logic Device), because CPLD is a programmable logic device with high density and low power consumption, and CPLD also has relatively fast logic Calculation speed, so when the controller is set to CPLD, it can further improve the repair speed when the controller performs fault repair on the PSU in the target server.
  • CPLD Complex Programable Logic Device
  • the PSU is specifically a PSU with a dual-core structure.
  • the PSU is set as a PSU with a dual-core structure, so that it can be guaranteed that the PSU can upgrade and repair the target firmware at the same time, that is, when one chip in the PSU has a problem, the other chip You can also continue to upgrade and repair the target firmware that has failed in the PSU.
  • using a PSU with a dual-core structure will not cause a system crash due to illegal interruption, that is, when a firmware application crashes in the hardware system, the PSU will still start normally and enable another core to upgrade the target firmware. This further guarantees the safety and reliability of the PSU during use.
  • this embodiment further illustrates and optimizes the technical solution.
  • the above-mentioned method for repairing the server power supply further includes:
  • FIG. 2 is a topological structure diagram of the BMC-ME-PSU communication link in the server system in the prior art.
  • the BMC since the BMC is in the process of obtaining the PSU power consumption data, it needs to use an ME (Management Engine) to obtain the PSU power consumption data.
  • ME Management Engine
  • a relay is used to connect the BMC and the PSU in advance.
  • FIG. 3 is a schematic diagram of a connection between a BMC and a PSU provided by an embodiment of the present invention. Because when the relay is used to connect the BMC and the PSU, the BMC can realize the direct communication with the PSU in any state, which can avoid the interference of other impurity signals and/or transfer links on the BMC and PSU information transmission process And influence.
  • the BMC will also generate a "false alarm" because the power consumption information of the PSU cannot be detected. .
  • the BMC and PSU are directly connected through the relay, if the communication between the BMC and the PSU is interrupted, the BMC will continue to initiate a communication handshake mechanism to the PSU, and ensure that the PSU has a response, and then resume BMC14 and BMC14.
  • the data communication transmission between PSU11 can further reduce the probability of "false alarm" generated by BMC.
  • the relay is specifically a Buffer.
  • the relay between the BMC and the PSU is set to Buffer
  • FIG. 4 is a schematic diagram of a BMC and PSU connected through a buffer provided by an embodiment of the present invention.
  • Buffer is the most widely used buffer device in actual operation, and Buffer also has the advantages of easy writing and reliable operation, so when the relay is set to Buffer, it can be further improved The overall ease of use of the server power supply system provided by this application.
  • the number of Buffers can be adjusted adaptively according to actual conditions, and will not be described in detail here.
  • the above-mentioned server power supply maintenance method further includes:
  • the first I2C port is restarted.
  • the BMC in order to prevent the interference and influence on the normal operation of the server power system due to the failure of the first I2C port in the PSU, when the first I2C port in the PSU is detected to be faulty, The first I2C port in the PSU restarts, and during this process, the BMC will keep the PMBus I2C bus between the BMC and the PSU unchanged, which can prevent the BMC from communicating with the PSU because of the I2C in the PSU.
  • the problem is that the fault is not recovered in time, which affects the normal operation of the server power system.
  • the above-mentioned server power supply maintenance method further includes:
  • the second I2C port in the BMC fails and/or the PEC transmission between the BMC and the PSU fails, the second I2C port is restarted.
  • the I2C bus between the BMC and the PSU will deadlock Status and affect the normal operation of the server power system. Therefore, in this embodiment, when a failure of the second I2C port in the BMC is detected and/or an error occurs in the PEC transmission between the BMC and the PSU, the BMC The second Reset port in the device is restarted, and the normal communication between the BMC and the PSU is restored.
  • the automatic recovery of the failed communication link can be realized without manual intervention, and the optimized design of the server power supply can be realized, so that the performance of the customer’s business application can be This greatly reduces the operation and maintenance costs of the server.
  • this embodiment further explains and optimizes the technical solution.
  • the target firmware may not be upgraded successfully or the upgraded version may be incorrect. Therefore, after the target firmware is upgraded, if it is judged that the target firmware is not successfully upgraded, you can also correct it again.
  • the target firmware is upgraded to further increase the probability of successful target firmware upgrade.
  • the PSU after the target firmware is upgraded, the PSU will activate and run the upgraded new firmware, and return to the BMC a successful upgrade identifier indicating that the firmware upgrade is successful, and at the same time feed back the upgraded PSU firmware version information.
  • the BMC receives the upgrade success indicator, it can judge whether the upgrade is successful according to the received firmware version information after the PSU upgrade. If the BMC determines that the target firmware upgrade is unsuccessful or the upgrade version is incorrect, it can correct Upgrade the target firmware again.
  • FIG. 5 is a schematic structural diagram of a server power supply maintenance device provided by an embodiment of the present invention.
  • the maintenance device includes:
  • the fault detection module 21 is used to detect the operating information of the PSU in the target server in real time, and determine whether the PSU is faulty according to the operating information;
  • the fault determination module 22 is configured to lock the PSU and restart the PSU when the determination result of the fault detection module is yes;
  • the restart judgment module 23 is used to judge whether the PSU can be restarted successfully
  • the PSU unlocking module 24 is used to unlock the PSU when the judgment result of the restart judgment module is yes;
  • the firmware upgrade module 25 is used to extract the fault information of the PSU failure from the operating information, and send the fault information to the BMC, so that the BMC can use the fault information to determine the target firmware of the PSU failure, and upgrade the target firmware.
  • it also includes:
  • the signal connection module is used to connect the PSU and the BMC through a relay in advance.
  • it also includes:
  • the first restart module is used for restarting the first I2C port when it is detected that the first I2C port in the PSU fails.
  • it also includes:
  • the second restart module is used to restart the second I2C port when it is detected that the second I2C port in the BMC is faulty and/or the PEC transmission between the BMC and the PSU is incorrect.
  • it also includes:
  • the upgrade judgment module is used to judge whether the target firmware is successfully upgraded after the process of upgrading the target firmware
  • the firmware re-upgrade module is used to perform the step of upgrading the target firmware again when the judgment result of the upgrade judgment module is no.
  • the server power supply maintenance device provided by the embodiment of the present invention has the beneficial effects of the server power supply maintenance method disclosed above.
  • Figure 6 is a schematic structural diagram of a server power supply maintenance equipment provided by an embodiment of the present invention.
  • the maintenance equipment includes:
  • the memory 31 is used to store computer programs
  • the processor 32 is used to implement the steps of a server power supply maintenance method disclosed above when executing a computer program.
  • the server power supply maintenance equipment provided by the embodiment of the present invention has the beneficial effects of the server power supply maintenance method disclosed above.
  • FIG. 7 is a schematic structural diagram of a computer-readable storage medium of a server power supply according to an embodiment of the present invention.
  • the computer-readable storage medium 601 stores a computer program 610, and when the computer program 610 is executed by a processor, it implements the steps of a server power supply maintenance method disclosed above.
  • the computer-readable storage medium provided by the embodiment of the present invention has the beneficial effects of the server power supply maintenance method disclosed above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)
  • Power Sources (AREA)

Abstract

本申请公开了一种服务器电源的维修方法、装置、设备及介质,该方法包括:实时检测目标服务器中PSU的运行信息,并根据运行信息判断PSU是否发生故障;若是,则锁定PSU,并对PSU进行重启;判断PSU是否能够成功重启;若是,则对PSU进行解锁;从运行信息中提取PSU发生故障的故障信息,并将故障信息发送至BMC,以使BMC利用故障信息确定PSU发生故障的目标固件,并对目标固件进行升级。显然,由于该维修方法可以避免运维管理人员需要去现场才能对PSU进行维修的繁琐过程,这样不仅可以提高在对服务器电源进行维修时的维修效率,而且,也能够降低运维管理人员的维修成本。

Description

一种服务器电源的维修方法、装置、设备及介质
本申请要求于2020年04月23日提交中国国家知识产权局,申请号为202010331104.6,发明名称为“一种服务器电源的维修方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及服务器技术领域,特别涉及一种服务器电源的维修方法、装置、设备及介质。
背景技术
PSU(Power Supply Unit,电源供应器)因其具有供电稳定、可靠的优点,所以,一般会使用PSU来对服务器进行供电,但是,如果服务器中的PSU发生故障,不仅服务器不能正常运行,而且,也会对服务器中的运行数据造成极大的影响,这样就会给用户带来极大的经济损失。
在现有技术中,如果服务器中的PSU发生故障,均需要运维管理人员去现场对PSU进行维修,这样不仅使得服务器的电源维修效率较低,而且,也会极大的增加运维管理人员的维修成本。目前,这对这一技术问题,还没有较为有效的解决办法。
由此可见,如何提高服务器电源维修效率的同时,也能够降低运维管理人员的维修成本,是本领域技术人员亟待解决的技术问题。
发明内容
有鉴于此,本发明的目的在于提供一种服务器电源的维修方法、装置、 设备及介质,以提高服务器电源维修效率的同时,也能够降低运维管理人员的维修成本。其具体方案如下:
一种服务器电源的维修方法,应用于控制器,包括:
实时检测目标服务器中PSU的运行信息,并根据所述运行信息判断所述PSU是否发生故障;
若是,则锁定所述PSU,并对所述PSU进行重启;
判断所述PSU是否能够成功重启;
若是,则对所述PSU进行解锁;
从所述运行信息中提取所述PSU发生故障的故障信息,并将所述故障信息发送至BMC,以使所述BMC利用所述故障信息确定所述PSU发生故障的目标固件,并对所述目标固件进行升级。
优选的,所述控制器具体为单片机或CPLD。
优选的,所述PSU具体为具有双核结构的PSU。
优选的,还包括:
预先利用中继将所述PSU和所述BMC进行连接。
优选的,还包括:
当检测到所述PSU中的第一I2C端口出现故障时,则对所述第一I2C端口进行重启。
优选的,还包括:
当检测到所述BMC中的第二I2C端口出现故障和/或所述BMC和所述PSU之间的PEC传输出现错误时,则对所述第二I2C端口进行重启。
优选的,在所述对所述目标固件进行升级的步骤之后,还包括:
判断所述目标固件是否升级成功;
若否,则再次执行所述对所述目标固件进行升级的步骤。
相应的,本发明还公开了一种服务器电源的维修装置,应用于控制器,包括:
故障检测模块,用于实时检测目标服务器中PSU的运行信息,并根据所述运行信息判断所述PSU是否发生故障;
故障判定模块,用于当所述故障检测模块的判定结果为是时,则锁定所述PSU,并对所述PSU进行重启;
重启判断模块,用于判断所述PSU是否能够成功重启;
PSU解锁模块,用于当所述重启判断模块的判定结果为是时,则对所述PSU进行解锁;
固件升级模块,用于从所述运行信息中提取所述PSU发生故障的故障信息,并将所述故障信息发送至BMC,以使所述BMC利用所述故障信息确定所述PSU发生故障的目标固件,并对所述目标固件进行升级。
相应的,本发明还公开了一种服务器电源的维修设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述计算机程序时实现如前述所公开的一种服务器电源的维修方法的步骤。
相应的,本发明还公开了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如前述所公开的一种服务器电源的维修方法的步骤。
可见,在本发明中,首先是利用控制器实时检测目标服务器中PSU的运行信息,并根据PSU的运行信息判断PSU是否发生故障,如果确定出PSU发生故障,则对PSU进行锁定,以防止PSU对目标服务器输出错误的供电信息。在此过程中,再对PSU进行重启,并判断PSU是否能够重启成功,如果PSU能够重启成功,则对PSU进行解锁,由此就避免了PSU保护电路在运行过程中可能遇到毛刺信号误将PSU锁死的现象。同时,控制器再从PSU的运行信息中提取出PSU在发生故障时的故障信息,并将PSU的故障信息发送至BMC,这样BMC就可以根据PSU的故障信息确定出PSU中发生故障的目标固件,之后,BMC再对目标固件进行升级,由此就达到了对目标固件进行 修复的目的。显然,由于该故障维修方法可以避免运维管理人员需要去现场才能对PSU进行维修的繁琐过程,这样不仅可以提高在对服务器电源进行维修时的维修效率,而且,也能够降低运维管理人员的维修成本。相应的,本发明所提供的一种服务器电源的维修装置、设备及介质,同样具有上述有益效果。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本发明实施例所提供的一种服务器电源的维修方法的流程图;
图2为现有技术中服务器系统中BMC-ME-PSU通信链路的拓扑结构图;
图3为本发明实施例所提供的一种BMC和PSU的连接示意图;
图4为本发明实施例所提供的一种BMC和PSU通过Buffer进行连接的示意图;
图5为本发明实施例所提供的一种服务器电源的维修装置的结构示意图;
图6为本发明实施例所提供的一种服务器电源的维修设备的结构示意图;
图7为本发明实施例所提供的一种服务器电源的计算机可读存储介质的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进 行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
请参见图1,图1为本发明实施例所提供的一种服务器电源的维修方法的流程图,该服务器电源的维修方法包括:
步骤S11:实时检测目标服务器中PSU(Power Supply Unit,电源供应器)的运行信息,并根据运行信息判断PSU是否发生故障;若是,则执行步骤S12;
步骤S12:锁定PSU,并对PSU进行重启;
步骤S13:判断PSU是否能够成功重启;若是,则执行步骤S14;
步骤S14:对PSU进行解锁;
步骤S15:从运行信息中提取PSU发生故障的故障信息,并将故障信息发送至BMC(Baseboard Manager Controller,基板管理控制器),以使BMC利用故障信息确定PSU发生故障的目标固件,并对目标固件进行升级。
在本实施例中,是提供了一种服务器电源的维修方法,通过该维修方法不仅可以提高服务器电源的维修效率,而且,也可以降低运维管理人员的维修成本。在本实施例所提供的服务器电源修复方法中,是以控制器为执行主体进行说明。
具体的,在本实施例中,首先是利用控制器实时检测目标服务器中PSU的运行信息,并根据运行信息判断目标服务器中的PSU是否发生故障,如果根据PSU的运行信息判断出PSU发生故障,则可以向PSU发送故障锁定信号,以将发生故障的PSU进行锁定,这样就可以避免PSU向目标服务器输出错误的供电信息。
当将PSU锁定之后,为了保证PSU不是由于目标服务器中的毛刺信号误将PSU锁死,此时,还可以通过对PSU进行重启,并通过判断PSU是否能够 成功重启来避免上述事件发生的概率。也就是说,如果PSU能够成功重启,则对PSU进行解锁,此时就可以避免目标服务器中由于毛刺信号误将PSU锁死的现象。
在将PSU锁定之后,同时还可以利用控制器从PSU的运行信息中提取PSU发生故障的故障信息,然后,再将PSU的故障信息发送至BMC,这样BMC在获取得到PSU的故障信息之后,就可以利用PSU的故障信息确定出PSU发生故障的目标固件,此时,再通过对发生故障的目标固件进行升级就可以将PSU中发生故障的目标固件进行消除,由此就可以达到对目标服务器进行修复的目的。
在实际操作过程中,在判断PSU是否能够成功重启的过程中,还可以通过设定PSU进行重启所需要的时间来判断PSU是否能够成功重启。也即,如果PSU能够在预设时间内进行重启,则可以判定PSU能够成功重启,如果PSU不能在预设时间内进行重启,则判定PSU不能成功重启,此时就不能对PSU进行解锁。
需要说明的是,在现有技术当中,在对目标服务器中PSU进行升级时,通常是通过离线的方式对PSU进行升级,也即,将PSU从目标服务器中取出,用治具板、电脑、烧录器、USB线缆、USB转化头、PMBus线缆逐台在线下对PSU进行升级。而在本实施例所提供的服务器电源修复方法中,因为是利用在线升级的方法来对PSU中发生故障的目标固件进行升级,所以,通过该方法不仅能够保证PSU的有效升级,而且,还可以防止目标固件在升级过程中因为中断、信号干扰、错码或者是突然掉电等异常情况而引起的目标固件升级失败、系统崩盘等情况的发生。
其中,BMC在对目标固件进行升级的过程中,BMC会首先确认PSU的当前固件版本和升级版本,然后,BMC将升级命令和固件升级程序统一发送至PSU,其中,固件升级程序中记录了固件升级程序的大小、版本信息、校验信息以及目标服务器和PSU的型号信息。并且,在待升级PSU获取得到 升级命令之后,首先会确认固件升级程序所对应的PSU类型和PSU型号是否与PSU本身的属性特征相对应,如果对应,则说明PSU可以响应本次升级,然后,BMC再获取固件升级程序的版本信息,并判断固件升级程序的版本是否与PSU本身的固件版本相同,如果固件升级程序的版本与PSU本身的固件版本相同,则PSU不需要进行升级,如果固件升级程序的版本与PSU本身的固件版本不同,则PSU需要进行升级。
显然,在本实施例所提供的服务器电源的维修方法中,由于控制器可以自动对目标服务器中PSU的故障进行维修与修复,整个维修过程都不需要运维管理人员进行参与,这样不仅可以提高目标服务器电源的维修效率,而且,也可以降低运维管理人员所需要的维修成本。
可见,在本实施例中,首先是利用控制器实时检测目标服务器中PSU的运行信息,并根据PSU的运行信息判断PSU是否发生故障,如果确定出PSU发生故障,则对PSU进行锁定,以防止PSU对目标服务器输出错误的供电信息。在此过程中,再对PSU进行重启,并判断PSU是否能够重启成功,如果PSU能够重启成功,则对PSU进行解锁,由此就避免了PSU保护电路在运行过程中可能遇到毛刺信号误将PSU锁死的现象。同时,控制器再从PSU的运行信息中提取出PSU在发生故障时的故障信息,并将PSU的故障信息发送至BMC,这样BMC就可以根据PSU的故障信息确定出PSU中发生故障的目标固件,之后,BMC再对目标固件进行升级,由此就达到了对目标固件进行修复的目的。显然,由于该故障维修方法可以避免运维管理人员需要去现场才能对PSU进行维修的繁琐过程,这样不仅可以提高在对服务器电源进行维修时的维修效率,而且,也能够降低运维管理人员的维修成本。
基于上述实施例,本实施例对技术方案作进一步的说明与优化,作为一种优选的实施方式,控制器具体为单片机或CPLD。
具体的,在本实施例中,是将控制器设置为单片机,因为单片机的体 积小、集成度高,并且,单片机还具有功耗低和易扩展的特点,因此当将控制器设置为单片机时,不仅可以降低控制器所需要占用的空间体积,而且,还可以提高控制器的外围扩展能力。
或者,还可以将控制器设置为CPLD(Complex Programable Logic Device,复杂可编程逻辑器件),因为CPLD是一款具有高密度、低功耗的可编程逻辑器件,而且,CPLD还具有较为快速的逻辑计算速度,所以,当将控制器设置为CPLD时,还可以进一步提高控制器对目标服务器中PSU进行故障维修时的维修速度。
基于上述实施例,本实施例对技术方案作进一步的说明与优化,作为一种优选的实施方式,PSU具体为具有双核结构的PSU。
在本实施例中,是将PSU设置为具有双核结构的PSU,这样就可以保证PSU可以对目标固件的升级和维修能够同时进行,也即,当PSU中的一个芯片出现问题时,另一个芯片也可以对PSU中发生故障的目标固件继续进行升级与维修。也就是说,利用具有双核结构的PSU不会因为非法中断而导致系统崩溃,即硬件系统在发生固件应用程序崩溃时,PSU仍然会正常启动,并启用另一个核来对目标固件进行升级,由此就进一步保证了PSU在使用过程中的安全性与可靠性。
基于上述实施例,本实施例对技术方案作进一步的说明与优化,作为一种优选的实施方式,上述服务器电源的修复方法还包括:
预先利用中继将PSU和BMC进行连接。
请参见图2,图2为现有技术中服务器系统中BMC-ME-PSU通信链路的拓扑结构图。在该BMC-ME-PSU通信链路中,由于BMC在获取PSU功耗数据的过程中,需要利用ME(Management Engine,管理引擎)才能获取得到PSU的功耗数据。在此过程中,如果ME出现任何故障,均会使得BMC和PSU 无法进行正常通讯,并导致BMC产生“误报警”的现象。
在本实施例中,为解决上述技术问题,是预先利用中继将BMC和PSU进行连接,请参见图3,图3为本发明实施例所提供的一种BMC和PSU的连接示意图。因为当利用中继将BMC和PSU进行连接时,就能够使得BMC在任何状态下实现与PSU的直接通信,由此就能够避免其它杂质信号和/或中转环节对BMC和PSU信息传输过程的干扰与影响。
此外,由于数据中心机房复杂的电磁干扰环境,不可避免的会影响到数据中心机房的通信链路,因此,在数据中心机房会发生小概率性的信号干扰现象,从而影响数据中心机房通信链路的通信质量。由于ME是由英特尔公司开发,所以,ME对接收到的信号质量会特别严格与挑剔,并且,在ME自身的通信链路中也没有开发相应的容错机制,这样当BMC和PSU之间的通信链路受到干扰时,BMC和PSU之间的I2C总线就会处于死锁状态。也即,ME会将BMC和PSU之间的I2C总线挂起,停止对数据信号的传输,在此情况下,BMC也会出现因检测不到PSU的功耗信息而产生“误报警”的现象。显然,当将BMC和PSU通过中继直接进行连接以后,如果是BMC和PSU之间的通信发生中断,BMC就会继续向PSU发起通信握手机制,并保证PSU在有应答之后,再恢复BMC14和PSU11之间的数据通信传输,由此就可以进一步降低BMC产生“误报警”的概率。
作为一种优选的实施方式,中继具体为Buffer(缓冲器)。
具体的,在本实施例中,是将BMC和PSU之间的中继设置为Buffer,请参见图4,图4为本发明实施例所提供的一种BMC和PSU通过Buffer进行连接的示意图。可以理解的是,因为Buffer是在实际操作过程中使用最为广泛的一种缓冲装置,并且,Buffer还具有易编写、运行可靠等优点,所以,当将中继设置为Buffer时,还可以进一步提高本申请所提供服务器电源系统的整体易用性。当然,在实际应用中,Buffer的数量还以根据实际情况进行适应性地调整,此处不作具体赘述。
基于上述实施例,本实施例对技术方案作进一步的说明与优化,作为一种优选的实施方式,上述服务器电源的维修方法还包括:
当检测到PSU中的第一I2C端口出现故障时,则对第一I2C端口进行重启。
可以理解的是,由于在数据中心机房会发生小概率性的信号干扰现象,并影响数据中心机房通信链路的通信质量,而当BMC和PSU之间的通信链路受到干扰时,BMC和PSU之间的I2C总线就会处于死锁状态,也即,ME会将BMC和PSU之间的I2C总线挂起,停止对数据信号的传输,在此情况下,就会出现BMC和PSU之间的通信链路无法恢复的情况。
所以,在本实施例中,为了防止因PSU中的第一I2C端口出现故障而对服务器电源系统正常工作所造成的干扰与影响,还当检测到PSU中的第一I2C端口出现故障时,对PSU中的第一I2C端口进行重启,并且,在此过程中,BMC会保持BMC和PSU之间的PMBus I2C总线不变,由此就可以避免BMC在和PSU进行通信过程中因为PSU中I2C出现故障没有及时恢复,从而影响服务器电源系统正常工作的问题。
基于上述实施例,本实施例对技术方案作进一步的说明与优化,作为一种优选的实施方式,上述服务器电源的维修方法还包括:
当检测到BMC中的第二I2C端口出现故障和/或BMC和PSU之间的PEC传输出现错误时,则对第二I2C端口进行重启。
可以理解的是,当BMC中的第二I2C端口出现故障和/或BMC和PSU之间的PEC(Parity Check奇偶校验)传输出现错误时,BMC和PSU之间的I2C总线均会出现死锁状态,并影响服务器电源系统的正常工作,所以,在本实施例中,当在检测到BMC中的第二I2C端口出现故障和/或BMC和PSU之间的PEC传输出现错误时,则对BMC中的第二Reset端口进行重启,并以此 来恢复BMC和PSU之间的正常通信。
显然,通过这样的设置方式,就可以预防和解决因BMC中第二I2C端口出现故障而影响BMC和PSU之间通信链路的问题,由此就可以进一步提高服务器电源系统在工作过程中的可靠性。
此外,在本申请所提供的服务器电源的维修方法中,由于无需人工干预就可以实现故障通信链路的自动恢复,并实现服务器电源的优化设计,这样就能够在不影响客户业务应用性能的前提下,极大的降低服务器的运维成本。
基于上述实施例,本实施例对技术方案作进一步的说明与优化,作为一种优选的实施方式,上述步骤:对目标固件进行升级的过程之后,还包括:
判断目标固件是否升级成功;
若否,则再次执行对目标固件进行升级的步骤。
可以理解的是,在实际应用中,可能会出现目标固件升级不成功或者是升级版本有误的情况,所以,在对目标固件升级完毕之后,如果判断出目标固件升级不成功,还可以再次对目标固件进行升级,并以此来进一步提高目标固件升级成功的概率。
并且,在本实施例中,在将目标固件升级完毕之后,PSU会激活运行升级之后的新固件,并向BMC返回固件升级成功的升级成功标识,同时反馈升级后的PSU固件版本信息。这样BMC在接收到升级成功标识后,就可以根据接收到的PSU升级后的固件版本信息来判断本次升级是否成功,如果BMC判断出目标固件升级不成功或者是升级版本有误,就可以对目标固件进行再次升级。
显然,通过本实施例所提供的技术方案,可以进一步保证目标固件在升级过程中的完整性与可靠性。
请参见图5,图5为本发明实施例所提供的一种服务器电源的维修装置的结构示意图,该维修装置包括:
故障检测模块21,用于实时检测目标服务器中PSU的运行信息,并根据运行信息判断PSU是否发生故障;
故障判定模块22,用于当故障检测模块的判定结果为是时,则锁定PSU,并对PSU进行重启;
重启判断模块23,用于判断PSU是否能够成功重启;
PSU解锁模块24,用于当重启判断模块的判定结果为是时,则对PSU进行解锁;
固件升级模块25,用于从运行信息中提取PSU发生故障的故障信息,并将故障信息发送至BMC,以使BMC利用故障信息确定PSU发生故障的目标固件,并对目标固件进行升级。
优选的,还包括:
信号连接模块,用于预先利用中继将PSU和BMC进行连接。
优选的,还包括:
第一重启模块,用于当检测到PSU中的第一I2C端口出现故障时,则对第一I2C端口进行重启。
优选的,还包括:
第二重启模块,用于当检测到BMC中的第二I2C端口出现故障和/或BMC和PSU之间的PEC传输出现错误时,则对第二I2C端口进行重启。
优选的,还包括:
升级判断模块,用于对目标固件进行升级的过程之后,判断目标固件是否升级成功;
固件重升模块,用于当升级判断模块的判定结果为否时,则再次执行对目标固件进行升级的步骤。
本发明实施例所提供的一种服务器电源的维修装置,具有前述所公开的一种服务器电源的维修方法所具有的有益效果。
请参见图6,图6为本发明实施例所提供的一种服务器电源的维修设备的结构示意图,该维修设备包括:
存储器31,用于存储计算机程序;
处理器32,用于执行计算机程序时实现如前述所公开的一种服务器电源的维修方法的步骤。
本发明实施例所提供的一种服务器电源的维修设备,具有前述所公开的一种服务器电源的维修方法所具有的有益效果。
请参见图7,图7为本发明实施例所提供的一种服务器电源的计算机可读存储介质的结构示意图。该计算机可读存储介质601上存储有计算机程序610,计算机程序610被处理器执行时实现如前述所公开的一种服务器电源的维修方法的步骤。
本发明实施例所提供的一种计算机可读存储介质,具有前述所公开的一种服务器电源的维修方法所具有的有益效果。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素, 并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上对本发明所提供的一种服务器电源的维修方法、装置、设备及介质进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,本说明书内容不应理解为对本发明的限制。

Claims (10)

  1. 一种服务器电源的维修方法,其特征在于,应用于控制器,包括:
    实时检测目标服务器中电源供应器的运行信息,并根据所述运行信息判断所述电源供应器是否发生故障;
    若是,则锁定所述电源供应器,并对所述电源供应器进行重启;
    判断所述电源供应器是否能够成功重启;
    若是,则对所述电源供应器进行解锁;
    从所述运行信息中提取所述电源供应器发生故障的故障信息,并将所述故障信息发送至基板管理控制器,以使所述基板管理控制器利用所述故障信息确定所述电源供应器发生故障的目标固件,并对所述目标固件进行升级。
  2. 根据权利要求1所述的维修方法,其特征在于,所述控制器具体为单片机或复杂可编程逻辑器件。
  3. 根据权利要求1所述的维修方法,其特征在于,所述电源供应器具体为具有双核结构的电源供应器。
  4. 根据权利要求1所述的维修方法,其特征在于,还包括:
    预先利用中继将所述电源供应器和所述基板管理控制器进行连接。
  5. 根据权利要求4所述的维修方法,其特征在于,还包括:
    当检测到所述电源供应器中的第一I2C端口出现故障时,则对所述第一I2C端口进行重启。
  6. 根据权利要求4所述的维修方法,其特征在于,还包括:
    当检测到所述基板管理控制器中的第二I2C端口出现故障和/或所述基板管理控制器和所述电源供应器之间的奇偶校验传输出现错误时,则对所述第二I2C端口进行重启。
  7. 根据权利要求1至6任一项所述的维修方法,其特征在于,在所述对所述目标固件进行升级的步骤之后,还包括:
    判断所述目标固件是否升级成功;
    若否,则再次执行所述对所述目标固件进行升级的步骤。
  8. 一种服务器电源的维修装置,其特征在于,应用于控制器,包括:
    故障检测模块,用于实时检测目标服务器中电源供应器的运行信息,并根据所述运行信息判断所述电源供应器是否发生故障;
    故障判定模块,用于当所述故障检测模块的判定结果为是时,则锁定所述电源供应器,并对所述电源供应器进行重启;
    重启判断模块,用于判断所述电源供应器是否能够成功重启;
    电源供应器解锁模块,用于当所述重启判断模块的判定结果为是时,则对所述电源供应器进行解锁;
    固件升级模块,用于从所述运行信息中提取所述电源供应器发生故障的故障信息,并将所述故障信息发送至基板管理控制器,以使所述基板管理控制器利用所述故障信息确定所述电源供应器发生故障的目标固件,并对所述目标固件进行升级。
  9. 一种服务器电源的维修设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序时实现如权利要求1至7任一项所述的一种服务器电源的维修方法的步骤。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的一种服务器电源的维修方法的步骤。
PCT/CN2021/073602 2020-04-23 2021-01-25 一种服务器电源的维修方法、装置、设备及介质 WO2021212943A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010331104.6A CN111538624A (zh) 2020-04-23 2020-04-23 一种服务器电源的维修方法、装置、设备及介质
CN202010331104.6 2020-04-23

Publications (1)

Publication Number Publication Date
WO2021212943A1 true WO2021212943A1 (zh) 2021-10-28

Family

ID=71977225

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/073602 WO2021212943A1 (zh) 2020-04-23 2021-01-25 一种服务器电源的维修方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN111538624A (zh)
WO (1) WO2021212943A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114442786A (zh) * 2022-01-21 2022-05-06 苏州浪潮智能科技有限公司 一种电源故障告警及恢复方法、装置及存储介质
CN115309250A (zh) * 2022-07-29 2022-11-08 苏州浪潮智能科技有限公司 一种提高电源oring可靠性的方法及系统
CN115442207A (zh) * 2022-07-29 2022-12-06 中电科思仪科技股份有限公司 一种基于BMC+SoC+网络交换模块的硬件运维管理系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111538624A (zh) * 2020-04-23 2020-08-14 苏州浪潮智能科技有限公司 一种服务器电源的维修方法、装置、设备及介质
CN113851155B (zh) * 2021-09-17 2023-08-22 苏州浪潮智能科技有限公司 一种存储设备和控制存储器锁紧的方法
CN114138587B (zh) * 2021-10-25 2024-01-12 苏州浪潮智能科技有限公司 服务器电源固件升级的可靠性验证方法、装置和设备
CN114003426A (zh) * 2021-10-29 2022-02-01 联想(北京)有限公司 故障处理方法、系统和电子设备

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7251723B2 (en) * 2001-06-19 2007-07-31 Intel Corporation Fault resilient booting for multiprocessor system using appliance server management
CN104834575A (zh) * 2015-05-07 2015-08-12 杭州昆海信息技术有限公司 一种固件恢复方法及装置
CN104991629A (zh) * 2015-07-10 2015-10-21 英业达科技有限公司 电源失效侦测系统与其方法
CN106610712A (zh) * 2015-10-21 2017-05-03 鸿富锦精密电子(天津)有限公司 基板管理控制器复位系统及方法
CN107315675A (zh) * 2017-07-24 2017-11-03 郑州云海信息技术有限公司 一种服务器开关电源保护装置和方法
US9946600B2 (en) * 2016-02-03 2018-04-17 Mitac Computing Technology Corporation Method of detecting power reset of a server, a baseboard management controller, and a server
CN108919935A (zh) * 2018-07-12 2018-11-30 浪潮电子信息产业股份有限公司 一种针对于服务器主板上的电源的监测方法、装置及设备
CN109683696A (zh) * 2018-12-25 2019-04-26 浪潮电子信息产业股份有限公司 服务器电源故障检测系统、方法、装置、设备及介质
CN110618909A (zh) * 2019-09-27 2019-12-27 苏州浪潮智能科技有限公司 基于i2c通讯的故障定位方法、装置、设备及存储介质
CN111538624A (zh) * 2020-04-23 2020-08-14 苏州浪潮智能科技有限公司 一种服务器电源的维修方法、装置、设备及介质

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7251723B2 (en) * 2001-06-19 2007-07-31 Intel Corporation Fault resilient booting for multiprocessor system using appliance server management
CN104834575A (zh) * 2015-05-07 2015-08-12 杭州昆海信息技术有限公司 一种固件恢复方法及装置
CN104991629A (zh) * 2015-07-10 2015-10-21 英业达科技有限公司 电源失效侦测系统与其方法
CN106610712A (zh) * 2015-10-21 2017-05-03 鸿富锦精密电子(天津)有限公司 基板管理控制器复位系统及方法
US9946600B2 (en) * 2016-02-03 2018-04-17 Mitac Computing Technology Corporation Method of detecting power reset of a server, a baseboard management controller, and a server
CN107315675A (zh) * 2017-07-24 2017-11-03 郑州云海信息技术有限公司 一种服务器开关电源保护装置和方法
CN108919935A (zh) * 2018-07-12 2018-11-30 浪潮电子信息产业股份有限公司 一种针对于服务器主板上的电源的监测方法、装置及设备
CN109683696A (zh) * 2018-12-25 2019-04-26 浪潮电子信息产业股份有限公司 服务器电源故障检测系统、方法、装置、设备及介质
CN110618909A (zh) * 2019-09-27 2019-12-27 苏州浪潮智能科技有限公司 基于i2c通讯的故障定位方法、装置、设备及存储介质
CN111538624A (zh) * 2020-04-23 2020-08-14 苏州浪潮智能科技有限公司 一种服务器电源的维修方法、装置、设备及介质

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114442786A (zh) * 2022-01-21 2022-05-06 苏州浪潮智能科技有限公司 一种电源故障告警及恢复方法、装置及存储介质
CN114442786B (zh) * 2022-01-21 2023-07-14 苏州浪潮智能科技有限公司 一种电源故障告警及恢复方法、装置及存储介质
CN115309250A (zh) * 2022-07-29 2022-11-08 苏州浪潮智能科技有限公司 一种提高电源oring可靠性的方法及系统
CN115442207A (zh) * 2022-07-29 2022-12-06 中电科思仪科技股份有限公司 一种基于BMC+SoC+网络交换模块的硬件运维管理系统
CN115442207B (zh) * 2022-07-29 2024-01-26 中电科思仪科技股份有限公司 一种基于BMC+SoC+网络交换模块的硬件运维管理系统
CN115309250B (zh) * 2022-07-29 2024-05-24 苏州浪潮智能科技有限公司 一种提高电源oring可靠性的方法及系统

Also Published As

Publication number Publication date
CN111538624A (zh) 2020-08-14

Similar Documents

Publication Publication Date Title
WO2021212943A1 (zh) 一种服务器电源的维修方法、装置、设备及介质
EP1703401A2 (en) Information processing apparatus and control method therefor
WO2020239060A1 (zh) 错误恢复的方法和装置
KR20190029995A (ko) 차량용 중앙 처리 장치를 제어하는 워치독 회로의 신뢰성을 향상시키는 장치 및 방법
TWI529624B (zh) Method and system of fault tolerance for multiple servers
WO2018095107A1 (zh) 一种bios程序的异常处理方法及装置
US10275330B2 (en) Computer readable non-transitory recording medium storing pseudo failure generation program, generation method, and generation apparatus
US20110145634A1 (en) Apparatus, a recovery method and a program thereof
CN113946148A (zh) 一种基于多ecu协同控制的mcu芯片唤醒系统
CN117389790B (zh) 可恢复故障的固件检测系统、方法、存储介质及服务器
CN113360347A (zh) 一种服务器及其控制方法
CN114116280A (zh) 交互式bmc自恢复方法、系统、终端及存储介质
US20080288828A1 (en) structures for interrupt management in a processing environment
CN110764829B (zh) 一种多路服务器cpu隔离方法及系统
CN111488246A (zh) 一种cpld升级方法、装置、电子设备和可读存储介质
JP2008152552A (ja) 計算機システム及び障害情報管理方法
CN111078454A (zh) 一种云平台配置恢复方法及装置
US7533297B2 (en) Fault isolation in a microcontroller based computer
CN213751052U (zh) 一种可进行程序备份和恢复的双核芯片
KR101100894B1 (ko) 임베디드 장치의 오류검출 및 복구방법
CN111522718A (zh) 一种服务器电源系统以及一种服务器
CN113032026A (zh) 一种服务器主板的固件管理方法、装置、设备及介质
JP3325785B2 (ja) 計算機の故障検出・回復方式
CN113835971A (zh) 一种服务器背板异常点灯的监测方法及相关组件
CN111539044A (zh) 服务器电源固件写保护控制方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21793342

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21793342

Country of ref document: EP

Kind code of ref document: A1