CN117435018A - Computing equipment - Google Patents

Computing equipment Download PDF

Info

Publication number
CN117435018A
CN117435018A CN202311282061.7A CN202311282061A CN117435018A CN 117435018 A CN117435018 A CN 117435018A CN 202311282061 A CN202311282061 A CN 202311282061A CN 117435018 A CN117435018 A CN 117435018A
Authority
CN
China
Prior art keywords
conversion circuit
bmc
fault
voltage data
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311282061.7A
Other languages
Chinese (zh)
Inventor
王彦斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XFusion Digital Technologies Co Ltd
Original Assignee
XFusion Digital Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XFusion Digital Technologies Co Ltd filed Critical XFusion Digital Technologies Co Ltd
Priority to CN202311282061.7A priority Critical patent/CN117435018A/en
Publication of CN117435018A publication Critical patent/CN117435018A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/263Arrangements for using multiple switchable power supplies, e.g. battery and AC
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/28Supervision thereof, e.g. detecting power-supply failure by out of limits supervision
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/30Means for acting in the event of power-supply failure or interruption, e.g. power-supply fluctuations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Power Engineering (AREA)
  • Computing Systems (AREA)
  • Dc-Dc Converters (AREA)
  • Power Sources (AREA)

Abstract

Embodiments of the present application provide a computing device comprising: the device comprises a DC-DC conversion circuit, a baseboard management controller BMC and a logic circuit; the BMC is electrically connected with the logic circuit and the DC-DC conversion circuit respectively; the logic circuit is used for monitoring the alarm information of an alarm register in the logic circuit when the DC-DC conversion circuit is abnormally powered down; and the BMC is used for reading the fault information of the fault register in the DC-DC conversion circuit when the abnormal power failure of the DC-DC conversion circuit is determined according to the alarm information, and generating an alarm log based on the fault information and the alarm information. According to the technical scheme, when the DC-DC conversion circuit is abnormally powered down, the reason of the abnormal power down can be known.

Description

Computing equipment
Technical Field
The invention relates to the technical field of servers, in particular to a computing device.
Background
The server may have an abnormal power failure and cause downtime in the operation process, and the name of the power failure power source is recorded in a log of a baseboard management controller (Baseboard Manager Controller, BMC), for example, a power good signal and an enable signal exist in each power source in the server, and the power good signal and the enable signal of each power source are connected with a logic circuit in the server. When the power is turned off in the running process of the server, the powergood signal is turned over, for example, the power is switched from high level to low level, the logic circuit sets a corresponding alarm register after detecting the powergood signal turning over, and the BMC polls the alarm register of the logic circuit to report the power failure.
However, the BMC does not record the cause of the abnormal power failure of the power supply, and therefore, when the operation and maintenance personnel maintain the operation and maintenance personnel cannot know the cause of the abnormal power failure.
Disclosure of Invention
The embodiment of the invention provides a computing device which can acquire the reason of abnormal power failure when a DC-DC conversion circuit is abnormally powered down.
The computing device provided by the embodiment of the application comprises: the device comprises a DC-DC conversion circuit, a baseboard management controller BMC and a logic circuit; the BMC is electrically connected with the logic circuit and the DC-DC conversion circuit respectively; the logic circuit is used for monitoring the alarm information of an alarm register in the logic circuit when the DC-DC conversion circuit is abnormally powered down; and the BMC is used for reading the fault information of the fault register in the DC-DC conversion circuit when the abnormal power failure of the DC-DC conversion circuit is determined according to the alarm information, and generating an alarm log based on the fault information and the alarm information.
According to the computing device, when the warning register of the CPLD is polled by the BMC, abnormal power failure of the DC-DC conversion circuit can be known, at the moment, the BMC can read the fault information of the fault register of the DC-DC conversion circuit and generate the warning log according to the fault information, so that later operation and maintenance personnel can know the reason of abnormal power failure of the DC-DC conversion circuit by checking the warning log when the later operation and maintenance personnel are maintained, the situation of the computing device on an operation site and the reason of fault are known, and maintenance efficiency is improved.
A possible implementation manner, the BMC is further configured to write the obtained voltage data of the DC-DC conversion circuit into the first memory when the DC-DC conversion circuit is not abnormally powered down; the BMC is also used for storing voltage data from a first memory to a second memory when the DC-DC conversion circuit is abnormally powered down, wherein the first memory is a volatile memory, and the second memory is a nonvolatile memory; and the BMC is used for generating an alarm log based on the fault information, the alarm information and the voltage data in the second memory.
The BMC can also acquire the voltage data of each DC-DC conversion circuit in real time, and combine the fault information and the voltage data to generate an alarm log, so that the fault reasons and the voltage data of the DC-DC conversion circuit before abnormal power failure are stored in the alarm log, and therefore an operation and maintenance person can analyze and send the reasons of abnormal power failure, such as gradual voltage reduction or gradual voltage rise, according to the voltage data in the alarm log, and therefore maintenance is carried out in a targeted manner, and the working efficiency of the maintenance is improved.
A possible implementation manner, the BMC is further configured to read fault information of a fault register of the DC-DC conversion circuit and write the fault information into the third memory when the DC-DC conversion circuit fails abnormally; the third memory is a nonvolatile memory; responding to an alarm log collection instruction, reading voltage data from the second memory and fault information from the third memory; an alarm log is generated based on the alarm information, the voltage data, and the fault information.
The implementation is suitable for both shutdown and non-shutdown of the computing device, for example, the server is powered down in a fault site and does not acquire an alarm log, and after the server retrieves a maintenance site from the fault site, the server can be powered up again to acquire the alarm log. Additionally, it should be appreciated that the first implementation described above is also applicable to servers that directly obtain an alarm log when the BMC has power on the fault site. Because the alarm log comprises voltage data before the power failure of the DC-DC conversion circuit and fault information causing the fault of the DC-DC conversion circuit, operation and maintenance personnel can know the fault reason of the abnormal power failure of the DC-DC conversion circuit through the alarm log.
In one possible implementation manner, the BMC is further configured to respond to an alarm log collection instruction when the DC-DC conversion circuit fails abnormally, and respectively read fault information from a fault register of the DC-DC conversion circuit and read voltage data from the second memory; an alarm log is generated based on the alarm information, the voltage data, and the fault information.
This implementation is suitable for computing devices that are down but not powered down. The BMC does not store the fault information of the fault register into the third memory first, but directly reads the fault information from the fault register of the DC-DC conversion circuit, and directly generates an alarm log based on the fault information and the voltage data of the DC-DC conversion circuit, i.e. the BMC generates the alarm log before power-down.
A possible implementation manner, a logic circuit is used for determining whether the DC-DC conversion circuit is abnormally powered down based on a power supply normal signal of the DC-DC conversion circuit; and under the condition that the normal power supply signal of the DC-DC conversion circuit is overturned, determining that the DC-DC conversion circuit is abnormally powered down.
Because the communication addresses of the BMC and the DC-DC conversion circuits are different, and the corresponding addresses exist when the logic circuit receives the powergood signals, the powergood signals of the different DC-DC conversion circuits are connected with different IO pins of the logic circuit, so that the alarm register of the logic circuit already identifies the DC-DC conversion circuit with abnormal power failure, the BMC can only read the fault register of the DC-DC conversion circuit with abnormal power failure, and the normal DC-DC conversion circuit does not need to be read.
One possible implementation, the BMC is further configured to determine a cause of the fault based on the fault information; displaying the fault reason; wherein the fault cause includes at least one of an over-temperature, an over-current, or an over-voltage. In addition, other fault causes may be included, for example, undervoltage may also be included, which is not described herein.
One possible implementation, BMC, is further configured to obtain voltage data of the DC-DC conversion circuit; the voltage data in the first memory is updated each time with voltage data of a preset duration.
The BMC100 performs rolling refreshing to record the voltage data of the DC-DC conversion circuit, the voltage data recorded at this time will cover the voltage data recorded last time, and the latest voltage data is stored, so that the storage space can be saved.
One possible implementation way is that the BMC, the DC-DC conversion circuit and the logic circuit all have I2C interfaces; the BMC communicates with the DC-DC conversion circuit and the logic device through the I2C interface.
The embodiments of the present application are not particularly limited to interfaces for communication between the BMC, the DC-DC conversion circuit, and the logic circuit, and may also interact with data through other serial communications, such as SPI, and the like.
One possible implementation is that the DC-DC conversion circuit is a voltage regulation unit VRM or a point-of-load power supply. The embodiment of the application is not particularly limited to the implementation form of the DC-DC conversion circuit, and may be, for example, a Buck circuit or a Buck-Boost circuit.
One possible implementation manner further comprises an analog-to-digital converter electrically connected with the BMC and the DC-DC conversion circuit; and the analog-to-digital converter is used for collecting voltage data of the DC-DC conversion circuit and sending the voltage data to the BMC.
Drawings
FIG. 1 is a schematic diagram of a computing device provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of another computing device provided in an embodiment of the present application;
FIG. 3A is a schematic diagram of another computing device provided in an embodiment of the present application;
FIG. 3B is a schematic diagram of yet another computing device provided by an embodiment of the present application;
FIG. 4 is a flowchart of an abnormal power down warning method for a computing device according to an embodiment of the present application;
FIG. 5 is a flowchart of another method for providing an abnormal power down warning for a computing device according to an embodiment of the present application;
fig. 6 is a flowchart of another method for alarming abnormal power failure of a computing device according to an embodiment of the present application.
Detailed Description
The computing device provided in the embodiments of the present application is not specifically limited to the application scenario, for example, the computing device is described by taking a server as an example, and is not specifically limited to the type of the server, for example, the computing device may be a rack server or an edge server. The server may be located in a data center or other areas, and the embodiment of the present application is not specifically limited.
Servers, which are one type of computing device, run faster and are more highly loaded than ordinary computers. The server provides computing or application services to other clients (e.g., PCs, smartphones, etc.) in the network. The server has high-speed CPU operation capability, long-time reliable operation, strong external data throughput capability and better expansibility. Servers are classified from the external form into rack-type, blade-type, tower-type and cabinet-type.
The server generally includes a main board and a power supply for supplying power to respective loads of the main board. The voltage level of the power supply provided to the motherboard is not particularly limited, and is described by taking direct current 12V as an example.
A main board, an important circuit board in the server, the main board comprises a Central Processing Unit (CPU), a controller, a memory, a connector and other components, and the interface of the controller is limited, so that the interface can be expanded through the connector so as to be convenient for connecting peripheral equipment; and the serial data interface is expanded and connected with devices such as a display card. The controller may be one or more of a micro control unit (micro controller unit, MCU), complex programmable logic circuit (complex Programming logic device, CPLD), field programmable gate array (field programmable gate array, FPGA).
The motherboard generally includes various loads, such as a CPU, a fan, and a memory, and embodiments of the present application do not specifically limit a specific type of memory, for example, the memory includes, but is not limited to, the following types: dual-inline memory modules (DIMMs), and the like. In addition, the motherboard may include one CPU or a plurality of CPUs.
A baseboard management controller (baseboard manager controller, BMC) is an essential component of a server for monitoring the operation conditions of the server, such as temperature, fan speed, power supply conditions, operating system status, etc. The BMC is independent of the operation of the server, is not influenced by the server, can perform firmware upgrade, check machine equipment, remotely control the machine to start and other operations on the server in a state that the server is not started, and can record key logs when the server crashes.
I2C (inter-integrated circuit) is a bi-directional two-wire synchronous serial bus that transfers information between devices connected to the bus. Including two I/O lines, SDA (serial data line) and SCL (serial clock line).
The voltage regulation module (voltage regulator module, VRM) is used for converting the voltage of the input power supply into the voltage for the stable power supply of the CPU.
Point of load (POL) refers to using a small VRM near a load, and converting a power supply voltage into a low voltage direct current voltage for supplying the load, so that efficient and stable voltage regulation can be realized, and problems such as power transmission loss and EMI (electromagnetic interference) can be reduced.
In order to enable a person skilled in the art to better understand the technical solution provided by the embodiments of the present application, an application scenario is first described below with reference to the accompanying drawings.
Referring to fig. 1, a schematic diagram of a computing device is provided in an embodiment of the present application.
The computing device provided in the embodiment of the application is described by taking a server as an example.
For example, to improve the power supply reliability, the computing device comprises two power supply units, a first power supply circuit PSU1 and a second power supply circuit PSU2, the load for example comprising at least one of the following: CPU, DIMM, HDD, PCIE, FAN, logic (e.g., CPLD 300), and BMC100.
The output terminal of the first power supply circuit PSU1 and the output terminal of the second power supply circuit PSU2 are both connected to a DC bus, for example, the voltage of the DC bus is 12V, and the computing device further includes a direct-direct (DC/DC) conversion circuit, which includes, but is not limited to, a step-down circuit Buck, VRM or POL, and the DC/DC circuit steps down the 12V voltage and provides the stepped-down voltage to a load, for example, may step down the voltage to 5V, 3.3V or 1.8V.
The CPLD300 may receive a power good signal of each DC/DC circuit, and the CPLD300 determines whether the power is normal by whether the power good signal is turned over, for example, when the power is normal, the power good signal is at a high level, and when the power good signal is turned over to a low level, it indicates that the power is abnormally powered down in the power operation.
When the CPLD300 detects that the powergood signal turns over, a corresponding alarm register in the CPLD300 is set, and the BMC100 polls the alarm register in the CPLD300 to acquire that the power supply is abnormally powered down. However, the BMC100 only knows that the power is abnormal, but cannot learn the reason of power failure due to the power abnormality, and after the server is powered down, all information is lost. For example, the operation and maintenance personnel retrieve the server from the work scene, but do not know the cause of the abnormal power failure of the server, resulting in a large workload of positioning problems.
In order to solve the above technical problems, the embodiment of the application provides a computing device, wherein the BMC can acquire the reason of abnormal power failure of a power supply, and then can conduct targeted maintenance, so that the workload of positioning the problem in the maintenance process is reduced, and the maintenance efficiency is improved.
In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present application, the following detailed description is provided with reference to the accompanying drawings.
Referring to fig. 2, a schematic diagram of another computing device is provided according to an embodiment of the present application.
The computing device provided by the embodiment of the application comprises: a DC-DC conversion circuit 200, a baseboard management controller BMC100, and a logic circuit 300;
the BMC100 is electrically connected to the logic circuit 300 and the DC-DC converter circuit 200, respectively; the number of the DC-DC conversion circuits 200 included in the computing device is not particularly limited, and may be one or more. One load may correspond to one DC-DC conversion circuit, or one load may correspond to a plurality of DC-DC conversion circuits. It should be noted that fig. 2 schematically shows a DC-DC converter circuit 200. The logic circuit 300 is used for monitoring the alarm information of an alarm register in the logic circuit when the DC-DC conversion circuit 200 is abnormally powered down; the embodiment of the present application does not specifically limit the manner of updating the alarm register of the logic circuit 300, for example, the alarm register may be set when the power supply is abnormally powered down, the alarm register corresponds to 0 when the power supply is normal, and the alarm register sets 1 when the power supply is abnormally powered down.
For example, each DC-DC conversion circuit generates a powergood signal, which is passed to logic circuit 300, and logic circuit 300 determines whether the DC-DC conversion circuit is abnormally powered down based on whether the powergood signal is flipped. For example, for a normal voltage of 5V for the DC-DC converter circuit, when the voltage of the DC-DC converter circuit drops to 4V, the powergood signal will be flipped from 1 to 0, and the above 4V is only an example, and the powergood signal may be flipped again when the voltage drops to other voltage values.
The BMC100 is configured to, when determining that the DC-DC conversion circuit 200 fails abnormally according to the alarm information, read fault information of a fault register in the DC-DC conversion circuit 200 and generate an alarm log based on the fault information and the alarm information.
The fault causes corresponding to the fault information of the fault register may include, but are not limited to, at least one of: over-temperature, over-current, over-voltage, etc. It should be understood that the fault information is stored in the form of fault codes, and the fault cause indicated by the fault code of each fault information can be known according to the manual of the DC-DC conversion circuit, for example, E01 represents an over-temperature fault, E02 represents an over-current fault, E03 represents an over-voltage fault lamp, and the above fault codes are just examples and may also exist in other code forms.
According to the computing device, when the warning register of the CPLD is polled by the BMC, abnormal power failure of the DC-DC conversion circuit can be known, at the moment, the BMC can read the fault information of the fault register of the DC-DC conversion circuit and generate the warning log according to the fault information, so that later operation and maintenance personnel can know the reason of abnormal power failure of the DC-DC conversion circuit by checking the warning log when the later operation and maintenance personnel are maintained, the situation of the computing device on an operation site and the reason of fault are known, and maintenance efficiency is improved.
In order to enable the server monitoring platform to know the running state of the server in time, the computing device and the baseboard management controller provided by the embodiment of the application are further used for reporting the reason of abnormal power failure based on the fault information of the fault register.
The BMC is also used for determining a fault reason based on the fault information; displaying the fault reason; wherein the fault cause includes at least one of an over-temperature, an over-current, or an over-voltage.
In addition, the CPLD can only know that the DC-DC conversion circuit is abnormally powered down from the powergood signal flip of the DC-DC conversion circuit, but the voltage change condition of the DC-DC conversion circuit is not known, for example, the specific value of the voltage of the DC-DC conversion circuit before power failure cannot be known. In order to obtain the voltage of the DC-DC conversion circuit before abnormal power failure more accurately, the BMC can also obtain the voltage of each DC-DC conversion circuit in real time and store the voltage, so that the fault cause can be assisted to be analyzed through a specific voltage value when the DC-DC conversion circuit is abnormally powered down.
Reference is now made in detail to fig. 3A, which is a schematic illustration of yet another computing device provided in accordance with an embodiment of the present application.
The BMC100 is further configured to write the obtained voltage data of the DC-DC conversion circuit 200 into the first memory 10 when the DC-DC conversion circuit 200 is not abnormally powered down;
the BMC100 is further configured to store voltage data from the first memory 10 to the second memory 20 when the DC-DC conversion circuit 200 is abnormally powered down, where the first memory 10 is a volatile memory and the second memory 20 is a nonvolatile memory;
the BMC100 is used for generating an alarm log based on the fault information, the alarm information and the voltage data in the second memory 20.
The BMC100 may poll an alarm register of the logic circuit 300, and may learn that the DC-DC converter circuit 200 is abnormally powered down through data of the alarm register.
Moreover, the BMC100 records the voltage data of the DC-DC conversion circuit 200 in real time, and in one possible implementation, the BMC100 includes an analog-to-digital conversion interface; the BMC100 collects voltage data of the DC-DC conversion circuit 200 through an analog-to-digital conversion interface. In addition, the BMC100 may collect voltage data of the DC-DC conversion circuit 200 through an external independently provided analog-to-digital converter. It should be appreciated that when the computing device includes multiple DC-DC conversion circuits, the BMC100 may obtain voltage data for each DC-DC conversion circuit.
The BMC100 performs rolling refresh to record the voltage data of the DC-DC conversion circuit 200, for example, the voltage data recorded this time will cover the voltage data recorded last time, and the latest current voltage data is stored, so that the storage space can be saved. When the BMC100 knows that the power supply is abnormally powered down, the recorded latest voltage data is written into the first nonvolatile memory, so that after the power of the computing equipment is lost, the voltage data of each DC-DC conversion circuit before the power is lost are also continuously stored in the first nonvolatile memory.
The embodiment of the application is not particularly limited to an implementation manner of the first memory, and the first memory is a nonvolatile memory, for example, may be a read-only memory (ROM) or Flash.
The BMC100 can also read fault information from fault registers of each DC-DC conversion circuit, and combine the fault information and voltage data of the DC-DC conversion circuit to generate an alarm log, so that fault reasons and voltage data of the DC-DC conversion circuit before abnormal power failure are stored in the alarm log, and an operation and maintenance person can analyze and send the reasons of abnormal power failure according to the information in the alarm log, thereby carrying out targeted maintenance and improving the working efficiency of the maintenance.
It should be understood that the collection of the alarm log may be triggered automatically, by the monitoring background, or by the man-machine interaction interface, where the BMC100 receives an alarm log collection instruction and generates the alarm log in response to the alarm log collection instruction. Prior to generating the alarm log, the BMC100 has saved the voltage of the DC-DC conversion circuit and fault information of the fault register.
A specific implementation will be described in detail below with reference to the accompanying drawings.
Referring to FIG. 3B, a schematic diagram of yet another computing device is provided in accordance with an embodiment of the present application.
The computing device provided by the embodiment of the application can comprise a plurality of DC-DC conversion circuits; for convenience of description, the first DC-DC conversion circuit is any one of a plurality of DC-DC conversion circuits; the logic circuit is used for updating the alarm register corresponding to the first DC-DC conversion circuit when the first DC-DC conversion circuit is abnormally powered down; the plurality of DC-DC conversion circuits can correspond to different alarm registers, and can also correspond to different addresses in the same alarm register. Acquiring voltage data of the DC-DC conversion circuit and storing the voltage to the first memory, comprising: voltage data of the DC-DC conversion circuit is acquired, and the voltage data in the first memory is updated with the voltage data for a preset period of time each time.
And the baseboard management controller is used for polling all alarm registers of the logic circuit, knowing that the first DC-DC conversion circuit is abnormally powered down, reading the fault information of the fault register of the first DC-DC conversion circuit and writing the fault information into the third memory, namely the second nonvolatile memory.
The computing device provided by the embodiments of the present application is described with reference to a DC-DC conversion circuit including, but not limited to, VRM400 and POL 500.
For example, the BMC100 obtains voltage data for each DC-DC conversion circuit through the ADC interface, such as the voltage at which each DC-DC conversion circuit is normal, including but not limited to one or more of the following: 12V, 5V, 3.3V, 1.8V, 1.2V, etc. It should be appreciated that the voltage data sampled by the BMC100 through the ADC interface may include other values, and that the output voltage may fluctuate, e.g., gradually drop, when the DC-DC conversion circuit is abnormal.
In the embodiment of the present application, the BMC100 is specifically described by performing data interaction with the CPLD300, the VRM400, and the POL500 through the I2C, and may also perform data interaction through other serial communications. For example, when the BMC100 learns, through the CPLD300, that the VRM400 is abnormally powered down, the BMC100 reads the fault information from the fault register of the VRM400 through the I2C, and since the communication addresses of the BMC100 and the respective DC-DC conversion circuits are different, and the corresponding addresses also exist for the power good signals received by the CPLD300, the power good signals of the different DC-DC conversion circuits are connected to different IO pins of the CPLD300, the alarm register of the CPLD300 already identifies the DC-DC conversion circuit in which the abnormal power down occurs, and the BMC100 can only read the fault register of the DC-DC conversion circuit in which the abnormal power down is generated, and does not have to read for the normal DC-DC conversion circuit. It should be understood that the BMC100 may also read the fault registers of all DC-DC conversion circuits, and the embodiments of the present application are not limited in particular.
A possible implementation manner, the BMC100 is configured to obtain voltage data of the DC-DC conversion circuit in real time, and update the voltage data of the DC-DC conversion circuit obtained in a preset period of time to the volatile memory; the magnitude of the preset time period is not particularly limited, for example, the preset time period is described by taking 10 seconds as an example, and the BMC100 periodically obtains the voltage data of the DC-DC conversion circuit according to the preset time period, that is, obtains the voltage data of the DC-DC conversion circuit once in 10 seconds. The embodiment of the present application is also not particularly limited to the volatile memory, and may be, for example, a RAM. It should be appreciated that to save space, the voltage data read every 10 seconds will overwrite the last voltage data, i.e. only the last voltage data is retained in the RAM.
When the DC-DC conversion circuit is abnormally powered down, the BMC writes voltage data of the DC-DC conversion circuit stored in the volatile memory (first memory) into the first nonvolatile memory (second memory). For example, the BMC writes the voltage data of the DC-DC conversion circuit stored in the RAM into the ROM, so that the voltage data of the DC-DC conversion circuit is not lost even if the ROM is powered off.
Two specific implementation modes are respectively introduced below, and the alarm log is collected in a downtime state or in a shutdown state by the computing device, wherein the downtime state and the shutdown state are different in that the input end of a first power supply circuit and the input end of a second power supply circuit of the server are also connected with power supplies, but the VRM, the board card, the CPU and the like are powered down. The power-off state refers to that the input end of the first power supply circuit and the input end of the second power supply circuit are not connected with power sources, namely, the output end of the first power supply circuit and the output end of the second power supply circuit do not output electric energy.
First kind: the method is applicable to both shutdown and non-shutdown of the computing device.
Before the computing equipment normally operates, namely the DC-DC conversion circuit is not abnormally powered down, the BMC periodically collects voltage data of the DC-DC conversion circuit according to a preset time period and refreshes the voltage data to the RAM in a rolling way.
And the BMC is used for writing the voltage data of the DC-DC conversion circuit in the RAM into a first nonvolatile memory, such as a first ROM, reading the fault information of a fault register of the DC-DC conversion circuit and writing the fault information into a second nonvolatile memory, such as a second ROM when the abnormal power failure of the DC-DC conversion circuit is known, wherein the BMC is not powered down at the moment, namely the BMC and the CPLD are powered up. Since the voltage data of the DC-DC conversion circuit is stored in the first ROM and the fault information is stored in the second ROM, the BMC can be powered off without power off at the moment, and even if the BMC is powered off again, the voltage data of the DC-DC conversion circuit can be read from the first ROM and the fault information can be read from the second ROM. For example, when the BMC is powered on again, in response to an alarm log collection instruction, alarm log collection is performed, specifically, the BMC reads voltage data of the DC-DC conversion circuit from the first ROM, reads fault information of a fault register of the DC-DC conversion circuit from the second ROM, and generates an alarm log based on the voltage data of the DC-DC conversion circuit and the fault information.
It should be understood that although the DC-DC conversion circuit is powered down, it means that the DC-DC conversion circuit cannot perform power conversion, i.e. the DC-DC conversion circuit cannot supply power to the load, the auxiliary power supply of the DC-DC conversion circuit is still powered, and the auxiliary power supply provides basic operating voltage data for the DC-DC conversion circuit.
The first implementation manner described above is applicable to a server that is powered down in a fault site and does not obtain an alarm log, and when the server retrieves a maintenance site from the fault site, the server may be powered up again to obtain the alarm log. Additionally, it should be appreciated that the first implementation described above is also applicable to servers that directly obtain an alarm log when the BMC has power on the fault site.
Because the alarm log comprises voltage data before the power failure of the DC-DC conversion circuit and fault information causing the fault of the DC-DC conversion circuit, operation and maintenance personnel can know the fault reason of the abnormal power failure of the DC-DC conversion circuit through the alarm log.
Second kind: the method is suitable for downtime of the computing equipment but not shutdown.
Before the computing equipment normally operates, namely the DC-DC conversion circuit is not abnormally powered down, the BMC periodically collects voltage data of the DC-DC conversion circuit according to a preset time period and refreshes the voltage data to the RAM in a rolling way.
And the BMC is used for powering down the computing equipment in a downtime state, namely the DC-DC conversion circuit is abnormally powered down, at the moment, the board card and the CPU are powered down, but the CPLD and the BMC are powered down. The BMC writes the voltage data of the DC-DC conversion circuit stored in the RAM into a first nonvolatile memory, for example, a first ROM. The BMC of the computing device responds to the alarm log collection command in a downtime state, the BMC collects alarm logs, reads fault information of a fault register of the DC-DC conversion circuit through I2C, reads voltage data of the DC-DC conversion circuit from the first ROM, and generates the alarm log based on the fault information and the voltage data of the DC-DC conversion circuit.
The second implementation described above differs from the first implementation in that the BMC does not first store the fault information of the fault register in a second non-volatile memory, such as a second ROM. The BMC directly reads fault information from a fault register of the DC-DC conversion circuit, and generates an alarm log directly based on the fault information and voltage data of the DC-DC conversion circuit, namely, the BMC generates the alarm log before power-down.
The technical scheme provided by the embodiment of the application is suitable for any DC-DC conversion circuit in the computing device, is not particularly limited to the type of a single board where the DC-DC conversion circuit is located, and can be a main board of the computing device or other single boards of the computing device. The scheme realizes the function of the single-board-level power black box of the computing equipment, and can also know the reason of abnormal power failure of the DC-DC conversion circuit after the power failure of the DC-DC conversion circuit.
Based on the computing device provided in the foregoing embodiments, the embodiments of the present application further provide a method for warning of abnormal power failure of the computing device, which is described in detail below with reference to the accompanying drawings.
Referring to fig. 4, a flowchart of an abnormal power failure warning method of a computing device is provided in an embodiment of the present application.
The abnormal power failure warning method for the computing equipment provided by the embodiment of the application, wherein the computing equipment comprises the following steps: a DC-DC conversion circuit and a logic circuit; the DC-DC conversion circuit is electrically connected with the logic circuit; the logic circuit is used for updating an alarm register of the logic circuit when the abnormal power failure of the DC-DC conversion circuit is monitored;
s401: when the alarm register of the polling logic circuit acquires that the DC-DC conversion circuit is abnormally powered down, the fault information of the fault register of the DC-DC conversion circuit is read;
the CPLD can receive power good signals of the DC/DC circuits, and judges whether the power supply is normal or not through whether the power good signals are overturned, for example, when the power supply is normal, the power good signals are high level, and when the power good signals are overturned to low level, abnormal power failure of the power supply is indicated. For example, the alarm register may be set when the power supply is abnormally powered down, the alarm register corresponds to 0 when the power supply is normal, and the alarm register sets 1 when the power supply is abnormally powered down.
When the CPLD detects that the powergood signal turns over, an alarm register in the CPLD is set, and the BMC polls the alarm register in the CPLD to acquire that the power supply is abnormally powered down.
S402: an alarm log is generated based on the alarm information and the fault information. The warning information at least comprises an identification mark of the DC-DC conversion circuit with abnormal power failure, and the BMC can know the DC-DC conversion circuit with power failure through the warning information.
The fault causes corresponding to the fault information of the fault register may include, but are not limited to, at least one of: over-temperature, over-current, over-voltage, etc. It should be understood that the fault information is stored in the form of a code, and the cause of the fault referred to by the code of each fault information can be known from the manual of the DC-DC conversion circuit.
According to the abnormal power failure warning method for the computing equipment, when the warning register of the CPLD is polled by the BMC, abnormal power failure of the DC-DC conversion circuit can be known, at the moment, the BMC can read the fault information of the fault register and generate the warning log according to the fault information, so that later operation and maintenance personnel can know the reason of abnormal power failure of the DC-DC conversion circuit by checking the warning log when the later operation and maintenance personnel are in maintenance, the situation of the computing equipment on an operation site and the reason of causing the fault can be known, and the maintenance efficiency is improved.
In one possible implementation manner, the CPLD can only know that the DC-DC conversion circuit is abnormally powered down from the powergood signal flip of the DC-DC conversion circuit, but the voltage data change condition of the DC-DC conversion circuit is not known, for example, the specific value of the voltage data of the DC-DC conversion circuit before power failure cannot be known. In order to obtain voltage data before abnormal power failure of the DC-DC conversion circuit more accurately, the BMC can also obtain voltage data of each DC-DC conversion circuit in real time and store the voltage data, so that when the DC-DC conversion circuit is abnormally powered down, the fault cause can be assisted to be analyzed through specific voltage data values. The abnormal power failure warning method for the computing device provided by the embodiment of the application further comprises the following steps: when the abnormal power failure of the DC-DC conversion circuit is known, the recorded voltage data of the DC-DC conversion circuit is written into the nonvolatile memory of the volatile memory.
Writing the recorded voltage data of the DC-DC conversion circuit into the first volatile memory nonvolatile memory, specifically comprising: and when the DC-DC conversion circuit is abnormally powered down, the voltage data of the DC-DC conversion circuit stored in the first volatile memory is written into the first nonvolatile memory.
The magnitude of the preset time period is not particularly limited, for example, the preset time period is described by taking 10 seconds as an example, and the BMC100 periodically obtains the voltage data of the DC-DC conversion circuit according to the preset time period, that is, obtains the voltage data of the DC-DC conversion circuit once in 10 seconds. The embodiment of the present application is also not particularly limited to the first volatile memory, and may be, for example, a RAM. It should be appreciated that to save space, the voltage data read every 10 seconds will overwrite the last voltage data, i.e. only the last voltage data is retained in the RAM.
Two specific implementation modes are respectively introduced below, and the alarm log is collected in a downtime state or in a shutdown state by the computing device, wherein the downtime state and the shutdown state are different in that the input end of a first power supply circuit and the input end of a second power supply circuit of the server are also connected with power supplies, but the VRM, the board card, the CPU and the like are powered down. The power-off state refers to that the input end of the first power supply circuit and the input end of the second power supply circuit are not connected with power sources, namely, the output end of the first power supply circuit and the output end of the second power supply circuit do not output electric energy.
First kind: the method is applicable to both shutdown and non-shutdown of the computing device.
The abnormal power failure warning method for the computing device provided by the embodiment of the application reads the fault information of the fault register of the DC-DC conversion circuit, and generates a warning log based on the fault information and the voltage data of the DC-DC conversion circuit in the nonvolatile memory of the volatile memory, and specifically comprises the following steps:
when the abnormal power failure of the DC-DC conversion circuit is known, the fault information of a fault register of the DC-DC conversion circuit is read and written into a second nonvolatile memory; in response to the alarm log collection instruction, voltage data of the DC-DC conversion circuit is read from the first nonvolatile memory, fault information of a fault register of the DC-DC conversion circuit is read from the second nonvolatile memory, and an alarm log is generated based on the voltage data of the DC-DC conversion circuit and the fault information.
The specific working principle is described in detail below with reference to the flowcharts.
Referring to fig. 5, a flowchart of another method for alarming abnormal power failure of a computing device according to an embodiment of the present application is shown.
S501: the BMC collects voltage data of each DC-DC conversion circuit; the BMC rolls and refreshes the collected voltage data of each DC-DC conversion circuit to the RAM by taking a preset time period as a period;
s502: the BMC polls the alarm information in the alarm register of the CPLD to obtain that the DC-DC conversion circuit has abnormal power failure;
s503: the BMC stores the voltage data of the DC-DC conversion circuit stored in the RAM into the first ROM;
s504: the BMC reads fault information of a fault register of the DC-DC conversion circuit which is abnormally powered down and stores the fault information into the second ROM;
the embodiment of the present application does not specifically limit the sequence of S503 and S504, and may be performed simultaneously or sequentially.
S505: the BMC receives an alarm log collection instruction;
s506: the BMC reads the voltage data of the DC-DC conversion circuit in the first ROM and the fault information in the second ROM, and generates an alarm log based on the alarm information, the voltage data and the fault information.
The first implementation manner described above is applicable to a server that is powered down in a fault site and does not obtain an alarm log, and when the server retrieves a maintenance site from the fault site, the server may be powered up again to obtain the alarm log. Additionally, it should be appreciated that the first implementation described above is also applicable to servers that directly obtain an alarm log when the BMC has power on the fault site.
Because the alarm log comprises voltage data before the power failure of the DC-DC conversion circuit and fault information causing the fault of the DC-DC conversion circuit, operation and maintenance personnel can know the fault reason of the abnormal power failure of the DC-DC conversion circuit through the alarm log.
Second kind: the method is suitable for downtime of the computing equipment but not shutdown.
The abnormal power failure warning method for the computing device provided by the embodiment of the application reads the fault information of the fault register of the DC-DC conversion circuit, and generates a warning log based on the fault information and the voltage data of the DC-DC conversion circuit in the nonvolatile memory of the volatile memory, and specifically comprises the following steps:
and in the downtime state of the computing equipment, responding to an alarm log collection instruction, reading voltage data of the DC-DC conversion circuit from the nonvolatile memory of the volatile memory, reading fault information of a fault register of the DC-DC conversion circuit, and generating an alarm log based on the fault information and the voltage data of the DC-DC conversion circuit.
The specific working principle is described in detail below with reference to the flowcharts.
Referring to fig. 6, a flowchart of another method for alarming abnormal power failure of a computing device according to an embodiment of the present application is provided.
S601: the computing equipment normally operates, and an ADC interface of the BMC acquires voltage data of each DC-DC conversion circuit; the BMC rolls and refreshes the collected voltage data of each DC-DC conversion circuit to the RAM by taking a preset time period as a period;
s602: the BMC polls the alarm information in the alarm register of the CPLD to obtain that the DC-DC conversion circuit has abnormal power failure;
s603: the BMC stores the voltage data of the DC-DC conversion circuit stored in the RAM into the first ROM;
s604: the BMC receives an alarm log collection instruction when the computing equipment is in a downtime state;
s605: the BMC reads voltage data of the DC-DC conversion circuit in the first ROM and reads fault information of a fault register of the DC-DC conversion circuit which is abnormally powered down;
s606: the BMC generates an alarm log based on the alarm information, the voltage data, and the fault information.
The second implementation described above differs from the first implementation in that the BMC does not first store the fault information of the fault register in a second non-volatile memory, such as a second ROM. The BMC directly reads fault information from a fault register of the DC-DC conversion circuit, and generates an alarm log directly based on the fault information and voltage data of the DC-DC conversion circuit, namely, the BMC generates the alarm log before power-down.
The abnormal power failure warning method of the computing device is applicable to any DC-DC conversion circuit in the computing device, is not particularly limited to the type of a single board where the DC-DC conversion circuit is located, and can be a main board of the computing device or other single boards of the computing device. The method realizes the function of the single-board level power black box of the computing equipment, and can also know the reason of abnormal power failure of the DC-DC conversion circuit after the power failure of the DC-DC conversion circuit.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the present application in any way. While the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application. Any person skilled in the art may make many possible variations and modifications to the technical solution of the present application, or modify equivalent embodiments, using the methods and technical contents disclosed above, without departing from the scope of the technical solution of the present application. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present application, which do not depart from the content of the technical solution of the present application, still fall within the scope of protection of the technical solution of the present application.

Claims (10)

1. A computing device, comprising: the device comprises a DC-DC conversion circuit, a baseboard management controller BMC and a logic circuit;
the BMC is electrically connected with the logic circuit and the DC-DC conversion circuit respectively;
the logic circuit is used for monitoring the alarm information of an alarm register in the logic circuit when the DC-DC conversion circuit is abnormally powered down;
and the BMC is used for reading the fault information of a fault register in the DC-DC conversion circuit when the abnormal power failure of the DC-DC conversion circuit is determined according to the alarm information, and generating an alarm log based on the fault information and the alarm information.
2. The computing device of claim 1, wherein the BMC is further configured to write the acquired voltage data of the DC-DC conversion circuit to a first memory when the DC-DC conversion circuit is not abnormally powered down;
the BMC is further used for storing the voltage data from a first memory to a second memory when the DC-DC conversion circuit is abnormally powered down, wherein the first memory is a volatile memory, and the second memory is a nonvolatile memory;
and the BMC is used for generating the alarm log based on the fault information, the alarm information and the voltage data in the second memory.
3. The computing device of claim 2, wherein the BMC is further configured to read fault information of a fault register of the DC-DC conversion circuit and write the fault information to a third memory when the DC-DC conversion circuit is abnormally powered down; the third memory is a nonvolatile memory; reading the voltage data from the second memory and the fault information from the third memory in response to an alarm log collection instruction; and generating an alarm log based on the alarm information, the voltage data and the fault information.
4. The computing device of claim 2, wherein the BMC is further configured to, in response to an alarm log collection instruction, read fault information from a fault register of the DC-DC conversion circuit and the voltage data from the second memory, respectively, when the DC-DC conversion circuit is abnormally powered down; and generating an alarm log based on the alarm information, the voltage data and the fault information.
5. The computing device of any of claims 1-4, wherein the logic circuitry is to determine whether the DC-DC conversion circuit is abnormally powered down based on a power-on-normal signal of the DC-DC conversion circuit; and under the condition that the normal power supply signal of the DC-DC conversion circuit turns over, determining that the DC-DC conversion circuit is abnormally powered down.
6. The computing device of any of claims 1-5, wherein the BMC is further to determine a cause of the fault based on the fault information; displaying the fault reason; wherein the fault causes include over-temperature, over-current and/or over-voltage.
7. The computing device of any of claims 2-6, wherein the BMC is further configured to obtain voltage data of the DC-DC conversion circuit; and updating the voltage data in the first memory each time by using the voltage data with preset duration.
8. The computing device of any of claims 1-7, wherein the BMC, the DC-DC conversion circuit, and the logic circuit each have an I2C interface; the BMC communicates with the DC-DC conversion circuit and the logic device through an I2C interface.
9. The computing device of any of claims 1-8, wherein the DC-DC conversion circuit is a voltage regulation module VRM or a point-of-load power supply.
10. The computing device of any of claims 2-9, further comprising an analog-to-digital converter electrically connected to the BMC and the DC-DC conversion circuit;
the analog-to-digital converter is used for collecting voltage data of the DC-DC conversion circuit and sending the voltage data to the BMC.
CN202311282061.7A 2023-09-28 2023-09-28 Computing equipment Pending CN117435018A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311282061.7A CN117435018A (en) 2023-09-28 2023-09-28 Computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311282061.7A CN117435018A (en) 2023-09-28 2023-09-28 Computing equipment

Publications (1)

Publication Number Publication Date
CN117435018A true CN117435018A (en) 2024-01-23

Family

ID=89556083

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311282061.7A Pending CN117435018A (en) 2023-09-28 2023-09-28 Computing equipment

Country Status (1)

Country Link
CN (1) CN117435018A (en)

Similar Documents

Publication Publication Date Title
CN107193713B (en) FPGA and method for realizing mainboard management control
CN100565470C (en) A kind of blog management method and device
CN111324192A (en) System board power supply detection method, device, equipment and storage medium
US20120137159A1 (en) Monitoring system and method of power sequence signal
CN103197748A (en) Server cabinet system and power management method thereof
US20110145620A1 (en) Method of using power supply to perform far-end monitoring of electronic system
CN103853678A (en) Board management device and board management system and control card using same
CN112286709A (en) Diagnosis method, diagnosis device and diagnosis equipment for server hardware faults
CN101494564B (en) Apparatus for monitoring power supply and method for implementing veneer thermal backup
US7045914B2 (en) System and method for automatically providing continuous power supply via standby uninterrupted power supplies
CN110401260B (en) Server standby power supply and server power supply
US10346072B1 (en) Dislocated power loss protection charge storage
CN113672306B (en) Server component self-checking abnormity recovery method, device, system and medium
US9558137B2 (en) Card control device and control card of computer system having card control device
CN210038709U (en) Power monitoring management buckle
CN113176982A (en) Device and method for realizing VPX architecture-based chassis management system
CN115728665A (en) Power failure detection circuit, method and system
CN117435018A (en) Computing equipment
CN116846790A (en) Method, device, equipment and storage medium for monitoring server abnormality
CN114115503B (en) System and method for automatically adjusting CPU voltage alarm threshold
CN114356060B (en) Master-slave exchange type power supply device, power supply method and host
CN212723938U (en) Device for monitoring power supply working state under server S5 state
US20120119775A1 (en) Circuitry for hot-swappable circuit boards
US10551892B1 (en) Centralized backup power module
CN111506332A (en) Data storage device capable of being remotely controlled and remote control system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination