CN114676019A - Method, device, equipment and storage medium for monitoring state of central processing unit - Google Patents

Method, device, equipment and storage medium for monitoring state of central processing unit Download PDF

Info

Publication number
CN114676019A
CN114676019A CN202210302352.7A CN202210302352A CN114676019A CN 114676019 A CN114676019 A CN 114676019A CN 202210302352 A CN202210302352 A CN 202210302352A CN 114676019 A CN114676019 A CN 114676019A
Authority
CN
China
Prior art keywords
processing unit
central processing
state information
temperature
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210302352.7A
Other languages
Chinese (zh)
Inventor
梅飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210302352.7A priority Critical patent/CN114676019A/en
Publication of CN114676019A publication Critical patent/CN114676019A/en
Priority to PCT/CN2023/083130 priority patent/WO2023179684A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations

Abstract

The application discloses a method, a device, equipment and a storage medium for monitoring the state of a central processing unit, which comprise the following steps: reading the current state information of the central processing unit recorded in a preset register inside the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information; judging whether the current state information is consistent with the last state information of the central processing unit stored locally; and if the current state information is inconsistent with the last state information, performing corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule. By the method and the system, the accurate current state information of the central processing unit can be acquired and reported in time to inform an administrator, the good use performance and the prolonged service life of the central processing unit are favorably maintained, the problems of server downtime and the like caused by high temperature of the central processing unit are avoided as much as possible, and the condition of mistakenly relieving alarms can be effectively prevented.

Description

Method, device, equipment and storage medium for monitoring state of central processing unit
Technical Field
The invention relates to the technical field of server management software, in particular to a method, a device, equipment and a storage medium for monitoring the state of a central processing unit.
Background
Currently, a Central Processing Unit (CPU) is used as a core component for operation and control of a server system, and the state of the CPU needs to be monitored during use to prevent occurrence of a CPU Prochot or a CPU Error. The CPU Prochot signal will trigger when the CPU temperature reaches a preset high temperature threshold.
Currently, in an egs (eagle stream) platform, because a CPU Prochot pin is designed as a unidirectional input pin, a CPLD (Complex Programmable Logic Device) can only obtain an ambient temperature near the CPU detected by a VR (Voltage Regulator) chip, and further determine whether to trigger the CPU Prochot signal according to the ambient temperature near the CPU. Therefore, when the VR chip detects that the ambient temperature near the CPU is delayed from the core temperature of the CPU, a BMC (Baseboard Management Controller) cannot acquire the Prochot state of the CPU through the CPLD in time and trigger an alarm in time.
In conclusion, how to realize accurate cpu state monitoring and realize abnormal state warning accurately is a problem to be solved in the art, which is beneficial for operation and maintenance personnel to adjust heat dissipation strategies or troubleshoot faults in time.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, a device, and a storage medium for monitoring a state of a central processing unit, which can accurately monitor the state of the central processing unit, accurately alarm an abnormal state, and facilitate operation and maintenance personnel to adjust a heat dissipation strategy or troubleshoot faults in time. The specific scheme is as follows:
in a first aspect, the present application discloses a method for monitoring a state of a central processing unit, which is applied to a baseboard management controller, and includes:
reading the current state information of the central processing unit recorded in a preset register inside the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information;
judging whether the current state information is consistent with the last state information of the central processing unit stored locally;
and if the current state information is inconsistent with the last state information, performing corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule.
Optionally, the reading, by using a dedicated single-wire bus that establishes a communication connection with a central processing unit in advance, the current state information of the central processing unit recorded in a preset register inside the central processing unit, and locally storing the current state information includes:
reading the current temperature state information of the central processing unit recorded in a preset register inside the central processing unit through a platform environment type control interface which is in communication connection with the central processing unit in advance, and locally storing the current temperature state information.
Optionally, the determining whether the current state information is consistent with the last state information of the central processing unit stored locally includes:
and if the current temperature state information is consistent with the last temperature state information of the central processing unit stored locally, not performing corresponding abnormal state alarm or removing the abnormal state alarm, and skipping to the execution of the platform environment type control interface which is in communication connection with the central processing unit in advance, reading the current temperature state information of the central processing unit recorded in a preset register in the central processing unit, and locally storing the current temperature state information.
Optionally, if the current state information is inconsistent with the last state information of the central processing unit stored locally, performing a corresponding abnormal state alarm according to a preset abnormal state alarm rule, including:
if the current temperature state information is inconsistent with the last temperature state information of the central processing unit stored locally and the current temperature state information is temperature abnormal state information, triggering a temperature abnormal state reporting instruction, recording an alarm log generated by the temperature abnormal state through a substrate management controller, and carrying out corresponding temperature abnormal state alarm.
Optionally, if the current state information is inconsistent with the last state information of the central processing unit stored locally, performing a corresponding abnormal state alarm removal according to a preset abnormal state alarm rule, including:
detecting and recording the system time of the server when the central processing unit is in an abnormal temperature state each time;
if the current temperature state information is inconsistent with the last temperature state information of the central processing unit stored locally and the current temperature state information is temperature normal state information, calculating the time difference between the system time of the current server and the system time of the server when the last temperature of the central processing unit is abnormal state information;
and selecting whether to remove the abnormal state alarm or not according to the time difference and the temperature state information of the central processing unit detected by a temperature sensor arranged in the voltage regulator.
Optionally, the selecting whether to cancel the abnormal state alarm according to the time difference and the cpu temperature state information detected by the temperature sensor built in the voltage regulator includes:
and when the time difference is smaller than the preset time difference, not performing the operation of removing the abnormal state alarm, and skipping to the step of executing the platform environment type control interface which is in communication connection with the central processing unit in advance, reading the current temperature state information of the central processing unit recorded in a preset register inside the central processing unit, and locally storing the current temperature state information.
Optionally, the selecting whether to cancel the abnormal state alarm according to the time difference and the cpu temperature state information detected by the temperature sensor built in the voltage regulator includes:
if the time difference is larger than the preset time difference and the temperature state information of the central processing unit detected by the temperature sensor arranged in the voltage regulator is the normal temperature state information, recording a log generated in the normal temperature state through the substrate management controller and removing the abnormal state alarm.
In a second aspect, the present application discloses a central processing unit status monitoring device, including:
the information reading module is used for reading the current state information of the central processing unit recorded in a preset register in the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information;
the information judging module is used for judging whether the current state information is consistent with the last state information of the central processing unit stored locally;
and the state monitoring module is used for carrying out corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule if the current state information is inconsistent with the last state information.
In a third aspect, the present application discloses an electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the central processor state monitoring method as disclosed in the foregoing.
In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements the steps of the central processor state monitoring method as disclosed in the foregoing.
Therefore, the application discloses a method for monitoring the state of a central processing unit, which is applied to a substrate management controller and comprises the following steps: reading the current state information of the central processing unit recorded in a preset register inside the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information; judging whether the current state information is consistent with the last state information of the central processing unit stored locally; and if the current state information is inconsistent with the last state information, performing corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule. Therefore, the method and the device have the advantages that the current state information of the central processing unit can be directly acquired through the special single-line bus which is in communication connection with the central processing unit in advance, the accurate current state information of the central processing unit can be acquired, the good service performance and the prolonged service life of the central processing unit can be maintained, meanwhile, the problems of server downtime and the like caused by high temperature of the central processing unit are avoided as much as possible, and the method and the device have objective economic benefits. And then corresponding abnormal state alarm or abnormal state alarm releasing is carried out according to a preset abnormal state alarm rule, so that the condition of mistaken alarm releasing can be effectively prevented.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method for monitoring the status of a central processing unit according to the present disclosure;
FIG. 2 is a flow chart of a particular CPU status monitoring method disclosed herein;
FIG. 3 is a flow chart of a specific CPU status monitoring method disclosed herein;
fig. 4 is a schematic structural diagram of a cpu status monitoring apparatus disclosed in the present application;
fig. 5 is a block diagram of an electronic device disclosed in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Currently, in the EGS platform, because the CPU Prochot pin is designed as a unidirectional input pin, the CPLD can only obtain the ambient temperature near the CPU detected by the VR chip, and then determine whether to trigger the CPU Prochot signal according to the ambient temperature near the CPU. Therefore, the VR chip detects that the ambient temperature near the CPU is delayed from the core temperature of the CPU, so that the BMC cannot acquire the CPU Prochot state through the CPLD in time and trigger an alarm in time.
Therefore, the CPU state monitoring scheme can realize accurate CPU state monitoring and can accurately realize abnormal state warning, and operation and maintenance personnel can adjust a heat dissipation strategy or troubleshoot faults in time.
Referring to fig. 1, an embodiment of the present invention discloses a method for monitoring a state of a central processing unit, which is applied to a baseboard management controller, and specifically includes:
step S11: reading the current state information of the central processing unit recorded in a preset register inside the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information.
In this embodiment, the current temperature state information of the central processing unit recorded in the preset register inside the central processing unit is read through a platform environment type control interface that establishes communication connection with the central processing unit in advance, and the current temperature state information is locally stored. It can be understood that the BMC periodically reads the value of bit0 in the CPU Package Thermal Status register through the PECI, where bit0 in the register is a bit representing the Prochot state of the CPU, where 1 represents in the Prochot state, and 0 represents in the normal state, and stores the temperature state information to the local.
Step S12: and judging whether the current state information is consistent with the last state information of the central processing unit stored locally.
In this embodiment, it is determined whether the current state information is consistent with the last state information of the cpu stored locally, and if the current temperature state information is consistent with the last temperature state information of the cpu stored locally, the corresponding abnormal state alarm is not performed or the abnormal state alarm is released, and the platform environment type control interface that establishes communication connection with the cpu in advance is executed, the current temperature state information of the cpu recorded in a preset register inside the cpu is read, and the current temperature state information is stored locally. It will be appreciated that the comparison is made from the read current temperature state information to the last temperature state information stored locally, for example: detecting the value of bit0 in the current CPU Package Thermal Status register, then taking out the last temperature state information from the local for comparison, comparing the value of bit0 read this time with the value of bit0 read last time, if the value of bit0 read currently is 0, the value of bit0 read last time is 0, the comparison result is consistent, the temperature in the CPU at two times belongs to the normal state, and the BMC does not need to report; if the value of the bit0 bit read currently is 1 and the value of the bit0 bit read last time is 1, the comparison results are consistent, the temperature in the CPU at two times belongs to an abnormal state, and the CPU is still in a temperature alarm state at the moment, so that the temperature in the CPU is not changed all the time, and the BMC does not need to report.
Step S13: and if the current state information is inconsistent with the last state information, performing corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule.
In this embodiment, in an implementation manner, if the current temperature state information is inconsistent with the last temperature state information of the cpu stored locally, and the current temperature state information is temperature abnormal state information, a temperature abnormal state reporting instruction is triggered, and an alarm log generated in a temperature abnormal state is recorded by the substrate management controller, and a corresponding temperature abnormal state alarm is performed. It can be understood that when the value of bit0 in the current CPU Package Thermal Status register is detected to be 1, and the value of bit0 stored last time is 0, it indicates that the temperature state of the CPU detected last time is a normal state, and the temperature state of the CPU detected currently is an abnormal state, the temperature state information of the previous time and the temperature state information of the next time are inconsistent, and the current temperature state is an abnormal state, which indicates that after the Prochot trigger/release of the CPU for a complete cycle, a temperature state abnormal reporting instruction is triggered, and the BMC needs to record an alarm log and perform corresponding abnormal state alarm.
In another embodiment, if the current temperature state information is inconsistent with the last temperature state information of the central processing unit stored locally, and the current temperature state information is temperature normal state information, the abnormal state alarm is released according to an abnormal state alarm rule. It can be understood that, when the value of bit0 in the current CPU Package Thermal Status register is detected to be 0, and the value of bit0 stored last time is 1, it indicates that the CPU temperature state detected last time is an abnormal state, and the CPU temperature state detected currently is a normal state, the temperature state information of the previous time and the temperature state information of the next time are inconsistent, and the current temperature state is a normal state, at this time, because bit0 of the CPU Package Thermal Status register is in an oscillation state, an abnormal state alarm cannot be immediately released, and whether a state alarm is released needs to be further determined based on an abnormal state alarm rule.
Further, the present application may also monitor a CPU ERROR state, where the CPU ERROR state may specifically include but is not limited to: an IERR (internal Error), Processor Disabled (Processor corruption), UCE (non-Processor Machine Check Exception), CE (Processor recoverable Error), and the like.
Therefore, the application discloses a method for monitoring the state of a central processing unit, which is applied to a substrate management controller and comprises the following steps: reading the current state information of the central processing unit recorded in a preset register inside the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information; judging whether the current state information is consistent with the last state information of the central processing unit stored locally; and if the current state information is inconsistent with the last state information, performing corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule. Therefore, the method and the device have the advantages that the current state information of the central processing unit can be directly acquired through the special single-line bus which is in communication connection with the central processing unit in advance, the accurate current state information of the central processing unit can be acquired, the good service performance and the prolonged service life of the central processing unit can be maintained, meanwhile, the problems of downtime of a server and the like caused by high temperature of the central processing unit are avoided as much as possible, and the method and the device have objective economic benefits. And then corresponding abnormal state alarm or abnormal state alarm release is carried out according to a preset abnormal state alarm rule, so that the condition of mistakenly releasing the alarm can be effectively prevented.
Referring to fig. 2 and fig. 3, the embodiment of the present invention discloses a specific method for monitoring the state of a central processing unit, and compared with the previous embodiment, the present embodiment further describes and optimizes the technical solution. Specifically, the method comprises the following steps:
step S21: reading the current temperature state information of the central processing unit recorded in a preset register inside the central processing unit through a platform environment type control interface which is in communication connection with the central processing unit in advance, and locally storing the current temperature state information.
In this embodiment, the PECI is used to read the current temperature state information of the central processing unit recorded in the preset register inside the central processing unit, and locally store the current temperature state information, it can be understood that the PECI is used to directly read the current temperature state information of the central processing unit recorded in the preset register inside the central processing unit, instead of using the CPLD to detect the ambient temperature near the central processing unit, and the VR chip is used to detect the ambient temperature near the central processing unit, so that the BMC can timely and accurately obtain the Prochot state of the central processing unit through the PECI.
Step S22: and judging whether the current temperature state information is consistent with the last temperature state information of the central processing unit stored locally.
Step S23: detecting and recording the system time of the server when the central processing unit is in an abnormal temperature state each time; and if the current temperature state information is inconsistent with the last temperature state information of the central processing unit stored locally and the current temperature state information is temperature normal state information, calculating the time difference between the system time of the current server and the system time of the server when the last temperature of the central processing unit is abnormal state information.
In this embodiment, the system time of the server when the central processing unit is in the abnormal temperature state each time is detected and recorded, for example: when detecting that the value of bit0 of the CPU Package Thermal Status register is 1, recording and storing the system time of the server; if the current temperature state information is inconsistent with the last temperature state information of the central processing unit stored locally, such as: when not _ value is not equal to last _ value, and it is detected that the value of bit0 of the CPU Package Thermal Status register is 0, that is, the current temperature state information is temperature normal state information, at this time, since the CPU core temperature just rises to the prochot threshold value, the bit0 is in a vibration state, that is, the value repeatedly jumps between 0 and 1, in order to prevent false cancellation of an abnormal state alarm, a time difference between the system time of the current server and the system time of the server when the last temperature of the central processing unit is abnormal state information needs to be calculated, and then whether the abnormal state alarm is cancelled is determined.
Step S24: and selecting whether to remove the abnormal state alarm or not according to the time difference and the temperature state information of the central processing unit detected by a temperature sensor arranged in the voltage regulator.
In this embodiment, a relationship between a time difference between a current system time of the server and a recorded last time when bit0 of the CPU Package Thermal Status register is 1 and a preset time difference is compared, and in one embodiment, when the time difference is smaller than the preset time difference, an operation of removing an abnormal state alarm is not performed, and a step of executing the platform environment type control interface which establishes communication connection with the central processing unit in advance, reading current temperature state information of the central processing unit recorded in the preset register inside the central processing unit, and locally storing the current temperature state information is performed. It can be understood that the preset time difference is 20s, if the time difference is 13s, the time difference is smaller than the preset time difference time _ now-time _ last < 20s, which indicates that bit0 of the CPU Package Thermal Status register is in an oscillation state at this time, the abnormal state alarm cannot be released, and the method continues to jump to execute the platform environment type control interface which establishes communication connection with the central processing unit in advance, reads the current temperature state information of the central processing unit recorded in the preset register inside the central processing unit, and locally saves the current temperature state information.
In another embodiment, if the time difference is greater than a preset time difference and the cpu temperature status information detected by a temperature sensor built in the voltage regulator is normal temperature status information, the bmc records a log generated in a normal temperature status and releases an abnormal status alarm. It can be understood that the preset time difference is 20s, if the time difference is 26s, time _ now-time _ last is greater than 20s and is greater than the preset time difference, at this time, auxiliary judgment needs to be performed according to the ambient temperature state near the central processing unit detected by a temperature sensor built in a VR chip, and if the VR chip detects that the ambient temperature near the central processing unit is also in a normal temperature state, a log generated in the normal temperature state is recorded by a BMC and an abnormal state alarm is released; and if the VR chip detects that the ambient temperature near the central processing unit is in an abnormal temperature state, the abnormal state alarm is not removed.
Therefore, the embodiment of the application can read the bit0 value of the CPU Package Thermal Status register through the PECI, and can accurately monitor the internal Prochot state of the CPU in real time. The method solves the problem that the BMC of the EGS platform cannot monitor the high-temperature alarm of the core temperature of the CPU of the server, is quicker than the original method of reading the transmission signal of the Prochot pin through the CPLD in a transparent way, can report the abnormal state alarm more timely to the administrator by the abnormal state alarm or timely and accurately removing the abnormal state alarm, is beneficial to operation and maintenance personnel to adjust the heat dissipation strategy or troubleshoot the fault in time, is beneficial to maintaining the good service performance of the CPU and prolonging the service life, avoids the problems of shutdown of the server and the like caused by the high temperature of the CPU as much as possible, and has objective economic benefit.
Referring to fig. 4, an embodiment of the present invention discloses a central processing unit status monitoring apparatus, including:
the information reading module 11 is configured to read current state information of the central processing unit recorded in a preset register inside the central processing unit through a dedicated single-wire bus that is in communication connection with the central processing unit in advance, and locally store the current state information;
the information judging module 12 is configured to judge whether the current state information is consistent with last state information of the central processing unit stored locally;
and the state monitoring module 13 is configured to perform corresponding abnormal state alarm or remove the abnormal state alarm according to a preset abnormal state alarm rule if the current state information is inconsistent with the previous state information.
Therefore, the application discloses a method for monitoring the state of a central processing unit, which is applied to a substrate management controller and comprises the following steps: reading the current state information of the central processing unit recorded in a preset register inside the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information; judging whether the current state information is consistent with the last state information of the central processing unit stored locally; and if the current state information is inconsistent with the last state information, performing corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule. Therefore, the method and the device have the advantages that the current state information of the central processing unit can be directly acquired through the special single-line bus which is in communication connection with the central processing unit in advance, the accurate current state information of the central processing unit can be acquired, the good service performance and the prolonged service life of the central processing unit can be maintained, meanwhile, the problems of server downtime and the like caused by high temperature of the central processing unit are avoided as much as possible, and the method and the device have objective economic benefits. And then corresponding abnormal state alarm or abnormal state alarm release is carried out according to a preset abnormal state alarm rule, so that the condition of mistakenly releasing the alarm can be effectively prevented.
Further, an electronic device is disclosed in the embodiments of the present application, and fig. 5 is a block diagram of the electronic device 20 according to an exemplary embodiment, which should not be construed as limiting the scope of the application.
Fig. 5 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present disclosure. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. The memory 22 is used for storing a computer program, and the computer program is loaded and executed by the processor 21 to implement the relevant steps in the central processing unit status monitoring method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.
In this embodiment, the power supply 23 is configured to provide a working voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and a communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein; the input/output interface 25 is configured to acquire external input data or output data to the outside, and a specific interface type thereof may be selected according to specific application requirements, which is not specifically limited herein.
In addition, the storage 22 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., and the resources stored thereon may include an operating system 221, a computer program 222, etc., and the storage manner may be a transient storage or a permanent storage.
The operating system 221 is used for managing and controlling each hardware device on the electronic device 20 and the computer program 222, and may be Windows Server, Netware, Unix, Linux, or the like. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the central processor state monitoring method performed by the electronic device 20 disclosed in any of the foregoing embodiments.
Further, the present application also discloses a computer-readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the central processor state monitoring method disclosed above. For the specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, which are not described herein again.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above method, apparatus, device and storage medium for monitoring the state of a central processing unit according to the present invention are described in detail, and a specific example is applied in the description to explain the principle and implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A CPU state monitoring method is applied to a substrate management controller and comprises the following steps:
reading the current state information of the central processing unit recorded in a preset register inside the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information;
judging whether the current state information is consistent with the last state information of the central processing unit stored locally;
and if the current state information is inconsistent with the last state information, performing corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule.
2. The method for monitoring the state of a central processing unit according to claim 1, wherein the reading the current state information of the central processing unit recorded in a preset register inside the central processing unit and locally saving the current state information via a dedicated single-wire bus that establishes a communication connection with the central processing unit in advance comprises:
reading the current temperature state information of the central processing unit recorded in a preset register inside the central processing unit through a platform environment type control interface which is in communication connection with the central processing unit in advance, and locally storing the current temperature state information.
3. The method according to claim 2, wherein said determining whether the current state information is consistent with the last state information of the cpu stored locally comprises:
and if the current temperature state information is consistent with the last temperature state information of the central processing unit stored locally, not performing corresponding abnormal state alarm or removing the abnormal state alarm, and skipping to the execution of the platform environment type control interface which is in communication connection with the central processing unit in advance, reading the current temperature state information of the central processing unit recorded in a preset register in the central processing unit, and locally storing the current temperature state information.
4. The method for monitoring the state of the central processing unit according to claim 2, wherein if the current state information is inconsistent with the last state information of the central processing unit stored locally, performing a corresponding abnormal state alarm according to a preset abnormal state alarm rule, comprising:
if the current temperature state information is inconsistent with the last temperature state information of the central processing unit stored locally and the current temperature state information is temperature abnormal state information, triggering a temperature abnormal state reporting instruction, recording an alarm log generated by the temperature abnormal state through a substrate management controller, and carrying out corresponding temperature abnormal state alarm.
5. The method for monitoring the state of the central processing unit according to claim 2, wherein if the current state information is inconsistent with the last state information of the central processing unit stored locally, performing a corresponding abnormal state alarm removal according to a preset abnormal state alarm rule, includes:
detecting and recording the system time of the server when the central processing unit is in an abnormal temperature state each time;
if the current temperature state information is inconsistent with the last temperature state information of the central processing unit stored locally and the current temperature state information is temperature normal state information, calculating the time difference between the system time of the current server and the system time of the server when the last temperature of the central processing unit is abnormal state information;
and selecting whether to remove the abnormal state alarm or not according to the time difference and the temperature state information of the central processing unit detected by a temperature sensor arranged in the voltage regulator.
6. The method for monitoring the state of the central processing unit according to claim 5, wherein the selecting whether to cancel the abnormal state alarm according to the time difference and the temperature state information of the central processing unit detected by a temperature sensor built in the voltage regulator comprises:
and when the time difference is smaller than the preset time difference, not performing the operation of removing the abnormal state alarm, and skipping to the step of executing the platform environment type control interface which is in communication connection with the central processing unit in advance, reading the current temperature state information of the central processing unit recorded in a preset register inside the central processing unit, and locally storing the current temperature state information.
7. The method for monitoring the state of the central processing unit according to claim 5, wherein the selecting whether to cancel the abnormal state alarm according to the time difference and the temperature state information of the central processing unit detected by a temperature sensor built in the voltage regulator comprises:
if the time difference is larger than the preset time difference and the temperature state information of the central processing unit detected by the temperature sensor arranged in the voltage regulator is the normal temperature state information, recording a log generated in the normal temperature state through the substrate management controller and removing the abnormal state alarm.
8. A central processing unit state monitoring device, comprising:
the information reading module is used for reading the current state information of the central processing unit recorded in a preset register in the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information;
the information judging module is used for judging whether the current state information is consistent with the last state information of the central processing unit stored locally;
and the state monitoring module is used for carrying out corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule if the current state information is inconsistent with the last state information.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to carry out the steps of the central processor condition monitoring method according to any one of claims 1 to 7.
10. A computer-readable storage medium for storing a computer program; wherein the computer program realizes the steps of the central processor condition monitoring method according to any one of claims 1 to 7 when executed by a processor.
CN202210302352.7A 2022-03-25 2022-03-25 Method, device, equipment and storage medium for monitoring state of central processing unit Pending CN114676019A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210302352.7A CN114676019A (en) 2022-03-25 2022-03-25 Method, device, equipment and storage medium for monitoring state of central processing unit
PCT/CN2023/083130 WO2023179684A1 (en) 2022-03-25 2023-03-22 Method and apparatus for monitoring state of central processing unit, and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210302352.7A CN114676019A (en) 2022-03-25 2022-03-25 Method, device, equipment and storage medium for monitoring state of central processing unit

Publications (1)

Publication Number Publication Date
CN114676019A true CN114676019A (en) 2022-06-28

Family

ID=82076556

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210302352.7A Pending CN114676019A (en) 2022-03-25 2022-03-25 Method, device, equipment and storage medium for monitoring state of central processing unit

Country Status (2)

Country Link
CN (1) CN114676019A (en)
WO (1) WO2023179684A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023179684A1 (en) * 2022-03-25 2023-09-28 苏州浪潮智能科技有限公司 Method and apparatus for monitoring state of central processing unit, and device and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS60195649A (en) * 1984-03-16 1985-10-04 Nec Corp Error reporting system of microprogram-controlled type data processor
CN108089964A (en) * 2017-12-07 2018-05-29 郑州云海信息技术有限公司 A kind of device and method by BMC monitoring server CPLD states
CN108268360A (en) * 2018-01-19 2018-07-10 郑州云海信息技术有限公司 A kind of BMC obtains method, system, device and the storage medium of memory temperature
CN109656767A (en) * 2018-12-21 2019-04-19 广东浪潮大数据研究有限公司 A kind of acquisition methods, system and the associated component of CPLD status information
CN111767184A (en) * 2020-09-01 2020-10-13 苏州浪潮智能科技有限公司 Fault diagnosis method and device, electronic equipment and storage medium
CN114676019A (en) * 2022-03-25 2022-06-28 苏州浪潮智能科技有限公司 Method, device, equipment and storage medium for monitoring state of central processing unit

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023179684A1 (en) * 2022-03-25 2023-09-28 苏州浪潮智能科技有限公司 Method and apparatus for monitoring state of central processing unit, and device and storage medium

Also Published As

Publication number Publication date
WO2023179684A1 (en) 2023-09-28

Similar Documents

Publication Publication Date Title
CN110661659B (en) Alarm method, device and system and electronic equipment
JP4573179B2 (en) Performance load abnormality detection system, performance load abnormality detection method, and program
US20050188263A1 (en) Detecting and correcting a failure sequence in a computer system before a failure occurs
CN114328102B (en) Equipment state monitoring method, equipment state monitoring device, equipment and computer readable storage medium
CN112380089A (en) Data center monitoring and early warning method and system
CN111309562A (en) Server failure prediction method, device, equipment and storage medium
CN114676019A (en) Method, device, equipment and storage medium for monitoring state of central processing unit
CN111625386A (en) Monitoring method and device for power-on overtime of system equipment
CN111901172B (en) Application service monitoring method and system based on cloud computing environment
CN111478792B (en) Cutover information processing method, system and device
CN111488050B (en) Power supply monitoring method, system and server
US20230359514A1 (en) Operation-based event suppression
JP5240709B2 (en) Computer system, method and computer program for evaluating symptom
CN115102838B (en) Emergency processing method and device for server downtime risk and electronic equipment
CN115687026A (en) Multi-node server fault early warning method, device, equipment and medium
CN113886122B (en) System operation exception handling method, device, equipment and storage medium
CN115174350A (en) Operation and maintenance warning method, device, equipment and medium
CN111694715A (en) Abnormity warning method, device, equipment and machine readable storage medium
CN113708986A (en) Server monitoring apparatus, method and computer-readable storage medium
CN111309532A (en) PCIE equipment abnormity detection method, system, electronic equipment and storage medium
JPH06324916A (en) Fault information logging system
CN113381895B (en) Network fault detection method and device
CN116225812B (en) Baseboard management controller system operation method, device, equipment and storage medium
CN109491872B (en) Memory supervision method and device and computer readable storage medium
CN110795263B (en) Hard disk link protection method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination