CN114676019A - Method, device, equipment and storage medium for monitoring state of central processing unit - Google Patents
Method, device, equipment and storage medium for monitoring state of central processing unit Download PDFInfo
- Publication number
- CN114676019A CN114676019A CN202210302352.7A CN202210302352A CN114676019A CN 114676019 A CN114676019 A CN 114676019A CN 202210302352 A CN202210302352 A CN 202210302352A CN 114676019 A CN114676019 A CN 114676019A
- Authority
- CN
- China
- Prior art keywords
- processing unit
- central processing
- state information
- temperature
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 title claims abstract description 158
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000012544 monitoring process Methods 0.000 title claims abstract description 36
- 230000002159 abnormal effect Effects 0.000 claims abstract description 108
- 238000004891 communication Methods 0.000 claims abstract description 32
- 238000004590 computer program Methods 0.000 claims description 17
- 239000000758 substrate Substances 0.000 claims description 9
- 238000012806 monitoring device Methods 0.000 claims description 2
- 230000002035 prolonged effect Effects 0.000 abstract description 4
- 230000008901 benefit Effects 0.000 description 7
- 101150039033 Eci2 gene Proteins 0.000 description 5
- 102100021823 Enoyl-CoA delta isomerase 2 Human genes 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000017525 heat dissipation Effects 0.000 description 4
- 238000012423 maintenance Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 230000010355 oscillation Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 239000000306 component Substances 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3024—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3058—Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
Abstract
The application discloses a method, a device, equipment and a storage medium for monitoring the state of a central processing unit, which comprise the following steps: reading the current state information of the central processing unit recorded in a preset register inside the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information; judging whether the current state information is consistent with the last state information of the central processing unit stored locally; and if the current state information is inconsistent with the last state information, performing corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule. By the method and the system, the accurate current state information of the central processing unit can be acquired and reported in time to inform an administrator, the good use performance and the prolonged service life of the central processing unit are favorably maintained, the problems of server downtime and the like caused by high temperature of the central processing unit are avoided as much as possible, and the condition of mistakenly relieving alarms can be effectively prevented.
Description
Technical Field
The invention relates to the technical field of server management software, in particular to a method, a device, equipment and a storage medium for monitoring the state of a central processing unit.
Background
Currently, a Central Processing Unit (CPU) is used as a core component for operation and control of a server system, and the state of the CPU needs to be monitored during use to prevent occurrence of a CPU Prochot or a CPU Error. The CPU Prochot signal will trigger when the CPU temperature reaches a preset high temperature threshold.
Currently, in an egs (eagle stream) platform, because a CPU Prochot pin is designed as a unidirectional input pin, a CPLD (Complex Programmable Logic Device) can only obtain an ambient temperature near the CPU detected by a VR (Voltage Regulator) chip, and further determine whether to trigger the CPU Prochot signal according to the ambient temperature near the CPU. Therefore, when the VR chip detects that the ambient temperature near the CPU is delayed from the core temperature of the CPU, a BMC (Baseboard Management Controller) cannot acquire the Prochot state of the CPU through the CPLD in time and trigger an alarm in time.
In conclusion, how to realize accurate cpu state monitoring and realize abnormal state warning accurately is a problem to be solved in the art, which is beneficial for operation and maintenance personnel to adjust heat dissipation strategies or troubleshoot faults in time.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, a device, and a storage medium for monitoring a state of a central processing unit, which can accurately monitor the state of the central processing unit, accurately alarm an abnormal state, and facilitate operation and maintenance personnel to adjust a heat dissipation strategy or troubleshoot faults in time. The specific scheme is as follows:
in a first aspect, the present application discloses a method for monitoring a state of a central processing unit, which is applied to a baseboard management controller, and includes:
reading the current state information of the central processing unit recorded in a preset register inside the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information;
judging whether the current state information is consistent with the last state information of the central processing unit stored locally;
and if the current state information is inconsistent with the last state information, performing corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule.
Optionally, the reading, by using a dedicated single-wire bus that establishes a communication connection with a central processing unit in advance, the current state information of the central processing unit recorded in a preset register inside the central processing unit, and locally storing the current state information includes:
reading the current temperature state information of the central processing unit recorded in a preset register inside the central processing unit through a platform environment type control interface which is in communication connection with the central processing unit in advance, and locally storing the current temperature state information.
Optionally, the determining whether the current state information is consistent with the last state information of the central processing unit stored locally includes:
and if the current temperature state information is consistent with the last temperature state information of the central processing unit stored locally, not performing corresponding abnormal state alarm or removing the abnormal state alarm, and skipping to the execution of the platform environment type control interface which is in communication connection with the central processing unit in advance, reading the current temperature state information of the central processing unit recorded in a preset register in the central processing unit, and locally storing the current temperature state information.
Optionally, if the current state information is inconsistent with the last state information of the central processing unit stored locally, performing a corresponding abnormal state alarm according to a preset abnormal state alarm rule, including:
if the current temperature state information is inconsistent with the last temperature state information of the central processing unit stored locally and the current temperature state information is temperature abnormal state information, triggering a temperature abnormal state reporting instruction, recording an alarm log generated by the temperature abnormal state through a substrate management controller, and carrying out corresponding temperature abnormal state alarm.
Optionally, if the current state information is inconsistent with the last state information of the central processing unit stored locally, performing a corresponding abnormal state alarm removal according to a preset abnormal state alarm rule, including:
detecting and recording the system time of the server when the central processing unit is in an abnormal temperature state each time;
if the current temperature state information is inconsistent with the last temperature state information of the central processing unit stored locally and the current temperature state information is temperature normal state information, calculating the time difference between the system time of the current server and the system time of the server when the last temperature of the central processing unit is abnormal state information;
and selecting whether to remove the abnormal state alarm or not according to the time difference and the temperature state information of the central processing unit detected by a temperature sensor arranged in the voltage regulator.
Optionally, the selecting whether to cancel the abnormal state alarm according to the time difference and the cpu temperature state information detected by the temperature sensor built in the voltage regulator includes:
and when the time difference is smaller than the preset time difference, not performing the operation of removing the abnormal state alarm, and skipping to the step of executing the platform environment type control interface which is in communication connection with the central processing unit in advance, reading the current temperature state information of the central processing unit recorded in a preset register inside the central processing unit, and locally storing the current temperature state information.
Optionally, the selecting whether to cancel the abnormal state alarm according to the time difference and the cpu temperature state information detected by the temperature sensor built in the voltage regulator includes:
if the time difference is larger than the preset time difference and the temperature state information of the central processing unit detected by the temperature sensor arranged in the voltage regulator is the normal temperature state information, recording a log generated in the normal temperature state through the substrate management controller and removing the abnormal state alarm.
In a second aspect, the present application discloses a central processing unit status monitoring device, including:
the information reading module is used for reading the current state information of the central processing unit recorded in a preset register in the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information;
the information judging module is used for judging whether the current state information is consistent with the last state information of the central processing unit stored locally;
and the state monitoring module is used for carrying out corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule if the current state information is inconsistent with the last state information.
In a third aspect, the present application discloses an electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the central processor state monitoring method as disclosed in the foregoing.
In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements the steps of the central processor state monitoring method as disclosed in the foregoing.
Therefore, the application discloses a method for monitoring the state of a central processing unit, which is applied to a substrate management controller and comprises the following steps: reading the current state information of the central processing unit recorded in a preset register inside the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information; judging whether the current state information is consistent with the last state information of the central processing unit stored locally; and if the current state information is inconsistent with the last state information, performing corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule. Therefore, the method and the device have the advantages that the current state information of the central processing unit can be directly acquired through the special single-line bus which is in communication connection with the central processing unit in advance, the accurate current state information of the central processing unit can be acquired, the good service performance and the prolonged service life of the central processing unit can be maintained, meanwhile, the problems of server downtime and the like caused by high temperature of the central processing unit are avoided as much as possible, and the method and the device have objective economic benefits. And then corresponding abnormal state alarm or abnormal state alarm releasing is carried out according to a preset abnormal state alarm rule, so that the condition of mistaken alarm releasing can be effectively prevented.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method for monitoring the status of a central processing unit according to the present disclosure;
FIG. 2 is a flow chart of a particular CPU status monitoring method disclosed herein;
FIG. 3 is a flow chart of a specific CPU status monitoring method disclosed herein;
fig. 4 is a schematic structural diagram of a cpu status monitoring apparatus disclosed in the present application;
fig. 5 is a block diagram of an electronic device disclosed in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Currently, in the EGS platform, because the CPU Prochot pin is designed as a unidirectional input pin, the CPLD can only obtain the ambient temperature near the CPU detected by the VR chip, and then determine whether to trigger the CPU Prochot signal according to the ambient temperature near the CPU. Therefore, the VR chip detects that the ambient temperature near the CPU is delayed from the core temperature of the CPU, so that the BMC cannot acquire the CPU Prochot state through the CPLD in time and trigger an alarm in time.
Therefore, the CPU state monitoring scheme can realize accurate CPU state monitoring and can accurately realize abnormal state warning, and operation and maintenance personnel can adjust a heat dissipation strategy or troubleshoot faults in time.
Referring to fig. 1, an embodiment of the present invention discloses a method for monitoring a state of a central processing unit, which is applied to a baseboard management controller, and specifically includes:
step S11: reading the current state information of the central processing unit recorded in a preset register inside the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information.
In this embodiment, the current temperature state information of the central processing unit recorded in the preset register inside the central processing unit is read through a platform environment type control interface that establishes communication connection with the central processing unit in advance, and the current temperature state information is locally stored. It can be understood that the BMC periodically reads the value of bit0 in the CPU Package Thermal Status register through the PECI, where bit0 in the register is a bit representing the Prochot state of the CPU, where 1 represents in the Prochot state, and 0 represents in the normal state, and stores the temperature state information to the local.
Step S12: and judging whether the current state information is consistent with the last state information of the central processing unit stored locally.
In this embodiment, it is determined whether the current state information is consistent with the last state information of the cpu stored locally, and if the current temperature state information is consistent with the last temperature state information of the cpu stored locally, the corresponding abnormal state alarm is not performed or the abnormal state alarm is released, and the platform environment type control interface that establishes communication connection with the cpu in advance is executed, the current temperature state information of the cpu recorded in a preset register inside the cpu is read, and the current temperature state information is stored locally. It will be appreciated that the comparison is made from the read current temperature state information to the last temperature state information stored locally, for example: detecting the value of bit0 in the current CPU Package Thermal Status register, then taking out the last temperature state information from the local for comparison, comparing the value of bit0 read this time with the value of bit0 read last time, if the value of bit0 read currently is 0, the value of bit0 read last time is 0, the comparison result is consistent, the temperature in the CPU at two times belongs to the normal state, and the BMC does not need to report; if the value of the bit0 bit read currently is 1 and the value of the bit0 bit read last time is 1, the comparison results are consistent, the temperature in the CPU at two times belongs to an abnormal state, and the CPU is still in a temperature alarm state at the moment, so that the temperature in the CPU is not changed all the time, and the BMC does not need to report.
Step S13: and if the current state information is inconsistent with the last state information, performing corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule.
In this embodiment, in an implementation manner, if the current temperature state information is inconsistent with the last temperature state information of the cpu stored locally, and the current temperature state information is temperature abnormal state information, a temperature abnormal state reporting instruction is triggered, and an alarm log generated in a temperature abnormal state is recorded by the substrate management controller, and a corresponding temperature abnormal state alarm is performed. It can be understood that when the value of bit0 in the current CPU Package Thermal Status register is detected to be 1, and the value of bit0 stored last time is 0, it indicates that the temperature state of the CPU detected last time is a normal state, and the temperature state of the CPU detected currently is an abnormal state, the temperature state information of the previous time and the temperature state information of the next time are inconsistent, and the current temperature state is an abnormal state, which indicates that after the Prochot trigger/release of the CPU for a complete cycle, a temperature state abnormal reporting instruction is triggered, and the BMC needs to record an alarm log and perform corresponding abnormal state alarm.
In another embodiment, if the current temperature state information is inconsistent with the last temperature state information of the central processing unit stored locally, and the current temperature state information is temperature normal state information, the abnormal state alarm is released according to an abnormal state alarm rule. It can be understood that, when the value of bit0 in the current CPU Package Thermal Status register is detected to be 0, and the value of bit0 stored last time is 1, it indicates that the CPU temperature state detected last time is an abnormal state, and the CPU temperature state detected currently is a normal state, the temperature state information of the previous time and the temperature state information of the next time are inconsistent, and the current temperature state is a normal state, at this time, because bit0 of the CPU Package Thermal Status register is in an oscillation state, an abnormal state alarm cannot be immediately released, and whether a state alarm is released needs to be further determined based on an abnormal state alarm rule.
Further, the present application may also monitor a CPU ERROR state, where the CPU ERROR state may specifically include but is not limited to: an IERR (internal Error), Processor Disabled (Processor corruption), UCE (non-Processor Machine Check Exception), CE (Processor recoverable Error), and the like.
Therefore, the application discloses a method for monitoring the state of a central processing unit, which is applied to a substrate management controller and comprises the following steps: reading the current state information of the central processing unit recorded in a preset register inside the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information; judging whether the current state information is consistent with the last state information of the central processing unit stored locally; and if the current state information is inconsistent with the last state information, performing corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule. Therefore, the method and the device have the advantages that the current state information of the central processing unit can be directly acquired through the special single-line bus which is in communication connection with the central processing unit in advance, the accurate current state information of the central processing unit can be acquired, the good service performance and the prolonged service life of the central processing unit can be maintained, meanwhile, the problems of downtime of a server and the like caused by high temperature of the central processing unit are avoided as much as possible, and the method and the device have objective economic benefits. And then corresponding abnormal state alarm or abnormal state alarm release is carried out according to a preset abnormal state alarm rule, so that the condition of mistakenly releasing the alarm can be effectively prevented.
Referring to fig. 2 and fig. 3, the embodiment of the present invention discloses a specific method for monitoring the state of a central processing unit, and compared with the previous embodiment, the present embodiment further describes and optimizes the technical solution. Specifically, the method comprises the following steps:
step S21: reading the current temperature state information of the central processing unit recorded in a preset register inside the central processing unit through a platform environment type control interface which is in communication connection with the central processing unit in advance, and locally storing the current temperature state information.
In this embodiment, the PECI is used to read the current temperature state information of the central processing unit recorded in the preset register inside the central processing unit, and locally store the current temperature state information, it can be understood that the PECI is used to directly read the current temperature state information of the central processing unit recorded in the preset register inside the central processing unit, instead of using the CPLD to detect the ambient temperature near the central processing unit, and the VR chip is used to detect the ambient temperature near the central processing unit, so that the BMC can timely and accurately obtain the Prochot state of the central processing unit through the PECI.
Step S22: and judging whether the current temperature state information is consistent with the last temperature state information of the central processing unit stored locally.
Step S23: detecting and recording the system time of the server when the central processing unit is in an abnormal temperature state each time; and if the current temperature state information is inconsistent with the last temperature state information of the central processing unit stored locally and the current temperature state information is temperature normal state information, calculating the time difference between the system time of the current server and the system time of the server when the last temperature of the central processing unit is abnormal state information.
In this embodiment, the system time of the server when the central processing unit is in the abnormal temperature state each time is detected and recorded, for example: when detecting that the value of bit0 of the CPU Package Thermal Status register is 1, recording and storing the system time of the server; if the current temperature state information is inconsistent with the last temperature state information of the central processing unit stored locally, such as: when not _ value is not equal to last _ value, and it is detected that the value of bit0 of the CPU Package Thermal Status register is 0, that is, the current temperature state information is temperature normal state information, at this time, since the CPU core temperature just rises to the prochot threshold value, the bit0 is in a vibration state, that is, the value repeatedly jumps between 0 and 1, in order to prevent false cancellation of an abnormal state alarm, a time difference between the system time of the current server and the system time of the server when the last temperature of the central processing unit is abnormal state information needs to be calculated, and then whether the abnormal state alarm is cancelled is determined.
Step S24: and selecting whether to remove the abnormal state alarm or not according to the time difference and the temperature state information of the central processing unit detected by a temperature sensor arranged in the voltage regulator.
In this embodiment, a relationship between a time difference between a current system time of the server and a recorded last time when bit0 of the CPU Package Thermal Status register is 1 and a preset time difference is compared, and in one embodiment, when the time difference is smaller than the preset time difference, an operation of removing an abnormal state alarm is not performed, and a step of executing the platform environment type control interface which establishes communication connection with the central processing unit in advance, reading current temperature state information of the central processing unit recorded in the preset register inside the central processing unit, and locally storing the current temperature state information is performed. It can be understood that the preset time difference is 20s, if the time difference is 13s, the time difference is smaller than the preset time difference time _ now-time _ last < 20s, which indicates that bit0 of the CPU Package Thermal Status register is in an oscillation state at this time, the abnormal state alarm cannot be released, and the method continues to jump to execute the platform environment type control interface which establishes communication connection with the central processing unit in advance, reads the current temperature state information of the central processing unit recorded in the preset register inside the central processing unit, and locally saves the current temperature state information.
In another embodiment, if the time difference is greater than a preset time difference and the cpu temperature status information detected by a temperature sensor built in the voltage regulator is normal temperature status information, the bmc records a log generated in a normal temperature status and releases an abnormal status alarm. It can be understood that the preset time difference is 20s, if the time difference is 26s, time _ now-time _ last is greater than 20s and is greater than the preset time difference, at this time, auxiliary judgment needs to be performed according to the ambient temperature state near the central processing unit detected by a temperature sensor built in a VR chip, and if the VR chip detects that the ambient temperature near the central processing unit is also in a normal temperature state, a log generated in the normal temperature state is recorded by a BMC and an abnormal state alarm is released; and if the VR chip detects that the ambient temperature near the central processing unit is in an abnormal temperature state, the abnormal state alarm is not removed.
Therefore, the embodiment of the application can read the bit0 value of the CPU Package Thermal Status register through the PECI, and can accurately monitor the internal Prochot state of the CPU in real time. The method solves the problem that the BMC of the EGS platform cannot monitor the high-temperature alarm of the core temperature of the CPU of the server, is quicker than the original method of reading the transmission signal of the Prochot pin through the CPLD in a transparent way, can report the abnormal state alarm more timely to the administrator by the abnormal state alarm or timely and accurately removing the abnormal state alarm, is beneficial to operation and maintenance personnel to adjust the heat dissipation strategy or troubleshoot the fault in time, is beneficial to maintaining the good service performance of the CPU and prolonging the service life, avoids the problems of shutdown of the server and the like caused by the high temperature of the CPU as much as possible, and has objective economic benefit.
Referring to fig. 4, an embodiment of the present invention discloses a central processing unit status monitoring apparatus, including:
the information reading module 11 is configured to read current state information of the central processing unit recorded in a preset register inside the central processing unit through a dedicated single-wire bus that is in communication connection with the central processing unit in advance, and locally store the current state information;
the information judging module 12 is configured to judge whether the current state information is consistent with last state information of the central processing unit stored locally;
and the state monitoring module 13 is configured to perform corresponding abnormal state alarm or remove the abnormal state alarm according to a preset abnormal state alarm rule if the current state information is inconsistent with the previous state information.
Therefore, the application discloses a method for monitoring the state of a central processing unit, which is applied to a substrate management controller and comprises the following steps: reading the current state information of the central processing unit recorded in a preset register inside the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information; judging whether the current state information is consistent with the last state information of the central processing unit stored locally; and if the current state information is inconsistent with the last state information, performing corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule. Therefore, the method and the device have the advantages that the current state information of the central processing unit can be directly acquired through the special single-line bus which is in communication connection with the central processing unit in advance, the accurate current state information of the central processing unit can be acquired, the good service performance and the prolonged service life of the central processing unit can be maintained, meanwhile, the problems of server downtime and the like caused by high temperature of the central processing unit are avoided as much as possible, and the method and the device have objective economic benefits. And then corresponding abnormal state alarm or abnormal state alarm release is carried out according to a preset abnormal state alarm rule, so that the condition of mistakenly releasing the alarm can be effectively prevented.
Further, an electronic device is disclosed in the embodiments of the present application, and fig. 5 is a block diagram of the electronic device 20 according to an exemplary embodiment, which should not be construed as limiting the scope of the application.
Fig. 5 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present disclosure. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. The memory 22 is used for storing a computer program, and the computer program is loaded and executed by the processor 21 to implement the relevant steps in the central processing unit status monitoring method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.
In this embodiment, the power supply 23 is configured to provide a working voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and a communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein; the input/output interface 25 is configured to acquire external input data or output data to the outside, and a specific interface type thereof may be selected according to specific application requirements, which is not specifically limited herein.
In addition, the storage 22 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., and the resources stored thereon may include an operating system 221, a computer program 222, etc., and the storage manner may be a transient storage or a permanent storage.
The operating system 221 is used for managing and controlling each hardware device on the electronic device 20 and the computer program 222, and may be Windows Server, Netware, Unix, Linux, or the like. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the central processor state monitoring method performed by the electronic device 20 disclosed in any of the foregoing embodiments.
Further, the present application also discloses a computer-readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the central processor state monitoring method disclosed above. For the specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, which are not described herein again.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above method, apparatus, device and storage medium for monitoring the state of a central processing unit according to the present invention are described in detail, and a specific example is applied in the description to explain the principle and implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (10)
1. A CPU state monitoring method is applied to a substrate management controller and comprises the following steps:
reading the current state information of the central processing unit recorded in a preset register inside the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information;
judging whether the current state information is consistent with the last state information of the central processing unit stored locally;
and if the current state information is inconsistent with the last state information, performing corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule.
2. The method for monitoring the state of a central processing unit according to claim 1, wherein the reading the current state information of the central processing unit recorded in a preset register inside the central processing unit and locally saving the current state information via a dedicated single-wire bus that establishes a communication connection with the central processing unit in advance comprises:
reading the current temperature state information of the central processing unit recorded in a preset register inside the central processing unit through a platform environment type control interface which is in communication connection with the central processing unit in advance, and locally storing the current temperature state information.
3. The method according to claim 2, wherein said determining whether the current state information is consistent with the last state information of the cpu stored locally comprises:
and if the current temperature state information is consistent with the last temperature state information of the central processing unit stored locally, not performing corresponding abnormal state alarm or removing the abnormal state alarm, and skipping to the execution of the platform environment type control interface which is in communication connection with the central processing unit in advance, reading the current temperature state information of the central processing unit recorded in a preset register in the central processing unit, and locally storing the current temperature state information.
4. The method for monitoring the state of the central processing unit according to claim 2, wherein if the current state information is inconsistent with the last state information of the central processing unit stored locally, performing a corresponding abnormal state alarm according to a preset abnormal state alarm rule, comprising:
if the current temperature state information is inconsistent with the last temperature state information of the central processing unit stored locally and the current temperature state information is temperature abnormal state information, triggering a temperature abnormal state reporting instruction, recording an alarm log generated by the temperature abnormal state through a substrate management controller, and carrying out corresponding temperature abnormal state alarm.
5. The method for monitoring the state of the central processing unit according to claim 2, wherein if the current state information is inconsistent with the last state information of the central processing unit stored locally, performing a corresponding abnormal state alarm removal according to a preset abnormal state alarm rule, includes:
detecting and recording the system time of the server when the central processing unit is in an abnormal temperature state each time;
if the current temperature state information is inconsistent with the last temperature state information of the central processing unit stored locally and the current temperature state information is temperature normal state information, calculating the time difference between the system time of the current server and the system time of the server when the last temperature of the central processing unit is abnormal state information;
and selecting whether to remove the abnormal state alarm or not according to the time difference and the temperature state information of the central processing unit detected by a temperature sensor arranged in the voltage regulator.
6. The method for monitoring the state of the central processing unit according to claim 5, wherein the selecting whether to cancel the abnormal state alarm according to the time difference and the temperature state information of the central processing unit detected by a temperature sensor built in the voltage regulator comprises:
and when the time difference is smaller than the preset time difference, not performing the operation of removing the abnormal state alarm, and skipping to the step of executing the platform environment type control interface which is in communication connection with the central processing unit in advance, reading the current temperature state information of the central processing unit recorded in a preset register inside the central processing unit, and locally storing the current temperature state information.
7. The method for monitoring the state of the central processing unit according to claim 5, wherein the selecting whether to cancel the abnormal state alarm according to the time difference and the temperature state information of the central processing unit detected by a temperature sensor built in the voltage regulator comprises:
if the time difference is larger than the preset time difference and the temperature state information of the central processing unit detected by the temperature sensor arranged in the voltage regulator is the normal temperature state information, recording a log generated in the normal temperature state through the substrate management controller and removing the abnormal state alarm.
8. A central processing unit state monitoring device, comprising:
the information reading module is used for reading the current state information of the central processing unit recorded in a preset register in the central processing unit through a special single-wire bus which is in communication connection with the central processing unit in advance, and locally storing the current state information;
the information judging module is used for judging whether the current state information is consistent with the last state information of the central processing unit stored locally;
and the state monitoring module is used for carrying out corresponding abnormal state alarm or removing the abnormal state alarm according to a preset abnormal state alarm rule if the current state information is inconsistent with the last state information.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to carry out the steps of the central processor condition monitoring method according to any one of claims 1 to 7.
10. A computer-readable storage medium for storing a computer program; wherein the computer program realizes the steps of the central processor condition monitoring method according to any one of claims 1 to 7 when executed by a processor.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210302352.7A CN114676019A (en) | 2022-03-25 | 2022-03-25 | Method, device, equipment and storage medium for monitoring state of central processing unit |
PCT/CN2023/083130 WO2023179684A1 (en) | 2022-03-25 | 2023-03-22 | Method and apparatus for monitoring state of central processing unit, and device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210302352.7A CN114676019A (en) | 2022-03-25 | 2022-03-25 | Method, device, equipment and storage medium for monitoring state of central processing unit |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114676019A true CN114676019A (en) | 2022-06-28 |
Family
ID=82076556
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210302352.7A Pending CN114676019A (en) | 2022-03-25 | 2022-03-25 | Method, device, equipment and storage medium for monitoring state of central processing unit |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114676019A (en) |
WO (1) | WO2023179684A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023179684A1 (en) * | 2022-03-25 | 2023-09-28 | 苏州浪潮智能科技有限公司 | Method and apparatus for monitoring state of central processing unit, and device and storage medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS60195649A (en) * | 1984-03-16 | 1985-10-04 | Nec Corp | Error reporting system of microprogram-controlled type data processor |
CN108089964A (en) * | 2017-12-07 | 2018-05-29 | 郑州云海信息技术有限公司 | A kind of device and method by BMC monitoring server CPLD states |
CN108268360A (en) * | 2018-01-19 | 2018-07-10 | 郑州云海信息技术有限公司 | A kind of BMC obtains method, system, device and the storage medium of memory temperature |
CN109656767A (en) * | 2018-12-21 | 2019-04-19 | 广东浪潮大数据研究有限公司 | A kind of acquisition methods, system and the associated component of CPLD status information |
CN111767184A (en) * | 2020-09-01 | 2020-10-13 | 苏州浪潮智能科技有限公司 | Fault diagnosis method and device, electronic equipment and storage medium |
CN114676019A (en) * | 2022-03-25 | 2022-06-28 | 苏州浪潮智能科技有限公司 | Method, device, equipment and storage medium for monitoring state of central processing unit |
-
2022
- 2022-03-25 CN CN202210302352.7A patent/CN114676019A/en active Pending
-
2023
- 2023-03-22 WO PCT/CN2023/083130 patent/WO2023179684A1/en unknown
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023179684A1 (en) * | 2022-03-25 | 2023-09-28 | 苏州浪潮智能科技有限公司 | Method and apparatus for monitoring state of central processing unit, and device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2023179684A1 (en) | 2023-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110661659B (en) | Alarm method, device and system and electronic equipment | |
JP4573179B2 (en) | Performance load abnormality detection system, performance load abnormality detection method, and program | |
US20050188263A1 (en) | Detecting and correcting a failure sequence in a computer system before a failure occurs | |
CN114328102B (en) | Equipment state monitoring method, equipment state monitoring device, equipment and computer readable storage medium | |
CN112380089A (en) | Data center monitoring and early warning method and system | |
CN111309562A (en) | Server failure prediction method, device, equipment and storage medium | |
CN114676019A (en) | Method, device, equipment and storage medium for monitoring state of central processing unit | |
CN111625386A (en) | Monitoring method and device for power-on overtime of system equipment | |
CN111901172B (en) | Application service monitoring method and system based on cloud computing environment | |
CN111478792B (en) | Cutover information processing method, system and device | |
CN111488050B (en) | Power supply monitoring method, system and server | |
US20230359514A1 (en) | Operation-based event suppression | |
JP5240709B2 (en) | Computer system, method and computer program for evaluating symptom | |
CN115102838B (en) | Emergency processing method and device for server downtime risk and electronic equipment | |
CN115687026A (en) | Multi-node server fault early warning method, device, equipment and medium | |
CN113886122B (en) | System operation exception handling method, device, equipment and storage medium | |
CN115174350A (en) | Operation and maintenance warning method, device, equipment and medium | |
CN111694715A (en) | Abnormity warning method, device, equipment and machine readable storage medium | |
CN113708986A (en) | Server monitoring apparatus, method and computer-readable storage medium | |
CN111309532A (en) | PCIE equipment abnormity detection method, system, electronic equipment and storage medium | |
JPH06324916A (en) | Fault information logging system | |
CN113381895B (en) | Network fault detection method and device | |
CN116225812B (en) | Baseboard management controller system operation method, device, equipment and storage medium | |
CN109491872B (en) | Memory supervision method and device and computer readable storage medium | |
CN110795263B (en) | Hard disk link protection method and related device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |