CN117453507A - Server hardware monitoring system and safety detection method thereof - Google Patents

Server hardware monitoring system and safety detection method thereof Download PDF

Info

Publication number
CN117453507A
CN117453507A CN202311454121.9A CN202311454121A CN117453507A CN 117453507 A CN117453507 A CN 117453507A CN 202311454121 A CN202311454121 A CN 202311454121A CN 117453507 A CN117453507 A CN 117453507A
Authority
CN
China
Prior art keywords
monitoring parameter
security
monitoring
server hardware
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202311454121.9A
Other languages
Chinese (zh)
Inventor
钟阳
曾泓瀚
邵伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuxi Semiconductor Shenzhen Co ltd
Original Assignee
Fuxi Semiconductor Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuxi Semiconductor Shenzhen Co ltd filed Critical Fuxi Semiconductor Shenzhen Co ltd
Priority to CN202311454121.9A priority Critical patent/CN117453507A/en
Publication of CN117453507A publication Critical patent/CN117453507A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Alarm Systems (AREA)

Abstract

The invention provides a server hardware monitoring system and a safety detection method thereof, wherein the safety detection class object of server hardware is constructed, the safety detection class object comprises a safety evaluation index for reflecting the safety state of each monitoring parameter in the server hardware, the safety detection class object is instantiated into a safety detection example object, the monitoring parameters of the server hardware are acquired in a preset data acquisition period, the safety evaluation index corresponding to each monitoring parameter in the safety detection example object is updated based on the monitoring parameters of the server hardware, the safety evaluation index in the safety detection example object is traversed in the preset safety detection period, the safety of the server hardware is evaluated according to the safety evaluation index in the safety detection example object, and the hardware fault of a server can be safely and efficiently identified.

Description

Server hardware monitoring system and safety detection method thereof
Technical Field
The invention relates to the technical field of computers, in particular to a server hardware monitoring system and a safety detection method thereof.
Background
In recent years, with the continuous breakthrough and development of network technology, various networks and information computing technologies such as big data, cloud computing, artificial intelligence and the like are layered endlessly, and various emerging products and services derived from the networks and information computing technologies also lead to great changes in life entertainment and working modes of people, and the foundation of the products and services is a data center formed by various types of servers, so that the data center is one of key infrastructures for internet operation. The number of servers in a data center is very large, and in some large-scale data centers, the number of servers reaches hundreds of thousands, so as to meet the operation and storage requirements of internet services. In such a huge number of servers, it is not practical to manually monitor the hardware status of the servers, so a BMC (Board Management Controller, baseboard management controller) system is used to remotely monitor the hardware status of the servers. Existing BMCs typically employ a threshold decision method for identifying hardware faults, i.e., a parameter threshold is set for each monitored parameter, and a hardware fault is considered to occur when a monitored hardware parameter, such as temperature, voltage, or current, is above or below the set parameter threshold. However, this hardware fault recognition approach can only be perceived afterwards, which may already lead to a large and irreparable loss when the fault is severe.
To solve this problem, a method of predicting a hardware failure of a server by training a hardware failure prediction model based on a neural network has been studied, for example, CN111124852a proposes a failure prediction method based on a BMC health management module, which collects data through an IPMI protocol; according to the use standard and parameters of each hardware resource in the equipment, the received data are analyzed, the abnormal condition is determined, and the prediction result is calculated by combining the historical data and model parameter selection through the BP neural network prediction method. For example, CN111143173a proposes a method for monitoring server failure based on a neural network, which uses a BMC to obtain server information, analyzes and predicts whether the server will fail through the neural network, and feeds back the failure to a web page for displaying, and monitors the state of the server at the same time, so as to improve the stability of the server. However, on one hand, the training of the hardware fault prediction model involves collection and arrangement of a large amount of data, model selection, training and testing parameter configuration, training and testing of the model, parameter optimization and the like, the process is tedious and the period is long, and on the other hand, the hardware fault prediction model is used as an artificial intelligent model, the operation of the hardware fault prediction model usually needs strong calculation power and support of a storage space, the existing BMC system does not have the processing and storage capacity of the artificial intelligent model for fully supporting operation, and the processing efficiency problem and the safety problem are introduced by utilizing a cloud computing mode.
Disclosure of Invention
Based on the above problems, the invention provides a server hardware monitoring system and a safety detection method thereof, which can safely and efficiently identify hardware faults of a server.
In view of this, a first aspect of the present invention proposes a server hardware monitoring system comprising a business service subsystem for running a business service program and a baseboard management subsystem for monitoring the target server, the baseboard management subsystem being configured to:
constructing a security detection class object of the server hardware, wherein the security detection class object comprises a security evaluation index for reflecting the security state of each monitoring parameter in the server hardware;
instantiating the security detection class object as a security detection instance object;
acquiring monitoring parameters of the server hardware in a preset data acquisition period;
updating a security evaluation index corresponding to each monitoring parameter in the security detection instance object based on the monitoring parameters of the server hardware;
traversing the safety evaluation index in the safety detection example object in a preset safety detection period;
and evaluating the security of the server hardware according to the security evaluation index in the security detection instance object.
The second aspect of the present invention proposes a security detection method for a server hardware monitoring system, including:
constructing a security detection class object of the server hardware, wherein the security detection class object comprises a security evaluation index for reflecting the security state of each monitoring parameter in the server hardware;
instantiating the security detection class object as a security detection instance object;
acquiring monitoring parameters of the server hardware in a preset data acquisition period;
updating a security evaluation index corresponding to each monitoring parameter in the security detection instance object based on the monitoring parameters of the server hardware;
traversing the safety evaluation index in the safety detection example object in a preset safety detection period;
and evaluating the security of the server hardware according to the security evaluation index in the security detection instance object.
Preferably, before the step of acquiring the monitoring parameters of the server hardware in a preset data acquisition period, the method further includes:
constructing a monitoring parameter class object of the server hardware;
instantiating the monitoring parameter class object into a monitoring parameter real-time object and a monitoring parameter historical object, wherein the monitoring parameter real-time object is used for storing the value of each monitoring parameter acquired last time in each data acquisition period, and the monitoring parameter historical object is used for storing the historical value of each monitoring parameter in each safety detection period;
The step of acquiring the monitoring parameters of the server hardware in a preset data acquisition period specifically comprises the following steps:
and writing the monitoring parameters of the server hardware into the monitoring parameter real-time object.
Preferably, after the step of acquiring the monitoring parameters of the server hardware in a preset data acquisition period, the method further includes:
instantiating the monitoring parameter class object into a monitoring parameter average value object, wherein the monitoring parameter average value object is used for storing the average value of each monitoring parameter;
in each data acquisition period, carrying out average calculation on the numerical value of each monitoring parameter stored in the monitoring parameter real-time object and the current average value of each monitoring parameter stored in the monitoring parameter average value object to obtain a new average value;
and writing the calculated new average value into the monitoring parameter average value object.
Preferably, the step of calculating the average value of each monitoring parameter stored in the real-time object of the monitoring parameter and the current average value of each monitoring parameter stored in the average object of the monitoring parameter to obtain a new average value in each data acquisition period specifically includes:
sequentially reading each attribute value pn in the monitoring parameter real-time object i (t) each attribute value in the monitored parameter mean objectWherein i is E [1, n p ],n p The number t is the current data acquisition cycle number for the monitoring parameter number in the monitoring parameter class object;
judgingWhether or not to establish;
when (when)When in use, let->
When (when)When in use, let->
Preferably, the step of updating the security evaluation index corresponding to each monitoring parameter in the security detection object based on the monitoring parameter of the server hardware specifically includes:
sequentially reading each attribute value pn in the monitoring parameter real-time object i (t) and said monitoringEach attribute value ph in the parameter history object i (t);
Calculating the dynamic step length of each monitoring parameter according to the average value of each monitoring parameter in the monitoring parameter average value object:
wherein σ is a dynamic step size division coefficient of each monitoring parameter and σ >1;
updating the safety evaluation index of each monitoring parameter according to the dynamic step length:
preferably, after the step of updating the security evaluation index corresponding to each monitoring parameter in the security detection object based on the monitoring parameter of the server hardware, each attribute value in the monitoring parameter real-time object is written into the attribute variable corresponding to the monitoring parameter history object.
Preferably, after the step of updating the security evaluation index corresponding to each monitoring parameter in the security detection object based on the monitoring parameter of the server hardware, each attribute value in the monitoring parameter mean object is written into the attribute variable corresponding to the monitoring parameter history object.
Preferably, the security detection class object further includes a counting variable AI for recording a number of periods in which the security evaluation index is continuously positive or continuously negative i (t) after the step of updating the security evaluation index corresponding to each of the monitored parameters in the security detection object based on the monitored parameters of the server hardware, further comprising:
calculation I i (t) and I i Product of (t-1):
MI i (t)=I i (t)×I i (t-1);
when MI i When (t) is more than or equal to 0, AI is made i (t)=AI i (t-1)+1;
When MI i (t)<0 time, let AI i (t)=0;
Preferably, the step of evaluating the security of the server hardware according to the security evaluation index in the security detection instance object specifically includes:
according to the counting variable AI i (t) calculating a security score corresponding to the monitored parameter:
wherein A is max Is a pre-configured maximum accumulation period number.
The invention provides a server hardware monitoring system and a safety detection method thereof, wherein the safety detection class object of server hardware is constructed, the safety detection class object comprises a safety evaluation index for reflecting the safety state of each monitoring parameter in the server hardware, the safety detection class object is instantiated into a safety detection example object, the monitoring parameters of the server hardware are acquired in a preset data acquisition period, the safety evaluation index corresponding to each monitoring parameter in the safety detection example object is updated based on the monitoring parameters of the server hardware, the safety evaluation index in the safety detection example object is traversed in the preset safety detection period, the safety of the server hardware is evaluated according to the safety evaluation index in the safety detection example object, and the hardware fault of a server can be safely and efficiently identified.
Drawings
Fig. 1 is a flowchart of a security detection method of a server hardware monitoring system according to an embodiment of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
In the description of the present invention, the term "plurality" means two or more, unless explicitly defined otherwise, the orientation or positional relationship indicated by the terms "upper", "lower", etc. are based on the orientation or positional relationship shown in the drawings, merely for convenience of description of the present invention and to simplify the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. The terms "coupled," "mounted," "secured," and the like are to be construed broadly, and may be fixedly coupled, detachably coupled, or integrally connected, for example; can be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", etc. may explicitly or implicitly include one or more such feature. In the description of the present invention, unless otherwise indicated, the meaning of "a plurality" is two or more.
In the description of this specification, the terms "one embodiment," "some implementations," "particular embodiments," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
A server hardware monitoring system and a security detection method thereof according to some embodiments of the present invention are described below with reference to the accompanying drawings.
A first aspect of the present invention proposes a server hardware monitoring system comprising a business service subsystem for running a business service program and a baseboard management subsystem for monitoring the target server. As shown in fig. 1, the baseboard management subsystem is configured to:
constructing a security detection class object of the server hardware, wherein the security detection class object comprises a security evaluation index for reflecting the security state of each monitoring parameter in the server hardware;
Instantiating the security detection class object as a security detection instance object;
acquiring monitoring parameters of the server hardware in a preset data acquisition period;
updating a security evaluation index corresponding to each monitoring parameter in the security detection instance object based on the monitoring parameters of the server hardware;
traversing the safety evaluation index in the safety detection example object in a preset safety detection period;
and evaluating the security of the server hardware according to the security evaluation index in the security detection instance object.
Specifically, the service subsystem is an in-band system, the substrate management subsystem is an out-of-band system, the service subsystem comprises a first processing unit, a first storage unit, a first power supply unit and a first communication unit, the substrate management subsystem comprises a second processing unit, a second storage unit, a second power supply unit, a second communication unit and a sensing unit, the sensing unit is arranged on each device of the service subsystem to acquire state data of each device of the service subsystem, and the security detection method of the server hardware monitoring system is realized by running a computer program stored in the second storage unit by the second processing unit of the substrate management subsystem. The baseboard management subsystem also comprises a cache unit which is a volatile storage unit, namely the cache unit loses the data stored therein after power is cut off. In the technical solution of the foregoing embodiment, the step of instantiating the security detection class object into a security detection instance object is specifically instantiating the security detection class object to generate the security detection instance object in the cache unit. The step of instantiating the security detection class object as a security detection instance object further comprises initializing the security detection instance object, i.e. initializing each attribute value of the security detection instance object to be assigned 0.
In the technical scheme of the invention, the construction of the security detection class object can have various embodiments, firstly, the security detection class object is constructed by taking hardware as a unit, namely, a security detection class object is constructed for each hardware of the server, and a security evaluation index corresponding to each monitoring parameter of each hardware is used as an attribute value of the security detection class object, for example, the security detection class object of one in-band hard disk can contain the security evaluation index corresponding to the monitoring parameters such as temperature, voltage, current and the like; and secondly, constructing the safety detection class object by taking the monitoring parameter type as a unit, namely constructing a safety detection class object aiming at each monitoring parameter type of the server, wherein the same safety detection class object relates to the same type of monitoring parameters of a plurality of hardware in the server, and a safety evaluation index corresponding to each monitoring parameter of the type is used as an attribute value of the safety detection class object, for example, the temperature of each in-band processor, the temperature of each in-band memory, the temperature of each in-band hard disk and the like can be contained in the temperature safety detection class object. Of course, in another embodiment, a security detection class object containing all the monitoring parameters of the server may be constructed, but the security detection class object constructed in this embodiment has too many security evaluation indexes to be easily distinguished and used.
To maintain the timeliness of the server hardware monitoring system to server hardware status monitoring, the data acquisition period is typically configured to a relatively small value, such as a few seconds or even less than 1 second. The security detection period is greater than or equal to the data acquisition period, and preferably, in order to reduce the processing burden on the server hardware monitoring system, the security detection period may be configured to a relatively large value, such as a few minutes or even tens of minutes.
Preferably, before the step of acquiring the monitoring parameters of the server hardware at a preset data acquisition period, the baseboard management subsystem is configured to:
constructing a monitoring parameter class object of the server hardware;
instantiating the monitoring parameter class object into a monitoring parameter real-time object and a monitoring parameter historical object, wherein the monitoring parameter real-time object is used for storing the value of each monitoring parameter acquired last time in each data acquisition period, and the monitoring parameter historical object is used for storing the historical value of each monitoring parameter in each safety detection period;
the step of acquiring the monitoring parameters of the server hardware in a preset data acquisition period specifically comprises the following steps:
And writing the monitoring parameters of the server hardware into the monitoring parameter real-time object.
In the foregoing embodiment, in the step of writing the monitoring parameter of the server hardware into the monitoring parameter real-time object, the baseboard management subsystem is configured to:
clearing data in the monitoring parameter real-time object;
and reassigning the real-time object of the monitoring parameters according to the collected monitoring parameters of the server hardware.
It should be noted that the real-time object of the monitoring parameter is a variable multiplexed at high frequency, the monitoring parameter acquired in each data acquisition period needs to be read and written by the real-time object of the monitoring parameter after being instantiated, and each data acquisition period instantiates a real-time object of the monitoring parameter to read and write new data so as to occupy a large amount of cache.
Preferably, after the step of acquiring the monitoring parameters of the server hardware at a preset data acquisition period, the baseboard management subsystem is configured to:
instantiating the monitoring parameter class object into a monitoring parameter average value object, wherein the monitoring parameter average value object is used for storing the average value of each monitoring parameter;
in each data acquisition period, carrying out average calculation on the numerical value of each monitoring parameter stored in the monitoring parameter real-time object and the current average value of each monitoring parameter stored in the monitoring parameter average value object to obtain a new average value;
and writing the calculated new average value into the monitoring parameter average value object.
Specifically, the monitoring parameter average object is the same as the monitoring parameter real-time object and the monitoring parameter history object, and is obtained by instantiation of the monitoring parameter class object, that is, the three object instances have the same object structure, the attribute values of the three object instances have one-to-one correspondence, and each attribute value corresponds to one monitoring parameter of the server hardware.
Preferably, in the step of averaging the value of each monitoring parameter stored in the real-time object of the monitoring parameter with the current average value of each monitoring parameter stored in the average object of the monitoring parameter to obtain a new average value in each data acquisition period, the baseboard management subsystem is configured to:
Sequentially reading each attribute value pn in the monitoring parameter real-time object i (t) each attribute value in the monitored parameter mean objectWherein i is E [1, n p ],n p To be the instituteThe number of the monitoring parameters in the monitoring parameter class object is t, and t is the current data acquisition cycle number;
judgingWhether or not to establish;
when (when)When in use, let->
When (when)When in use, let->
In the technical solutions of the above embodiments, i is n or less p When the values of i are the same, pn i (t) andcorresponds to the same monitoring parameter. It should be noted that pn in all data acquisition cycles is not held in the cache molecule i (t) and->Preferably, in the technical solution of the present invention, only pn in the last data acquisition cycle is stored in the cache unit i (t) and->In the technical scheme of the above embodiment, +.>In practice, it is +.>And carrying out iterative calculation on the same variable by using the numerical value stored by the variable per data acquisition period, and then carrying out assignment again.
Preferably, in the step of updating the security evaluation index corresponding to each monitored parameter in the security inspection object based on the monitored parameter of the server hardware, the baseboard management subsystem is configured to:
Sequentially reading each attribute value pn in the monitoring parameter real-time object i (t) and each attribute value ph in the monitoring parameter history object i (t);
Calculating the dynamic step length of each monitoring parameter according to the average value of each monitoring parameter in the monitoring parameter average value object:
wherein σ is a dynamic step size division coefficient of each monitoring parameter and σ >1;
updating the safety evaluation index of each monitoring parameter according to the dynamic step length:
specifically, in the foregoing embodiment, the dynamic step size division coefficient σ is configured as a positive integer greater than 1, and the increment of the safety evaluation index of the monitoring parameter is determined according to each attribute value pn in the real-time object of the monitoring parameter i (t) and each attribute value ph in the monitoring parameter history object i The magnitude of (t) may be positive or negative. The dynamic step length is a dynamic value calculated in real time in each safety detection period according to the dynamic average value of each monitoring parameter. Based on the safety evaluation index calculated by the dynamic step length, the numerical variation of the monitoring parameter in the current safety detection period can be reflected relative to the monitoring parameterThe magnitude of the global average of the measured parameters. In other embodiments of the present invention, a scheme of configuring a static step size with a fixed size for each monitoring parameter to calculate a safety evaluation index of each monitoring parameter may be adopted.
Preferably, after the step of updating the security evaluation index corresponding to each of the monitoring parameters in the security inspection object based on the monitoring parameters of the server hardware, the baseboard management subsystem is configured to: and writing each attribute value in the monitoring parameter real-time object into an attribute variable corresponding to the monitoring parameter history object.
In the technical solution of this embodiment, the value of the monitoring parameter stored in the real-time object of one monitoring parameter in each safety detection period is used as the historical value of each corresponding monitoring parameter in the historical object of the monitoring parameter in the next safety detection period, which can reflect the change of the value of the monitoring parameter in each safety detection period relative to the previous safety detection period.
Preferably, after the step of updating the security evaluation index corresponding to each of the monitoring parameters in the security inspection object based on the monitoring parameters of the server hardware, the baseboard management subsystem is configured to: and writing each attribute value in the monitoring parameter mean value object into an attribute variable corresponding to the monitoring parameter history object.
In the technical solution of this embodiment, the moving average value of the monitoring parameter stored in the monitoring parameter average value object is used as the historical value of each corresponding monitoring parameter in the monitoring parameter historical object of the next safety detection period, which can reflect the magnitude of the value of the monitoring parameter of each safety detection period relative to the global average value.
Preferably, the security detection class object further includes a counting variable AI for recording a number of periods in which the security evaluation index is continuously positive or continuously negative i (t) after the step of updating the security assessment index corresponding to each monitored parameter in the security inspection object based on the monitored parameters of the server hardware, the baseboard management subsystem is configured to:
calculation I i (t) and I i Product of (t-1):
MI i (t)=I i (t)×I i (t-1);
when MI i When (t) is more than or equal to 0, AI is made i (t)=AI i (t-1)+1;
When MI i (t)<0 time, let AI i (t)=0;
In the technical solution of the foregoing embodiment, initializing and assigning 0 to each attribute value of the security detection instance object further includes initializing and assigning 0 to a count variable corresponding to each security evaluation index.
After the step of updating the security assessment index corresponding to each monitored parameter in the security inspection object based on the monitored parameters of the server hardware, the baseboard management subsystem is configured to:
judging whether the value of each monitoring parameter corresponding to the monitoring parameter real-time object exceeds a preset safety value range or not;
when the value of any monitoring parameter in the monitoring parameter real-time object exceeds a preset safety value range, the corresponding counting variable AI i (t) reset to 0 and output a safety precaution.
Preferably, in the step of evaluating the security of the server hardware according to the security evaluation index in the security detection instance object, the baseboard management subsystem is configured to:
according to the counting variable AI i (t) calculating a security score corresponding to the monitored parameter:
wherein A is max Is a pre-configured maximum accumulation period number.
In the foregoing embodiment, after the step of updating the security evaluation index corresponding to each of the monitoring parameters in the security inspection object based on the monitoring parameters of the server hardware, the substrate tubeThe processing subsystem is configured to: when AI i (t)>A max At the time, the corresponding counting variable AI i (t) reset to 0 and output a safety precaution.
As shown in fig. 1, a second aspect of the present invention proposes a security detection method of a server hardware monitoring system, including:
constructing a security detection class object of the server hardware, wherein the security detection class object comprises a security evaluation index for reflecting the security state of each monitoring parameter in the server hardware;
instantiating the security detection class object as a security detection instance object;
acquiring monitoring parameters of the server hardware in a preset data acquisition period;
Updating a security evaluation index corresponding to each monitoring parameter in the security detection instance object based on the monitoring parameters of the server hardware;
traversing the safety evaluation index in the safety detection example object in a preset safety detection period;
and evaluating the security of the server hardware according to the security evaluation index in the security detection instance object.
Specifically, the service subsystem is an in-band system, the substrate management subsystem is an out-of-band system, the service subsystem comprises a first processing unit, a first storage unit, a first power supply unit and a first communication unit, the substrate management subsystem comprises a second processing unit, a second storage unit, a second power supply unit, a second communication unit and a sensing unit, the sensing unit is arranged on each device of the service subsystem to acquire state data of each device of the service subsystem, and the security detection method of the server hardware monitoring system is realized by running a computer program stored in the second storage unit by the second processing unit of the substrate management subsystem. The baseboard management subsystem also comprises a cache unit which is a volatile storage unit, namely the cache unit loses the data stored therein after power is cut off. In the technical solution of the foregoing embodiment, the step of instantiating the security detection class object into a security detection instance object is specifically instantiating the security detection class object to generate the security detection instance object in the cache unit. The step of instantiating the security detection class object as a security detection instance object further comprises initializing the security detection instance object, i.e. initializing each attribute value of the security detection instance object to be assigned 0.
In the technical scheme of the invention, the construction of the security detection class object can have various embodiments, firstly, the security detection class object is constructed by taking hardware as a unit, namely, a security detection class object is constructed for each hardware of the server, and a security evaluation index corresponding to each monitoring parameter of each hardware is used as an attribute value of the security detection class object, for example, the security detection class object of one in-band hard disk can contain the security evaluation index corresponding to the monitoring parameters such as temperature, voltage, current and the like; and secondly, constructing the safety detection class object by taking the monitoring parameter type as a unit, namely constructing a safety detection class object aiming at each monitoring parameter type of the server, wherein the same safety detection class object relates to the same type of monitoring parameters of a plurality of hardware in the server, and a safety evaluation index corresponding to each monitoring parameter of the type is used as an attribute value of the safety detection class object, for example, the temperature of each in-band processor, the temperature of each in-band memory, the temperature of each in-band hard disk and the like can be contained in the temperature safety detection class object. Of course, in another embodiment, a security detection class object containing all the monitoring parameters of the server may be constructed, but the security detection class object constructed in this embodiment has too many security evaluation indexes to be easily distinguished and used.
To maintain the timeliness of the server hardware monitoring system to server hardware status monitoring, the data acquisition period is typically configured to a relatively small value, such as a few seconds or even less than 1 second. The security detection period is greater than or equal to the data acquisition period, and preferably, in order to reduce the processing burden on the server hardware monitoring system, the security detection period may be configured to a relatively large value, such as a few minutes or even tens of minutes.
Preferably, before the step of acquiring the monitoring parameters of the server hardware in a preset data acquisition period, the method further includes:
constructing a monitoring parameter class object of the server hardware;
instantiating the monitoring parameter class object into a monitoring parameter real-time object and a monitoring parameter historical object, wherein the monitoring parameter real-time object is used for storing the value of each monitoring parameter acquired last time in each data acquisition period, and the monitoring parameter historical object is used for storing the historical value of each monitoring parameter in each safety detection period;
the step of acquiring the monitoring parameters of the server hardware in a preset data acquisition period specifically comprises the following steps:
and writing the monitoring parameters of the server hardware into the monitoring parameter real-time object.
In the foregoing technical solution of the foregoing embodiment, the step of writing the monitoring parameter of the server hardware into the monitoring parameter real-time object specifically includes:
clearing data in the monitoring parameter real-time object;
and reassigning the real-time object of the monitoring parameters according to the collected monitoring parameters of the server hardware.
It should be noted that the real-time object of the monitoring parameter is a variable multiplexed at high frequency, the monitoring parameter acquired in each data acquisition period needs to be read and written by the real-time object of the monitoring parameter after being instantiated, and each data acquisition period instantiates a real-time object of the monitoring parameter to read and write new data so as to occupy a large amount of cache.
Preferably, after the step of acquiring the monitoring parameters of the server hardware in a preset data acquisition period, the method further includes:
Instantiating the monitoring parameter class object into a monitoring parameter average value object, wherein the monitoring parameter average value object is used for storing the average value of each monitoring parameter;
in each data acquisition period, carrying out average calculation on the numerical value of each monitoring parameter stored in the monitoring parameter real-time object and the current average value of each monitoring parameter stored in the monitoring parameter average value object to obtain a new average value;
and writing the calculated new average value into the monitoring parameter average value object.
Specifically, the monitoring parameter average object is the same as the monitoring parameter real-time object and the monitoring parameter history object, and is obtained by instantiation of the monitoring parameter class object, that is, the three object instances have the same object structure, the attribute values of the three object instances have one-to-one correspondence, and each attribute value corresponds to one monitoring parameter of the server hardware.
Preferably, the step of calculating the average value of each monitoring parameter stored in the real-time object of the monitoring parameter and the current average value of each monitoring parameter stored in the average object of the monitoring parameter to obtain a new average value in each data acquisition period specifically includes:
Sequentially reading each attribute value pn in the monitoring parameter real-time object i (t) each attribute value in the monitored parameter mean objectWherein i is E [1, n p ],n p The number t is the current data acquisition cycle number for the monitoring parameter number in the monitoring parameter class object;
judgingWhether or not to establish;
when (when)When in use, let->
When (when)When in use, let->
In the technical solutions of the above embodiments, i is n or less p When the values of i are the same, pn i (t) andcorresponds to the same monitoring parameter. It should be noted that pn in all data acquisition cycles is not held in the cache molecule i (t) and->Preferably, in the technical solution of the present invention, only pn in the last data acquisition cycle is stored in the cache unit i (t) and->In the technical scheme of the above embodiment, +.>In practice, it is +.>And carrying out iterative calculation on the same variable by using the numerical value stored by the variable per data acquisition period, and then carrying out assignment again.
Preferably, the step of updating the security evaluation index corresponding to each monitoring parameter in the security detection object based on the monitoring parameter of the server hardware specifically includes:
Sequentially reading each attribute value pn in the monitoring parameter real-time object i (t) and each attribute value ph in the monitoring parameter history object i (t);
Calculating the dynamic step length of each monitoring parameter according to the average value of each monitoring parameter in the monitoring parameter average value object:
wherein σ is a dynamic step size division coefficient of each monitoring parameter and σ >1;
updating the safety evaluation index of each monitoring parameter according to the dynamic step length:
specifically, in the foregoing embodiment, the dynamic step size division coefficient σ is configured as a positive integer greater than 1, and the increment of the safety evaluation index of the monitoring parameter is determined according to each attribute value pn in the real-time object of the monitoring parameter i (t) and each attribute value ph in the monitoring parameter history object i The magnitude of (t) may be positive or negative. The dynamic step length is a dynamic value calculated in real time in each safety detection period according to the dynamic average value of each monitoring parameter. Based on the safety evaluation index calculated by the dynamic step length, the magnitude of the numerical variation of the monitoring parameter in the current safety detection period relative to the global average value of the monitoring parameter can be reflected. In other embodiments of the present invention, a scheme of configuring a static step size with a fixed size for each monitoring parameter to calculate a safety evaluation index of each monitoring parameter may be adopted.
Preferably, after the step of updating the security evaluation index corresponding to each monitoring parameter in the security detection object based on the monitoring parameter of the server hardware, each attribute value in the monitoring parameter real-time object is written into the attribute variable corresponding to the monitoring parameter history object.
In the technical solution of this embodiment, the value of the monitoring parameter stored in the real-time object of one monitoring parameter in each safety detection period is used as the historical value of each corresponding monitoring parameter in the historical object of the monitoring parameter in the next safety detection period, which can reflect the change of the value of the monitoring parameter in each safety detection period relative to the previous safety detection period.
Preferably, after the step of updating the security evaluation index corresponding to each monitoring parameter in the security detection object based on the monitoring parameter of the server hardware, each attribute value in the monitoring parameter mean object is written into the attribute variable corresponding to the monitoring parameter history object.
In the technical solution of this embodiment, the moving average value of the monitoring parameter stored in the monitoring parameter average value object is used as the historical value of each corresponding monitoring parameter in the monitoring parameter historical object of the next safety detection period, which can reflect the magnitude of the value of the monitoring parameter of each safety detection period relative to the global average value.
Preferably, the security detection class object further includes a counting variable AI for recording a number of periods in which the security evaluation index is continuously positive or continuously negative i (t) after the step of updating the security evaluation index corresponding to each of the monitored parameters in the security detection object based on the monitored parameters of the server hardware, further comprising:
calculation I i (t) and I i Product of (t-1):
MI i (t)=I i (t)×I i (t-1);
when MI i When (t) is more than or equal to 0, AI is made i (t)=AI i (t-1)+1;
When MI i (t)<0 time, let AI i (t)=0;
In the technical solution of the foregoing embodiment, initializing and assigning 0 to each attribute value of the security detection instance object further includes initializing and assigning 0 to a count variable corresponding to each security evaluation index.
After the step of updating the security evaluation index corresponding to each monitoring parameter in the security detection object based on the monitoring parameter of the server hardware, the method further comprises:
judging whether the value of each monitoring parameter corresponding to the monitoring parameter real-time object exceeds a preset safety value range or not;
when the value of any monitoring parameter in the monitoring parameter real-time object exceeds a preset safety value range, the corresponding counting variable AI i (t) reset to 0 and output a safety precaution.
Preferably, the step of evaluating the security of the server hardware according to the security evaluation index in the security detection instance object specifically includes:
According to the counting variable AI i (t) calculating a security score corresponding to the monitored parameter:
wherein A is max Is a pre-configured maximum accumulation period number.
In the foregoing embodiment, after the step of updating the security evaluation index corresponding to each of the monitoring parameters in the security detection object based on the monitoring parameters of the server hardware, the method further includes when AI i (t)>A max At the time, the corresponding counting variable AI i (t) reset to 0 and output a safety precaution.
It should be noted that in this document relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Embodiments in accordance with the present invention, as described above, are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention and various modifications as are suited to the particular use contemplated. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims (10)

1. A server hardware monitoring system comprising a business service subsystem for running a business service program and a baseboard management subsystem for monitoring the target server, the baseboard management subsystem configured to:
constructing a security detection class object of the server hardware, wherein the security detection class object comprises a security evaluation index for reflecting the security state of each monitoring parameter in the server hardware;
instantiating the security detection class object as a security detection instance object;
acquiring monitoring parameters of the server hardware in a preset data acquisition period;
Updating a security evaluation index corresponding to each monitoring parameter in the security detection instance object based on the monitoring parameters of the server hardware;
traversing the safety evaluation index in the safety detection example object in a preset safety detection period;
and evaluating the security of the server hardware according to the security evaluation index in the security detection instance object.
2. A security detection method for a server hardware monitoring system, comprising:
constructing a security detection class object of the server hardware, wherein the security detection class object comprises a security evaluation index for reflecting the security state of each monitoring parameter in the server hardware;
instantiating the security detection class object as a security detection instance object;
acquiring monitoring parameters of the server hardware in a preset data acquisition period;
updating a security evaluation index corresponding to each monitoring parameter in the security detection instance object based on the monitoring parameters of the server hardware;
traversing the safety evaluation index in the safety detection example object in a preset safety detection period;
and evaluating the security of the server hardware according to the security evaluation index in the security detection instance object.
3. The security detection method according to claim 2, further comprising, before the step of acquiring the monitoring parameters of the server hardware at a preset data acquisition period:
constructing a monitoring parameter class object of the server hardware;
instantiating the monitoring parameter class object into a monitoring parameter real-time object and a monitoring parameter historical object, wherein the monitoring parameter real-time object is used for storing the value of each monitoring parameter acquired last time in each data acquisition period, and the monitoring parameter historical object is used for storing the historical value of each monitoring parameter in each safety detection period;
the step of acquiring the monitoring parameters of the server hardware in a preset data acquisition period specifically comprises the following steps:
and writing the monitoring parameters of the server hardware into the monitoring parameter real-time object.
4. A security detection method according to claim 3, further comprising, after the step of acquiring the monitoring parameters of the server hardware at a preset data acquisition period:
instantiating the monitoring parameter class object into a monitoring parameter average value object, wherein the monitoring parameter average value object is used for storing the average value of each monitoring parameter;
In each data acquisition period, carrying out average calculation on the numerical value of each monitoring parameter stored in the monitoring parameter real-time object and the current average value of each monitoring parameter stored in the monitoring parameter average value object to obtain a new average value;
and writing the calculated new average value into the monitoring parameter average value object.
5. The method according to claim 4, wherein the step of averaging the value of each monitoring parameter stored in the real-time object of the monitoring parameter with the current average value of each monitoring parameter stored in the average object of the monitoring parameter to obtain a new average value in each data acquisition period comprises:
sequentially reading each attribute value pn in the monitoring parameter real-time object i (t) each attribute value in the monitored parameter mean objectWherein i is E [1, n p ],n p The number t is the current data acquisition cycle number for the monitoring parameter number in the monitoring parameter class object;
judgingWhether or not to establish;
when (when)When in use, let->
When (when)When in use, let->
6. The method according to claim 5, wherein the step of updating the security evaluation index corresponding to each of the monitored parameters in the security detection object based on the monitored parameters of the server hardware specifically comprises:
Sequentially reading each attribute value pn in the monitoring parameter real-time object i (t) and each attribute value ph in the monitoring parameter history object i (t);
Calculating the dynamic step length of each monitoring parameter according to the average value of each monitoring parameter in the monitoring parameter average value object:
wherein σ is a dynamic step size division coefficient of each monitoring parameter and σ >1;
updating the safety evaluation index of each monitoring parameter according to the dynamic step length:
7. the security detection method according to claim 6, further comprising writing each attribute value in the real-time object of the monitored parameter into an attribute variable corresponding to the history object of the monitored parameter after the step of updating the security evaluation index corresponding to each monitored parameter in the security detection object based on the monitored parameter of the server hardware.
8. The security detection method according to claim 6, further comprising writing each attribute value in the monitoring parameter mean object into an attribute variable corresponding to the monitoring parameter history object after the step of updating the security evaluation index corresponding to each monitoring parameter in the security detection object based on the monitoring parameter of the server hardware.
9. The security detection method according to claim 6, wherein the security detection class object further includes a count variable AI for recording a number of periods in which the security evaluation index is continuously positive or continuously negative i (t) after the step of updating the security evaluation index corresponding to each of the monitored parameters in the security detection object based on the monitored parameters of the server hardware, further comprising:
calculation I i (t) and I i Product of (t-1):
MI i (t)=I i (t)×I i (t-1);
when MI i When (t) is more than or equal to 0, AI is made i (t)=AI i (t-1)+1;
When MI i (t)<0 time, let AI i (t)=0。
10. The method according to claim 9, wherein the step of evaluating the security of the server hardware according to the security evaluation index in the security detection instance object specifically comprises:
according to the counting variable AI i (t) calculating a security score corresponding to the monitored parameter:
wherein A is max Is a pre-configured maximum accumulation period number.
CN202311454121.9A 2023-11-02 2023-11-02 Server hardware monitoring system and safety detection method thereof Withdrawn CN117453507A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311454121.9A CN117453507A (en) 2023-11-02 2023-11-02 Server hardware monitoring system and safety detection method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311454121.9A CN117453507A (en) 2023-11-02 2023-11-02 Server hardware monitoring system and safety detection method thereof

Publications (1)

Publication Number Publication Date
CN117453507A true CN117453507A (en) 2024-01-26

Family

ID=89579675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311454121.9A Withdrawn CN117453507A (en) 2023-11-02 2023-11-02 Server hardware monitoring system and safety detection method thereof

Country Status (1)

Country Link
CN (1) CN117453507A (en)

Similar Documents

Publication Publication Date Title
CN108052528B (en) A kind of storage equipment timing classification method for early warning
US11514354B2 (en) Artificial intelligence based performance prediction system
JP3922375B2 (en) Anomaly detection system and method
CN107925612B (en) Network monitoring system, network monitoring method, and computer-readable medium
CN106209432B (en) Network equipment inferior health method for early warning and device based on dynamic threshold
CA2730165C (en) Automatic discovery of physical connectivity between power outlets and it equipment
US9600394B2 (en) Stateful detection of anomalous events in virtual machines
US20080195369A1 (en) Diagnostic system and method
JP5214656B2 (en) Evaluation apparatus and evaluation program
US10248561B2 (en) Stateless detection of out-of-memory events in virtual machines
Jassas et al. Failure analysis and characterization of scheduling jobs in google cluster trace
CN101442561A (en) Method for monitoring grid based on vector machine support
US20160098225A1 (en) Method for optimizing storage configuration for future demand and system thereof
WO2010049732A1 (en) Capacity control
CN111563022A (en) Centralized storage monitoring method and device
CN115860729A (en) IT operation and maintenance integrated management system
CN112287548A (en) Power distribution network real-time monitoring method and device based on broadband network
CN113487086B (en) Method, device, computer equipment and medium for predicting residual service life of equipment
CN113487182B (en) Device health state evaluation method, device, computer device and medium
CN114298558A (en) Electric power network safety studying and judging system and studying and judging method thereof
CN111614504A (en) Power grid regulation and control data center service characteristic fault positioning method and system based on time sequence and fault tree analysis
CN110007174B (en) Power management and control detection method for data center
CN117453507A (en) Server hardware monitoring system and safety detection method thereof
CN114819367A (en) Public service platform based on industrial internet
Jehangiri et al. Distributed predictive performance anomaly detection for virtualised platforms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20240126