CN116450463A - Processing method for monitoring server hardware - Google Patents
Processing method for monitoring server hardware Download PDFInfo
- Publication number
- CN116450463A CN116450463A CN202310488972.9A CN202310488972A CN116450463A CN 116450463 A CN116450463 A CN 116450463A CN 202310488972 A CN202310488972 A CN 202310488972A CN 116450463 A CN116450463 A CN 116450463A
- Authority
- CN
- China
- Prior art keywords
- server
- information
- state information
- state
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 35
- 238000003672 processing method Methods 0.000 title abstract description 10
- 238000000034 method Methods 0.000 claims abstract description 27
- 230000004044 response Effects 0.000 claims abstract description 24
- 238000004458 analytical method Methods 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 16
- 230000002159 abnormal effect Effects 0.000 claims description 13
- 239000013598 vector Substances 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000007812 deficiency Effects 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000004891 communication Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 4
- 230000003247 decreasing effect Effects 0.000 description 4
- 230000007547 defect Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3024—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3055—Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Power Sources (AREA)
Abstract
The embodiment of the invention relates to a processing method for monitoring server hardware, which comprises the following steps: the monitoring server periodically sends a first equipment query instruction to each first server; the instruction sending time of the first equipment query instruction is recorded as first server time; receiving first equipment response data returned by each first server; and recording the data receiving time of the first equipment response data as the second server time; the first record is formed by the first server time, the second server time and the response data of the first equipment and is stored in a first record list; analyzing the running state of the server hardware according to the latest first record in the first record list; and predicting the running risk of the hardware of the server according to all the first records in the last appointed time period in the first record list. By the method and the system, the running state of the monitored server can be monitored in real time and risk prediction can be performed.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a processing method for monitoring server hardware.
Background
With the development of informatization construction, the application of information networks has been advanced into various industries. In order to ensure that each network can stably and effectively operate, a set of corresponding operation and maintenance monitoring schemes are conventionally configured for each network, and the conventional scheme is mainly used for monitoring the flow of a server in the network at present. We have found through practice that this conventional approach has a number of problems, one of which is: the other hardware states of the server are not monitored, which can make the monitored server incapable of being monitored comprehensively by the monitoring platform.
Disclosure of Invention
The invention aims at overcoming the defects of the prior art and provides a processing method, electronic equipment and a computer readable storage medium for monitoring server hardware, wherein a monitoring server periodically collects time, CPU (central processing unit) state, memory state, hard disk state, power state, fan state and slot parameter sets of a monitored server, performs real-time server running state analysis according to the latest collection result, and performs running risk prediction according to the historical collection result by using an artificial intelligent model. The invention can make up the defect that the hardware of the server is not monitored comprehensively in the conventional scheme, and can monitor the running state of any server in real time and predict the risk.
To achieve the above object, a first aspect of an embodiment of the present invention provides a processing method for monitoring server hardware, where the method includes:
the monitoring server periodically sends a first equipment query instruction to each first server; recording the instruction sending time of the first equipment query instruction as corresponding first server time; receiving first equipment response data returned by each first server; and recording the data receiving time of the first equipment response data as the corresponding second server time; the first server time, the second server time and the first equipment response data form corresponding first records which are stored in a corresponding first record list;
analyzing the running state of the server hardware according to the latest first record in the first record list to generate a corresponding first analysis result and displaying the first analysis result;
and predicting the running risk of the server hardware according to all the first records in the latest appointed time period in the first record list to generate a corresponding first prediction result and displaying the first prediction result.
Preferably, the monitoring server sends an instruction to each of the first servers based on SNMP protocol and receives response data of each of the first servers.
Preferably, the first record list includes a plurality of the first records; the first record includes the first server time, the second server time, and the first device response data; the first device response data includes a first server IP address, a first server name, a first server type, first CPU state information, first memory state information, first hard disk state information, first power state information, first fan state information, and a first slot parameter set; the first server type includes blade, tower and desktop; the first CPU state information comprises a CPU model, a CPU parameter, a CPU temperature and a CPU utilization rate; the first memory state information comprises a plurality of pieces of first memory bank information, and the first memory bank information comprises a memory bank interface, a memory bank capacity and a memory bank utilization rate; the first hard disk state information comprises a plurality of first hard disk information, and the first hard disk information comprises a hard disk interface, a hard disk type, a hard disk total capacity and a hard disk residual capacity; the first power state information comprises a power supply mode, a power supply state and a battery power, wherein the power supply mode comprises a battery power supply mode and an alternating current power supply mode, and the power supply state comprises a normal state and an abnormal state; the first fan state information comprises a fan model, fan power and a fan state, and the fan state comprises a normal state and an abnormal state; the first socket parameter set includes a plurality of first socket parameters including a socket type and a socket state including an occupied state and an idle state.
Preferably, the method further comprises:
when the first server receives the first equipment query instruction sent by the monitoring server, a preset server IP address, a server name and a server type are obtained from local as the corresponding first server IP address, first server name and first server type; counting the CPU information of the local server to obtain corresponding first CPU state information; counting the memory bank information of the local server to obtain the corresponding first memory state information; the hard disk information of the local server is counted to obtain corresponding first hard disk state information; the power supply information of the local server is counted to obtain corresponding first power supply state information; counting the fan information of the local server to obtain corresponding first fan state information; counting slot parameters of a local server to obtain a corresponding first slot parameter set; and the first device response data corresponding to the first server IP address, the first server name, the first server type, the first CPU state information, the first memory state information, the first hard disk state information, the first power state information, the first fan state information and the first slot parameter set are obtained to be returned to the monitoring server.
Preferably, the analyzing the running state of the server hardware according to the latest first record in the first record list to generate and display a corresponding first analysis result specifically includes:
extracting the first record with the latest time from the first record list as a corresponding current record; extracting the currently recorded first server IP address, the first server name, the first server type, the first CPU state information, the first memory state information, the first hard disk state information, the first power state information, the first fan state information and the first slot parameter set as corresponding current server IP address, current server name, current server type, current CPU state information, current memory state information, current hard disk state information, current power state information, current fan state information and current slot parameter set;
identifying whether the CPU utilization rate of the current CPU state information exceeds a preset CPU utilization rate warning threshold; if yes, setting the corresponding first information as preset CPU resource occupation excessive alarm information; if not, setting the corresponding first information to be empty;
Identifying whether the CPU temperature of the current CPU state information exceeds a preset CPU temperature warning threshold; if yes, setting the corresponding second information as preset CPU temperature higher alarm information; if not, setting the corresponding second information to be empty;
calculating the average value of all the memory bank utilization rates of the current memory state information to obtain corresponding average memory utilization rates; identifying whether the average memory usage exceeds a preset memory usage warning threshold; if yes, setting the corresponding third information as preset excessive memory resource occupation alarm information; if not, setting the corresponding third information to be empty;
summing up all the total capacities of the hard disks of the current hard disk state information to obtain a corresponding first total amount; and performing sum calculation on all the residual capacities of the hard disks of the current hard disk state information to obtain a corresponding second total amount; calculating corresponding hard disk residual percentage = [ (first total amount-second total amount)/first total amount ]. Times.100% according to the first total amount and the second total amount; identifying whether the hard disk residual percentage is lower than a preset hard disk residual percentage warning threshold value; if yes, setting the corresponding fourth information as preset hard disk resource occupation excessive alarm information; if not, setting the corresponding fourth information to be null;
Identifying whether the battery power of the current power state information is lower than a preset battery power warning threshold; if yes, setting the corresponding fifth information as preset electric quantity shortage alarm information; if not, setting the corresponding fifth information to be null;
identifying whether the fan state of the current fan state information is an abnormal state; if yes, setting the corresponding sixth information as preset abnormal fan alarm information; if not, setting the corresponding sixth information to be null;
identifying whether the fan power of the current fan state information exceeds a preset fan power warning threshold; if yes, setting the corresponding seventh information as preset high power consumption warning information of the fan; if not, setting the corresponding seventh information to be empty;
counting the number of the first slot parameters in the current slot parameter set, wherein the slot state is the occupied state, so as to obtain a corresponding first number, counting the number of the first slot parameters in the current slot parameter set, so as to obtain a corresponding second number, and calculating the corresponding slot utilization rate= (first number/second number) ×100% according to the first number and the second number; identifying whether the slot utilization rate exceeds a preset slot utilization rate warning threshold value; if yes, setting the corresponding eighth information as preset slot resource deficiency alarm information; if not, setting the corresponding eighth information to be null;
Identifying whether the first, second, third, fourth, fifth, sixth, seventh and eighth information is all empty; if yes, setting the corresponding first analysis information as preset normal information of the running state of the server; if not, the first analysis information corresponding to the first, second, third, fourth, fifth, sixth, seventh and eighth information is formed;
and the corresponding first analysis result is formed by the current server IP address, the current server name, the current server type and the first analysis information and displayed.
Preferably, the predicting the running risk of the server hardware according to all the first records in the last specified period in the first record list to generate and display a corresponding first prediction result specifically includes:
extracting any one of the first records in the first record list, namely the first server IP address, the first server name and the first server type, as a corresponding current server IP address, a corresponding current server name and a corresponding current server type;
extracting all the first records in the first record list within the latest appointed time period, and sequencing the first records in time sequence to generate a corresponding first record sequence;
In the first record sequence, extracting the CPU temperature and the CPU utilization rate of each first record as first CPU characteristics corresponding to the composition; extracting each group of memory bank capacity and memory bank utilization rate in the first memory state information of each first record to form corresponding first memory bank characteristics, and forming corresponding first memory characteristics by all the obtained first memory bank characteristics; extracting the total capacity and the residual capacity of each group of hard disks in the first hard disk state information of each first record to form corresponding first hard disk bar characteristics, and forming corresponding first storage characteristics by all the obtained first hard disk bar characteristics; extracting the first power state information of each first record to form corresponding first power features; extracting the fan power and the fan state of the first fan state information recorded by each first record as a first fan characteristic; extracting each slot state in each first recorded first slot parameter set to form corresponding first slot features, and forming corresponding second slot features by all obtained first slot features; and the first CPU feature, the first memory feature, the first storage feature, the first power feature, the first fan feature and the second slot feature corresponding to each first record form a corresponding first feature vector; and forming a corresponding first feature tensor by all the obtained first feature vectors;
Inputting the first characteristic tensor into a preset running risk classification prediction model to perform running risk classification prediction processing to obtain a corresponding first prediction vector; the first prediction vector includes a plurality of first classification probabilities; each first classification probability corresponds to a preset risk type;
forming corresponding first-type prediction information by each first classification probability and the corresponding risk type; and the corresponding first prediction results are formed and displayed by all the obtained first type of prediction information.
A second aspect of an embodiment of the present invention provides an electronic device, including: memory, processor, and transceiver;
the processor is coupled to the memory, and reads and executes the instructions in the memory to implement the method of the first aspect;
the transceiver is coupled to the processor and is controlled by the processor to transmit and receive messages.
A third aspect of the embodiments of the present invention provides a computer-readable storage medium storing computer instructions that, when executed by a computer, cause the computer to perform the method of the first aspect.
The embodiment of the invention provides a processing method, electronic equipment and a computer readable storage medium for monitoring server hardware, wherein a monitoring server periodically collects time, CPU (Central processing Unit) state, memory state, hard disk state, power supply state, fan state and slot parameter sets of a monitored server, analyzes the running state of the server in real time according to the latest collection result, and predicts running risk according to the historical collection result by using an artificial intelligent model. By the method and the system, the running state of any server can be monitored in real time and risk prediction is performed, so that the defect that the hardware of the server is not monitored comprehensively in the conventional scheme is overcome.
Drawings
FIG. 1 is a schematic diagram of a processing method for monitoring server hardware according to a first embodiment of the present invention;
fig. 2 is a schematic structural diagram of an electronic device according to a second embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a schematic diagram of a processing method for monitoring server hardware according to a first embodiment of the present invention, where as shown in fig. 1, the method mainly includes the following steps:
step 1, a monitoring server periodically sends a first equipment query instruction to each first server; the instruction sending time of the first equipment query instruction is recorded as corresponding first server time; receiving first equipment response data returned by each first server; and recording the data receiving time of the first equipment response data as the corresponding second server time; the first server time, the second server time and the first equipment response data form corresponding first records which are stored in a corresponding first record list;
the monitoring server sends instructions to each first server based on an SNMP protocol and receives response data of each first server;
server time and first device response data; the first device response data includes a first server IP address, a first server name, a first server type, first CPU state information, first memory state information, first hard disk state information, first power state information, first fan state information, and a first slot parameter set; the first server type includes blade, tower, and desktop; the first CPU state information comprises a CPU model, a CPU parameter, a CPU temperature and a CPU utilization rate; the first memory state information comprises a plurality of pieces of first memory bank information, and the first memory bank information comprises a memory bank interface, a memory bank capacity and a memory bank utilization rate; the first hard disk state information comprises a plurality of first hard disk information, wherein the first hard disk information comprises a hard disk interface, a hard disk type, a hard disk total capacity and a hard disk residual capacity; the first power state information comprises a power supply mode, a power supply state and a battery power, wherein the power supply mode comprises a battery power supply mode and an alternating current power supply mode, and the power supply state comprises a normal state and an abnormal state; the first fan state information comprises a fan model, fan power and a fan state, and the fan state comprises a normal state and an abnormal state; the first slot parameter set includes a plurality of first slot parameters, the first slot parameters including a slot type and a slot state, the slot state including an occupied state and an idle state.
It should be noted that, when the first server receives the first device query instruction sent by the monitoring server, the following processing steps are shown:
the method comprises the steps of locally obtaining a preset server IP address, a server name and a server type as a corresponding first server IP address, a first server name and a first server type; counting the CPU information of the local server to obtain corresponding first CPU state information; counting the memory bank information of the local server to obtain corresponding first memory state information; the hard disk information of the local server is counted to obtain corresponding first hard disk state information; the power supply information of the local server is counted to obtain corresponding first power supply state information; counting the fan information of the local server to obtain corresponding first fan state information; counting slot parameters of a local server to obtain a corresponding first slot parameter set; and the corresponding first equipment response data composed of the obtained first server IP address, the first server name, the first server type, the first CPU state information, the first memory state information, the first hard disk state information, the first power supply state information, the first fan state information and the first slot parameter set is returned to the monitoring server.
Step 2, analyzing the running state of the server hardware according to the latest first record in the first record list to generate a corresponding first analysis result and displaying the first analysis result;
the method specifically comprises the following steps: step 2a, extracting a first record with the latest time from the first record list as a corresponding current record; extracting a first server IP address, a first server name, a first server type, first CPU state information, first memory state information, first hard disk state information, first power supply state information, first fan state information and a first slot parameter set which are recorded currently as a corresponding current server IP address, a current server name, a current server type, current CPU state information, current memory state information, current hard disk state information, current power supply state information, current fan state information and a current slot parameter set;
step 2b, identifying whether the CPU utilization rate of the current CPU state information exceeds a preset CPU utilization rate warning threshold value; if yes, setting the corresponding first information as preset CPU resource occupation excessive alarm information; if not, setting the corresponding first information to be empty;
here, the CPU usage alert threshold is a proportionality threshold constant with a preset value range between [0,1 ];
Step 2c, identifying whether the CPU temperature of the current CPU state information exceeds a preset CPU temperature warning threshold; if yes, setting the corresponding second information as preset CPU temperature higher alarm information; if not, setting the corresponding second information to be empty;
here, the CPU temperature alert threshold is a preset temperature threshold constant;
step 2d, performing average calculation on all memory bank utilization rates of the current memory state information to obtain corresponding average memory utilization rates; identifying whether the average memory usage exceeds a preset memory usage warning threshold; if yes, setting the corresponding third information as preset excessive memory resource occupation alarm information; if not, setting the corresponding third information to be null;
here, the memory usage alert threshold is a preset proportional threshold constant with a value range between [0,1 ];
step 2e, calculating the sum of all the hard disk total capacities of the current hard disk state information to obtain a corresponding first total amount; and calculating the sum of all the hard disk residual capacities of the current hard disk state information to obtain a corresponding second total amount; calculating the corresponding residual percentage of the hard disk according to the first total amount and the second total amount, wherein the residual percentage of the hard disk is = [ (first total amount-second total amount)/first total amount ]. Times.100%; identifying whether the residual percentage of the hard disk is lower than a preset hard disk residual percentage warning threshold value; if yes, setting the corresponding fourth information as preset hard disk resource occupation excessive alarm information; if not, setting the corresponding fourth information to be null;
Here, the hard disk residual percentage warning threshold is a preset proportional threshold constant with the value range of 0, 1;
step 2f, identifying whether the battery power of the current power supply state information is lower than a preset battery power warning threshold; if yes, setting the corresponding fifth information as preset electric quantity shortage alarm information; if not, setting the corresponding fifth information to be null;
here, the battery power alert threshold is a preset battery power constant;
step 2g, identifying whether the fan state of the current fan state information is an abnormal state; if yes, setting the corresponding sixth information as preset abnormal fan alarm information; if not, setting the corresponding sixth information to be null;
step 2h, identifying whether the fan power of the current fan state information exceeds a preset fan power warning threshold; if yes, setting the corresponding seventh information as preset high power consumption warning information of the fan; if not, setting the corresponding seventh information to be null;
here, the fan power alert threshold is a preset power threshold constant;
step 2i, counting the number of first slot parameters with the slot state in the current slot parameter set as the occupied state to obtain a corresponding first number, counting the number of first slot parameters in the current slot parameter set to obtain a corresponding second number, and calculating the corresponding slot utilization rate= (first number/second number)/(second number) ×100% according to the first number and the second number; identifying whether the utilization rate of the slot exceeds a preset slot utilization rate warning threshold value; if yes, setting the corresponding eighth information as preset slot resource deficiency alarm information; if not, setting the corresponding eighth information to be null;
Here, the slot utilization warning threshold is a preset proportional threshold constant with a value range between [0,1 ];
step 2j, identifying whether the first, second, third, fourth, fifth, sixth, seventh and eighth information is all empty; if yes, setting the corresponding first analysis information as preset normal information of the running state of the server; if not, the first, second, third, fourth, fifth, sixth, seventh and eighth information forms corresponding first analysis information;
and 2k, forming a corresponding first analysis result by the current server IP address, the current server name, the current server type and the first analysis information, and displaying the first analysis result.
Step 3, predicting the running risk of the server hardware according to all the first records in the latest appointed time period in the first record list to generate a corresponding first prediction result and displaying the first prediction result;
the method specifically comprises the following steps: step 3a, extracting any first record first server IP address, first server name and first server type in the first record list as corresponding current server IP address, current server name and current server type;
step 3b, extracting all first records in the latest appointed time period in the first record list, and sequencing the first records in time sequence to generate a corresponding first record sequence;
Step 3c, in the first record sequence, extracting the CPU temperature and the CPU utilization rate of each first record as first CPU characteristics corresponding to the composition; extracting each group of memory bank capacity and memory bank utilization rate in each first recorded first memory state information to form corresponding first memory bank characteristics, and forming corresponding first memory characteristics by all the obtained first memory bank characteristics; extracting the total capacity of each group of hard disks and the residual capacity of the hard disk in the first hard disk state information of each first record to form corresponding first hard disk bar characteristics, and forming corresponding first storage characteristics by all the obtained first hard disk bar characteristics; extracting the first power state information of each first record to form corresponding first power features; extracting the fan power and the fan state of each first recorded first fan state information as a corresponding first fan characteristic; extracting each slot state in each first recorded first slot parameter set to form corresponding first slot features, and forming corresponding second slot features by all the obtained first slot features; and a first feature vector corresponding to the first CPU feature, the first memory feature, the first storage feature, the first power feature, the first fan feature and the second slot feature corresponding to each first record; and forming a corresponding first feature tensor by all the obtained first feature vectors;
Step 3d, inputting the first characteristic tensor into a preset running risk classification prediction model to perform running risk classification prediction processing to obtain a corresponding first prediction vector;
wherein the first predictive vector includes a plurality of first classification probabilities; each first classification probability corresponds to a preset risk type;
here, the running risk classification prediction model according to the embodiment of the present invention may predict risk types that may occur at a future time of a monitored server based on a section of latest historical data of the monitored server, that is, a first record sequence, and assign a corresponding probability of possibility, that is, a first classification probability, to each risk type when outputting the prediction;
it should be noted that, the running risk classification prediction model in the embodiment of the present invention is an artificial intelligent prediction model implemented based on a classifier model, and the specific implementation manner of the classifier model in the embodiment of the present invention is various, where one is implemented based on an SVM model structure, one is implemented based on an MLP network structure, one is implemented based on a random forest model structure, and also can be implemented based on other neural networks or algorithm models capable of implementing classification prediction; before the running risk classification prediction model is used, the model needs to be trained based on enough historical data-risk type labels;
It should be further noted that, the risk types that can be predicted by the running risk classification prediction model according to the embodiment of the present invention include: the method comprises the steps of CPU resource excessive loss risk, memory resource excessive loss risk, storage resource excessive loss risk, capacity expansion reduction risk, downtime risk and the like; if the trend of the change of the CPU temperature in the first CPU feature in the first feature tensor along with the time is an increasing trend, the predicted probability of excessive loss risk and downtime risk of the CPU resource is increased, and if the trend of the change of the CPU utilization rate along with the time is an increasing trend, the predicted probability of excessive loss risk of the CPU resource is increased; if the memory bank capacity of all the first memory bank features in the first memory features is lower than a preset large memory threshold, the probability of the predicted excessive loss risk of the memory resources is higher, and the probability of the predicted excessive loss risk of the CPU resources and the probability of downtime risk are increased if the change trend of the memory bank utilization rate along with time is an increasing trend; if the trend of the ratio of the remaining capacity of the hard disk of the first hard disk stripe feature to the total capacity of the hard disk in the first storage feature is a decreasing trend along with the time and the decreasing speed is too fast, the predicted probability of excessive loss risk and downtime risk of the storage resource is increased; if the number of first slot features, which are in the second slot features and are in the unoccupied state, is smaller than a preset idle slot threshold, the predicted probability of the risk of decreasing the expansion degree is increased; if the change trend of the battery power of the first power supply characteristic along with time is a decreasing trend, the predicted probability of downtime risk is increased; if the longer the fan power of the first fan feature exceeds the preset power threshold value, or the longer the fan state is in the abnormal state, the higher the predicted probability of downtime risk;
Step 3e, forming corresponding first-type prediction information by each first classification probability and the corresponding risk type; and all the obtained first type of prediction information forms a corresponding first prediction result and is displayed.
Fig. 2 is a schematic structural diagram of an electronic device according to a second embodiment of the present invention. The electronic device may be the aforementioned terminal device or server, or may be a terminal device or server connected to the aforementioned terminal device or server for implementing the method of the embodiment of the present invention. As shown in fig. 2, the electronic device may include: a processor 301 (e.g., a CPU), a memory 302, a transceiver 303; the transceiver 303 is coupled to the processor 301, and the processor 301 controls the transceiving actions of the transceiver 303. The memory 302 may store various instructions for performing the various processing functions and implementing the processing steps described in the method embodiments previously described. Preferably, the electronic device according to the embodiment of the present invention further includes: a power supply 304, a system bus 305, and a communication port 306. The system bus 305 is used to implement communication connections between the elements. The communication port 306 is used for connection communication between the electronic device and other peripheral devices.
The system bus 305 referred to in fig. 2 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, or the like. The system bus may be classified into an address bus, a data bus, a control bus, and the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The communication interface is used to enable communication between the database access apparatus and other devices (e.g., clients, read-write libraries, and read-only libraries). The Memory may comprise random access Memory (Random Access Memory, RAM) and may also include Non-Volatile Memory (Non-Volatile Memory), such as at least one disk Memory.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), a graphics processor (Graphics Processing Unit, GPU), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
It should be noted that, the embodiments of the present invention also provide a computer readable storage medium, where instructions are stored, when the computer readable storage medium runs on a computer, to cause the computer to perform the method and the process provided in the above embodiments.
The embodiment of the invention also provides a chip for running the instructions, and the chip is used for executing the processing steps described in the embodiment of the method.
The embodiment of the invention provides a processing method, electronic equipment and a computer readable storage medium for monitoring server hardware, wherein a monitoring server periodically collects time, CPU (Central processing Unit) state, memory state, hard disk state, power supply state, fan state and slot parameter sets of a monitored server, analyzes the running state of the server in real time according to the latest collection result, and predicts running risk according to the historical collection result by using an artificial intelligent model. By the method and the system, the running state of any server can be monitored in real time and risk prediction is performed, so that the defect that the hardware of the server is not monitored comprehensively in the conventional scheme is overcome.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of function in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (8)
1. A method of processing for monitoring server hardware, the method comprising:
the monitoring server periodically sends a first equipment query instruction to each first server; recording the instruction sending time of the first equipment query instruction as corresponding first server time; receiving first equipment response data returned by each first server; and recording the data receiving time of the first equipment response data as the corresponding second server time; the first server time, the second server time and the first equipment response data form corresponding first records which are stored in a corresponding first record list;
analyzing the running state of the server hardware according to the latest first record in the first record list to generate a corresponding first analysis result and displaying the first analysis result;
and predicting the running risk of the server hardware according to all the first records in the latest appointed time period in the first record list to generate a corresponding first prediction result and displaying the first prediction result.
2. The method of claim 1, wherein the monitoring of the server hardware is performed,
the monitoring server sends instructions to each first server based on an SNMP protocol and receives response data of each first server.
3. The method of claim 1, wherein the monitoring of the server hardware is performed,
the first record list comprises a plurality of first records; the first record includes the first server time, the second server time, and the first device response data; the first device response data includes a first server IP address, a first server name, a first server type, first CPU state information, first memory state information, first hard disk state information, first power state information, first fan state information, and a first slot parameter set; the first server type includes blade, tower and desktop; the first CPU state information comprises a CPU model, a CPU parameter, a CPU temperature and a CPU utilization rate; the first memory state information comprises a plurality of pieces of first memory bank information, and the first memory bank information comprises a memory bank interface, a memory bank capacity and a memory bank utilization rate; the first hard disk state information comprises a plurality of first hard disk information, and the first hard disk information comprises a hard disk interface, a hard disk type, a hard disk total capacity and a hard disk residual capacity; the first power state information comprises a power supply mode, a power supply state and a battery power, wherein the power supply mode comprises a battery power supply mode and an alternating current power supply mode, and the power supply state comprises a normal state and an abnormal state; the first fan state information comprises a fan model, fan power and a fan state, and the fan state comprises a normal state and an abnormal state; the first socket parameter set includes a plurality of first socket parameters including a socket type and a socket state including an occupied state and an idle state.
4. A method of processing for monitoring server hardware according to claim 3, further comprising:
when the first server receives the first equipment query instruction sent by the monitoring server, a preset server IP address, a server name and a server type are obtained from local as the corresponding first server IP address, first server name and first server type; counting the CPU information of the local server to obtain corresponding first CPU state information; counting the memory bank information of the local server to obtain the corresponding first memory state information; the hard disk information of the local server is counted to obtain corresponding first hard disk state information; the power supply information of the local server is counted to obtain corresponding first power supply state information; counting the fan information of the local server to obtain corresponding first fan state information; counting slot parameters of a local server to obtain a corresponding first slot parameter set; and the first device response data corresponding to the first server IP address, the first server name, the first server type, the first CPU state information, the first memory state information, the first hard disk state information, the first power state information, the first fan state information and the first slot parameter set are obtained to be returned to the monitoring server.
5. The method for monitoring server hardware according to claim 3, wherein the analyzing the running state of the server hardware according to the latest first record in the first record list to generate and display a corresponding first analysis result specifically includes:
extracting the first record with the latest time from the first record list as a corresponding current record; extracting the currently recorded first server IP address, the first server name, the first server type, the first CPU state information, the first memory state information, the first hard disk state information, the first power state information, the first fan state information and the first slot parameter set as corresponding current server IP address, current server name, current server type, current CPU state information, current memory state information, current hard disk state information, current power state information, current fan state information and current slot parameter set;
identifying whether the CPU utilization rate of the current CPU state information exceeds a preset CPU utilization rate warning threshold; if yes, setting the corresponding first information as preset CPU resource occupation excessive alarm information; if not, setting the corresponding first information to be empty;
Identifying whether the CPU temperature of the current CPU state information exceeds a preset CPU temperature warning threshold; if yes, setting the corresponding second information as preset CPU temperature higher alarm information; if not, setting the corresponding second information to be empty;
calculating the average value of all the memory bank utilization rates of the current memory state information to obtain corresponding average memory utilization rates; identifying whether the average memory usage exceeds a preset memory usage warning threshold; if yes, setting the corresponding third information as preset excessive memory resource occupation alarm information; if not, setting the corresponding third information to be empty;
summing up all the total capacities of the hard disks of the current hard disk state information to obtain a corresponding first total amount; and performing sum calculation on all the residual capacities of the hard disks of the current hard disk state information to obtain a corresponding second total amount; calculating corresponding hard disk residual percentage = [ (first total amount-second total amount)/first total amount ]. Times.100% according to the first total amount and the second total amount; identifying whether the hard disk residual percentage is lower than a preset hard disk residual percentage warning threshold value; if yes, setting the corresponding fourth information as preset hard disk resource occupation excessive alarm information; if not, setting the corresponding fourth information to be null;
Identifying whether the battery power of the current power state information is lower than a preset battery power warning threshold; if yes, setting the corresponding fifth information as preset electric quantity shortage alarm information; if not, setting the corresponding fifth information to be null;
identifying whether the fan state of the current fan state information is an abnormal state; if yes, setting the corresponding sixth information as preset abnormal fan alarm information; if not, setting the corresponding sixth information to be null;
identifying whether the fan power of the current fan state information exceeds a preset fan power warning threshold; if yes, setting the corresponding seventh information as preset high power consumption warning information of the fan; if not, setting the corresponding seventh information to be empty;
counting the number of the first slot parameters in the current slot parameter set, wherein the slot state is the occupied state, so as to obtain a corresponding first number, counting the number of the first slot parameters in the current slot parameter set, so as to obtain a corresponding second number, and calculating the corresponding slot utilization rate= (first number/second number) ×100% according to the first number and the second number; identifying whether the slot utilization rate exceeds a preset slot utilization rate warning threshold value; if yes, setting the corresponding eighth information as preset slot resource deficiency alarm information; if not, setting the corresponding eighth information to be null;
Identifying whether the first, second, third, fourth, fifth, sixth, seventh and eighth information is all empty; if yes, setting the corresponding first analysis information as preset normal information of the running state of the server; if not, the first analysis information corresponding to the first, second, third, fourth, fifth, sixth, seventh and eighth information is formed;
and the corresponding first analysis result is formed by the current server IP address, the current server name, the current server type and the first analysis information and displayed.
6. The method for monitoring server hardware according to claim 3, wherein said predicting the risk of running server hardware according to all the first records in the last specified period in the first record list generates and displays a corresponding first prediction result, specifically including:
extracting any one of the first records in the first record list, namely the first server IP address, the first server name and the first server type, as a corresponding current server IP address, a corresponding current server name and a corresponding current server type;
Extracting all the first records in the first record list within the latest appointed time period, and sequencing the first records in time sequence to generate a corresponding first record sequence;
in the first record sequence, extracting the CPU temperature and the CPU utilization rate of each first record as first CPU characteristics corresponding to the composition; extracting each group of memory bank capacity and memory bank utilization rate in the first memory state information of each first record to form corresponding first memory bank characteristics, and forming corresponding first memory characteristics by all the obtained first memory bank characteristics; extracting the total capacity and the residual capacity of each group of hard disks in the first hard disk state information of each first record to form corresponding first hard disk bar characteristics, and forming corresponding first storage characteristics by all the obtained first hard disk bar characteristics; extracting the first power state information of each first record to form corresponding first power features; extracting the fan power and the fan state of the first fan state information recorded by each first record as a first fan characteristic; extracting each slot state in each first recorded first slot parameter set to form corresponding first slot features, and forming corresponding second slot features by all obtained first slot features; and the first CPU feature, the first memory feature, the first storage feature, the first power feature, the first fan feature and the second slot feature corresponding to each first record form a corresponding first feature vector; and forming a corresponding first feature tensor by all the obtained first feature vectors;
Inputting the first characteristic tensor into a preset running risk classification prediction model to perform running risk classification prediction processing to obtain a corresponding first prediction vector; the first prediction vector includes a plurality of first classification probabilities; each first classification probability corresponds to a preset risk type;
forming corresponding first-type prediction information by each first classification probability and the corresponding risk type; and the corresponding first prediction results are formed and displayed by all the obtained first type of prediction information.
7. An electronic device, comprising: memory, processor, and transceiver;
the processor being operative to couple with the memory, read and execute instructions in the memory to implement the method of any one of claims 1-6;
the transceiver is coupled to the processor and is controlled by the processor to transmit and receive messages.
8. A computer readable storage medium storing computer instructions which, when executed by a computer, cause the computer to perform the method of any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310488972.9A CN116450463A (en) | 2023-04-27 | 2023-04-27 | Processing method for monitoring server hardware |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310488972.9A CN116450463A (en) | 2023-04-27 | 2023-04-27 | Processing method for monitoring server hardware |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116450463A true CN116450463A (en) | 2023-07-18 |
Family
ID=87133722
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310488972.9A Pending CN116450463A (en) | 2023-04-27 | 2023-04-27 | Processing method for monitoring server hardware |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116450463A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117539727A (en) * | 2024-01-10 | 2024-02-09 | 深圳市网时云计算有限公司 | Computer running state monitoring method and system |
-
2023
- 2023-04-27 CN CN202310488972.9A patent/CN116450463A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117539727A (en) * | 2024-01-10 | 2024-02-09 | 深圳市网时云计算有限公司 | Computer running state monitoring method and system |
CN117539727B (en) * | 2024-01-10 | 2024-05-10 | 深圳市网时云计算有限公司 | Computer running state monitoring method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8909762B2 (en) | Network system management | |
CN108804668A (en) | Data processing method and device | |
CN110740061B (en) | Fault early warning method and device and computer storage medium | |
CN105745868A (en) | Method and apparatus for anomaly detection in a network | |
US20210042578A1 (en) | Feature engineering orchestration method and apparatus | |
CN111949429A (en) | Server fault monitoring method and system based on density clustering algorithm | |
CN113992602B (en) | Cable monitoring data uploading method, device, equipment and storage medium | |
CN105871957A (en) | Monitoring framework design method, monitoring server, proxy unit and center control server | |
CN114723082A (en) | Abnormity early warning method and system for intelligent low-voltage complete equipment | |
CN111753875A (en) | Power information system operation trend analysis method and device and storage medium | |
CN116450463A (en) | Processing method for monitoring server hardware | |
CN108809720A (en) | The management method and device of alarming assignment in cloud data system | |
CN112532435B (en) | Operation and maintenance method, operation and maintenance management platform, equipment and medium | |
CN117061335A (en) | Cloud platform equipment health management and control method and device, storage medium and electronic equipment | |
CN111078503B (en) | Abnormality monitoring method and system | |
CN114202256B (en) | Architecture upgrading early warning method and device, intelligent terminal and readable storage medium | |
CN115686756A (en) | Virtual machine migration method and device, storage medium and electronic equipment | |
CN116610521A (en) | Processing method for monitoring database | |
CN115222181A (en) | Robot operation state monitoring system and method | |
CN106897113A (en) | The method and device of a kind of virtualized host operation conditions prediction | |
CN116450462A (en) | Processing method for monitoring storage equipment | |
CN113835961B (en) | Alarm information monitoring method, device, server and storage medium | |
CN116450299A (en) | Processing method for monitoring virtual machine | |
CN117149569A (en) | Board running state early warning method and device and electronic equipment | |
CN113285978B (en) | Fault identification method based on block chain and big data and general computing node |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |