CN116450462A - Processing method for monitoring storage equipment - Google Patents
Processing method for monitoring storage equipment Download PDFInfo
- Publication number
- CN116450462A CN116450462A CN202310480070.0A CN202310480070A CN116450462A CN 116450462 A CN116450462 A CN 116450462A CN 202310480070 A CN202310480070 A CN 202310480070A CN 116450462 A CN116450462 A CN 116450462A
- Authority
- CN
- China
- Prior art keywords
- equipment
- time
- information
- record
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 42
- 238000003672 processing method Methods 0.000 title claims abstract description 11
- 238000004458 analytical method Methods 0.000 claims abstract description 54
- 230000004044 response Effects 0.000 claims abstract description 24
- 238000000034 method Methods 0.000 claims description 23
- 230000003862 health status Effects 0.000 claims description 21
- 230000036541 health Effects 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 14
- 239000013598 vector Substances 0.000 claims description 12
- 230000002159 abnormal effect Effects 0.000 claims description 4
- 230000005856 abnormality Effects 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 2
- 230000000903 blocking effect Effects 0.000 description 8
- 230000003247 decreasing effect Effects 0.000 description 7
- 230000008859 change Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 230000007547 defect Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3037—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3055—Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3058—Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The embodiment of the invention relates to a processing method for monitoring storage equipment, which comprises the following steps: the monitoring server periodically sends a first device query instruction to each first storage device; the instruction sending time of the first equipment query instruction is recorded as first server time; receiving first device response data returned by each first storage device; and recording the data receiving time of the first equipment response data as the second server time; the first record is formed by the first server time, the second server time and the response data of the first equipment and is stored in a first record list; performing network timeout analysis according to the latest first record in the first record list; analyzing the running state of the storage equipment according to the latest first record in the first record list; and predicting the running risk of the storage device according to all the first records in the latest appointed time period in the first record list. The invention can monitor and predict the running state of any storage device.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a processing method for monitoring storage equipment.
Background
With the development of informatization construction, the application of information networks has been advanced into various industries. In order to ensure that each network can stably and effectively operate, a set of corresponding operation and maintenance monitoring schemes are conventionally configured for each network. Currently, conventional monitoring schemes only monitor traffic or load bottlenecks of network connected devices (such as routers, switches), and do not monitor operating state and predict risk of all storage devices (such as disk arrays, databases, etc.).
Disclosure of Invention
The invention aims at overcoming the defects of the prior art, and provides a processing method, electronic equipment and a computer readable storage medium for monitoring storage equipment, wherein a monitoring server periodically collects time, CPU (central processing unit) utilization rate, power supply mode, fan state, total disk space, residual disk space, percentage of disk health state and storage index parameter set of any storage equipment, performs real-time network timeout analysis and storage equipment running state analysis according to the latest collection result, and performs running risk prediction according to the historical collection result by using an artificial intelligent model. The invention can make up for the technical defects of insufficient monitoring range, single monitoring content and the like in the conventional scheme, can monitor a plurality of running states of any storage device in real time, and can predict future state trend of any storage device.
To achieve the above object, a first aspect of an embodiment of the present invention provides a processing method for monitoring a storage device, where the method includes:
the monitoring server periodically sends a first device query instruction to each first storage device; recording the instruction sending time of the first equipment query instruction as corresponding first server time; receiving first device response data returned by each first storage device; and recording the data receiving time of the first equipment response data as the corresponding second server time; the first server time, the second server time and the first equipment response data form corresponding first records which are stored in a corresponding first record list;
according to the latest first record in the first record list, performing network timeout analysis to generate a corresponding first analysis result and displaying the first analysis result;
analyzing the running state of the storage equipment according to the latest first record in the first record list to generate a corresponding second analysis result and displaying the second analysis result;
and predicting the running risk of the storage equipment according to all the first records in the latest appointed time period in the first record list to generate a corresponding first prediction result and displaying the first prediction result.
Preferably, the monitoring server sends an instruction to each of the first storage devices based on SNMP protocol and receives response data of each of the first storage devices.
Preferably, the first record list includes a plurality of the first records; the first record includes the first server time, the second server time, and the first device response data; the first device response data includes a first device I P address, a first device name, a first device time, a second device time, a first device CPU utilization, a first device power mode, a first device fan status, a first device disk space total amount, a first device disk space remaining amount, a first device disk health status percentage, and a first device storage index parameter set; the first device power mode includes battery powered and ac powered; the first device fan status includes normal and abnormal; the first device storing an index parameter set comprising a plurality of first index parameters; the first index parameter includes a first parameter name and a first parameter value.
Preferably, the method further comprises:
when the first storage device receives the first device query instruction sent by the monitoring server, recording the instruction receiving time as the corresponding first device time; the preset address and the device name of the device I P are obtained locally and serve as the corresponding address and the corresponding first device name of the first device I P; counting the current CPU utilization rate of the equipment to generate a corresponding CPU utilization rate of the first equipment; the current power supply mode of the equipment is obtained as the corresponding power supply mode of the first equipment; the fan working state of the equipment is obtained as the corresponding fan state of the first equipment; the total disk space of the device is obtained as the corresponding total disk space of the first device; obtaining the disk space remaining amount of the equipment as the corresponding disk space remaining amount of the first equipment; evaluating the magnetic disk health state of the equipment to generate a corresponding magnetic disk health state percentage of the first equipment; evaluating the storage index parameters of the equipment to generate a corresponding storage index parameter set of the first equipment; the address of the first device I P, the first device name, the first device time, the first device CPU usage, the first device power mode, the first device fan status, the first device disk space total amount, the first device disk space remaining amount, the first device disk health status percentage, and the first device storage index parameter set are obtained to form corresponding first collected data; acquiring primary equipment system time as the corresponding second equipment time after the first acquired data are acquired; and the second equipment time and the first acquired data form corresponding first equipment response data to be returned to the monitoring server.
Preferably, the performing network timeout analysis according to the latest first record in the first record list to generate and display a corresponding first analysis result specifically includes:
extracting the latest first record in the first record list to serve as a corresponding current record; extracting the first server time, the second server time, the first equipment time and the second equipment time which are recorded currently as corresponding first time, second time, third time and fourth time;
calculating the absolute values of the time differences of the first time and the second time to generate corresponding first time differences; calculating the absolute values of the time differences of the first time and the third time to generate a corresponding second time difference; calculating the absolute values of the time differences of the second time and the fourth time to generate a corresponding third time difference;
identifying whether the first time difference exceeds a preset first time difference threshold; if yes, setting the corresponding first information as preset network receiving and transmitting timeout information; if not, setting the corresponding first information to be empty;
identifying whether the second time difference exceeds a preset second time difference threshold; if yes, setting the corresponding second information as preset network sending timeout information; if not, setting the corresponding second information to be empty;
Identifying whether the third time difference exceeds a preset third time difference threshold; if yes, setting the corresponding third information as preset network receiving timeout information; if not, setting the corresponding third information to be empty;
identifying whether the first, second and third information are all empty; if yes, setting the corresponding first analysis information as preset network transceiving normal information; if not, the first analysis information corresponding to the first, second and third information is formed;
and the first device I P address, the first device name and the first analysis information of the current record form the corresponding first analysis result and display the corresponding first analysis result.
Preferably, the analyzing the running state of the storage device according to the latest first record in the first record list to generate and display a corresponding second analysis result specifically includes:
extracting the first record with the latest time from the first record list as a corresponding current record; extracting the first device I P address, the first device name, the first device CPU usage, the first device power mode, the first device fan status, the first device disk space total, the first device disk space remaining, the first device disk health status percentage, and the first device storage index parameter set recorded currently as a corresponding current device I P address, current device name, current device CPU usage, current device power mode, current device fan status, current device disk space total, current device disk space remaining, current device disk health status percentage, and current device storage index parameter set;
Identifying whether the CPU utilization rate of the current equipment exceeds a preset CPU utilization rate warning threshold; if yes, setting the corresponding fourth information as preset excessive alarm information of equipment CPU resource occupation; if not, setting the corresponding fourth information to be null;
computing pairs according to the total amount of the disk space of the current equipment and the residual amount of the disk space of the current equipment
Identifying whether the disk space availability is lower than a preset availability guard threshold; if yes, setting the corresponding fifth information as preset equipment disk availability insufficiency alarm information; if not, setting the corresponding fifth information to be null;
identifying whether the current equipment disk health status percentage is lower than a preset health status percentage warning threshold; if yes, setting the corresponding sixth information as preset equipment health status percentage lower alarm information; if not, setting the corresponding sixth information to be null;
identifying whether the first parameter values of all the first index parameters of the index parameter set stored by the current equipment meet a first parameter threshold range corresponding to each first parameter value; if yes, setting the corresponding seventh information to be null; if not, extracting the first parameter names of the first index parameters, of which the first parameter values do not meet the corresponding first parameter threshold range, in the current equipment storage index parameter set to form a corresponding first parameter name sequence, and forming corresponding seventh information by preset equipment storage index parameter abnormality warning information and the first parameter name sequence;
Identifying whether the fourth, fifth, sixth and seventh information are all empty; if yes, setting the corresponding second analysis information as preset equipment running state normal information; if not, the fourth, fifth, sixth and seventh information form the corresponding second analysis information;
and the address of the current equipment I P, the name of the current equipment and the second analysis information form the corresponding second analysis result and display the second analysis result.
Preferably, the predicting the running risk of the storage device according to all the first records in the last specified period in the first record list to generate and display a corresponding first prediction result specifically includes:
extracting the address of the first device I P and the first device name of any one of the first records in the first record list as a corresponding current device I P address and a corresponding current device name;
extracting all the first records in the first record list within the latest appointed time period, and sequencing the first records in time sequence to generate a corresponding first record sequence;
extracting the first server time, the second server time, the first device time, the second device time, the first device CPU usage rate, the first device power supply mode, the first device fan state, the first device disk space total amount, the first device disk space remaining amount, the first device disk health state percentage and the first device storage index parameter set of each first record in the first record sequence to form a corresponding first data vector; and forming a corresponding first data tensor by all the obtained first data vectors;
Inputting the first data tensor into a preset running risk classification prediction model to perform running risk classification prediction processing to obtain a corresponding first prediction vector; the first prediction vector includes a plurality of first classification probabilities; each first classification probability corresponds to a preset risk type;
forming corresponding first-type prediction information by each first classification probability and the corresponding risk type; and the corresponding first prediction results are formed and displayed by all the obtained first type of prediction information.
A second aspect of an embodiment of the present invention provides an electronic device, including: memory, processor, and transceiver;
the processor is coupled to the memory, and reads and executes the instructions in the memory to implement the method of the first aspect;
the transceiver is coupled to the processor and is controlled by the processor to transmit and receive messages.
A third aspect of the embodiments of the present invention provides a computer-readable storage medium storing computer instructions that, when executed by a computer, cause the computer to perform the method of the first aspect.
The embodiment of the invention provides a processing method for monitoring storage equipment, electronic equipment and a computer readable storage medium, wherein a monitoring server periodically collects time, CPU (Central processing Unit) utilization rate, a power supply mode, a fan state, a total disk space, a residual disk space, a percentage of disk health state and a storage index parameter set of any storage equipment, performs real-time network timeout analysis and storage equipment operation state analysis according to the latest collection result, and performs operation risk prediction according to a historical collection result by using an artificial intelligent model. The invention can monitor a plurality of running states of any storage device in real time and predict future state trend of any storage device, thereby effectively compensating the technical defects of insufficient monitoring range, single monitoring content and the like in the conventional scheme.
Drawings
FIG. 1 is a schematic diagram of a processing method for monitoring a storage device according to a first embodiment of the present invention;
fig. 2 is a schematic structural diagram of an electronic device according to a second embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a schematic diagram of a processing method for monitoring a storage device according to a first embodiment of the present invention, where, as shown in fig. 1, the method mainly includes the following steps:
step 1, a monitoring server periodically sends a first device query instruction to each first storage device; the instruction sending time of the first equipment query instruction is recorded as corresponding first server time; receiving first device response data returned by each first storage device; and recording the data receiving time of the first equipment response data as the corresponding second server time; the first server time, the second server time and the first equipment response data form corresponding first records which are stored in a corresponding first record list;
the monitoring server sends instructions to each first storage device based on an SNMP protocol and receives response data of each first storage device;
the first record list comprises a plurality of first records; the first record includes a first server time, a second server time, and first device response data; the first device response data includes a first device I P address, a first device name, a first device time, a second device time, a first device CPU utilization, a first device power mode, a first device fan status, a first device disk space total amount, a first device disk space remaining amount, a first device disk health status percentage, and a first device storage index parameter set; the first device power mode includes battery powered and ac powered; the first device fan status includes normal and abnormal; the first device storing an index parameter set comprising a plurality of first index parameters; the first index parameter comprises a first parameter name and a first parameter value; here, the first device name may be a manufacturer+model+model of the storage device; the first parameter name at least comprises parameter names capable of reflecting various storage indexes, including an I/O read rate index parameter, an I/O write rate index parameter, a cache read rate index parameter, a cache write rate index parameter, a disk read rate index parameter, a disk write rate index parameter and the like.
When the first storage device receives a first device query instruction sent by the monitoring server, recording the instruction receiving time as corresponding first device time; the preset address and the device name of the device I P are obtained locally and serve as a corresponding first device I P address and a corresponding first device name; counting the current CPU utilization rate of the equipment to generate a corresponding first equipment CPU utilization rate; the current power supply mode of the equipment is obtained as a corresponding first equipment power supply mode; the fan working state of the equipment is obtained as a corresponding fan state of first equipment; the total disk space of the device is obtained as the corresponding total disk space of the first device; the disk space remaining amount of the equipment is obtained as the corresponding disk space remaining amount of the first equipment; the disk health status of the equipment is evaluated to generate a corresponding first equipment disk health status percentage; evaluating the storage index parameters of the equipment to generate a corresponding first equipment storage index parameter set; the corresponding first collected data is composed of a first device I P address, a first device name, a first device time, a first device CPU utilization rate, a first device power supply mode, a first device fan state, a first device disk space total amount, a first device disk space residual amount, a first device disk health state percentage and a first device storage index parameter set; acquiring primary equipment system time as corresponding second equipment time after acquiring the first acquired data; and the second equipment time and the first acquired data form corresponding first equipment response data which are sent back to the monitoring server.
Step 2, performing network timeout analysis according to the latest first record in the first record list to generate a corresponding first analysis result and displaying the first analysis result;
the method specifically comprises the following steps: step 21, extracting the latest first record in the first record list as the corresponding current record; extracting the first server time, the second server time, the first equipment time and the second equipment time which are recorded currently as corresponding first time, second time, third time and fourth time;
step 22, calculating the absolute value of the time difference between the first time and the second time to generate a corresponding first time difference; calculating the absolute values of the time differences of the first time and the third time to generate a corresponding second time difference; calculating the absolute values of the time differences of the second time and the fourth time to generate a corresponding third time difference;
step 23, identifying whether the first time difference exceeds a preset first time difference threshold; if yes, setting the corresponding first information as preset network receiving and transmitting timeout information; if not, setting the corresponding first information to be empty;
here, the first time difference threshold is a preset time difference threshold constant greater than zero;
step 24, identifying whether the second time difference exceeds a preset second time difference threshold; if yes, setting the corresponding second information as preset network sending timeout information; if not, setting the corresponding second information to be empty;
Here, the second time difference threshold is a preset time difference threshold constant greater than zero;
step 25, identifying whether the third time difference exceeds a preset third time difference threshold; if yes, setting the corresponding third information as preset network receiving timeout information; if not, setting the corresponding third information to be null;
here, the third time difference threshold is a preset time difference threshold constant greater than zero;
step 26, identifying whether the first, second and third information are all empty; if yes, setting the corresponding first analysis information as preset network transceiving normal information; if not, the first, second and third information form corresponding first analysis information;
and step 27, forming and displaying a corresponding first analysis result by the address of the first device I P, the first device name and the first analysis information which are recorded currently.
Step 3, analyzing the running state of the storage equipment according to the latest first record in the first record list to generate a corresponding second analysis result and displaying the second analysis result;
the method specifically comprises the following steps: step 31, extracting the first record with the latest time in the first record list as the corresponding current record; extracting a first device I P address, a first device name, a first device CPU utilization rate, a first device power supply mode, a first device fan state, a first device disk space total amount, a first device disk space residual amount, a first device disk health state percentage and a first device storage index parameter set which are recorded at present as a corresponding current device I P address, a current device name, a current device CPU utilization rate, a current device power supply mode, a current device fan state, a current device disk space total amount, a current device disk space residual amount, a current device disk health state percentage and a current device storage index parameter set;
Step 32, identifying whether the CPU usage rate of the current device exceeds a preset CPU usage rate warning threshold; if yes, setting the corresponding fourth information as preset excessive alarm information of equipment CPU resource occupation; if not, setting the corresponding fourth information to be null;
here, the CPU usage alert threshold is a proportionality threshold constant with a preset value range between [0,1 ];
step 33, calculating the pair according to the total amount of the disk space of the current device and the residual amount of the disk space of the current device
Identifying whether the space availability is lower than a preset availability warning threshold; if yes, setting the corresponding fifth information as preset equipment disk availability insufficiency alarm information; if not, setting the corresponding fifth information to be null;
here, the availability guard threshold is a preset proportional threshold constant with a value range between [0,1 ];
step 34, identifying whether the health status percentage of the magnetic disk of the current device is lower than a preset health status percentage warning threshold; if yes, setting the corresponding sixth information as preset equipment health status percentage lower alarm information; if not, setting the corresponding sixth information to be null;
Here, the health status percentage warning threshold is a preset percentage constant with the value range of 0, 1;
step 35, identifying whether the first parameter values of all the first index parameters of the index parameter set stored by the current device meet the respective corresponding first parameter threshold ranges; if yes, setting the corresponding seventh information to be null; if not, extracting first parameter names of first index parameters of which the first parameter values do not meet the corresponding first parameter threshold range from the current equipment storage index parameter set to form a corresponding first parameter name sequence, and forming corresponding seventh information by preset equipment storage index parameter abnormality alarm information and the first parameter name sequence;
here, a storage index parameter threshold range set is preset on the monitoring server according to the embodiment of the present invention, where the storage index parameter threshold range set is composed of a plurality of first parameter threshold ranges, and each first parameter threshold range corresponds to one first index parameter;
step 38, identifying whether the fourth, fifth, sixth and seventh information is all empty; if yes, setting the corresponding second analysis information as preset equipment running state normal information; if not, the fourth, fifth, sixth and seventh information form corresponding second analysis information;
And 39, composing and displaying a corresponding second analysis result by the address of the current equipment I P, the name of the current equipment and the second analysis information.
Step 4, predicting the running risk of the storage equipment according to all the first records in the latest appointed time period in the first record list to generate a corresponding first prediction result and displaying the first prediction result;
the method specifically comprises the following steps: step 41, extracting the address of the first device I P and the first device name of any first record in the first record list as the corresponding current device I P address and the current device name;
step 42, extracting all the first records in the first record list within the latest appointed time period, and sorting the first records in time sequence to generate a corresponding first record sequence;
step 43, extracting the first server time, the second server time, the first device time, the second device time, the first device CPU usage, the first device power mode, the first device fan status, the first device disk space total amount, the first device disk space remaining amount, the first device disk health status percentage, and the first device storage index parameter set of each first record in the first record sequence to form a corresponding first data vector; and forming corresponding first data tensors by all the obtained first data vectors;
Step 44, inputting the first data tensor into a preset running risk classification prediction model to perform running risk classification prediction processing to obtain a corresponding first prediction vector;
wherein the first predictive vector includes a plurality of first classification probabilities; each first classification probability corresponds to a preset risk type;
here, the running risk classification prediction model of the embodiment of the present invention may predict risk types that may occur in a current storage device at a future time based on a section of latest historical data of the device, that is, a first record sequence, and allocate a corresponding possible probability, that is, a first classification probability, for each risk type when outputting the prediction;
it should be noted that, the running risk classification prediction model in the embodiment of the present invention is an artificial intelligent prediction model implemented based on a classifier model, and the specific implementation manner of the classifier model in the embodiment of the present invention is various, where one is implemented based on an SVM model structure, one is implemented based on an MLP network structure, one is implemented based on a random forest model structure, and also can be implemented based on other neural networks or algorithm models capable of implementing classification prediction; before the running risk classification prediction model is used, the model needs to be trained based on enough historical data-risk type labels;
It should also be noted that, the risk types predictable by the running risk classification prediction model according to the embodiment of the present invention include: network delay risk, network packet loss risk, network blocking risk, network disconnection risk, I/O blocking risk, cache blocking risk, disk blocking risk, resource excessive loss risk and downtime risk; if the trend of the time difference absolute value between the first server time and the second server time over time is an increasing trend, or the trend of the time difference absolute value between the first server time and the first device time over time is an increasing trend, or the trend of the time difference absolute value between the second server time and the second device time over time is an increasing trend, the probability of the predicted network delay risk will increase; if the trend of the absolute value of the time difference between the first server time and the first equipment time along with the time is an increasing trend, the predicted probability of network packet loss risk is increased; if the trend of the absolute value of the time difference between the first equipment time and the second equipment time along with the time is an increasing trend, the predicted probability of the network blocking risk is increased; if the number of times that the absolute value of the time difference between the first server time and the second server time exceeds the preset maximum time delay threshold is larger, the predicted probability of the network disconnection risk is larger; if the change trend of the I/O reading rate index parameter of the first equipment storage index parameter set along with time is a decreasing trend, or the change trend of the I/O writing rate index parameter along with time is a decreasing trend, the predicted probability of the I/O blocking risk is increased; if the change trend of the cache read rate index parameter of the first equipment storage index parameter set along with time is a decreasing trend, or if the change trend of the cache write rate index parameter along with time is a decreasing trend, the predicted cache blocking risk probability is increased; if the change trend of the magnetic disk reading rate index parameter of the first equipment storage index parameter set along with time is a decreasing trend, or if the change trend of the magnetic disk writing rate index parameter along with time is a decreasing trend, the predicted probability of the magnetic disk blocking risk is increased; if the trend of the first equipment CPU utilization rate over time is an increasing trend, or the trend of the ratio of the first equipment disk space remaining amount to the first equipment disk space total amount over time is an increasing trend, or the trend of the first equipment disk health status percentage over time is a decreasing trend, the predicted probability of excessive resource loss risk and downtime risk is increased; if the first equipment power supply mode is battery powered and the first equipment fan state is abnormal, the predicted probability of downtime risk is increased;
Step 45, forming corresponding first-type prediction information by each first classification probability and the corresponding risk type; and all the obtained first type of prediction information forms a corresponding first prediction result and is displayed.
Fig. 2 is a schematic structural diagram of an electronic device according to a second embodiment of the present invention. The electronic device may be the aforementioned terminal device or server, or may be a terminal device or server connected to the aforementioned terminal device or server for implementing the method of the embodiment of the present invention. As shown in fig. 2, the electronic device may include: a processor 301 (e.g., a CPU), a memory 302, a transceiver 303; the transceiver 303 is coupled to the processor 301, and the processor 301 controls the transceiving actions of the transceiver 303. The memory 302 may store various instructions for performing the various processing functions and implementing the processing steps described in the method embodiments previously described. Preferably, the electronic device according to the embodiment of the present invention further includes: a power supply 304, a system bus 305, and a communication port 306. The system bus 305 is used to implement communication connections between the elements. The communication port 306 is used for connection communication between the electronic device and other peripheral devices.
The system bus 305 referred to in fig. 2 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, or the like. The system bus may be classified into an address bus, a data bus, a control bus, and the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The communication interface is used to enable communication between the database access apparatus and other devices (e.g., clients, read-write libraries, and read-only libraries). The Memory may comprise random access Memory (Random Access Memory, RAM) and may also include Non-Volatile Memory (Non-Volatile Memory), such as at least one disk Memory.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), a graphics processor (Graphics Processing Unit, GPU), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
It should be noted that, the embodiments of the present invention also provide a computer readable storage medium, where instructions are stored, when the computer readable storage medium runs on a computer, to cause the computer to perform the method and the process provided in the above embodiments.
The embodiment of the invention also provides a chip for running the instructions, and the chip is used for executing the processing steps described in the embodiment of the method.
The embodiment of the invention provides a processing method for monitoring storage equipment, electronic equipment and a computer readable storage medium, wherein a monitoring server periodically collects time, CPU (Central processing Unit) utilization rate, a power supply mode, a fan state, a total disk space, a residual disk space, a percentage of disk health state and a storage index parameter set of any storage equipment, performs real-time network timeout analysis and storage equipment operation state analysis according to the latest collection result, and performs operation risk prediction according to a historical collection result by using an artificial intelligent model. The invention can monitor a plurality of running states of any storage device in real time and predict future state trend of any storage device, thereby effectively compensating the technical defects of insufficient monitoring range, single monitoring content and the like in the conventional scheme.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of function in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (9)
1. A method of processing for monitoring a storage device, the method comprising:
the monitoring server periodically sends a first device query instruction to each first storage device; recording the instruction sending time of the first equipment query instruction as corresponding first server time; receiving first device response data returned by each first storage device; and recording the data receiving time of the first equipment response data as the corresponding second server time; the first server time, the second server time and the first equipment response data form corresponding first records which are stored in a corresponding first record list;
according to the latest first record in the first record list, performing network timeout analysis to generate a corresponding first analysis result and displaying the first analysis result;
analyzing the running state of the storage equipment according to the latest first record in the first record list to generate a corresponding second analysis result and displaying the second analysis result;
and predicting the running risk of the storage equipment according to all the first records in the latest appointed time period in the first record list to generate a corresponding first prediction result and displaying the first prediction result.
2. The method of claim 1, wherein the monitoring of the storage device is performed,
the monitoring server sends instructions to each first storage device based on an SNMP protocol and receives response data of each first storage device.
3. The method of claim 1, wherein the monitoring of the storage device is performed,
the first record list comprises a plurality of first records; the first record includes the first server time, the second server time, and the first device response data; the first device response data comprises a first device IP address, a first device name, a first device time, a second device time, a first device CPU utilization rate, a first device power supply mode, a first device fan state, a first device disk space total amount, a first device disk space residual amount, a first device disk health state percentage and a first device storage index parameter set; the first device power mode includes battery powered and ac powered; the first device fan status includes normal and abnormal; the first device storing an index parameter set comprising a plurality of first index parameters; the first index parameter includes a first parameter name and a first parameter value.
4. A processing method for monitoring a storage device according to claim 3, further comprising:
when the first storage device receives the first device query instruction sent by the monitoring server, recording the instruction receiving time as the corresponding first device time; the preset equipment IP address and equipment name are obtained locally and serve as the corresponding first equipment IP address and first equipment name; counting the current CPU utilization rate of the equipment to generate a corresponding CPU utilization rate of the first equipment; the current power supply mode of the equipment is obtained as the corresponding power supply mode of the first equipment; the fan working state of the equipment is obtained as the corresponding fan state of the first equipment; the total disk space of the device is obtained as the corresponding total disk space of the first device; obtaining the disk space remaining amount of the equipment as the corresponding disk space remaining amount of the first equipment; evaluating the magnetic disk health state of the equipment to generate a corresponding magnetic disk health state percentage of the first equipment; evaluating the storage index parameters of the equipment to generate a corresponding storage index parameter set of the first equipment; the first equipment IP address, the first equipment name, the first equipment time, the first equipment CPU utilization rate, the first equipment power supply mode, the first equipment fan state, the first equipment disk space total amount, the first equipment disk space residual amount, the first equipment disk health state percentage and the first equipment storage index parameter set are obtained to form corresponding first acquired data; acquiring primary equipment system time as the corresponding second equipment time after the first acquired data are acquired; and the second equipment time and the first acquired data form corresponding first equipment response data to be returned to the monitoring server.
5. The method for monitoring a storage device according to claim 3, wherein the performing network timeout analysis according to the latest first record in the first record list generates and displays a corresponding first analysis result, and specifically includes:
extracting the latest first record in the first record list to serve as a corresponding current record; extracting the first server time, the second server time, the first equipment time and the second equipment time which are recorded currently as corresponding first time, second time, third time and fourth time;
calculating the absolute values of the time differences of the first time and the second time to generate corresponding first time differences; calculating the absolute values of the time differences of the first time and the third time to generate a corresponding second time difference; calculating the absolute values of the time differences of the second time and the fourth time to generate a corresponding third time difference;
identifying whether the first time difference exceeds a preset first time difference threshold; if yes, setting the corresponding first information as preset network receiving and transmitting timeout information; if not, setting the corresponding first information to be empty;
Identifying whether the second time difference exceeds a preset second time difference threshold; if yes, setting the corresponding second information as preset network sending timeout information; if not, setting the corresponding second information to be empty;
identifying whether the third time difference exceeds a preset third time difference threshold; if yes, setting the corresponding third information as preset network receiving timeout information; if not, setting the corresponding third information to be empty;
identifying whether the first, second and third information are all empty; if yes, setting the corresponding first analysis information as preset network transceiving normal information; if not, the first analysis information corresponding to the first, second and third information is formed;
and the first equipment IP address, the first equipment name and the first analysis information which are recorded currently form the corresponding first analysis result and are displayed.
6. The method for monitoring a storage device according to claim 3, wherein the analyzing the running state of the storage device according to the latest first record in the first record list to generate and display a corresponding second analysis result specifically includes:
Extracting the first record with the latest time from the first record list as a corresponding current record; extracting the first device IP address, the first device name, the first device CPU usage, the first device power mode, the first device fan state, the first device disk space total amount, the first device disk space remaining amount, the first device disk health status percentage, and the first device storage index parameter set recorded currently as a corresponding current device IP address, a current device name, a current device CPU usage, a current device power mode, a current device fan state, a current device disk space total amount, a current device disk space remaining amount, a current device disk health status percentage, and a current device storage index parameter set;
identifying whether the CPU utilization rate of the current equipment exceeds a preset CPU utilization rate warning threshold; if yes, setting the corresponding fourth information as preset excessive alarm information of equipment CPU resource occupation; if not, setting the corresponding fourth information to be null;
calculating corresponding data according to the total amount of the disk space of the current device and the residual amount of the disk space of the current device Identifying whether the availability of the disk space is lower than a preset availability guard threshold; if yes, setting the corresponding fifth information as preset equipment disk availability insufficiency alarm information; if not, setting the corresponding fifth information to be null;
identifying whether the current equipment disk health status percentage is lower than a preset health status percentage warning threshold; if yes, setting the corresponding sixth information as preset equipment health status percentage lower alarm information; if not, setting the corresponding sixth information to be null;
identifying whether the first parameter values of all the first index parameters of the index parameter set stored by the current equipment meet a first parameter threshold range corresponding to each first parameter value; if yes, setting the corresponding seventh information to be null; if not, extracting the first parameter names of the first index parameters, of which the first parameter values do not meet the corresponding first parameter threshold range, in the current equipment storage index parameter set to form a corresponding first parameter name sequence, and forming corresponding seventh information by preset equipment storage index parameter abnormality warning information and the first parameter name sequence;
Identifying whether the fourth, fifth, sixth and seventh information are all empty; if yes, setting the corresponding second analysis information as preset equipment running state normal information; if not, the fourth, fifth, sixth and seventh information form the corresponding second analysis information;
and the corresponding second analysis result is formed by the current equipment IP address, the current equipment name and the second analysis information and displayed.
7. The method for monitoring a storage device according to claim 3, wherein said predicting the risk of running the storage device according to all the first records in the last specified period in the first record list to generate and display a corresponding first prediction result specifically includes:
extracting the first equipment IP address and the first equipment name of any one of the first records in the first record list as a corresponding current equipment IP address and a corresponding current equipment name;
extracting all the first records in the first record list within the latest appointed time period, and sequencing the first records in time sequence to generate a corresponding first record sequence;
Extracting the first server time, the second server time, the first device time, the second device time, the first device CPU usage rate, the first device power supply mode, the first device fan state, the first device disk space total amount, the first device disk space remaining amount, the first device disk health state percentage and the first device storage index parameter set of each first record in the first record sequence to form a corresponding first data vector; and forming a corresponding first data tensor by all the obtained first data vectors;
inputting the first data tensor into a preset running risk classification prediction model to perform running risk classification prediction processing to obtain a corresponding first prediction vector; the first prediction vector includes a plurality of first classification probabilities; each first classification probability corresponds to a preset risk type;
forming corresponding first-type prediction information by each first classification probability and the corresponding risk type; and the corresponding first prediction results are formed and displayed by all the obtained first type of prediction information.
8. An electronic device, comprising: memory, processor, and transceiver;
the processor being operative to couple with the memory, read and execute instructions in the memory to implement the method of any one of claims 1-7;
the transceiver is coupled to the processor and is controlled by the processor to transmit and receive messages.
9. A computer readable storage medium storing computer instructions which, when executed by a computer, cause the computer to perform the method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310480070.0A CN116450462A (en) | 2023-04-27 | 2023-04-27 | Processing method for monitoring storage equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310480070.0A CN116450462A (en) | 2023-04-27 | 2023-04-27 | Processing method for monitoring storage equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116450462A true CN116450462A (en) | 2023-07-18 |
Family
ID=87121884
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310480070.0A Pending CN116450462A (en) | 2023-04-27 | 2023-04-27 | Processing method for monitoring storage equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116450462A (en) |
-
2023
- 2023-04-27 CN CN202310480070.0A patent/CN116450462A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020259421A1 (en) | Method and apparatus for monitoring service system | |
US10585774B2 (en) | Detection of misbehaving components for large scale distributed systems | |
CN109981333B (en) | Operation and maintenance method and operation and maintenance equipment applied to data center | |
CN110740061B (en) | Fault early warning method and device and computer storage medium | |
CN111949429A (en) | Server fault monitoring method and system based on density clustering algorithm | |
CN110300008A (en) | A kind of method and device of the state of the determining network equipment | |
CN113992602B (en) | Cable monitoring data uploading method, device, equipment and storage medium | |
CN111930603A (en) | Server performance detection method, device, system and medium | |
CN117061335A (en) | Cloud platform equipment health management and control method and device, storage medium and electronic equipment | |
CN111897706A (en) | Server performance prediction method, device, computer system and medium | |
CN116450463A (en) | Processing method for monitoring server hardware | |
CN108039971A (en) | A kind of alarm method and device | |
CN115296876A (en) | Network security early warning system of self-adaptation mimicry technique | |
CN116610521A (en) | Processing method for monitoring database | |
KR20200126766A (en) | Operation management apparatus and method in ict infrastructure | |
CN116450462A (en) | Processing method for monitoring storage equipment | |
CN118096246A (en) | Enterprise portrait system and method | |
CN115701890B (en) | Method for adjusting alarm rule and related equipment | |
CN111614504A (en) | Power grid regulation and control data center service characteristic fault positioning method and system based on time sequence and fault tree analysis | |
CN116450299A (en) | Processing method for monitoring virtual machine | |
CN117149569A (en) | Board running state early warning method and device and electronic equipment | |
CN116127400A (en) | Sensitive data identification system, method and storage medium based on heterogeneous computation | |
CN111211938B (en) | Biological information software monitoring system and method | |
CN116489061A (en) | Processing method for monitoring intermediate piece | |
CN116436815A (en) | Processing method for monitoring network equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |