CN111949429A - Server fault monitoring method and system based on density clustering algorithm - Google Patents

Server fault monitoring method and system based on density clustering algorithm Download PDF

Info

Publication number
CN111949429A
CN111949429A CN202010823489.8A CN202010823489A CN111949429A CN 111949429 A CN111949429 A CN 111949429A CN 202010823489 A CN202010823489 A CN 202010823489A CN 111949429 A CN111949429 A CN 111949429A
Authority
CN
China
Prior art keywords
sample
server
fault
monitoring
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010823489.8A
Other languages
Chinese (zh)
Inventor
杨柳
马晓光
赖一鹏
张永健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Chaoyue CNC Electronics Co Ltd
Original Assignee
Shandong Chaoyue CNC Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Chaoyue CNC Electronics Co Ltd filed Critical Shandong Chaoyue CNC Electronics Co Ltd
Priority to CN202010823489.8A priority Critical patent/CN111949429A/en
Publication of CN111949429A publication Critical patent/CN111949429A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a server fault monitoring method and system based on a density clustering algorithm, belongs to the technical field of server fault monitoring, and aims to solve the technical problems of how to perform fault monitoring and early warning on a server and reset a downtime fault through the density-based clustering algorithm. The method comprises the following steps: acquiring health information data of a server through a BMC (baseboard management controller) and constructing a sample; carrying out normalization processing on the sample data; constructing a monitoring model based on a DBSCAN algorithm, and optimizing parameters of the monitoring model to obtain a trained monitoring model; analyzing sample data in the test sample through the trained monitoring model and outputting a monitoring result; and if the server has faults in the monitoring result, the monitoring result is fed back to the web page to be displayed, and if the server is down, the server is reset through the BMC. The system comprises a data acquisition module, a data preprocessing module, a model training module and a result output module.

Description

Server fault monitoring method and system based on density clustering algorithm
Technical Field
The invention relates to the technical field of server fault monitoring, in particular to a server fault monitoring method and system based on a density clustering algorithm.
Background
The server is a computer with fast operation, high load and strong performance, long-time operation is an important performance index of the server, monitoring of the operation state of the server is an important method for guaranteeing long-time reliable operation of the server, once the server fails to operate normally, the server needs to be reset and the like by means of a server remote controller BMC, the BMC can control the server, can acquire health information such as voltage, temperature, fan rotating speed and the like from the server and process related information, and can analyze the BMC acquired information in real time through machine learning algorithms such as a neural network and clustering to predict whether the server has fault hidden danger.
At present, artificial intelligence algorithms are diversified, such as a neural network based on supervised learning, a clustering algorithm based on unsupervised learning and the like, wherein the clustering algorithm is light and quick, a clustering center is established, errors are reduced through circulation and degressive to achieve the purposes of classification and prediction, and the clustering center is determined to play a vital role in classifying and predicting results by combining with data characteristics in an application scene. The clustering algorithm is divided into a division method, a hierarchy method, a density algorithm, a graph theory clustering method, a grid algorithm, a model algorithm and the like, wherein the density-based clustering algorithm can overcome the defect that the distance-based algorithm can only find the clustering of the 'quasi-circular'.
Based on the advantages of the density-based clustering algorithm, how to perform fault monitoring and early warning on the server and reset the downtime fault through the density-based clustering algorithm is a technical problem to be solved.
Disclosure of Invention
The technical task of the invention is to provide a server fault monitoring method and system based on a density clustering algorithm to solve the problem of how to perform fault monitoring and early warning on a server and reset a downtime fault through the density-based clustering algorithm.
In a first aspect, the present invention provides a server fault monitoring method based on a density clustering algorithm, which analyzes health information data of a server and predicts an operating state of the server through the density clustering algorithm, performs fault early warning on the server, and resets when the server is down, and the method includes the following steps:
acquiring health information data of a server through a BMC (baseboard management controller) and constructing a sample, wherein the sample is divided into a training sample and a testing sample, the sample data in the training sample needs to mark the current server state, and the server state comprises fault types and numerical data corresponding to various fault types;
normalizing the sample data;
constructing a monitoring model based on a DBSCAN algorithm, and optimizing parameters of the monitoring model by taking a training sample as input to obtain a trained monitoring model;
inputting a test sample into a trained monitoring model, analyzing sample data in the test sample through the trained monitoring model and outputting a monitoring result, wherein the monitoring result comprises whether a server has a fault or not and a fault type;
and if the server has faults in the monitoring result, feeding the monitoring result back to the web page for displaying, further monitoring whether the server is down, and if the server is down, resetting the server through the BMC.
Preferably, the BMC obtains the health information data of the server through I2C.
Preferably, the health information data is health information data of relevant components in the server motherboard, including but not limited to voltage, current, temperature value, and PCU memory usage.
Preferably, the method for optimizing the parameters of the monitoring model by taking the training samples as input comprises the following steps:
l100, setting a domain parameter, wherein the domain parameter comprises a neighborhood radius eps and a cluster sample number min _ sample;
l200, selecting any sample point from the training samples, judging whether the sample point is allocated with a cluster label, and executing the step L300 if the sample point is not allocated with the cluster label;
l300, calculating all other sample points within the neighborhood radius eps of the sample point, forming a neighborhood sample subset based on all other sample points, if the number of the sample points within the neighborhood sample subset is less than the cluster sample number min _ sample, marking the sample points as noise points, and if the number of the sample points within the neighborhood sample subset is greater than the cluster sample number min _ sample, marking the sample points as core sample points, and allocating a cluster label to the core sample points;
l400, traversing the neighborhood sample subset, judging whether other sample points in the neighborhood sample subset are not allocated with cluster labels, allocating the cluster labels corresponding to the core sample points for the other sample points which are not allocated with the cluster labels, and executing the step L500;
l00, judging whether other sample points which are not allocated with cluster labels are core sample points, if so, executing steps L300-L400 for each other sample point which is the core sample point;
l600, selecting another sample point which is not visited from the training sample, and sequentially executing the steps L200-L500 until all sample points in the training sample are visited;
and L600, adjusting field parameters according to the accuracy of sample classification until the accuracy of sample classification meets the expected requirement, and finally obtaining a trained monitoring model.
Preferably, the method for analyzing the test sample and outputting the monitoring result by the trained monitoring model comprises the following steps:
inputting a test sample into the trained monitoring model;
calculating the number of samples of various fault types in the neighborhood radius eps through a training monitoring model, selecting the fault type with the largest number of samples as the type of the test sample data, and outputting the type of the fault as a predicted fault type.
In a second aspect, the present invention provides a server fault monitoring system based on a density clustering algorithm, which is characterized in that a server is subjected to fault monitoring by the server fault monitoring method based on the density clustering algorithm according to any one of the first aspect, and the system includes:
the system comprises a data acquisition module, a fault detection module and a fault analysis module, wherein the data acquisition module is used for acquiring health information data of a server through a BMC (baseboard management controller) and constructing a sample, the sample is divided into a training sample and a testing sample, the sample data in the training sample needs to mark the current server state, and the server state comprises fault types and numerical data corresponding to various fault types;
the data preprocessing module is used for carrying out normalization processing on the sample data;
the model training module is used for constructing a monitoring model based on a DBSCAN algorithm, and optimizing parameters of the monitoring model by taking a training sample as input to obtain a trained monitoring model;
and the result output module is used for inputting the test sample into the trained monitoring model, analyzing the sample data in the test sample through the trained monitoring model and outputting a monitoring result, wherein the monitoring result comprises whether the server has a fault and a fault type, if the server has a fault in the monitoring result, the monitoring result is fed back to a web page for displaying, whether the server is down is further monitored, and if the server is down, the monitoring result is used for resetting the server through the BMC.
Preferably, the BMC obtains the health information data of the server through I2C.
Preferably, the health information data is health information data of relevant components in the server motherboard, including but not limited to voltage, current, temperature value, and PCU memory usage.
Preferably, the model training module is configured to train the monitoring model by:
l100, setting a domain parameter, wherein the domain parameter comprises a neighborhood radius eps and a cluster sample number min _ sample;
l200, selecting any sample point from the training samples, judging whether the sample point is allocated with a cluster label, and executing the step L300 if the sample point is not allocated with the cluster label;
l300, calculating all other sample points within the neighborhood radius eps of the sample point, forming a neighborhood sample subset based on all other sample points, if the number of the sample points within the neighborhood sample subset is less than the cluster sample number min _ sample, marking the sample points as noise points, and if the number of the sample points within the neighborhood sample subset is greater than the cluster sample number min _ sample, marking the sample points as core sample points, and allocating a cluster label to the core sample points;
l400, traversing the neighborhood sample subset, judging whether other sample points in the neighborhood sample subset are not allocated with cluster labels, allocating the cluster labels corresponding to the core sample points for the other sample points which are not allocated with the cluster labels, and executing the step L500;
l00, judging whether other sample points which are not allocated with cluster labels are core sample points, if so, executing steps L300-L400 for each other sample point which is the core sample point;
l600, selecting another sample point which is not visited from the training sample, and sequentially executing the steps L200-L500 until all sample points in the training sample are visited;
and L600, adjusting field parameters according to the accuracy of sample classification until the accuracy of sample classification meets the expected requirement, and finally obtaining a trained monitoring model.
Preferably, the result output module is configured to output the monitoring result by:
inputting a test sample into the trained monitoring model;
calculating the number of samples of various fault types in the neighborhood radius eps through a training monitoring model, selecting the fault type with the largest number of samples as the type of the test sample data, and outputting the type of the fault as a predicted fault type.
The server fault monitoring method and system based on the density clustering algorithm have the following advantages: the health information data related to the server acquired by the BMC is analyzed through the clustering algorithm, the running state of the server is predicted, the early warning of the server fault is realized, the monitoring server performs reset operation on the downtime fault during the early warning of the fault, and the stability of the server is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a block diagram of a server failure monitoring method based on a density clustering algorithm in accordance with embodiment 1;
fig. 2 is a block diagram of a DBSCAN algorithm flow in the server fault monitoring method based on the density clustering algorithm in embodiment 1.
Detailed Description
The present invention is further described in the following with reference to the drawings and the specific embodiments so that those skilled in the art can better understand the present invention and can implement the present invention, but the embodiments are not to be construed as limiting the present invention, and the embodiments and the technical features of the embodiments can be combined with each other without conflict.
The embodiment of the invention provides a server fault monitoring method and system based on a density clustering algorithm, which are used for solving the technical problems of how to carry out fault monitoring and early warning on a server and reset a downtime fault through the density-based clustering algorithm.
Example 1:
the invention relates to a server fault monitoring method based on a density clustering algorithm, which analyzes health information data of a server through the density clustering algorithm, predicts the running state of the server, carries out fault early warning on the server and resets when the server goes down, and comprises the following steps:
s100, acquiring health information data of a server through a BMC (baseboard management controller) and constructing a sample, wherein the sample is divided into a training sample and a testing sample, the sample data in the training sample needs to mark the current server state, and the server state comprises fault types and numerical data corresponding to various fault types;
s200, normalizing the sample data;
s300, constructing a monitoring model based on a DBSCAN algorithm, and optimizing parameters of the monitoring model by taking a training sample as input to obtain a trained monitoring model;
s400, inputting a test sample into a trained monitoring model, analyzing sample data in the test sample through the trained monitoring model, and outputting a monitoring result, wherein the monitoring result comprises whether a server has a fault or not and a fault type; and if the server has faults in the monitoring result, feeding the monitoring result back to the web page for displaying, further monitoring whether the server is down, and if the server is down, resetting the server through the BMC.
The BMC obtains the health information data of the server through I2C, where the health information data is the health information data of relevant components in the server motherboard, including but not limited to voltage, current, temperature value, and PCU memory usage rate.
Because the acquired data is continuous data and the range difference of each data is large, the sample data needs to be normalized and the range is unified. After normalization processing, training samples are used as input, a constructed monitoring model is trained, test samples are used as input, and the trained monitoring model is used for analyzing sample data in the test samples so as to perform fault early warning on a server.
Fault monitoring based on a DBSCAN clustering algorithm is divided into two parts, namely a first part, a neighborhood radius eps and a cluster sample number min _ sample of a sample are confirmed through a certain amount of labeled data before monitoring, and a noise sample is removed; and a second part, taking sample data as input in real-time monitoring, calculating whether the minimum sample point of the test data in the neighborhood radius eps meets the min _ sample, and if so, dividing the test data into the class. In this embodiment, the monitoring model is constructed based on the DBSCAN algorithm, and parameters of the monitoring model are optimized by using a training sample as input through the following steps:
l100, setting a domain parameter, wherein the domain parameter comprises a neighborhood radius eps and a cluster sample number min _ sample;
l200, selecting any sample point from the training samples, judging whether the sample point is allocated with a cluster label, and executing the step L300 if the sample point is not allocated with the cluster label;
l300, calculating all other sample points within the neighborhood radius eps of the sample point, forming a neighborhood sample subset based on all other sample points, if the number of the sample points within the neighborhood sample subset is less than the cluster sample number min _ sample, marking the sample points as noise points, and if the number of the sample points within the neighborhood sample subset is greater than the cluster sample number min _ sample, marking the sample points as core sample points, and allocating a cluster label to the core sample points;
l400, traversing the neighborhood sample subset, judging whether other sample points in the neighborhood sample subset are not allocated with cluster labels, allocating the cluster labels corresponding to the core sample points for the other sample points which are not allocated with the cluster labels, and executing the step L500;
l00, judging whether other sample points which are not allocated with cluster labels are core sample points, if so, executing steps L300-L400 for each other sample point which is the core sample point;
l600, selecting another sample point which is not visited from the training sample, and sequentially executing the steps L200-L500 until all sample points in the training sample are visited;
and L600, adjusting field parameters according to the accuracy of sample classification until the accuracy of sample classification meets the expected requirement, and finally obtaining a trained monitoring model.
Analyzing the test sample through the trained monitoring model and outputting a monitoring result, wherein the method comprises the following steps:
l100, inputting a test sample into the trained monitoring model;
l100, calculating the number of samples of each fault type in the neighborhood radius eps through the training monitoring model, selecting the fault type with the largest number of samples as the type of the test sample data, and outputting the type of the test sample data as a predicted fault type.
According to the server fault monitoring method based on the density clustering algorithm, the method for analyzing the health information data acquired by the BMC and predicting the running state of the server through the clustering algorithm is used for realizing the early warning of the server fault and monitoring the server to reset the downtime fault during the early warning of the fault.
Example 2:
the invention discloses a server fault monitoring system based on a density clustering algorithm, which comprises a data acquisition module, a data preprocessing module, a model training module and a result output module, wherein the data acquisition module is used for acquiring health information data of a server through a BMC (baseboard management controller) and constructing samples, the samples are divided into training samples and testing samples, the sample data in the training samples are required to mark the current server state, and the server state comprises fault types and numerical data corresponding to various fault types; the data preprocessing module is used for carrying out normalization processing on the sample data; the model training module is used for constructing a monitoring model based on a DBSCAN algorithm, and optimizing parameters of the monitoring model by taking a training sample as input to obtain a trained monitoring model; the result output module is used for inputting the test sample into the trained monitoring model, analyzing the sample data in the test sample through the trained monitoring model and outputting a monitoring result, wherein the monitoring result comprises whether the server has a fault and a fault type, if the server has a fault in the monitoring result, the monitoring result is fed back to a web page to be displayed, whether the server is down is further monitored, and if the server is down, the resetting operation is carried out on the server through the BMC.
The BMC obtains the health information data of the server through I2C, where the health information data is the health information data of relevant components in the server motherboard, including but not limited to voltage, current, temperature value, and PCU memory usage rate.
The model training module is used for training the monitoring model through the following steps:
l100, setting a domain parameter, wherein the domain parameter comprises a neighborhood radius eps and a cluster sample number min _ sample;
l200, selecting any sample point from the training samples, judging whether the sample point is allocated with a cluster label, and executing the step L300 if the sample point is not allocated with the cluster label;
l300, calculating all other sample points within the neighborhood radius eps of the sample point, forming a neighborhood sample subset based on all other sample points, if the number of the sample points within the neighborhood sample subset is less than the cluster sample number min _ sample, marking the sample points as noise points, and if the number of the sample points within the neighborhood sample subset is greater than the cluster sample number min _ sample, marking the sample points as core sample points, and allocating a cluster label to the core sample points;
l400, traversing the neighborhood sample subset, judging whether other sample points in the neighborhood sample subset are not allocated with cluster labels, allocating the cluster labels corresponding to the core sample points for the other sample points which are not allocated with the cluster labels, and executing the step L500;
l00, judging whether other sample points which are not allocated with cluster labels are core sample points, if so, executing steps L300-L400 for each other sample point which is the core sample point;
l600, selecting another sample point which is not visited from the training sample, and sequentially executing the steps L200-L500 until all sample points in the training sample are visited;
and L600, adjusting field parameters according to the accuracy of sample classification until the accuracy of sample classification meets the expected requirement, and finally obtaining a trained monitoring model.
The result output module is used for outputting the monitoring result through the following steps:
inputting a test sample into the trained monitoring model;
calculating the number of samples of various fault types in the neighborhood radius eps through a training monitoring model, selecting the fault type with the largest number of samples as the type of the test sample data, and outputting the type of the fault as a predicted fault type.
According to the server fault monitoring system based on the density clustering algorithm, disclosed by the embodiment 1, the server is subjected to fault monitoring through the server fault monitoring method based on the density clustering algorithm, the fault type is fed back to a webpage for displaying, the state of the server is monitored, and resetting operation is performed when the server is down.
It should be noted that not all steps and modules in the above flows and system structure diagrams are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The system structure described in the above embodiments may be a physical structure or a logical structure, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by a plurality of physical entities, or some components in a plurality of independent devices may be implemented together.
While the invention has been shown and described in detail in the drawings and in the preferred embodiments, it is not intended to limit the invention to the embodiments disclosed, and it will be apparent to those skilled in the art that many more embodiments of the invention are possible that combine the features of the different embodiments described above and still fall within the scope of the invention.

Claims (10)

1. A server fault monitoring method based on a density clustering algorithm is characterized in that health information data of a server are analyzed through the density clustering algorithm, the running state of the server is predicted, fault early warning is carried out on the server, and resetting is carried out when the server is down, and the method comprises the following steps:
acquiring health information data of a server through a BMC (baseboard management controller) and constructing a sample, wherein the sample is divided into a training sample and a testing sample, the sample data in the training sample needs to mark the current server state, and the server state comprises fault types and numerical data corresponding to various fault types;
normalizing the sample data;
constructing a monitoring model based on a DBSCAN algorithm, and optimizing parameters of the monitoring model by taking a training sample as input to obtain a trained monitoring model;
inputting a test sample into a trained monitoring model, analyzing sample data in the test sample through the trained monitoring model and outputting a monitoring result, wherein the monitoring result comprises whether a server has a fault or not and a fault type;
and if the server has faults in the monitoring result, feeding the monitoring result back to the web page for displaying, further monitoring whether the server is down, and if the server is down, resetting the server through the BMC.
2. The server fault monitoring method based on the density clustering algorithm as claimed in claim 1, wherein the BMC obtains the health information data of the server through I2C.
3. The server fault monitoring method based on the density clustering algorithm according to claim 1 or 2, wherein the health information data is health information data of relevant components in a server main board, including but not limited to voltage, current, temperature value, and PCU memory usage.
4. The server fault monitoring method based on the density clustering algorithm as claimed in claim 1, wherein training samples are taken as input to optimize parameters of the monitoring model, comprising the following steps:
l100, setting a domain parameter, wherein the domain parameter comprises a neighborhood radius eps and a cluster sample number min _ sample;
l200, selecting any sample point from the training samples, judging whether the sample point is allocated with a cluster label, and executing the step L300 if the sample point is not allocated with the cluster label;
l300, calculating all other sample points within the neighborhood radius eps of the sample point, forming a neighborhood sample subset based on all other sample points, if the number of the sample points within the neighborhood sample subset is less than the cluster sample number min _ sample, marking the sample points as noise points, and if the number of the sample points within the neighborhood sample subset is greater than the cluster sample number min _ sample, marking the sample points as core sample points, and allocating a cluster label to the core sample points;
l400, traversing the neighborhood sample subset, judging whether other sample points in the neighborhood sample subset are not allocated with cluster labels, allocating the cluster labels corresponding to the core sample points for the other sample points which are not allocated with the cluster labels, and executing the step L500;
l00, judging whether other sample points which are not allocated with cluster labels are core sample points, if so, executing steps L300-L400 for each other sample point which is the core sample point;
l600, selecting another sample point which is not visited from the training sample, and sequentially executing the steps L200-L500 until all sample points in the training sample are visited;
and L600, adjusting field parameters according to the accuracy of sample classification until the accuracy of sample classification meets the expected requirement, and finally obtaining a trained monitoring model.
5. The server fault monitoring method based on the density clustering algorithm as claimed in claim 4, wherein the test samples are analyzed by the trained monitoring model and the monitoring result is output, comprising the following steps:
inputting a test sample into the trained monitoring model;
calculating the number of samples of various fault types in the neighborhood radius eps through a training monitoring model, selecting the fault type with the largest number of samples as the type of the test sample data, and outputting the type of the fault as a predicted fault type.
6. A server fault monitoring system based on a density clustering algorithm, which is characterized in that the fault monitoring is carried out on a server by the server fault monitoring method based on the density clustering algorithm according to any one of claims 1 to 5, and the system comprises:
the system comprises a data acquisition module, a fault detection module and a fault analysis module, wherein the data acquisition module is used for acquiring health information data of a server through a BMC (baseboard management controller) and constructing a sample, the sample is divided into a training sample and a testing sample, the sample data in the training sample needs to mark the current server state, and the server state comprises fault types and numerical data corresponding to various fault types;
the data preprocessing module is used for carrying out normalization processing on the sample data;
the model training module is used for constructing a monitoring model based on a DBSCAN algorithm, and optimizing parameters of the monitoring model by taking a training sample as input to obtain a trained monitoring model;
and the result output module is used for inputting the test sample into the trained monitoring model, analyzing the sample data in the test sample through the trained monitoring model and outputting a monitoring result, wherein the monitoring result comprises whether the server has a fault and a fault type, if the server has a fault in the monitoring result, the monitoring result is fed back to a web page for displaying, whether the server is down is further monitored, and if the server is down, the monitoring result is used for resetting the server through the BMC.
7. The server fault monitoring system based on the density clustering algorithm as claimed in claim 6, wherein the BMC obtains health information data of the server through I2C.
8. The server fault monitoring system based on density clustering algorithm according to claim 6, wherein the health information data is health information data of related components in the server motherboard, including but not limited to voltage, current, temperature value, and PCU memory usage.
9. The server fault monitoring system based on density clustering algorithm according to claim 6, wherein the model training module is configured to train the monitoring model by:
l100, setting a domain parameter, wherein the domain parameter comprises a neighborhood radius eps and a cluster sample number min _ sample;
l200, selecting any sample point from the training samples, judging whether the sample point is allocated with a cluster label, and executing the step L300 if the sample point is not allocated with the cluster label;
l300, calculating all other sample points within the neighborhood radius eps of the sample point, forming a neighborhood sample subset based on all other sample points, if the number of the sample points within the neighborhood sample subset is less than the cluster sample number min _ sample, marking the sample points as noise points, and if the number of the sample points within the neighborhood sample subset is greater than the cluster sample number min _ sample, marking the sample points as core sample points, and allocating a cluster label to the core sample points;
l400, traversing the neighborhood sample subset, judging whether other sample points in the neighborhood sample subset are not allocated with cluster labels, allocating the cluster labels corresponding to the core sample points for the other sample points which are not allocated with the cluster labels, and executing the step L500;
l00, judging whether other sample points which are not allocated with cluster labels are core sample points, if so, executing steps L300-L400 for each other sample point which is the core sample point;
l600, selecting another sample point which is not visited from the training sample, and sequentially executing the steps L200-L500 until all sample points in the training sample are visited;
and L600, adjusting field parameters according to the accuracy of sample classification until the accuracy of sample classification meets the expected requirement, and finally obtaining a trained monitoring model.
10. The server fault monitoring system based on the density clustering algorithm as claimed in claim 6, wherein the result output module is configured to output the monitoring result by:
inputting a test sample into the trained monitoring model;
calculating the number of samples of various fault types in the neighborhood radius eps through a training monitoring model, selecting the fault type with the largest number of samples as the type of the test sample data, and outputting the type of the fault as a predicted fault type.
CN202010823489.8A 2020-08-17 2020-08-17 Server fault monitoring method and system based on density clustering algorithm Pending CN111949429A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010823489.8A CN111949429A (en) 2020-08-17 2020-08-17 Server fault monitoring method and system based on density clustering algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010823489.8A CN111949429A (en) 2020-08-17 2020-08-17 Server fault monitoring method and system based on density clustering algorithm

Publications (1)

Publication Number Publication Date
CN111949429A true CN111949429A (en) 2020-11-17

Family

ID=73341983

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010823489.8A Pending CN111949429A (en) 2020-08-17 2020-08-17 Server fault monitoring method and system based on density clustering algorithm

Country Status (1)

Country Link
CN (1) CN111949429A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032218A (en) * 2021-03-26 2021-06-25 山东英信计算机技术有限公司 Server fault detection method, system and computer readable storage medium
CN113347079A (en) * 2021-05-31 2021-09-03 中国工商银行股份有限公司 Mail identification method, mail identification device, electronic equipment and readable storage medium
CN113554055A (en) * 2021-06-11 2021-10-26 杭州玖欣物联科技有限公司 Processing condition identification method based on clustering algorithm
CN113835962A (en) * 2021-09-24 2021-12-24 超越科技股份有限公司 Server fault detection method and device, computer equipment and storage medium
WO2023045512A1 (en) * 2021-09-24 2023-03-30 浪潮集团有限公司 Method and system for performing fault detection on industrial hardware on basis of machine learning
WO2024169123A1 (en) * 2023-02-13 2024-08-22 浪潮通用软件有限公司 Clustering-algorithm-based cluster control device health monitoring method and device, and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222782A (en) * 2019-06-13 2019-09-10 齐鲁工业大学 There are supervision two-category data analysis method and system based on Density Clustering
CN111143173A (en) * 2020-01-02 2020-05-12 山东超越数控电子股份有限公司 Server fault monitoring method and system based on neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222782A (en) * 2019-06-13 2019-09-10 齐鲁工业大学 There are supervision two-category data analysis method and system based on Density Clustering
CN111143173A (en) * 2020-01-02 2020-05-12 山东超越数控电子股份有限公司 Server fault monitoring method and system based on neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
石鑫,朱永利: "电力变压器状态监测数据聚类研究", 电力信息与通信技术, vol. 13, no. 11, pages 32 - 34 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032218A (en) * 2021-03-26 2021-06-25 山东英信计算机技术有限公司 Server fault detection method, system and computer readable storage medium
CN113032218B (en) * 2021-03-26 2022-07-29 山东英信计算机技术有限公司 Server fault detection method, system and computer readable storage medium
CN113347079A (en) * 2021-05-31 2021-09-03 中国工商银行股份有限公司 Mail identification method, mail identification device, electronic equipment and readable storage medium
CN113347079B (en) * 2021-05-31 2022-12-09 中国工商银行股份有限公司 Mail identification method, mail identification device, electronic equipment and readable storage medium
CN113554055A (en) * 2021-06-11 2021-10-26 杭州玖欣物联科技有限公司 Processing condition identification method based on clustering algorithm
CN113835962A (en) * 2021-09-24 2021-12-24 超越科技股份有限公司 Server fault detection method and device, computer equipment and storage medium
WO2023045512A1 (en) * 2021-09-24 2023-03-30 浪潮集团有限公司 Method and system for performing fault detection on industrial hardware on basis of machine learning
WO2024169123A1 (en) * 2023-02-13 2024-08-22 浪潮通用软件有限公司 Clustering-algorithm-based cluster control device health monitoring method and device, and medium

Similar Documents

Publication Publication Date Title
CN111178456B (en) Abnormal index detection method and device, computer equipment and storage medium
CN111949429A (en) Server fault monitoring method and system based on density clustering algorithm
Borghesi et al. A semisupervised autoencoder-based approach for anomaly detection in high performance computing systems
Borghesi et al. Anomaly detection using autoencoders in high performance computing systems
KR102522005B1 (en) Apparatus for VNF Anomaly Detection based on Machine Learning for Virtual Network Management and a method thereof
CN108345544B (en) Software defect distribution influence factor analysis method based on complex network
KR102118670B1 (en) System and method for management of ict infra
CN111459700A (en) Method and apparatus for diagnosing device failure, diagnostic device, and storage medium
US20230033680A1 (en) Communication Network Performance and Fault Analysis Using Learning Models with Model Interpretation
CN110532152A (en) A kind of monitoring alarm processing method and system based on Kapacitor computing engines
Su et al. Detecting outlier machine instances through gaussian mixture variational autoencoder with one dimensional cnn
WO2017034512A1 (en) Interactive analytics on time series
WO2023179042A1 (en) Data updating method, fault diagnosis method, electronic device, and storage medium
Borghesi et al. Examon-x: a predictive maintenance framework for automatic monitoring in industrial iot systems
Xie et al. Logm: Log analysis for multiple components of hadoop platform
Song et al. Autonomous selection of the fault classification models for diagnosing microservice applications
Xiao et al. Operation and maintenance (O&M) for data center: An intelligent anomaly detection approach
CN111901156B (en) Method and device for monitoring faults
CN113468022A (en) Automatic operation and maintenance method for centralized monitoring of products
CN116681350A (en) Intelligent factory fault detection method and system
CN111614504A (en) Power grid regulation and control data center service characteristic fault positioning method and system based on time sequence and fault tree analysis
Zhu et al. A Performance Fault Diagnosis Method for SaaS Software Based on GBDT Algorithm.
KR20200002433A (en) Statistical quality control system and method using big data analysis
CN113992496A (en) Abnormal operation warning method and device based on quartile algorithm and computing equipment
RU2809719C1 (en) Method for diagnosing aircraft on-board equipment complex based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201117