CN111949429A

CN111949429A - Server fault monitoring method and system based on density clustering algorithm

Info

Publication number: CN111949429A
Application number: CN202010823489.8A
Authority: CN
Inventors: 杨柳; 马晓光; 赖一鹏; 张永健
Original assignee: Shandong Chaoyue CNC Electronics Co Ltd
Current assignee: Shandong Chaoyue CNC Electronics Co Ltd
Priority date: 2020-08-17
Filing date: 2020-08-17
Publication date: 2020-11-17

Abstract

The invention discloses a server fault monitoring method and system based on a density clustering algorithm, belongs to the technical field of server fault monitoring, and aims to solve the technical problems of how to perform fault monitoring and early warning on a server and reset a downtime fault through the density-based clustering algorithm. The method comprises the following steps: acquiring health information data of a server through a BMC (baseboard management controller) and constructing a sample; carrying out normalization processing on the sample data; constructing a monitoring model based on a DBSCAN algorithm, and optimizing parameters of the monitoring model to obtain a trained monitoring model; analyzing sample data in the test sample through the trained monitoring model and outputting a monitoring result; and if the server has faults in the monitoring result, the monitoring result is fed back to the web page to be displayed, and if the server is down, the server is reset through the BMC. The system comprises a data acquisition module, a data preprocessing module, a model training module and a result output module.

Description

Server fault monitoring method and system based on density clustering algorithm

Technical Field

The invention relates to the technical field of server fault monitoring, in particular to a server fault monitoring method and system based on a density clustering algorithm.

Background

The server is a computer with fast operation, high load and strong performance, long-time operation is an important performance index of the server, monitoring of the operation state of the server is an important method for guaranteeing long-time reliable operation of the server, once the server fails to operate normally, the server needs to be reset and the like by means of a server remote controller BMC, the BMC can control the server, can acquire health information such as voltage, temperature, fan rotating speed and the like from the server and process related information, and can analyze the BMC acquired information in real time through machine learning algorithms such as a neural network and clustering to predict whether the server has fault hidden danger.

At present, artificial intelligence algorithms are diversified, such as a neural network based on supervised learning, a clustering algorithm based on unsupervised learning and the like, wherein the clustering algorithm is light and quick, a clustering center is established, errors are reduced through circulation and degressive to achieve the purposes of classification and prediction, and the clustering center is determined to play a vital role in classifying and predicting results by combining with data characteristics in an application scene. The clustering algorithm is divided into a division method, a hierarchy method, a density algorithm, a graph theory clustering method, a grid algorithm, a model algorithm and the like, wherein the density-based clustering algorithm can overcome the defect that the distance-based algorithm can only find the clustering of the 'quasi-circular'.

Based on the advantages of the density-based clustering algorithm, how to perform fault monitoring and early warning on the server and reset the downtime fault through the density-based clustering algorithm is a technical problem to be solved.

Disclosure of Invention

The technical task of the invention is to provide a server fault monitoring method and system based on a density clustering algorithm to solve the problem of how to perform fault monitoring and early warning on a server and reset a downtime fault through the density-based clustering algorithm.

In a first aspect, the present invention provides a server fault monitoring method based on a density clustering algorithm, which analyzes health information data of a server and predicts an operating state of the server through the density clustering algorithm, performs fault early warning on the server, and resets when the server is down, and the method includes the following steps:

acquiring health information data of a server through a BMC (baseboard management controller) and constructing a sample, wherein the sample is divided into a training sample and a testing sample, the sample data in the training sample needs to mark the current server state, and the server state comprises fault types and numerical data corresponding to various fault types;

normalizing the sample data;

constructing a monitoring model based on a DBSCAN algorithm, and optimizing parameters of the monitoring model by taking a training sample as input to obtain a trained monitoring model;

inputting a test sample into a trained monitoring model, analyzing sample data in the test sample through the trained monitoring model and outputting a monitoring result, wherein the monitoring result comprises whether a server has a fault or not and a fault type;

and if the server has faults in the monitoring result, feeding the monitoring result back to the web page for displaying, further monitoring whether the server is down, and if the server is down, resetting the server through the BMC.

Preferably, the BMC obtains the health information data of the server through I2C.

Preferably, the health information data is health information data of relevant components in the server motherboard, including but not limited to voltage, current, temperature value, and PCU memory usage.

Preferably, the method for optimizing the parameters of the monitoring model by taking the training samples as input comprises the following steps:

l100, setting a domain parameter, wherein the domain parameter comprises a neighborhood radius eps and a cluster sample number min _ sample;

l200, selecting any sample point from the training samples, judging whether the sample point is allocated with a cluster label, and executing the step L300 if the sample point is not allocated with the cluster label;

l300, calculating all other sample points within the neighborhood radius eps of the sample point, forming a neighborhood sample subset based on all other sample points, if the number of the sample points within the neighborhood sample subset is less than the cluster sample number min _ sample, marking the sample points as noise points, and if the number of the sample points within the neighborhood sample subset is greater than the cluster sample number min _ sample, marking the sample points as core sample points, and allocating a cluster label to the core sample points;

l400, traversing the neighborhood sample subset, judging whether other sample points in the neighborhood sample subset are not allocated with cluster labels, allocating the cluster labels corresponding to the core sample points for the other sample points which are not allocated with the cluster labels, and executing the step L500;

l00, judging whether other sample points which are not allocated with cluster labels are core sample points, if so, executing steps L300-L400 for each other sample point which is the core sample point;

l600, selecting another sample point which is not visited from the training sample, and sequentially executing the steps L200-L500 until all sample points in the training sample are visited;

and L600, adjusting field parameters according to the accuracy of sample classification until the accuracy of sample classification meets the expected requirement, and finally obtaining a trained monitoring model.

Preferably, the method for analyzing the test sample and outputting the monitoring result by the trained monitoring model comprises the following steps:

inputting a test sample into the trained monitoring model;

calculating the number of samples of various fault types in the neighborhood radius eps through a training monitoring model, selecting the fault type with the largest number of samples as the type of the test sample data, and outputting the type of the fault as a predicted fault type.

In a second aspect, the present invention provides a server fault monitoring system based on a density clustering algorithm, which is characterized in that a server is subjected to fault monitoring by the server fault monitoring method based on the density clustering algorithm according to any one of the first aspect, and the system includes:

the system comprises a data acquisition module, a fault detection module and a fault analysis module, wherein the data acquisition module is used for acquiring health information data of a server through a BMC (baseboard management controller) and constructing a sample, the sample is divided into a training sample and a testing sample, the sample data in the training sample needs to mark the current server state, and the server state comprises fault types and numerical data corresponding to various fault types;

the data preprocessing module is used for carrying out normalization processing on the sample data;

the model training module is used for constructing a monitoring model based on a DBSCAN algorithm, and optimizing parameters of the monitoring model by taking a training sample as input to obtain a trained monitoring model;

and the result output module is used for inputting the test sample into the trained monitoring model, analyzing the sample data in the test sample through the trained monitoring model and outputting a monitoring result, wherein the monitoring result comprises whether the server has a fault and a fault type, if the server has a fault in the monitoring result, the monitoring result is fed back to a web page for displaying, whether the server is down is further monitored, and if the server is down, the monitoring result is used for resetting the server through the BMC.

Preferably, the model training module is configured to train the monitoring model by:

Preferably, the result output module is configured to output the monitoring result by:

inputting a test sample into the trained monitoring model;

The server fault monitoring method and system based on the density clustering algorithm have the following advantages: the health information data related to the server acquired by the BMC is analyzed through the clustering algorithm, the running state of the server is predicted, the early warning of the server fault is realized, the monitoring server performs reset operation on the downtime fault during the early warning of the fault, and the stability of the server is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a block diagram of a server failure monitoring method based on a density clustering algorithm in accordance with embodiment 1;

fig. 2 is a block diagram of a DBSCAN algorithm flow in the server fault monitoring method based on the density clustering algorithm in embodiment 1.

Detailed Description

The present invention is further described in the following with reference to the drawings and the specific embodiments so that those skilled in the art can better understand the present invention and can implement the present invention, but the embodiments are not to be construed as limiting the present invention, and the embodiments and the technical features of the embodiments can be combined with each other without conflict.

The embodiment of the invention provides a server fault monitoring method and system based on a density clustering algorithm, which are used for solving the technical problems of how to carry out fault monitoring and early warning on a server and reset a downtime fault through the density-based clustering algorithm.

Example 1:

the invention relates to a server fault monitoring method based on a density clustering algorithm, which analyzes health information data of a server through the density clustering algorithm, predicts the running state of the server, carries out fault early warning on the server and resets when the server goes down, and comprises the following steps:

s100, acquiring health information data of a server through a BMC (baseboard management controller) and constructing a sample, wherein the sample is divided into a training sample and a testing sample, the sample data in the training sample needs to mark the current server state, and the server state comprises fault types and numerical data corresponding to various fault types;

s200, normalizing the sample data;

s300, constructing a monitoring model based on a DBSCAN algorithm, and optimizing parameters of the monitoring model by taking a training sample as input to obtain a trained monitoring model;

s400, inputting a test sample into a trained monitoring model, analyzing sample data in the test sample through the trained monitoring model, and outputting a monitoring result, wherein the monitoring result comprises whether a server has a fault or not and a fault type; and if the server has faults in the monitoring result, feeding the monitoring result back to the web page for displaying, further monitoring whether the server is down, and if the server is down, resetting the server through the BMC.

The BMC obtains the health information data of the server through I2C, where the health information data is the health information data of relevant components in the server motherboard, including but not limited to voltage, current, temperature value, and PCU memory usage rate.

Because the acquired data is continuous data and the range difference of each data is large, the sample data needs to be normalized and the range is unified. After normalization processing, training samples are used as input, a constructed monitoring model is trained, test samples are used as input, and the trained monitoring model is used for analyzing sample data in the test samples so as to perform fault early warning on a server.

Fault monitoring based on a DBSCAN clustering algorithm is divided into two parts, namely a first part, a neighborhood radius eps and a cluster sample number min _ sample of a sample are confirmed through a certain amount of labeled data before monitoring, and a noise sample is removed; and a second part, taking sample data as input in real-time monitoring, calculating whether the minimum sample point of the test data in the neighborhood radius eps meets the min _ sample, and if so, dividing the test data into the class. In this embodiment, the monitoring model is constructed based on the DBSCAN algorithm, and parameters of the monitoring model are optimized by using a training sample as input through the following steps:

Analyzing the test sample through the trained monitoring model and outputting a monitoring result, wherein the method comprises the following steps:

l100, inputting a test sample into the trained monitoring model;

l100, calculating the number of samples of each fault type in the neighborhood radius eps through the training monitoring model, selecting the fault type with the largest number of samples as the type of the test sample data, and outputting the type of the test sample data as a predicted fault type.

According to the server fault monitoring method based on the density clustering algorithm, the method for analyzing the health information data acquired by the BMC and predicting the running state of the server through the clustering algorithm is used for realizing the early warning of the server fault and monitoring the server to reset the downtime fault during the early warning of the fault.

Example 2:

the invention discloses a server fault monitoring system based on a density clustering algorithm, which comprises a data acquisition module, a data preprocessing module, a model training module and a result output module, wherein the data acquisition module is used for acquiring health information data of a server through a BMC (baseboard management controller) and constructing samples, the samples are divided into training samples and testing samples, the sample data in the training samples are required to mark the current server state, and the server state comprises fault types and numerical data corresponding to various fault types; the data preprocessing module is used for carrying out normalization processing on the sample data; the model training module is used for constructing a monitoring model based on a DBSCAN algorithm, and optimizing parameters of the monitoring model by taking a training sample as input to obtain a trained monitoring model; the result output module is used for inputting the test sample into the trained monitoring model, analyzing the sample data in the test sample through the trained monitoring model and outputting a monitoring result, wherein the monitoring result comprises whether the server has a fault and a fault type, if the server has a fault in the monitoring result, the monitoring result is fed back to a web page to be displayed, whether the server is down is further monitored, and if the server is down, the resetting operation is carried out on the server through the BMC.

The model training module is used for training the monitoring model through the following steps:

The result output module is used for outputting the monitoring result through the following steps:

inputting a test sample into the trained monitoring model;

According to the server fault monitoring system based on the density clustering algorithm, disclosed by the embodiment 1, the server is subjected to fault monitoring through the server fault monitoring method based on the density clustering algorithm, the fault type is fed back to a webpage for displaying, the state of the server is monitored, and resetting operation is performed when the server is down.

It should be noted that not all steps and modules in the above flows and system structure diagrams are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The system structure described in the above embodiments may be a physical structure or a logical structure, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by a plurality of physical entities, or some components in a plurality of independent devices may be implemented together.

While the invention has been shown and described in detail in the drawings and in the preferred embodiments, it is not intended to limit the invention to the embodiments disclosed, and it will be apparent to those skilled in the art that many more embodiments of the invention are possible that combine the features of the different embodiments described above and still fall within the scope of the invention.

Claims

1. A server fault monitoring method based on a density clustering algorithm is characterized in that health information data of a server are analyzed through the density clustering algorithm, the running state of the server is predicted, fault early warning is carried out on the server, and resetting is carried out when the server is down, and the method comprises the following steps:

normalizing the sample data;

2. The server fault monitoring method based on the density clustering algorithm as claimed in claim 1, wherein the BMC obtains the health information data of the server through I2C.

3. The server fault monitoring method based on the density clustering algorithm according to claim 1 or 2, wherein the health information data is health information data of relevant components in a server main board, including but not limited to voltage, current, temperature value, and PCU memory usage.

4. The server fault monitoring method based on the density clustering algorithm as claimed in claim 1, wherein training samples are taken as input to optimize parameters of the monitoring model, comprising the following steps:

5. The server fault monitoring method based on the density clustering algorithm as claimed in claim 4, wherein the test samples are analyzed by the trained monitoring model and the monitoring result is output, comprising the following steps:

inputting a test sample into the trained monitoring model;

6. A server fault monitoring system based on a density clustering algorithm, which is characterized in that the fault monitoring is carried out on a server by the server fault monitoring method based on the density clustering algorithm according to any one of claims 1 to 5, and the system comprises:

7. The server fault monitoring system based on the density clustering algorithm as claimed in claim 6, wherein the BMC obtains health information data of the server through I2C.

8. The server fault monitoring system based on density clustering algorithm according to claim 6, wherein the health information data is health information data of related components in the server motherboard, including but not limited to voltage, current, temperature value, and PCU memory usage.

9. The server fault monitoring system based on density clustering algorithm according to claim 6, wherein the model training module is configured to train the monitoring model by:

10. The server fault monitoring system based on the density clustering algorithm as claimed in claim 6, wherein the result output module is configured to output the monitoring result by:

inputting a test sample into the trained monitoring model;